CCSR for ICD-10-CM Diagnosis Codes • hcup

The Clinical Classifications Software Refined (CCSR) software for ICD-10-CM diagnosis codes provides both one-to-one mapping (classify_ccsr_dx1()) and one-to-many mapping (ccsr_dx()) to CCSR categories. The one-to-many mapping is necessary because many ICD-10 codes span multiple meaningful categories, and the identification of all conditions related to a diagnosis code would be lost by categorizing some ICD-10-CM codes into a single category.

A simple demonstration

For example, consider the code I11.0 (Hypertensive heart disease with heart failure) which encompasses both heart failure (CCSR category CIR019) and hypertension with complications (CIR008). Classifying this code as heart failure alone would miss meaningful information, but on the other hand, some analyses require mutually exclusive categorization.

The ccsr_dx() function identifies all CCSR categories associated with an ICD-10 diagnosis code. Each ICD-10 code can have one or more CCSR category. In the example used above, I11.0 has two CSSR categories, so the function returns a character vector of two CCSR categories

library(hcup)
library(dplyr)
ccsr_dx("I110")
#> [1] "CIR008" "CIR019"
ccsr_dx("I110") %>% explain_ccsr()
#> [1] "Hypertension with complications and secondary hypertension"
#> [2] "Heart failure"

If we need to do one-to-one mapping (e.g. descriptive statistics on the number of hospital discharges), we use classify_ccsr_dx1(). Importantly, this algorithm is only valid for the principal diagnosis (for inpatient records) or the first listed diagnosis code (for outpatient data)

classify_ccsr_dx1("I110", setting = "inpatient")
#> `setting = inpatient` only applies to the principal/first listed diagnosis
#> This message is displayed once every 8 hours.
#> [1] "CIR019"

One-to-One with classify_ccsr_dx1

The classify_ccsr_dx1() function is similar to the other classify_ functions used in this package, so we will begin here.

Although a comprehensive review of the CCSR methodology is beyond the scope of this article (I’d highly recommend reading the appendices of the CCSR DX user guide), I do want to highlight a few important points here.

Only valid for the principal/first diagnosis

This algorithm was developed for use with the principal diagnosis (for inpatient records) or the first listed diagnosis code (for outpatient data). Consider the sample of two patients below, with their principal diagnosis listed in DX1, and subsequent diagnoses listed in DX_n.

library(dplyr)
df <- tibble::tribble(
  ~Pt_id,    ~DX_1,    ~DX_2,  ~DX_3,
     "A",    "I10", "F17210",     NA,
     "B",    "D62",   "Z370", "E876"
  )

## GOOD
df %>%
  mutate(prin_CCSR = classify_ccsr_dx1(DX_1, setting="ip"))
#> # A tibble: 2 × 5
#>   Pt_id DX_1  DX_2   DX_3  prin_CCSR
#>   <chr> <chr> <chr>  <chr> <chr>    
#> 1 A     I10   F17210 NA    CIR007   
#> 2 B     D62   Z370   E876  BLD004

## BAD
df %>%
  mutate(prin_CCSR = classify_ccsr_dx1(DX_2, setting="ip"))
#> # A tibble: 2 × 5
#>   Pt_id DX_1  DX_2   DX_3  prin_CCSR
#>   <chr> <chr> <chr>  <chr> <chr>    
#> 1 A     I10   F17210 NA    XXX000   
#> 2 B     D62   Z370   E876  XXX000

Caution should be used when working with tidy data (long-format) to avoid classifying the non-principal diagnoses

df_tidy <- df %>%
  pivot_longer(cols      = -Pt_id, 
               names_to  = "DX_NUM", 
               values_to = "ICD")

## GOOD
df_tidy %>%
  filter(DX_NUM=="DX_1") %>%
  mutate(prin_CCSR = classify_ccsr_dx1(ICD, setting="ip"),
         expln     = explain_ccsr(prin_CCSR))
#> # A tibble: 2 × 5
#>   Pt_id DX_NUM ICD   prin_CCSR expln                       
#>   <chr> <chr>  <chr> <chr>     <chr>                       
#> 1 A     DX_1   I10   CIR007    Essential hypertension      
#> 2 B     DX_1   D62   BLD004    Acute posthemorrhagic anemia

## BAD
df_tidy %>%
  mutate(prin_CCSR = classify_ccsr_dx1(ICD, setting="ip"),
         expln     = explain_ccsr(prin_CCSR))
#> # A tibble: 6 × 5
#>   Pt_id DX_NUM ICD    prin_CCSR expln                                           
#>   <chr> <chr>  <chr>  <chr>     <chr>                                           
#> 1 A     DX_1   I10    CIR007    Essential hypertension                          
#> 2 A     DX_2   F17210 XXX000    Code is unacceptable as a principal diagnosis P…
#> 3 A     DX_3   NA     NA        NA                                              
#> 4 B     DX_1   D62    BLD004    Acute posthemorrhagic anemia                    
#> 5 B     DX_2   Z370   XXX000    Code is unacceptable as a principal diagnosis P…
#> 6 B     DX_3   E876   END011    Fluid and electrolyte disorders

rm(df, df_tidy)

Unacceptable principal/first diagnosis

Second, not all ICD-10 codes are acceptable for default classification. For example if B95.2 were listed as the Principal/First diagnoses, it would map to XXX000 (for inpatient data) and XXX111 (for outpatient data).

# B95.2: Enterococcus as the cause of diseases classified elsewhere
classify_ccsr_dx1("B952", setting = "inpatient")
#> [1] "XXX000"
classify_ccsr_dx1("B952", setting = "outpatient")
#> [1] "XXX111"

This could be because B95.2 was inappropriately listed as the principal diagnosis, and if you run into this problem frequently, it may be helpful to consult the CCSR DX user guide.

Setting isn’t that important

As of this version of the CCSR (see ?hcup.data::CCSR_DX_mapping to check the version), the setting (i.e. inpatient or outpatient) doesn’t drastically change the results. Using the dataset this function is based off of, we can see that the only situations where setting = “inpatient” and setting = “outpatient” don’t agree are codes that are acceptable for the first diagnosis in the outpatient setting, not for the principal diagnosis in the inpatient setting.

if(interactive()){
  hcup.data::CCSR_DX_mapping %>%
    
    # Recode the Unacceptable category to be the same for ip/op
    mutate(default_CCSR_IP = recode(default_CCSR_IP, "XXX000"="Unccpt"),
           default_CCSR_OP = recode(default_CCSR_OP, "XXX111"="Unccpt")) %>%
    
    # Remove default categories that match across settings
    filter(default_CCSR_IP!=default_CCSR_OP) %>%
    
    count(default_CCSR_IP, default_CCSR_OP, sort=T)
}

One-to-Many with ccsr_dx()

ccsr_dx() is different than most of the other functions in this package, because each ICD code can return multiple categories per ICD code. This means that we’ll be dealing with lists. To demonstrate, consider the two ICD-10 codes below:

pt_id	DX_NUM	`I10_DX`
John Doe	1	`A401`
John Doe	2	`K432`

The code A401 maps to two CCSR categories (INF002 & INF003) while K432 only has one associated CCSR category (DIG010)

ccsr_dx("A401")
#> [1] "INF002" "INF003"
ccsr_dx("K432")
#> [1] "DIG010"

Appending this to the previous table gives us

pt_id	DX_NUM	`I10_DX`	CCSR
John Doe	1	`A401`	INF002, INF003
John Doe	2	`K432`	DIG010

df.listcol <- dplyr::tibble(pt_id = "John Doe",
                            DX_NUM = 1:2,
                            I10_DX = c("A401", "K432")) %>%
  mutate(CCSR = ccsr_dx(I10_DX))

df.listcol
#> # A tibble: 2 × 4
#>   pt_id    DX_NUM I10_DX CCSR     
#>   <chr>     <int> <chr>  <list>   
#> 1 John Doe      1 A401   <chr [2]>
#> 2 John Doe      2 K432   <chr [1]>

str(df.listcol$CCSR)
#> List of 2
#>  $ : chr [1:2] "INF002" "INF003"
#>  $ : chr "DIG010"

The column CCSR in the table above is now a list. For those unfamiliar with list columns the book Functional Programming has a helpful overview chapter and the Rectangle vignette in tidyr. You can also use View(df.listcol) in RStudio to see the contents of the list column in a more familiar structure.

If you want to get this list-column into an atomic vector, tidyr::unnest_longer() will make a new row for every element within the list

df.listcol %>%
  unnest_longer(CCSR)
#> # A tibble: 3 × 4
#>   pt_id    DX_NUM I10_DX CCSR  
#>   <chr>     <int> <chr>  <chr> 
#> 1 John Doe      1 A401   INF002
#> 2 John Doe      1 A401   INF003
#> 3 John Doe      2 K432   DIG010

A longer example

df <- tibble::tribble(
  ~pt_id, ~DX_NUM, ~ICD10,
  "A",     1,      "K432",
  "A",     2,      "A401",
  "B",     1,      "I495",
  "B",     2,   "BadCode",
  "C",     1,     "E8771",
  "C",     2,     "A5442",
  "C",     3,      "A564"
)

df %>%
 mutate(CCSR = ccsr_dx(ICD10)) %>%
 tidyr::unnest_longer(CCSR) %>%
 mutate(CCSR_expl = explain_ccsr(CCSR))
#> # A tibble: 10 × 5
#>    pt_id DX_NUM ICD10   CCSR   CCSR_expl                                        
#>    <chr>  <dbl> <chr>   <chr>  <chr>                                            
#>  1 A          1 K432    DIG010 Abdominal hernia                                 
#>  2 A          2 A401    INF002 Septicemia                                       
#>  3 A          2 A401    INF003 Bacterial infections                             
#>  4 B          1 I495    CIR017 Cardiac dysrhythmias                             
#>  5 B          2 BadCode NA     NA                                               
#>  6 C          1 E8771   END011 Fluid and electrolyte disorders                  
#>  7 C          2 A5442   INF010 Sexually transmitted infections (excluding HIV a…
#>  8 C          2 A5442   MUS001 Infective arthritis                              
#>  9 C          3 A564    INF010 Sexually transmitted infections (excluding HIV a…
#> 10 C          3 A564    RSP006 Other specified upper respiratory infections