CCSR for ICD-10-CM Diagnosis Codes
ccsr-dx.Rmd
The Clinical Classifications Software Refined (CCSR) software for ICD-10-CM diagnosis codes provides both one-to-one mapping (classify_ccsr_dx1()
) and one-to-many mapping (ccsr_dx()
) to CCSR categories. The one-to-many mapping is necessary because many ICD-10 codes span multiple meaningful categories, and the identification of all conditions related to a diagnosis code would be lost by categorizing some ICD-10-CM codes into a single category.
A simple demonstration
For example, consider the code I11.0 (Hypertensive heart disease with heart failure) which encompasses both heart failure (CCSR category CIR019) and hypertension with complications (CIR008). Classifying this code as heart failure alone would miss meaningful information, but on the other hand, some analyses require mutually exclusive categorization.
The ccsr_dx()
function identifies all CCSR categories associated with an ICD-10 diagnosis code. Each ICD-10 code can have one or more CCSR category. In the example used above, I11.0 has two CSSR categories, so the function returns a character vector of two CCSR categories
library(hcup)
library(dplyr)
ccsr_dx("I110")
#> [1] "CIR008" "CIR019"
ccsr_dx("I110") %>% explain_ccsr()
#> [1] "Hypertension with complications and secondary hypertension"
#> [2] "Heart failure"
If we need to do one-to-one mapping (e.g. descriptive statistics on the number of hospital discharges), we use classify_ccsr_dx1()
. Importantly, this algorithm is only valid for the principal diagnosis (for inpatient records) or the first listed diagnosis code (for outpatient data)
classify_ccsr_dx1("I110", setting = "inpatient")
#> `setting = inpatient` only applies to the principal/first listed diagnosis
#> This message is displayed once every 8 hours.
#> [1] "CIR019"
One-to-One with classify_ccsr_dx1
The classify_ccsr_dx1()
function is similar to the other classify_
functions used in this package, so we will begin here.
Although a comprehensive review of the CCSR methodology is beyond the scope of this article (I’d highly recommend reading the appendices of the CCSR DX user guide), I do want to highlight a few important points here.
Only valid for the principal/first diagnosis
This algorithm was developed for use with the principal diagnosis (for inpatient records) or the first listed diagnosis code (for outpatient data). Consider the sample of two patients below, with their principal diagnosis listed in DX1
, and subsequent diagnoses listed in DX_n
.
library(dplyr)
df <- tibble::tribble(
~Pt_id, ~DX_1, ~DX_2, ~DX_3,
"A", "I10", "F17210", NA,
"B", "D62", "Z370", "E876"
)
## GOOD
df %>%
mutate(prin_CCSR = classify_ccsr_dx1(DX_1, setting="ip"))
#> # A tibble: 2 × 5
#> Pt_id DX_1 DX_2 DX_3 prin_CCSR
#> <chr> <chr> <chr> <chr> <chr>
#> 1 A I10 F17210 NA CIR007
#> 2 B D62 Z370 E876 BLD004
## BAD
df %>%
mutate(prin_CCSR = classify_ccsr_dx1(DX_2, setting="ip"))
#> # A tibble: 2 × 5
#> Pt_id DX_1 DX_2 DX_3 prin_CCSR
#> <chr> <chr> <chr> <chr> <chr>
#> 1 A I10 F17210 NA XXX000
#> 2 B D62 Z370 E876 XXX000
Caution should be used when working with tidy data (long-format) to avoid classifying the non-principal diagnoses
df_tidy <- df %>%
pivot_longer(cols = -Pt_id,
names_to = "DX_NUM",
values_to = "ICD")
## GOOD
df_tidy %>%
filter(DX_NUM=="DX_1") %>%
mutate(prin_CCSR = classify_ccsr_dx1(ICD, setting="ip"),
expln = explain_ccsr(prin_CCSR))
#> # A tibble: 2 × 5
#> Pt_id DX_NUM ICD prin_CCSR expln
#> <chr> <chr> <chr> <chr> <chr>
#> 1 A DX_1 I10 CIR007 Essential hypertension
#> 2 B DX_1 D62 BLD004 Acute posthemorrhagic anemia
## BAD
df_tidy %>%
mutate(prin_CCSR = classify_ccsr_dx1(ICD, setting="ip"),
expln = explain_ccsr(prin_CCSR))
#> # A tibble: 6 × 5
#> Pt_id DX_NUM ICD prin_CCSR expln
#> <chr> <chr> <chr> <chr> <chr>
#> 1 A DX_1 I10 CIR007 Essential hypertension
#> 2 A DX_2 F17210 XXX000 Code is unacceptable as a principal diagnosis P…
#> 3 A DX_3 NA NA NA
#> 4 B DX_1 D62 BLD004 Acute posthemorrhagic anemia
#> 5 B DX_2 Z370 XXX000 Code is unacceptable as a principal diagnosis P…
#> 6 B DX_3 E876 END011 Fluid and electrolyte disorders
rm(df, df_tidy)
Unacceptable principal/first diagnosis
Second, not all ICD-10 codes are acceptable for default classification. For example if B95.2 were listed as the Principal/First diagnoses, it would map to XXX000
(for inpatient data) and XXX111
(for outpatient data).
# B95.2: Enterococcus as the cause of diseases classified elsewhere
classify_ccsr_dx1("B952", setting = "inpatient")
#> [1] "XXX000"
classify_ccsr_dx1("B952", setting = "outpatient")
#> [1] "XXX111"
This could be because B95.2 was inappropriately listed as the principal diagnosis, and if you run into this problem frequently, it may be helpful to consult the CCSR DX user guide.
Setting isn’t that important
As of this version of the CCSR (see ?hcup.data::CCSR_DX_mapping
to check the version), the setting (i.e. inpatient or outpatient) doesn’t drastically change the results. Using the dataset this function is based off of, we can see that the only situations where setting = “inpatient” and setting = “outpatient” don’t agree are codes that are acceptable for the first diagnosis in the outpatient setting, not for the principal diagnosis in the inpatient setting.
if(interactive()){
hcup.data::CCSR_DX_mapping %>%
# Recode the Unacceptable category to be the same for ip/op
mutate(default_CCSR_IP = recode(default_CCSR_IP, "XXX000"="Unccpt"),
default_CCSR_OP = recode(default_CCSR_OP, "XXX111"="Unccpt")) %>%
# Remove default categories that match across settings
filter(default_CCSR_IP!=default_CCSR_OP) %>%
count(default_CCSR_IP, default_CCSR_OP, sort=T)
}
One-to-Many with ccsr_dx()
ccsr_dx()
is different than most of the other functions in this package, because each ICD code can return multiple categories per ICD code. This means that we’ll be dealing with lists. To demonstrate, consider the two ICD-10 codes below:
pt_id | DX_NUM | I10_DX |
---|---|---|
John Doe | 1 | A401 |
John Doe | 2 | K432 |
The code A401
maps to two CCSR categories (INF002 & INF003) while K432
only has one associated CCSR category (DIG010)
Appending this to the previous table gives us
pt_id | DX_NUM | I10_DX |
CCSR |
---|---|---|---|
John Doe | 1 | A401 |
INF002, INF003 |
John Doe | 2 | K432 |
DIG010 |
df.listcol <- dplyr::tibble(pt_id = "John Doe",
DX_NUM = 1:2,
I10_DX = c("A401", "K432")) %>%
mutate(CCSR = ccsr_dx(I10_DX))
df.listcol
#> # A tibble: 2 × 4
#> pt_id DX_NUM I10_DX CCSR
#> <chr> <int> <chr> <list>
#> 1 John Doe 1 A401 <chr [2]>
#> 2 John Doe 2 K432 <chr [1]>
str(df.listcol$CCSR)
#> List of 2
#> $ : chr [1:2] "INF002" "INF003"
#> $ : chr "DIG010"
The column CCSR in the table above is now a list. For those unfamiliar with list columns the book Functional Programming has a helpful overview chapter and the Rectangle vignette in tidyr. You can also use View(df.listcol)
in RStudio to see the contents of the list column in a more familiar structure.
If you want to get this list-column into an atomic vector, tidyr::unnest_longer()
will make a new row for every element within the list
df.listcol %>%
unnest_longer(CCSR)
#> # A tibble: 3 × 4
#> pt_id DX_NUM I10_DX CCSR
#> <chr> <int> <chr> <chr>
#> 1 John Doe 1 A401 INF002
#> 2 John Doe 1 A401 INF003
#> 3 John Doe 2 K432 DIG010
A longer example
df <- tibble::tribble(
~pt_id, ~DX_NUM, ~ICD10,
"A", 1, "K432",
"A", 2, "A401",
"B", 1, "I495",
"B", 2, "BadCode",
"C", 1, "E8771",
"C", 2, "A5442",
"C", 3, "A564"
)
df %>%
mutate(CCSR = ccsr_dx(ICD10)) %>%
tidyr::unnest_longer(CCSR) %>%
mutate(CCSR_expl = explain_ccsr(CCSR))
#> # A tibble: 10 × 5
#> pt_id DX_NUM ICD10 CCSR CCSR_expl
#> <chr> <dbl> <chr> <chr> <chr>
#> 1 A 1 K432 DIG010 Abdominal hernia
#> 2 A 2 A401 INF002 Septicemia
#> 3 A 2 A401 INF003 Bacterial infections
#> 4 B 1 I495 CIR017 Cardiac dysrhythmias
#> 5 B 2 BadCode NA NA
#> 6 C 1 E8771 END011 Fluid and electrolyte disorders
#> 7 C 2 A5442 INF010 Sexually transmitted infections (excluding HIV a…
#> 8 C 2 A5442 MUS001 Infective arthritis
#> 9 C 3 A564 INF010 Sexually transmitted infections (excluding HIV a…
#> 10 C 3 A564 RSP006 Other specified upper respiratory infections