Abstract
Patient readmission in healthcare is an expensive occurrence for hospitals and insurers, and costs on average around USD 16,037 per patient (Kum Ghabowen et al., 2024). In the United States (US), the hospitals are also penalized if they have higher than expected readmission rates. Despite substantial research in the area, the existing predictive models have low to moderate accuracy and are susceptible to violations of data integrity and assumptions. The digitization of healthcare services, evolving landscape of medical terminology, and developments in machine learning technologies open avenues for continuous improvement of existing readmission risk prediction models. One such opportunity lies with transition from International Classification of Diseases (ICD) ninth generation to the tenth generation. When a patient visits a hospital, their diagnosis and procedures are recorded using an ICD code. Since US adopted the tenth generation of ICD codes in 2015, the number of possible unique diagnosis has increased from approximately 17000 to 155000 (Hirsch et al., 2016). ICD-10 codes are alpha numeric having three to seven characters long with a hierarchical structure. The specificity increases as one starts decoding from left to right. A challenge with using ICD-10 codes is that some diagnoses codes are rare which provides very little information for model training. Curse of dimensionality is another challenge that emanates from the sheer number of possible codes which when used can lead to model overfitting potentially incorporating noise into the trained model. An avenue to tackle the challenges in using diagnosis codes to predict readmission risk is the incorporation of contextual information in the code. For instance, codes with first three letters as ‘I05’ and ‘I06’ are both associated with chronic heart disease. Several verified resources have kept an updated library of diagnosis codes and their descriptions including part of body impacted and severity of the impact. Instead of using diagnosis code as a nominal variable a textual description of the patient’s diagnosis can be created which can help not only to reduce the dimensions but also allow for learning contextual features using recent developments in the generative artificial intelligence (AI) models.
Recommended Citation
Shrivastava, Utkarsh; Razi, Muhammad A.; Han, Bernard; and Tarn, Mike, "Understanding readmission risk from context aware models using diagnosis codes" (2025). AMCIS 2025 TREOs. 212.
https://aisel.aisnet.org/treos_amcis2025/212
Comments
tpp1303