Journal of the Midwest Association for Information Systems (JMWAIS)


Healthcare industry generates streams of data in different problem domains. Analysis of such data requires stream analytics tools and techniques to generate useful insights. Stream analytics involve analysis of time variant events. The specific patterns in the events can indicate some imminent outcomes such as state of a heart, etc. Therefore, novel ways to find specific patterns in the events generated by multiple sources are required. A key requirement for applying any such method is data preparation and organization to enable such analysis. In this paper, we extend the CRISP-DM process to include data preparation approaches for sequence mining. We present progression analysis, an approach for converting multidimensional time variant streams of health records in a form to be able to detect useful sequential signals. To illustrate the process, we use patient health history stored in an Electronic Medical Record system (EMR) and present a healthcare application to compare progression of diseases over time between patients diagnosed with Tobacco Use Disorder (TUD) and non-tobacco users. Interestingly, many diseases follow the same path for TUD and non-TUD patients. Finally, the generalizability of the progression analysis is discussed.