In this paper, we investigate Chronic Obstructive Pulmonary Disease in the United States 2012-2017. We integrate data from multiple sources and use them to analyze COPD at the level of core-based statistical area. We include cigarette smoking and race / ethnicity categories because of well-known health disparities in the United States. We develop a baseline model with multiple linear regression and then attempt to improve upon it with machine learning methods, including Lasso Regression, Ridge Regression, Generalized Additive Model, Support Vector Machines, Artificial Neural Network, Random Forest, and Gradient Boosted Tree. The best machine learning model, a Support Vector Machine, captures an additional 6% variance explained in a strongly predictive model. Overall, cigarette smoking and household income are the strongest predictors. Future directions for research and practice are discussed.

Abstract Only