Big Data Analytics: Predicting Obesity Epidemic through Socioeconomic Data Analysis

Oluwafemi Akanfe, University of Texas
Myung Ko, University of Texas

Abstract

The population rate of people with obesity has been continuously increasing for the past decades, especially in the United States and some other developed nations. It is growing at an alarming rate which many chronic diseases have been associated with it and, as such, becomes a grave threat to the public health and substantial economic burden to a nation. To tackle this challenge, many researchers have focused on the understanding of its antecedents in order to recommend plausible preventive mechanisms. While many studies have examined the traditional health data like lab reports, medical billing, periodic checkup records to study and predict health risks for targeted population, often these traditional sources have been found inadequate to make accurate obesity predictions. Even in this era of big data, many studies have mostly conducted ex-post analyses of socioeconomic data to reveal inherent obesity challenges and are yet to tap into the potential benefits of socioeconomic data in predicting obesity prone population. This study extends the previous research on the application of big data analytics by adopting socioeconomic data from multiple sources, such as population and economic census, transportation and infrastructure inventories, labor statistics, among others, on the previously unrecognized potential variables involving social influence, type and frequency of shared physical activities, social engagement, availability of recreational facility, types of food consumption etc. and proposes to predict obesity epidemic. In this study, socioeconomic data are considered to be large data sets consisting of the interactions of individual’s personality, lifestyle and economic factors that may be analyzed computationally to reveal patterns and associations. Thus, the study seeks to answer the following research question using big data: can data analytics be utilized in analyzing socioeconomic data to reveal valuable information regarding an impending obesity epidemic in a target population, as well as a potential lifestyle that could suggest the increase in the disease? Specifically, in this paper, we propose a predictive model that can provide accurate prediction regarding the impending obesity using decision tree and naïve Bayes classifier which are well-known data mining techniques for classification problems. The preliminary results of our pilot study suggest that performance of these predictive models is promising and thus, we can still apply them on our targeted population data, which will be collated from the archival data of census related variables, economic systems index and health-related survey administration organizations, such as American Community Survey (ACS) and Centers for Disease Control and Prevention (CDC). Consequently, this study contributes to IS literature on big data analytics on healthcare and their applicability in predicting obesity epidemic, helps health professionals better understand factors influencing obesity, profiling their patient population and provide prioritize interventions by targeting the population at high risk.

 

Big Data Analytics: Predicting Obesity Epidemic through Socioeconomic Data Analysis

The population rate of people with obesity has been continuously increasing for the past decades, especially in the United States and some other developed nations. It is growing at an alarming rate which many chronic diseases have been associated with it and, as such, becomes a grave threat to the public health and substantial economic burden to a nation. To tackle this challenge, many researchers have focused on the understanding of its antecedents in order to recommend plausible preventive mechanisms. While many studies have examined the traditional health data like lab reports, medical billing, periodic checkup records to study and predict health risks for targeted population, often these traditional sources have been found inadequate to make accurate obesity predictions. Even in this era of big data, many studies have mostly conducted ex-post analyses of socioeconomic data to reveal inherent obesity challenges and are yet to tap into the potential benefits of socioeconomic data in predicting obesity prone population. This study extends the previous research on the application of big data analytics by adopting socioeconomic data from multiple sources, such as population and economic census, transportation and infrastructure inventories, labor statistics, among others, on the previously unrecognized potential variables involving social influence, type and frequency of shared physical activities, social engagement, availability of recreational facility, types of food consumption etc. and proposes to predict obesity epidemic. In this study, socioeconomic data are considered to be large data sets consisting of the interactions of individual’s personality, lifestyle and economic factors that may be analyzed computationally to reveal patterns and associations. Thus, the study seeks to answer the following research question using big data: can data analytics be utilized in analyzing socioeconomic data to reveal valuable information regarding an impending obesity epidemic in a target population, as well as a potential lifestyle that could suggest the increase in the disease? Specifically, in this paper, we propose a predictive model that can provide accurate prediction regarding the impending obesity using decision tree and naïve Bayes classifier which are well-known data mining techniques for classification problems. The preliminary results of our pilot study suggest that performance of these predictive models is promising and thus, we can still apply them on our targeted population data, which will be collated from the archival data of census related variables, economic systems index and health-related survey administration organizations, such as American Community Survey (ACS) and Centers for Disease Control and Prevention (CDC). Consequently, this study contributes to IS literature on big data analytics on healthcare and their applicability in predicting obesity epidemic, helps health professionals better understand factors influencing obesity, profiling their patient population and provide prioritize interventions by targeting the population at high risk.