
The global expansion of information technology is providing MIS researchers greater accessibility to secondary data sources than ever before. Unfortunately, practically all of the variables in this era of Big Data are non-normally distributed and many have large proportions of missing values. In many cases, extreme distributional problems such as inflated frequencies (e.g., stacks of zeroes) inhibit statistical analyses for data mining or theory testing purposes. Because of the massive amounts of available ‘messy’ data, practitioners and researchers need methods for prioritizing and analyzing their archival datasets. Thus, we propose a normalized formative index development methodology that progresses through five stages: content specification, data collection, data reduction, technical validation and norming. Each stage of the proposed methodology is validated by its successful use in multidisciplinary research on formative index construction.



A Methodology for Developing Normalized Formative Indices Using Messy Data

The global expansion of information technology is providing MIS researchers greater accessibility to secondary data sources than ever before. Unfortunately, practically all of the variables in this era of Big Data are non-normally distributed and many have large proportions of missing values. In many cases, extreme distributional problems such as inflated frequencies (e.g., stacks of zeroes) inhibit statistical analyses for data mining or theory testing purposes. Because of the massive amounts of available ‘messy’ data, practitioners and researchers need methods for prioritizing and analyzing their archival datasets. Thus, we propose a normalized formative index development methodology that progresses through five stages: content specification, data collection, data reduction, technical validation and norming. Each stage of the proposed methodology is validated by its successful use in multidisciplinary research on formative index construction.