Location

Level 0, Open Space, Owen G. Glenn Building

Start Date

12-15-2014

Description

Poor data quality can have a significant impact on system and organizational performance. With significant increase in data gathering and storage, the number of sources of data that must be merged in data warehouse and Enterprise Resource Planning (ERP) implementations has increased significantly. This makes data cleansing as part of the implementation conversion, increasingly difficult. In this research we expand the traditional Extraction-Load-Transform (ETL) process to identify sub-processes between the main stages. We then identify the decisions and tradeoffs related to the various decisions on allocation of time, resources and accuracy constraints on the data cleansing process. We develop a mathematical model of the process to identify the optimal configuration of these factors in data cleansing process. We use empirical data to test the proposed model. Three different levels of cleansing complexity are tested in the preliminary analysis to demonstrate the feasibility of the optimization modeling process.

Share

COinS
 
Dec 15th, 12:00 AM

How Clean is Clean Enough? Determining the Most Effective Use of Resources in the Data Cleansing Process

Level 0, Open Space, Owen G. Glenn Building

Poor data quality can have a significant impact on system and organizational performance. With significant increase in data gathering and storage, the number of sources of data that must be merged in data warehouse and Enterprise Resource Planning (ERP) implementations has increased significantly. This makes data cleansing as part of the implementation conversion, increasingly difficult. In this research we expand the traditional Extraction-Load-Transform (ETL) process to identify sub-processes between the main stages. We then identify the decisions and tradeoffs related to the various decisions on allocation of time, resources and accuracy constraints on the data cleansing process. We develop a mathematical model of the process to identify the optimal configuration of these factors in data cleansing process. We use empirical data to test the proposed model. Three different levels of cleansing complexity are tested in the preliminary analysis to demonstrate the feasibility of the optimization modeling process.