Location
Level 0, Open Space, Owen G. Glenn Building
Start Date
12-15-2014
Description
Existing methodologies for identifying data quality issues are inevitably user-centric, wherein data quality requirements are determined in a top-down manner following organizational structures and data governance frameworks. In the current data landscape, however, users are often confronted with new, unexplored data sets that may have relevance and potential to create value. In such scenarios applying top-down approaches is not feasible. Users need to be empowered with data exploration capabilities that allow them to investigate and understand the quality of data sets and, subsequently, the implications for use. The question is to what extent can the quality of a data set be explored in a bottom up manner without access to well defined data quality measures. Accordingly, in this paper we present an approach for discovering data quality issues using generic exploratory methods, which we derived through experimentation with a real data set based on public transport.
Recommended Citation
Zhang, Ruojing; Jayawardene, Vimukthi; Indulska, Marta; Sadiq, Shazia; and Zhou, Xiaofang, "A Data Driven Approach for Discovering Data Quality Requirements" (2014). ICIS 2014 Proceedings. 13.
https://aisel.aisnet.org/icis2014/proceedings/DecisionAnalytics/13
A Data Driven Approach for Discovering Data Quality Requirements
Level 0, Open Space, Owen G. Glenn Building
Existing methodologies for identifying data quality issues are inevitably user-centric, wherein data quality requirements are determined in a top-down manner following organizational structures and data governance frameworks. In the current data landscape, however, users are often confronted with new, unexplored data sets that may have relevance and potential to create value. In such scenarios applying top-down approaches is not feasible. Users need to be empowered with data exploration capabilities that allow them to investigate and understand the quality of data sets and, subsequently, the implications for use. The question is to what extent can the quality of a data set be explored in a bottom up manner without access to well defined data quality measures. Accordingly, in this paper we present an approach for discovering data quality issues using generic exploratory methods, which we derived through experimentation with a real data set based on public transport.