Location

Level 0, Open Space, Owen G. Glenn Building

Start Date

12-15-2014

Description

Existing methodologies for identifying data quality issues are inevitably user-centric, wherein data quality requirements are determined in a top-down manner following organizational structures and data governance frameworks. In the current data landscape, however, users are often confronted with new, unexplored data sets that may have relevance and potential to create value. In such scenarios applying top-down approaches is not feasible. Users need to be empowered with data exploration capabilities that allow them to investigate and understand the quality of data sets and, subsequently, the implications for use. The question is to what extent can the quality of a data set be explored in a bottom up manner without access to well defined data quality measures. Accordingly, in this paper we present an approach for discovering data quality issues using generic exploratory methods, which we derived through experimentation with a real data set based on public transport.

Share

COinS
 
Dec 15th, 12:00 AM

A Data Driven Approach for Discovering Data Quality Requirements

Level 0, Open Space, Owen G. Glenn Building

Existing methodologies for identifying data quality issues are inevitably user-centric, wherein data quality requirements are determined in a top-down manner following organizational structures and data governance frameworks. In the current data landscape, however, users are often confronted with new, unexplored data sets that may have relevance and potential to create value. In such scenarios applying top-down approaches is not feasible. Users need to be empowered with data exploration capabilities that allow them to investigate and understand the quality of data sets and, subsequently, the implications for use. The question is to what extent can the quality of a data set be explored in a bottom up manner without access to well defined data quality measures. Accordingly, in this paper we present an approach for discovering data quality issues using generic exploratory methods, which we derived through experimentation with a real data set based on public transport.