Paper Type
Completed Research Paper
Abstract
Data quality assessment is a key factor in data-intensive domains. The data deluge is aggravated by an increasing need for interoperability and cooperation across groups and organizations. New alternatives must be found to select the data that best satisfy users’ needs in a given context. This paper presents a strategy to provide information to support the evaluation of the quality of data sets. This strategy is based on combining metadata on the provenance of a data set (derived from workflows that generate it) and quality dimensions defined by the set’s users, based on the desired context of use. Our solution, validated via a case study, takes advantage of a semantic model to preserve data provenance related to applications in a specific domain.
Recommended Citation
Gonzales Malaverri, Joana E.; Mota, Matheus S.; and Bauzer Medeiros, Claudia M., "Estimating the quality of data using provenance: a case study in eScience" (2013). AMCIS 2013 Proceedings. 4.
https://aisel.aisnet.org/amcis2013/DataQuality/GeneralPresentations/4
Estimating the quality of data using provenance: a case study in eScience
Data quality assessment is a key factor in data-intensive domains. The data deluge is aggravated by an increasing need for interoperability and cooperation across groups and organizations. New alternatives must be found to select the data that best satisfy users’ needs in a given context. This paper presents a strategy to provide information to support the evaluation of the quality of data sets. This strategy is based on combining metadata on the provenance of a data set (derived from workflows that generate it) and quality dimensions defined by the set’s users, based on the desired context of use. Our solution, validated via a case study, takes advantage of a semantic model to preserve data provenance related to applications in a specific domain.