Assessing Accuracy with Locality-Sensitive Hashing in Multiple Source Environment

Abstract

Accuracy assessment is a key issue in data quality management. Most of current studies focus on how to qualitatively analyze accuracy dimension and the analysis depends heavily on experts’ knowledge. Seldom work is given on how to automatically quantify accuracy dimension. Based on Jensen-Shannon Divergence (JSD) measure, we propose accuracy of data can be automatically quantified by comparing data with its entity’s most approximation in available context. To quickly identify most approximation in large scale data sources, Locality-Sensitive Hashing (LSH) is employed to extract most approximation at multiple levels, namely column, record and field level. Our approach can not only give each data source an objective accuracy score very quickly as long as context member is available but also avoid human’s laborious interaction. Theory and experiment show our approach performs well in achieving metadata on accuracy dimension.

Recommended Citation

Han, Jingyu; Jiang, Dawei; Li, Lingjuan; and Ding, Zhiming, "Assessing Accuracy with Locality-Sensitive Hashing in Multiple Source Environment" (2009). AMCIS 2009 Proceedings. 278.
https://aisel.aisnet.org/amcis2009/278

AMCIS 2009 Proceedings

Assessing Accuracy with Locality-Sensitive Hashing in Multiple Source Environment

Abstract

Recommended Citation

Search

Links

Browse

Author Corner

Links

AMCIS 2009 Proceedings

Assessing Accuracy with Locality-Sensitive Hashing in Multiple Source Environment

Authors

Abstract

Recommended Citation

Share

Search

Links

Browse

Author Corner

Links