Abstract

Because of the heterogeneous nature of different data sources, data integration is often one of the most challenging tasks in managing modern information systems. The challenges exist at three different levels: schema heterogeneity, entity heterogeneity, and data heterogeneity. The existing literature has largely focused on schema heterogeneity and entity heterogeneity; and the very limited work on data heterogeneity either avoid attribute value conflicts or resolve them in an ad-hoc manner. The focus of this research is on data heterogeneity. We propose a decision-theoretical framework that enables attribute value conflicts to be resolved in a cost-efficient manner. The framework takes into consideration the consequences of incorrect data values and selects the value that minimizes the total expected error costs for all application problems. Numerical results show that significant savings can be achieved by adopting the proposed framework instead of other ad-hoc approaches.

Share

COinS