Abstract

The rapidly-growing organizational data resources introduce a growing difficulty to locate and understand the relevant data subsets within large datasets – what can be seen as a severe information quality issue in today's decision-support environments. The study proposes a quantitative methodology, based on the mutual-information metric, for assessing the relative importance of different data subsets within a large dataset. Such assessments can grant the end-user with faster access to relevant subsets within a large dataset, the ability to better understandits contents, and gain deeper insights from analyzing it – e.g., when such a dataset is being used for Business Intelligence (BI) applications. This manuscript provides the background and the motivation for integrating the proposed assessments of relative importance. It then defines the calculations behind the mutualinformation metric, and demonstrates their applications using illustrative examples.

Share

COinS