Paper Type

Complete Research Paper

Description

Data quality is a critical issu in scientific databases since the reliability of empirical data can have a major impact on the formation of scientific theories and policy decisions. Yet while several conceptual frameworks for data quality have been proposed, there is still a lack of general tools and metrics to measure and control the quality of empirical data in practice. As a first step in this direction, we carried out a detailed study of data quality requirements in a system designed to support food scientists by managing data about food composition. Our users included system designers and developers as well as food compilers and project managers. In addition to determining which dimensions of data quality specified in existing conceptual frameworks users consider important in assessing the reliability of data, we also asked users to assess the importance of various criteria related specifically to empirical data. These factors were based around the four steps typical in the life-cycle of empirical data, namely sampling, analysis, data acquisition and data processing. Another novel feature of our study was to investigate not only the different dimensions of data quality considered to be important but also how this depends on the role of users.

Share

COinS
 

A STUDY OF DATA QUALITY REQUIREMENTS FOR EMPIRICAL DATA IN THE FOOD SCIENCES

Data quality is a critical issu in scientific databases since the reliability of empirical data can have a major impact on the formation of scientific theories and policy decisions. Yet while several conceptual frameworks for data quality have been proposed, there is still a lack of general tools and metrics to measure and control the quality of empirical data in practice. As a first step in this direction, we carried out a detailed study of data quality requirements in a system designed to support food scientists by managing data about food composition. Our users included system designers and developers as well as food compilers and project managers. In addition to determining which dimensions of data quality specified in existing conceptual frameworks users consider important in assessing the reliability of data, we also asked users to assess the importance of various criteria related specifically to empirical data. These factors were based around the four steps typical in the life-cycle of empirical data, namely sampling, analysis, data acquisition and data processing. Another novel feature of our study was to investigate not only the different dimensions of data quality considered to be important but also how this depends on the role of users.