Abstract

Data in an organisation often contains business secrets that organisations do not want to release. However, there are occasions when it is necessary for an organisation to release its data such as when outsourcing work or using the cloud for Data Quality (DQ) related tasks like data cleansing. Currently, there is no mechanism that allows organisations to release their data for DQ tasks while ensuring that it is suitably protected from releasing business related secrets. The aim of this paper is therefore to present our current progress on determining which methods are able to modify secret data and retain DQ problems. So far we have identified the ways in which data swapping and the SHA-2 hash function alterations methods can be used to preserve missing data, incorrectly formatted values, and domain violations DQ problems while minimising the risk of disclosing secrets.

Share

COinS
 

A Preliminary Study on Methods for Retaining Data Quality Problems in Automatically Generated Test Data

Data in an organisation often contains business secrets that organisations do not want to release. However, there are occasions when it is necessary for an organisation to release its data such as when outsourcing work or using the cloud for Data Quality (DQ) related tasks like data cleansing. Currently, there is no mechanism that allows organisations to release their data for DQ tasks while ensuring that it is suitably protected from releasing business related secrets. The aim of this paper is therefore to present our current progress on determining which methods are able to modify secret data and retain DQ problems. So far we have identified the ways in which data swapping and the SHA-2 hash function alterations methods can be used to preserve missing data, incorrectly formatted values, and domain violations DQ problems while minimising the risk of disclosing secrets.