Abstract

Many organizations publish anonymous medical data for sociology research, health research, education and other useful studies. Although attributes that clearly identify individuals, such as name and certain personal identity numbers are removed, the combination of some other information, like the date of birth, gender, post-code etc. can still be used to identify an individual. Existing data perturbation techniques are able to de-identify the data prior to publishing, but they suffer from making the process irreversible, so that the original data cannot be fully recovered. How to maintain the usability and utility of privacy-protected data as well as make the published data restorable for authorized users is a major issue. In this paper, we propose a novel robust data perturbation algorithm that can withstand brute force attacks, while the perturbed data pattern is indistinguishable from the original data pattern. A distinguishing feature of our data perturbation method is that, using fractal theory to derive perturbation vectors, it provides high privacy protection together with fully reversible data perturbation while maintaining maximal data utility. Experiments based on practical data confirm the desired operation of our data perturbation algorithm and its effectiveness. The results obtained from our experiments leads us to conclude that the proposed approach is able to computationally resist brute-force attacks as well as maintain the same data distribution type as that of original data.

Share

COinS