Abstract

For presenting better public service to the citizen, there is an increasing need to share individual data. Such data sharing must preserve data privacy without disclosing any information that can be used to identify any individual person. In big data context, though there seems to be considerable progress in the development of e-government initiatives through big data sharing, it remains skeptical that citizens will embrace the use of such services. Privacy preservation is an important challenge when big private data are analyzed by individuals to organizations. A considerable amount of research has done formalizing data privacy techniques, whereby most of these techniques deal with either structured datasets or limiting the size of unstructured datasets. Moreover, current privacy techniques are inadequate to maintain the proper trade-off between data privacy and data utility. In this article, we attempt to fill the above gaps and present a framework for de-identifying citizen data that are heterogeneous. Moreover, our proposed framework is enabled to handle big data privacy. It consists of three important modules: a) Big data collection, b) Information extraction, and c) Anonymization module. We deploy a conditional random field (CRF) classifier for extracting identifying attributes, and k-anonymization technique for de-identifying the extracted data through minimal generalization and suppression. We also present a set of preliminary experimental results showing the effectiveness of our proposed framework.

Share

COinS