Start Date
10-12-2017 12:00 AM
Description
Many organizations are starting to make datasets, such as customer review data and service usage logs. To protect the privacy of involved individuals, these datasets are usually pseudonymized or anonymized before they are released. A method called k-anonymization is widely used in such open datasets. Recent literature showed that this method, however, can be unsafe and compromise individuals’ privacy. In this paper, we address this problem by analyzing the New York Citi Bike dataset. Through our analyses, we show that given some generalized and payload data, it is possible to recover other payload data of an individual in the k-anonymized dataset. We also demonstrate that it is possible to achieve a high success rate in re-identification of records. These findings shed additional light on the weakness of the k-anonymization method, thus evidencing a trade-off between data availability and privacy protection. We finally provide some implications for both academics and practitioners.
Recommended Citation
Pennarola, Ferdinando; Pistilli, Luca; and Chau, Michael, "Angels and Daemons: Is more Knowledge better than less Privacy? An Empirical Study on a K-anonymized openly available Dataset" (2017). ICIS 2017 Proceedings. 6.
https://aisel.aisnet.org/icis2017/Security/Presentations/6
Angels and Daemons: Is more Knowledge better than less Privacy? An Empirical Study on a K-anonymized openly available Dataset
Many organizations are starting to make datasets, such as customer review data and service usage logs. To protect the privacy of involved individuals, these datasets are usually pseudonymized or anonymized before they are released. A method called k-anonymization is widely used in such open datasets. Recent literature showed that this method, however, can be unsafe and compromise individuals’ privacy. In this paper, we address this problem by analyzing the New York Citi Bike dataset. Through our analyses, we show that given some generalized and payload data, it is possible to recover other payload data of an individual in the k-anonymized dataset. We also demonstrate that it is possible to achieve a high success rate in re-identification of records. These findings shed additional light on the weakness of the k-anonymization method, thus evidencing a trade-off between data availability and privacy protection. We finally provide some implications for both academics and practitioners.