Abstract
Individuals are becoming increasingly concerned with privacy. This curtails their willingness to share sensitive attributes like age, gender or personal preferences; yet firms largely rely upon customer data in any type of predictive analytics. Hence, organizations are confronted with a dilemma in which they need to make a tradeoff between a sparse use of data and the utility from better predictive analytics. This paper proposes a masking mechanism that obscures sensitive attributes while maintaining a large degree of predictive power. More precisely, we efficiently identify data partitions that are best suited for (i) shuffling, (ii) swapping and, as a form of randomization, (iii) perturbing attributes by conditional replacement. By operating on data partitions that are derived from a predictive algorithm, we achieve the objective of masking privacy-sensitive attributes with marginal downsides for predictive modeling. The resulting trade-off between masking and predictive utility is empirically evaluated in the context of customer churn where, for instance, a stratified shuffling of attribute values impedes predictive accuracy rarely by more than a percentage point. Our proposed framework entails direct managerial implications as a growing share of firms adopts predictive analytics and thus requires mechanisms that better adhere to user demands for information privacy.
Recommended Citation
Banholzer, Nicolas and Feuerriegel, Stefan, "The misty crystal ball: Efficient concealment of privacy-sensitive attributes in predictive analytics" (2018). WISP 2018 Proceedings. 4.
https://aisel.aisnet.org/wisp2018/4