Privacy-preserving Data clustering in Cloud Computing based on Fully Homomorphic Encryption
Cloud infrastructure with its massive storage and computing power is an ideal platform to perform large scale data analysis tasks to extract knowledge and support decision-making. However, there are critical data privacy and security issues associated with this platform, as the data is stored in a public infrastructure that can be accessed by cloud service providers or other potentially malicious intruders. Recently, fully homomorphic data encryption has been proposed as a solution due to its capabilities in performing computations over encrypted data. However, it is demonstrably slow for practical data mining applications. To address this and related concerns, we introduce a fully homomorphic and distributed data processing framework that utilizes MapReduce to perform distributed computations for data clustering tasks on a large number of cloud Virtual Machines (VMs). We illustrate how a variety of fully homomorphic-based computations can be carried out to accomplish data clustering tasks independently in the cloud without the need to interact with data owners or any Trusted Third Party (TTP) and show that the distributed execution of data clustering tasks based on MapReduce can significantly reduce the execution time overhead caused by fully homomorphic computations. To evaluate our framework, we performed experiments using electricity consumption measurement data on the Google cloud platform with 100 VMs. We found the proposed distributed data processing framework to be highly efficient when compared to a centralized approach and as accurate as a plaintext implementation.