Location
Hilton Hawaiian Village, Honolulu, Hawaii
Event Website
https://hicss.hawaii.edu/
Start Date
3-1-2024 12:00 AM
End Date
6-1-2024 12:00 AM
Description
In this paper, we focus on defending against adversarial attacks for privacy-preserving Natural Language Processing (NLP) under a model partitioning scenario, where the model splits into a local, on-device part and a remote, cloud-based part. Model partitioning improves the scalability and protects the privacy of inputs into the model. However, we argue that privacy protection breaks during inference with model partitioning. In this paper, an adversary eavesdrops on the hidden representations output from the local devices and tries to use the representations to obtain private information from the input text. We study two types of adversarial attacks, i.e., adversarial classification and adversarial generation. Based on these two attack models, we correspondingly propose two defenses: defending the adversarial classification (DAC) and defending the adversarial generation (DAG). Specifically, the DAC and DAG approaches are both bilevel optimization-based defense methods. Both methods optimally modify a subpopulation of the neural representations that are subject to maximally decreasing the adversary’s ability. The representations trained with this bilevel optimization protect sensitive information from the adversary attack while maintaining their utility for downstream tasks. Our experiments show that both DAC and DAG approaches improve the performance of the main text classifier and achieve even higher privacy of neural representations compared with the current state-of-the-art methods.
Recommended Citation
Zhan, Huixin; Zhang, Kun; Chen, Zhong; and Sheng, Victor, "Defense Against Adversarial Attacks for Neural Representations of Text" (2024). Hawaii International Conference on System Sciences 2024 (HICSS-57). 4.
https://aisel.aisnet.org/hicss-57/st/threat_hunting/4
Defense Against Adversarial Attacks for Neural Representations of Text
Hilton Hawaiian Village, Honolulu, Hawaii
In this paper, we focus on defending against adversarial attacks for privacy-preserving Natural Language Processing (NLP) under a model partitioning scenario, where the model splits into a local, on-device part and a remote, cloud-based part. Model partitioning improves the scalability and protects the privacy of inputs into the model. However, we argue that privacy protection breaks during inference with model partitioning. In this paper, an adversary eavesdrops on the hidden representations output from the local devices and tries to use the representations to obtain private information from the input text. We study two types of adversarial attacks, i.e., adversarial classification and adversarial generation. Based on these two attack models, we correspondingly propose two defenses: defending the adversarial classification (DAC) and defending the adversarial generation (DAG). Specifically, the DAC and DAG approaches are both bilevel optimization-based defense methods. Both methods optimally modify a subpopulation of the neural representations that are subject to maximally decreasing the adversary’s ability. The representations trained with this bilevel optimization protect sensitive information from the adversary attack while maintaining their utility for downstream tasks. Our experiments show that both DAC and DAG approaches improve the performance of the main text classifier and achieve even higher privacy of neural representations compared with the current state-of-the-art methods.
https://aisel.aisnet.org/hicss-57/st/threat_hunting/4