Location

Hilton Hawaiian Village, Honolulu, Hawaii

Event Website

https://hicss.hawaii.edu/

Start Date

3-1-2024 12:00 AM

End Date

6-1-2024 12:00 AM

Description

In this paper, we focus on defending against adversarial attacks for privacy-preserving Natural Language Processing (NLP) under a model partitioning scenario, where the model splits into a local, on-device part and a remote, cloud-based part. Model partitioning improves the scalability and protects the privacy of inputs into the model. However, we argue that privacy protection breaks during inference with model partitioning. In this paper, an adversary eavesdrops on the hidden representations output from the local devices and tries to use the representations to obtain private information from the input text. We study two types of adversarial attacks, i.e., adversarial classification and adversarial generation. Based on these two attack models, we correspondingly propose two defenses: defending the adversarial classification (DAC) and defending the adversarial generation (DAG). Specifically, the DAC and DAG approaches are both bilevel optimization-based defense methods. Both methods optimally modify a subpopulation of the neural representations that are subject to maximally decreasing the adversary’s ability. The representations trained with this bilevel optimization protect sensitive information from the adversary attack while maintaining their utility for downstream tasks. Our experiments show that both DAC and DAG approaches improve the performance of the main text classifier and achieve even higher privacy of neural representations compared with the current state-of-the-art methods.

Share

COinS
 
Jan 3rd, 12:00 AM Jan 6th, 12:00 AM

Defense Against Adversarial Attacks for Neural Representations of Text

Hilton Hawaiian Village, Honolulu, Hawaii

In this paper, we focus on defending against adversarial attacks for privacy-preserving Natural Language Processing (NLP) under a model partitioning scenario, where the model splits into a local, on-device part and a remote, cloud-based part. Model partitioning improves the scalability and protects the privacy of inputs into the model. However, we argue that privacy protection breaks during inference with model partitioning. In this paper, an adversary eavesdrops on the hidden representations output from the local devices and tries to use the representations to obtain private information from the input text. We study two types of adversarial attacks, i.e., adversarial classification and adversarial generation. Based on these two attack models, we correspondingly propose two defenses: defending the adversarial classification (DAC) and defending the adversarial generation (DAG). Specifically, the DAC and DAG approaches are both bilevel optimization-based defense methods. Both methods optimally modify a subpopulation of the neural representations that are subject to maximally decreasing the adversary’s ability. The representations trained with this bilevel optimization protect sensitive information from the adversary attack while maintaining their utility for downstream tasks. Our experiments show that both DAC and DAG approaches improve the performance of the main text classifier and achieve even higher privacy of neural representations compared with the current state-of-the-art methods.

https://aisel.aisnet.org/hicss-57/st/threat_hunting/4