Abstract

In the context of products employing speech recognition, where the speech signal is sent from the device to centralized servers that process data, or simply products that involve data storage on servers, privacy for audio data is an important issue, just as it is for other types of data. Ignoring privacy has consequences for both, speakers (information leaks) and server administrators (legal issues). In this paper, we propose a speaker de-identification solution based on sound processing, altering voice characteristics, along with an API. Our solution consisting of pitch shift and noise mix (the latter is an optional augmentation method) has a great speaker de-identification performance, without an important loss in terms of word intelligibility. It is worth mentioning that sometimes the recordings may not be easy to understand in the initial (i.e., not de-identified) form, due to the speaker’s pronunciation, talking speed, and other related factors.

Recommended Citation

Costandache, M. A., Iftene, A., & Gifu, D. (2021). A Speaker De-Identification System Based on Sound Processing. In E. Insfran, F. González, S. Abrahão, M. Fernández, C. Barry, H. Linger, M. Lang, & C. Schneider (Eds.), Information Systems Development: Crossing Boundaries between Development and Operations (DevOps) in Information Systems (ISD2021 Proceedings). Valencia, Spain: Universitat Politècnica de València.

Paper Type

Short Paper

Share

COinS
 

A Speaker De-Identification System Based on Sound Processing

In the context of products employing speech recognition, where the speech signal is sent from the device to centralized servers that process data, or simply products that involve data storage on servers, privacy for audio data is an important issue, just as it is for other types of data. Ignoring privacy has consequences for both, speakers (information leaks) and server administrators (legal issues). In this paper, we propose a speaker de-identification solution based on sound processing, altering voice characteristics, along with an API. Our solution consisting of pitch shift and noise mix (the latter is an optional augmentation method) has a great speaker de-identification performance, without an important loss in terms of word intelligibility. It is worth mentioning that sometimes the recordings may not be easy to understand in the initial (i.e., not de-identified) form, due to the speaker’s pronunciation, talking speed, and other related factors.