Abstract

This article presents a case study on the development of a biometric voice verification system for an intercom solution, utilizing the DeepSpeaker neural network architecture. Despite the variety of solutions available in the literature, there is a noted lack of evaluations for "text-independent" systems under real conditions and with varying distances between the speaker and the microphone. This article aims to bridge this gap. The study explores the impact of different types of parameterizations on network performance, the effects of signal augmentation, and the results obtained under conditions of low Signal-to-Noise Ratio (SNR) and reverberation. The findings indicate a significant need for further research, as they suggest substantial room for improvement.

Recommended Citation

Zaporowski, S., Górski, F. & Kotus, J. (2024). Developing a Low SNR Resistant, Text Independent Speaker Recognition System for Intercom Solutions - A Case Study. In B. Marcinkowski, A. Przybylek, A. Jarzębowicz, N. Iivari, E. Insfran, M. Lang, H. Linger, & C. Schneider (Eds.), Harnessing Opportunities: Reshaping ISD in the post-COVID-19 and Generative AI Era (ISD2024 Proceedings). Gdańsk, Poland: University of Gdańsk. ISBN: 978-83-972632-0-8. https://doi.org/10.62036/ISD.2024.38

Paper Type

Full Paper

DOI

10.62036/ISD.2024.38

Share

COinS
 

Developing a Low SNR Resistant, Text Independent Speaker Recognition System for Intercom Solutions - A Case Study

This article presents a case study on the development of a biometric voice verification system for an intercom solution, utilizing the DeepSpeaker neural network architecture. Despite the variety of solutions available in the literature, there is a noted lack of evaluations for "text-independent" systems under real conditions and with varying distances between the speaker and the microphone. This article aims to bridge this gap. The study explores the impact of different types of parameterizations on network performance, the effects of signal augmentation, and the results obtained under conditions of low Signal-to-Noise Ratio (SNR) and reverberation. The findings indicate a significant need for further research, as they suggest substantial room for improvement.