Abstract

The research involved creating synthetic samples to enrich the training set and improve classification performance. Data generation was a key element of the biometrics gait system based on wearable sensors. The aim of the study was to investigate which parameters of the Long short-term memory–Mixture Density Networks (LSTM–MDN) models would provide the greatest increase in recognition metrics. Validation was conducted for normalized and non-normalized data for a large 100-person dataset. In the first case, the use of synthetic data from VAE-type generative models increased the F1-score from 0.754 to 0.776, while for proposed architectures increased metrics to 0.789. For normalized data, VAE-based models worsened recognition performance. Whereas the proposed model increased the F1-score from a baseline of 0.928 to 0.966. The conducted experiments indicate that generating synthetic data based on MDN models is more profitable in the cases of distribution shift between training and testing set.

Recommended Citation

Sawicki, A., Khalid, S. & Walendziuk, W. (2024). Generation Of Synthetic Data for Behavioral Gait Biometrics. In B. Marcinkowski, A. Przybylek, A. Jarzębowicz, N. Iivari, E. Insfran, M. Lang, H. Linger, & C. Schneider (Eds.), Harnessing Opportunities: Reshaping ISD in the post-COVID-19 and Generative AI Era (ISD2024 Proceedings). Gdańsk, Poland: University of Gdańsk. ISBN: 978-83-972632-0-8. https://doi.org/10.62036/ISD.2024.56

Paper Type

Short Paper

DOI

10.62036/ISD.2024.56

Share

COinS
 

Generation Of Synthetic Data for Behavioral Gait Biometrics

The research involved creating synthetic samples to enrich the training set and improve classification performance. Data generation was a key element of the biometrics gait system based on wearable sensors. The aim of the study was to investigate which parameters of the Long short-term memory–Mixture Density Networks (LSTM–MDN) models would provide the greatest increase in recognition metrics. Validation was conducted for normalized and non-normalized data for a large 100-person dataset. In the first case, the use of synthetic data from VAE-type generative models increased the F1-score from 0.754 to 0.776, while for proposed architectures increased metrics to 0.789. For normalized data, VAE-based models worsened recognition performance. Whereas the proposed model increased the F1-score from a baseline of 0.928 to 0.966. The conducted experiments indicate that generating synthetic data based on MDN models is more profitable in the cases of distribution shift between training and testing set.