Paper Type
Complete
Abstract
With the rising complexity of regulations and the growing volume of data, audit professionals are increasingly leveraging machine learning methods to analyze business processes, detect anomalies, and assess risks to improve audit effectiveness and efficiency. However, data privacy regulations and confidentiality constraints significantly limit access to real-world accounting data for training of machine learning methods. To address this challenge, this paper evaluates state-of-the-art tabular synthetic data generation methods by comparing their ability to create realistic accounting records based on a real-world dataset. We systematically evaluate these methods across four quality dimensions: fidelity, privacy, utility, and compliance with domain-specific constraints. Our results show that autoencoder-based generation methods generally outperform other generative approaches in this task. However, all methods struggle to fully capture certain accounting-specific constraints, such as complex relationships between journal entries. This paper explores potential strategies to mitigate these challenges.
Paper Number
1589
Recommended Citation
Kröger, Tobias and Schultz, Martin, "Generation of Synthetic Accounting Data in Audit: A Comparison of Tabular Generation Methods" (2025). AMCIS 2025 Proceedings. 4.
https://aisel.aisnet.org/amcis2025/acctinfosys/acctinfosys/4
Generation of Synthetic Accounting Data in Audit: A Comparison of Tabular Generation Methods
With the rising complexity of regulations and the growing volume of data, audit professionals are increasingly leveraging machine learning methods to analyze business processes, detect anomalies, and assess risks to improve audit effectiveness and efficiency. However, data privacy regulations and confidentiality constraints significantly limit access to real-world accounting data for training of machine learning methods. To address this challenge, this paper evaluates state-of-the-art tabular synthetic data generation methods by comparing their ability to create realistic accounting records based on a real-world dataset. We systematically evaluate these methods across four quality dimensions: fidelity, privacy, utility, and compliance with domain-specific constraints. Our results show that autoencoder-based generation methods generally outperform other generative approaches in this task. However, all methods struggle to fully capture certain accounting-specific constraints, such as complex relationships between journal entries. This paper explores potential strategies to mitigate these challenges.
When commenting on articles, please be friendly, welcoming, respectful and abide by the AIS eLibrary Discussion Thread Code of Conduct posted here.
Comments
SIGASYS