Paper Type

Complete

Abstract

With the rising complexity of regulations and the growing volume of data, audit professionals are increasingly leveraging machine learning methods to analyze business processes, detect anomalies, and assess risks to improve audit effectiveness and efficiency. However, data privacy regulations and confidentiality constraints significantly limit access to real-world accounting data for training of machine learning methods. To address this challenge, this paper evaluates state-of-the-art tabular synthetic data generation methods by comparing their ability to create realistic accounting records based on a real-world dataset. We systematically evaluate these methods across four quality dimensions: fidelity, privacy, utility, and compliance with domain-specific constraints. Our results show that autoencoder-based generation methods generally outperform other generative approaches in this task. However, all methods struggle to fully capture certain accounting-specific constraints, such as complex relationships between journal entries. This paper explores potential strategies to mitigate these challenges.

Paper Number

1589

Author Connect URL

https://authorconnect.aisnet.org/conferences/AMCIS2025/papers/1589

Comments

SIGASYS

Author Connect Link

Share

COinS
 
Aug 15th, 12:00 AM

Generation of Synthetic Accounting Data in Audit: A Comparison of Tabular Generation Methods

With the rising complexity of regulations and the growing volume of data, audit professionals are increasingly leveraging machine learning methods to analyze business processes, detect anomalies, and assess risks to improve audit effectiveness and efficiency. However, data privacy regulations and confidentiality constraints significantly limit access to real-world accounting data for training of machine learning methods. To address this challenge, this paper evaluates state-of-the-art tabular synthetic data generation methods by comparing their ability to create realistic accounting records based on a real-world dataset. We systematically evaluate these methods across four quality dimensions: fidelity, privacy, utility, and compliance with domain-specific constraints. Our results show that autoencoder-based generation methods generally outperform other generative approaches in this task. However, all methods struggle to fully capture certain accounting-specific constraints, such as complex relationships between journal entries. This paper explores potential strategies to mitigate these challenges.

When commenting on articles, please be friendly, welcoming, respectful and abide by the AIS eLibrary Discussion Thread Code of Conduct posted here.