Fraudulent financial information made by public companies not only cause significant financial loss to broad shareholders but also result in a great loss of confidence to capital market. Conventional auditing practices, which primarily focus on statistical analysis of structured financial ratios in auditing process, work not so well with the presence of misleading financial reports. This research tries to tap the power of huge amount of largely ignored textual contents in financial statements. With the theoretical guidance of Systemic Functional Linguistics theory (SFL), we develop a systematic text analytic framework for financial statement fraud detection. Seven information types, i.e., topics, opinions, emotions, modality, personal pronouns, writing style, and genres are identified based on ideational, interpersonal, and textual metafunctions in SFL. Under the analytic framework, Latent Dirichlet Allocation algorithm, computational linguistics, term frequency-inverse document frequency method, are integrated to create a synergy for extracting both word-level and document-level features. All these features serve as the input of Liblinear Support Vector Machine classifier. Finally, with application to detect fraud in 1610 firm-year samples from U.S. listed companies, the analytic framework makes a classification with average accuracy at 82.36% under ten-fold cross validation, much better than baseline method using financial ratios.
Dong, Wei; Liao, Shaoyi; and Liang, Liang, "FINANCIAL STATEMENT FRAUD DETECTION USING TEXT MINING: A SYSTEMIC FUNCTIONAL LINGUISTICS THEORY PERSPECTIVE" (2016). PACIS 2016 Proceedings. 188.