Among the growing number of Chinese companies that went public overseas, many have been detected and alleged as conducting financial fraud by market research firms or U.S. Securities and Exchange Commission (SEC). Then investors lost money and even confidence to all overseas-listed Chinese companies. Likewise, these companies suffered serious stock sank or were even delisted from the stock exchange. Conventional auditing practices failed in these cases when misleading financial reports presented. This is partly because existing auditing practices and academic researches primarily focus on statistical analysis of structured financial ratios and market activity data in auditing process, while ignoring large amount of textual information about those companies in financial statements. In this paper, we build integrated language model, which combines statistical language model (SLM) and latent semantic analysis (LSA), to detect the strategic use of deceptive language in financial statements. By integrating SLM with LSA framework, the integrated model not only overcomes SLM’s inability to capture long-span information, but also extracts the semantic patterns which distinguish fraudulent financial statements from non-fraudulent ones. Four different modes of the integrated model are also studied and compared. With application to assess fraud risk in overseas-listed Chinese companies, the integrated model shows high accuracy to flag fraudulent financial statements.
Dong, Wei; Liao, Stephen Shaoyi; Fang, Bing; Cheng, Xian; Chen, Zhu; and Fan, Wenjie, "THE DETECTION OF FRAUDULENT FINANCIAL STATEMENTS: AN INTEGRATED LANGUAGE MODEL" (2014). PACIS 2014 Proceedings. 383.