Abstract

Managerial financial fraud is estimated in the billions of dollars annually in the United States. Since fraud includes obfuscation, misdirection, and fabrication, this study proposes using deception theory as a means of detecting fraud in textual portions of financial statements (10K). A corpus of 101 fraudulent 10Ks was collected from the Securities and Exchange Commission along with 101 matching non-fraudulent 10Ks. Natural Language Processing techniques were applied to the corpus to generate raw counts and usage rates of hedging devices: hedging modal verbs, hedging adjectives, hedging adverbs, hedging conjunctions, hedging nouns, and hedging lexical verbs. A classification model, based on logistic regression, successfully discriminates with 69.3% accuracy and accounts for nearly 20% of the observed variance. Two machinelearning algorithms are investigated. Bayesian Network and JRip achieve accuracy results of 62.4% and 67.8% respectively. Both results are better than chance or of human deception detection suggesting the possibility of a diagnostic tool for auditors.

Share

COinS