Abstract
The widespread use of generative AI tools has significantly changed academic and professional writing, due to their ability to produce texts that mimic human writing styles. As a result, there are growing concerns about academic integrity, authorship, and the possible spread of misinformation. This study addresses the challenge of finding clear language features that can distinguish AI-generated texts from human-written ones, which is a gap that current detection tools have not resolved. Prior work shows that AI can produce coherent and context-relevant text by learning from large data sets and that features such as readability, lexical diversity, perplexity and burstiness, and sentiment are useful in detection, though results have been mixed. Our main goal is to determine which of these language features best predict AI authorship and to compare these machine-identified signals with the cues that human reviewers use. We analyze 100 mental health abstracts from 2022, published before the release of ChatGPT from OpenAI, and generate 100 additional abstracts using ChatGPT. We use a quantitative approach, using natural language processing methods such as readability, analytic writing index, lexical diversity, including measures like the measure of textual lexical diversity and type-token ratio, perplexity, burstiness, sentiment, common word groups, term frequency-inverse document frequency scores, voice usage, punctuation, and tone. These measures are then used to train a machine learning model to pick out the top predictors of AI-generated content. In addition, we will conduct a survey of 200 participants (expected) from Toronto Metropolitan University to collect ratings on abstract quality and ask participants to identify if each abstract was written by a human or generated by AI, along with background and AI usage information. We expect our analysis to show that AI-generated abstracts tend to have lower lexical diversity, simpler sentence structures, and lower perplexity, and that human reviewers will struggle to correctly identify AI-generated abstracts, especially when the differences are subtle. The findings add to our existing knowledge of the key language features that signal AI authorship and support the creation of better detection tools that combine machine analysis with human insight, ultimately helping to protect academic integrity and guide ethical authorship.
Recommended Citation
Fu, Karen and Yang, Xingwei, "Linguistic Markers of AI-Generated Text: A Comparative Analysis of Machine-Identified and Human-Inferred Predictors" (2025). AMCIS 2025 TREOs. 1.
https://aisel.aisnet.org/treos_amcis2025/1
Comments
tpp1027