Abstract

Language serves not just as a medium for transmitting information but also embodies the subtleties of human thoughts, culture, and emotions. The selection of words, sentence formation, and narrative style all provide insights into the author's intentions, biases, and the credibility of the content presented (Molina et al., 2021). Fake news articles, for instance, often utilize specific pronouns, adjectives, and verbs more frequently (Grieve & Woodfield, 2023). The task of analyzing semantic characteristics in a large corpus of text through advanced natural language processing (NLP) techniques is challenging, especially in categorizing the rapid spread of fake news online. This study investigates how linguistic elements such as parts of speech, sentiment, and subjectivity among others can help differentiate fake news from genuine articles. An initial review of 6,892 political articles, with 32% identified as fake, revealed certain patterns that could assist in predicting fake news. For example, articles expressing negative sentiments and using a higher proportion of adjectives tend to be fake. Moreover, a one percentage point increase in the share of unique verbs raises the likelihood of an article being fake by 78%. The presence of grammatical errors is also strongly linked to the likelihood of an article being fake. Furthermore, using topic modeling techniques (Ahammad, 2024), the study found variability in how different linguistic features correlate with the presence of fake news. This preliminary evidence underscores the need for further research in this field and suggests that deploying large language models could be effective in extracting key linguistic features associated with fake news, thereby aiding in its early and efficient detection online.

Share

COinS