Description

Lexicons are dictionaries of sentiment words and their matching polarity. Some comprise words that are numerically scored based on the degree of positivity/negativity of the underlying sentiments. The ranges of scores differ since each lexicon has its own scoring process. Others use labelled words instead of scores with polarity tags (i.e., positive/negative/neutral). Lexicons are important in text mining and sentiment analysis which compels researchers to develop and publish them. Larger lexicons better train sentiment models thereby classifying sentiments in text more accurately. Hence, it is useful to combine the various available lexicons. Nevertheless, there exist many duplicates, overlaps and contradictions between these lexicons. In this paper, we define a method to combine different lexicons. We used the method to normalize and unify lexicon items and merge duplicated lexicon items from twelve lexicons for (in)formal Arabic. This resulted in a coherent Arabic sentiment lexicon with the largest number of terms.

Share

COinS
 
Aug 10th, 12:00 AM

Combining Sentiment Lexicons of Arabic Terms

Lexicons are dictionaries of sentiment words and their matching polarity. Some comprise words that are numerically scored based on the degree of positivity/negativity of the underlying sentiments. The ranges of scores differ since each lexicon has its own scoring process. Others use labelled words instead of scores with polarity tags (i.e., positive/negative/neutral). Lexicons are important in text mining and sentiment analysis which compels researchers to develop and publish them. Larger lexicons better train sentiment models thereby classifying sentiments in text more accurately. Hence, it is useful to combine the various available lexicons. Nevertheless, there exist many duplicates, overlaps and contradictions between these lexicons. In this paper, we define a method to combine different lexicons. We used the method to normalize and unify lexicon items and merge duplicated lexicon items from twelve lexicons for (in)formal Arabic. This resulted in a coherent Arabic sentiment lexicon with the largest number of terms.