Abstract

Named Entity Recognition (NER) refers to the computational task of identifying real-world entities in text documents. A research challenge is to use computational techniques to identify and utilize these entities to improve several NLP applications. In this paper, a method that clusters prominent names of people and organizations based on their semantic similarity in a text corpus is proposed. The method relies on common named entity recognition techniques and word embeddings models. Semantic similarity scores generated using word embeddings models for named entities are used to cluster similar entities of the people and organizations types. A human judge evaluated ten variations of the method after it was run on a corpus that consists of 4,821 articles on a specific topic. The performance of the method was measured using three quantitative measures. The results of these three metrics demonstrate that the method is effective in clustering semantically similar named entities.

Share

COinS
 

Clustering Prominent Named Entities in Topic-Specific Text Corpora

Named Entity Recognition (NER) refers to the computational task of identifying real-world entities in text documents. A research challenge is to use computational techniques to identify and utilize these entities to improve several NLP applications. In this paper, a method that clusters prominent names of people and organizations based on their semantic similarity in a text corpus is proposed. The method relies on common named entity recognition techniques and word embeddings models. Semantic similarity scores generated using word embeddings models for named entities are used to cluster similar entities of the people and organizations types. A human judge evaluated ten variations of the method after it was run on a corpus that consists of 4,821 articles on a specific topic. The performance of the method was measured using three quantitative measures. The results of these three metrics demonstrate that the method is effective in clustering semantically similar named entities.