Abstract

How are IT concepts related to each other, what is the best way to automatically detect these relationships, and how do such automatic methods compare with traditional methods? We address each of these questions by developing and evaluating two statistical natural language processing methods: co-occurrence and Kullback-Liebler (KL) divergence when used in combination with hierarchical clustering. The results of these automatic methods were then compared to a ground truth classification scheme using statistical methods as well as a survey of IT experts. Co-occurrence outperformed KL divergence according to both the statistical and survey results. Further, co-occurrence had some benefits in comparison to the ground truth, and was preferred by some of the experts included in the survey. The main contribution of this research is the demonstration that automatic methods can be used effectively to classify IT concepts, and that success does not always depend on the complexity of the methods.

Share

COinS
 

Evaluating Two Automatic Methods for Classifying Information Technology Concepts

How are IT concepts related to each other, what is the best way to automatically detect these relationships, and how do such automatic methods compare with traditional methods? We address each of these questions by developing and evaluating two statistical natural language processing methods: co-occurrence and Kullback-Liebler (KL) divergence when used in combination with hierarchical clustering. The results of these automatic methods were then compared to a ground truth classification scheme using statistical methods as well as a survey of IT experts. Co-occurrence outperformed KL divergence according to both the statistical and survey results. Further, co-occurrence had some benefits in comparison to the ground truth, and was preferred by some of the experts included in the survey. The main contribution of this research is the demonstration that automatic methods can be used effectively to classify IT concepts, and that success does not always depend on the complexity of the methods.