Abstract

Recently, text mining has received special attention from both researchers and practitioners, since it enables the development of intelligent and automated services. Text mining has been influenced by different disciplines like computer science, statistics, computational linguistics and library and information sciences. However, text mining features that evolved in one particular discipline are often unknown or rarely used in the other disciplines. No scientific feature framework exits which facilitates costly feature engineering and evaluation. Therefore, we aim to develop a novel text mining feature taxonomy, which helps researchers and practitioners to develop, refine, compare and evaluate their text mining studies. In this research in progress paper, we focus on laying the foundation for our taxonomy development by presenting our first two research cycles. Here, we were aiming for diversity, not completeness. We derived five dimensions and classified different text features accordingly to provide a deeper understanding.

Share

COinS