Abstract

Document clustering is an intentional act that should reflect individuals’ preferences with regard to the semantic coherency or relevant categorization of documents and should conform to the context of a target task under investigation. Thus, effective documentclustering techniques need to take into account a user’s categorization context defined by or relevant to the target task under consideration. However, existing document-clustering techniques generally anchor in pure content-based analysis and therefore are not able to facilitate context-aware document-clustering. In response, we propose a Context-Aware document-Clustering (CAC) technique that takes into consideration a user’s categorization preference (expressed as a list of anchoring terms) relevant to the context of a target task and subsequently generates a set of document clusters from this specific contextual perspective. Our empirical evaluation results suggest that our proposed CAC technique outperforms the pure content-based document-clustering technique.

Share

COinS