Clustering over categorical attributes is an important yet tough task. In this paper, we present a new algorithm K-meansⅡ to extend the famous K-means algorithm which is efficient only on numerical clustering, by using new cluster center definitions and new similarity measures. Thus, our algorithm can be used in categorical clustering while preserving the efficiency. Experiments on both real-life datasets and synthetic datasets show that the K-meansⅡ algorithm can produce high quality results and deserve good scalability at the same time.
Tang, Chunbin and Zhao, Weidong, "A New Clustering Algorithm for Categorical Attributes" (2004). ICEB 2004 Proceedings. 219.