Abstract

Many studies show that named entities are closely related to users' search behaviors, which brings increasing interest in studying named entities in search logs recently. This paper addresses the problem of forming fine grained semantic clusters of named entities within a broad domain such as “company”, and generating keywords for each cluster, which help users to interpret the embedded semantic information in the cluster. By exploring contexts, URLs and session IDs as features of named entities, a three-phase approach proposed in this paper first disambiguates named entities according to the features. Then it properly weights the features with a novel measurement, calculates the semantic similarity between named entities with the weighted feature space, and clusters named entities accordingly. After that, keywords for the clusters are generated using a text-oriented graph ranking algorithm. Each phase of the proposed approach solves problems that are not addressed in existing works, and experimental results obtained from a real click through data demonstrate the effectiveness of the proposed approach.

Share

COinS