Abstract

With the pervasion of digital textual data, text mining is becoming more and more important to deriving competitive advantages. One factor for successful text mining applications is the ability of finding significant topical terms for discovering interesting patterns or relationships. Document keyphrases are phrases carrying the most important topical concepts for a given document. In many applications, keyphrases as textual elements are better suited for text mining and could provide more discriminating power than single words. This paper describes an automatic keyphrase identification program (KIP). KIP’s algorithm examines the composition of noun phrases and calculates their scores by looking up a domain-specific glossary database; the ones with higher scores are extracted as keyphrases. KIP’s learning function can enrich its glossary database by automatically adding new identified keyphrases. KIP’s personalization feature allows the user build a glossary database specifically suitable for the area of his/her interest.

Share

COinS