Abstract

Locating specific and structured information in the World Wide Web (WWW) is becoming increasingly difficult, because of the rapid growth of the Web and the distributed nature of information. Although existing search engines do a good job in ranking web pages based on topical relevance, they provide limited assistance for free-choice learners to leverage the nonlinear nature of information spaces for knowledge acquisition. We hypothesize that free-choice learners would benefit more from structured topical information spaces than a list of individual pages across multiple websites. We conceptualize a within-site topical information space as a sphere formed by linked pages centering on a web page. In this paper, we investigate techniques and heuristics to form the space. In particular, we propose a hybrid method that relies on not only content-based characteristics and user queries, but also a site's global structure. Experimental results show that consideration of website topology provides good improvement to page relevance estimation, indicating the clustering tendency of relevant pages.

Share

COinS