Abstract

This paper presents a process to enrich the web document representation in order to supply an information warehouse and allow more precise queries than the web search engines do. This information warehouse is stored in an objectoriented database (OODB) so that powerful set-based query languages can be used. One of the main contributions of the paper is the HTML document enrichment while supplying the warehouse. This enrichment is based on the document decomposition and on the components indexing. These processes take into account the logical and the hyperlinking structures as well as the appearance of the Web documents. A prototype has been developed using the OODBMS O2.

Share

COinS