Abstract

Information extraction systems that remember only novel information (facts that differ semantically from those previously extracted) can be used to build lean knowledge bases fed from multiple, possibly overlapping sources. In previous research by the authors, natural language processing techniques were used to build a system to extract financial facts from international corporate reports of the Wall Street Journal. We will enhance that system to extract the same types of financial facts from a second source of corporate financial reports: Reuters. The improved system will provide more generality through its ability to extract from multiple sources rather than just one. In addition, it will provide novelty filtering of extracted information, admitting only novel facts into the database, while remembering all sources that a redundant fact came from.

Share

COinS