Abstract

Information extraction is a process of extracting relevant data in a specified structured format from semi-structured and unstructured data sources. Extracting information from a collection of unstructured documents allowing reasonable range of fault tolerance is a challenging problem. Existing methodology includes statistical training methods that require enormous training time, and which yield trained models biased to erroneous data. To avoid these weaknesses, we propose a similarity maximization methodology that requires a very small amount of human coding and uses an integer-programming (IP) framework to extract appropriate information.

Recommended Citation

Sheikh, Mahmudul and Conlon, Sumali, "Extracting Information from a Domain of Unstructured Data" (2005). AMCIS 2005 Proceedings. 26.
https://aisel.aisnet.org/amcis2005/26

Download

COinS

AMCIS 2005 Proceedings

Extracting Information from a Domain of Unstructured Data

Abstract

Recommended Citation

Search

Links

Browse

Author Corner

Links

AMCIS 2005 Proceedings

Extracting Information from a Domain of Unstructured Data

Authors

Abstract

Recommended Citation

Share

Search

Links

Browse

Author Corner

Links