This paper describes our recent efforts in developing a text segmentation technique in our business document management system. The document analysis is based upon a knowledge-based analysis of the documents’ contents, by automating the coherence identification process, without a full semantic understanding. In the technique, document boundaries can be identified by observing the shifts of segments from one cluster to another. Our experimental results show that the combination of the heterogeneous knowledge is capable to address the topic shifts. Given the increasing recognition of document structure in the fields of information retrieval as well as knowledge management, this approach provides a quantitative model and automatic classification of documents in a business document management system. This will beneficial to the distribution of documents or automatic launching of business processes in a workflow management system.
Chan, Samuel W.K., "Coherence Identification of Business Documents: Towards an Automated Message Processing System" (2001). ICEB 2001 Proceedings. 166.