Abstract

Multi-Structured documents (denoted MSDs) are documents whose structure is composed of a set of concurrent hierarchical structures. Many distinct structures may be defined simultaneously for the same original document (logical structure, physical structure). Each hierarchy analyses the text within the document by a different point of view, which depends on different use of that text. These structures may overlap over the document contents.

XML has become the most used language for encoding electronic documents. XML documents are tree based; and since there are overlapping between different structures, the hierarchy of a tree allows encoding a document depending on one structure.

Some applications need to consider more than one hierarchy over the same text, which corresponds to different analysis for different uses of that document. If several different structures should be represented, the solution that manages several different versions for same information is not only ineffective and expensive in time and resources, but does not allow, for example, a search for information relating to two different structures for the same document.

One of the distinguished solutions that addressed this problematic, is a generic model called Multi- Structure Document Model (MSDM), which is independent of any formalism of encoding. However MSDM is encoded by formalism called MultiX that uses XML syntax. MultiX could serialize the MSDM model into XML syntax and expresses the different structures and their correspondences in a single xml file. However it still has some complexity due to its respect to XML tree model.

In this paper, we will present how to encode MSDs depending on MSDM but by means of non-tree based data model (graph based). We will use Ontology Web Language (OWL) to represent the metadata that corresponds to XML schema in MultiX. To illustrate our work, we choose, as running example, an application of philology (science dedicated to the study of text history).The example is a fragment of an old manuscript written in Occitan language. Keywords: Multi-Structured documents, XML, MSDM, MultiX, OWL, encoding manuscripts.

Share

COinS