Abstract

The growing importance of IT systems implies an increased demand for reliable data. Such data can be used for different purposes, including application testing, AI training, or domain query- ing. Existing tools struggle to generate realistic data consistent with the business rules of the domain under consideration. The paper proposes a data generation method based on ontology, which is treated as a source of domain knowledge description. Wordnet taxonomy supports the generation process by allowing the selection of appropriate external resources to create instance properties. An ontology reasoner is used to enrich generated properties. The proposed method has been implemented as a prototype tool capable of processing ontologies expressed in OWL 2. The tool tests showed that the generated data is complete and corrected within the supported set of constraints. Data realism depends on the domain definition, the provided sources of data, and the instrumentation of the generation process through configuration.

Recommended Citation

Hnatkowska, B., & Kimmel, M. (2023). Data Generation Based on Domain Ontology. In A. R. da Silva, M. M. da Silva, J. Estima, C. Barry, M. Lang, H. Linger, & C. Schneider (Eds.), Information Systems Development, Organizational Aspects and Societal Trends (ISD2023 Proceedings). Lisbon, Portugal: Instituto Superior Técnico. ISBN: 978-989-33-5509-1. https://doi.org/10.62036/ISD.2023.16

Paper Type

Full Paper

DOI

10.62036/ISD.2023.16

Share

COinS
 

Data Generation Based on Domain Ontology

The growing importance of IT systems implies an increased demand for reliable data. Such data can be used for different purposes, including application testing, AI training, or domain query- ing. Existing tools struggle to generate realistic data consistent with the business rules of the domain under consideration. The paper proposes a data generation method based on ontology, which is treated as a source of domain knowledge description. Wordnet taxonomy supports the generation process by allowing the selection of appropriate external resources to create instance properties. An ontology reasoner is used to enrich generated properties. The proposed method has been implemented as a prototype tool capable of processing ontologies expressed in OWL 2. The tool tests showed that the generated data is complete and corrected within the supported set of constraints. Data realism depends on the domain definition, the provided sources of data, and the instrumentation of the generation process through configuration.