Abstract

Data lineage is the set of techniques for tracking the flow of data throughout its lifecycle. These techniques are crucial for data management, governance, and compliance with regulations. Lineage links are maintained between data and database objects, but they are often broken by temporary objects and user defined functions. To the best of our knowledge, discovering broken lineage links has not been addressed yet in research. In this paper, we present a method for detecting broken lineage links between database objects. To this end we apply machine learning techniques on available metadata. We extract feature vectors and employ a classification approach to determine whether one database object is a source for another. Initial experiments on large database schemas show that the discovery of broken lineage links is possible at an acceptably high probability.

Recommended Citation

Boiński, P., Andrzejewski, W., Grocholewski, M., Gruszczyński, T. & Wrembel, R. (2025). Leveraging machine learning techniques for discovering broken lineage links between database objectsIn I. Luković, S. Bjeladinović, B. Delibašić, D. Barać, N. Iivari, E. Insfran, M. Lang, H. Linger, & C. Schneider (Eds.), Empowering the Interdisciplinary Role of ISD in Addressing Contemporary Issues in Digital Transformation: How Data Science and Generative AI Contributes to ISD (ISD2025 Proceedings). Belgrade, Serbia: University of Gdańsk, Department of Business Informatics & University of Belgrade, Faculty of Organizational Sciences. ISBN: 978-83-972632-1-5. https://doi.org/10.62036/ISD.2025.109

Paper Type

Short Paper

DOI

10.62036/ISD.2025.109

Share

COinS
 

Leveraging machine learning techniques for discovering broken lineage links between database objects

Data lineage is the set of techniques for tracking the flow of data throughout its lifecycle. These techniques are crucial for data management, governance, and compliance with regulations. Lineage links are maintained between data and database objects, but they are often broken by temporary objects and user defined functions. To the best of our knowledge, discovering broken lineage links has not been addressed yet in research. In this paper, we present a method for detecting broken lineage links between database objects. To this end we apply machine learning techniques on available metadata. We extract feature vectors and employ a classification approach to determine whether one database object is a source for another. Initial experiments on large database schemas show that the discovery of broken lineage links is possible at an acceptably high probability.