Abstract

The paper presents an innovative method for imputing missing data in historical vital statistics, focusing on parish records from two Prussian towns, Stargard and Słupsk. A range of simple statistical techniques—such as seasonal averages, moving medians, and trend-based extrapolations—were combined and aggregated using predictive models, including linear and polynomial regression as well as AI-based algorithms like Random Forest and Gradient Boosted Trees. Numerical experiments demonstrate the effectiveness of the approach, with AI models achieving particularly low relative error rates, often below 1%. This multi-method aggregation framework significantly improves the quality of historical datasets and supports demographic analysis over extended time periods, even when records are incomplete. The approach is easily adaptable to various datasets and scales.

Recommended Citation

Kiersztyn, A., Rachwał, P. & Kiersztyn, K. (2025). Filling the Gap in Time: Intelligent Imputation of Historical Parish RecordsIn I. Luković, S. Bjeladinović, B. Delibašić, D. Barać, N. Iivari, E. Insfran, M. Lang, H. Linger, & C. Schneider (Eds.), Empowering the Interdisciplinary Role of ISD in Addressing Contemporary Issues in Digital Transformation: How Data Science and Generative AI Contributes to ISD (ISD2025 Proceedings). Belgrade, Serbia: University of Gdańsk, Department of Business Informatics & University of Belgrade, Faculty of Organizational Sciences. ISBN: 978-83-972632-1-5. https://doi.org/10.62036/ISD.2025.52

Paper Type

Short Paper

DOI

10.62036/ISD.2025.52

Share

COinS
 

Filling the Gap in Time: Intelligent Imputation of Historical Parish Records

The paper presents an innovative method for imputing missing data in historical vital statistics, focusing on parish records from two Prussian towns, Stargard and Słupsk. A range of simple statistical techniques—such as seasonal averages, moving medians, and trend-based extrapolations—were combined and aggregated using predictive models, including linear and polynomial regression as well as AI-based algorithms like Random Forest and Gradient Boosted Trees. Numerical experiments demonstrate the effectiveness of the approach, with AI models achieving particularly low relative error rates, often below 1%. This multi-method aggregation framework significantly improves the quality of historical datasets and supports demographic analysis over extended time periods, even when records are incomplete. The approach is easily adaptable to various datasets and scales.