Abstract
Genealogical research relies on historical records, most of which are digitized as images without transcription or indexing. This makes the search process slow, manual, and prone to errors. The RecordIndex project proposes an automated solution for the transcription of handwritten genealogical documents by combining Handwritten Text Recognition (HTR) with Large Language Models (LLMs). The methodology includes the collection of historical records in Portuguese, the generation of preliminary transcriptions using Transkribus, HTR model training with PyLaia, and post-processing using local LLMs. Text similarity techniques were also applied to group complete records and preserve the document structure. Testing showed that the system is effective in organizing and improving the readability of data, despite limitations in the transcription of individual names. The results highlight the tool’s potential for applications in genealogy, history, and document preservation.
Recommended Citation
Gleizer, Alan Meniuk and de Oliveira, Ivan Carlos Alcântara, "RecordIndex: Um Sistema De Transcrição Automatizada De Registros Genealógicos Manuscritos" (2025). ISLA 2025 Proceedings. 25.
https://aisel.aisnet.org/isla2025/25