Abstract

Recurrence after lung cancer treatment is seldom recorded in population registries, limiting their value for research and decision-support. This study introduces and externally validates a survival-based information systems framework that reconstructs recurrence trajectories from heterogeneous clinical annotations and cause-of-death data. The approach was applied to The Cancer Genome Atlas (TCGA), which includes explicit recurrence fields, enabling validation against observed outcomes. Stage-specific trajectories were estimated using Kaplan–Meier survival analysis and Aalen–Johansen competing-risk functions. Results showed close concordance with ground-truth annotations, with absolute differences consistently within ±5% at two-, three-, and five-year horizons. Clinically, the framework provides calibrated recurrence probabilities to support surveillance and patient counselling. From an information systems perspective, it delivers a reproducible artefact that standardises fragmented registry data, reconstructs missing endpoints, and generates analysis-ready outputs for predictive modelling and decision-support systems.

Share

COinS