Track 01: Artificial Intelligence and Machine Learning

From News to Data: A Hybrid LLM Pipeline for Robust Person-Centric Information Extraction

Paper Type

Complete

Paper Number

PACIS2026-2135

Description

Analyzing media representation requires scalable data extraction, but proprietary LLMs pose data governance and cost risks, limiting adoption in journalism. We address this by developing and evaluating a locally deployable, hybrid LLM pipeline using small, open-source models to extract attributes such as occupations, quotes, and demographics. We test this architecture on 200 human-annotated German news articles from a 100,000-article corpus, spanning diverse genres such as interviews, reports, and reviews. We compare its performance against monolithic, single-prompt LLM baselines such as GPT-4o. Results show the hybrid pipeline significantly outperforms all baselines, solving the critical recall deficits of single-prompt methods. It remains robust to genre variations while baseline performance degrades on narrative texts. This research demonstrates that strategic task decomposition within a local LLM pipeline yields superior extraction performance, establishing a highly accurate and governable alternative to commercial LLMs.

Comments

01-AIML

Recommended Citation

Yu, Joe; Born, Nadja; and Treffers, Theresa, "From News to Data: A Hybrid LLM Pipeline for Robust Person-Centric Information Extraction" (2026). PACIS 2026 Proceedings. 21.
https://aisel.aisnet.org/pacis2026/ai_ml/ai_ml/21

Download

COinS

Jul 5th, 12:00 AM

From News to Data: A Hybrid LLM Pipeline for Robust Person-Centric Information Extraction

Track 01: Artificial Intelligence and Machine Learning

From News to Data: A Hybrid LLM Pipeline for Robust Person-Centric Information Extraction

Paper Type

Paper Number

Description

Comments

Recommended Citation

Search

Browse

Author Corner

Links

Track 01: Artificial Intelligence and Machine Learning

From News to Data: A Hybrid LLM Pipeline for Robust Person-Centric Information Extraction

Presenter Information

Paper Type

Paper Number

Description

Comments

Recommended Citation

Share

Search

Browse

Author Corner

Links