Track 4: Data Science and Machine Learning

Fine-Tuned Transformers and Large Language Models for Entity Recognition in Complex Eligibility Criteria for Clinical Trials

Klaudia Kantor, Roche Poland, PolandFollow
Mikolaj Morzy, Poznań University of Technology, PolandFollow

Abstract

This paper evaluates the \texttt{gpt-4-turbo} model’s proficiency in recognizing named entities within the clinical trial eligibility criteria. We employ prompt learning to a dataset comprising $49\,903$ criteria from $3\,314$ trials, with $120\,906$ annotated entities in 15 classes. We compare the performance of \texttt{gpt-4-turbo} to state-of-the-art BERT-based Transformer models\footnote{Due to page limits, detailed results and code listings are presented in the supplementary material available at https://github.com/megaduks/isd24}. Contrary to expectations, BERT-based models outperform \texttt{gpt-4-turbo} after moderate fine-tuning, in particular in low-resource settings. The \texttt{CODER} model consistently surpasses others in both low- and high-resource environments, likely due to term normalization and extensive pre-training on the UMLS thesaurus. However, it is important to recognize that traditional NER evaluation metrics, such as precision, recall, and the $F_1$ score, can unfairly penalize generative language models, even if they correctly identify entities.

Recommended Citation

Kantor, K. & Morzy, M. (2024). Fine-Tuned Transformers and Large Language Models for Entity Recognition in Complex Eligibility Criteria for Clinical Trials. In B. Marcinkowski, A. Przybylek, A. Jarzębowicz, N. Iivari, E. Insfran, M. Lang, H. Linger, & C. Schneider (Eds.), Harnessing Opportunities: Reshaping ISD in the post-COVID-19 and Generative AI Era (ISD2024 Proceedings). Gdańsk, Poland: University of Gdańsk. ISBN: 978-83-972632-0-8. https://doi.org/10.62036/ISD.2024.53

Paper Type

Poster

DOI

10.62036/ISD.2024.53

References_DOI_ISD.2024.53.pdf (127 kB)

Download

COinS

Fine-Tuned Transformers and Large Language Models for Entity Recognition in Complex Eligibility Criteria for Clinical Trials

Track 4: Data Science and Machine Learning

Fine-Tuned Transformers and Large Language Models for Entity Recognition in Complex Eligibility Criteria for Clinical Trials

Abstract

Recommended Citation

Paper Type

DOI

Search

Browse

Author Corner

Links

Track 4: Data Science and Machine Learning

Fine-Tuned Transformers and Large Language Models for Entity Recognition in Complex Eligibility Criteria for Clinical Trials

Presenter Information

Abstract

Recommended Citation

Paper Type

DOI

Share

Search

Browse

Author Corner

Links