Abstract

We present an approach for identifying persistently misclassified images in real-world thyroid ultrasound data. Using 484 images of thyroid nodules, we evaluated four different convolutional neural network architectures. Persistent misclassification is defined as images repeatedly misclassified across models and cross-validation folds. These cases are validated by an experienced radiologist and subjected to Grad-CAM analysis. Results confirm that images, that have negative impact on model results, often exhibit atypical or ambiguous features. We emphasize that persistent misclassification is an important source of diagnostic error, independent of model choice. Recognizing misleading cases is crucial for dataset quality, model robustness and the trustworthiness of AI systems in clinical applications. This work highlights the need for incorporation data validation strategies alongside standard performance metrics in the development of deep learning models.

Recommended Citation

Rafało, M. & Żyłka, A. (2025). Persistent Misclassification Analysis for Improving Thyroid Cancer Classification from Ultrasound ImagesIn I. Luković, S. Bjeladinović, B. Delibašić, D. Barać, N. Iivari, E. Insfran, M. Lang, H. Linger, & C. Schneider (Eds.), Empowering the Interdisciplinary Role of ISD in Addressing Contemporary Issues in Digital Transformation: How Data Science and Generative AI Contributes to ISD (ISD2025 Proceedings). Belgrade, Serbia: University of Gdańsk, Department of Business Informatics & University of Belgrade, Faculty of Organizational Sciences. ISBN: 978-83-972632-1-5. https://doi.org/10.62036/ISD.2025.74

Paper Type

Full Paper

DOI

10.62036/ISD.2025.74

Share

COinS
 

Persistent Misclassification Analysis for Improving Thyroid Cancer Classification from Ultrasound Images

We present an approach for identifying persistently misclassified images in real-world thyroid ultrasound data. Using 484 images of thyroid nodules, we evaluated four different convolutional neural network architectures. Persistent misclassification is defined as images repeatedly misclassified across models and cross-validation folds. These cases are validated by an experienced radiologist and subjected to Grad-CAM analysis. Results confirm that images, that have negative impact on model results, often exhibit atypical or ambiguous features. We emphasize that persistent misclassification is an important source of diagnostic error, independent of model choice. Recognizing misleading cases is crucial for dataset quality, model robustness and the trustworthiness of AI systems in clinical applications. This work highlights the need for incorporation data validation strategies alongside standard performance metrics in the development of deep learning models.