Abstract
We present an approach for identifying persistently misclassified images in real-world thyroid ultrasound data. Using 484 images of thyroid nodules, we evaluated four different convolutional neural network architectures. Persistent misclassification is defined as images repeatedly misclassified across models and cross-validation folds. These cases are validated by an experienced radiologist and subjected to Grad-CAM analysis. Results confirm that images, that have negative impact on model results, often exhibit atypical or ambiguous features. We emphasize that persistent misclassification is an important source of diagnostic error, independent of model choice. Recognizing misleading cases is crucial for dataset quality, model robustness and the trustworthiness of AI systems in clinical applications. This work highlights the need for incorporation data validation strategies alongside standard performance metrics in the development of deep learning models.
Paper Type
Full Paper
DOI
10.62036/ISD.2025.74
Persistent Misclassification Analysis for Improving Thyroid Cancer Classification from Ultrasound Images
We present an approach for identifying persistently misclassified images in real-world thyroid ultrasound data. Using 484 images of thyroid nodules, we evaluated four different convolutional neural network architectures. Persistent misclassification is defined as images repeatedly misclassified across models and cross-validation folds. These cases are validated by an experienced radiologist and subjected to Grad-CAM analysis. Results confirm that images, that have negative impact on model results, often exhibit atypical or ambiguous features. We emphasize that persistent misclassification is an important source of diagnostic error, independent of model choice. Recognizing misleading cases is crucial for dataset quality, model robustness and the trustworthiness of AI systems in clinical applications. This work highlights the need for incorporation data validation strategies alongside standard performance metrics in the development of deep learning models.
Recommended Citation
Rafało, M. & Żyłka, A. (2025). Persistent Misclassification Analysis for Improving Thyroid Cancer Classification from Ultrasound ImagesIn I. Luković, S. Bjeladinović, B. Delibašić, D. Barać, N. Iivari, E. Insfran, M. Lang, H. Linger, & C. Schneider (Eds.), Empowering the Interdisciplinary Role of ISD in Addressing Contemporary Issues in Digital Transformation: How Data Science and Generative AI Contributes to ISD (ISD2025 Proceedings). Belgrade, Serbia: University of Gdańsk, Department of Business Informatics & University of Belgrade, Faculty of Organizational Sciences. ISBN: 978-83-972632-1-5. https://doi.org/10.62036/ISD.2025.74