Abstract

This study explores the application of artificial intelligence (AI) methods for the automated detection and classification of laryngeal pathologies in fiberoptic laryngoscopy videos. From recordings of 292 patients, a total of 885 informative image frames were automatically ex- tracted, and subsequently segmented manually by experienced clinicians. Seven distinct pathol- ogy categories were examined using two deep learning models, Mask R-CNN, designed for classification, object detection, and segmentation tasks; and EfficientNet V2L, solely for clas- sification. For the classification task, an across-class average imbalance-resistant F1-score was higher for Mask R-CNN model, 0.95 (confidence interval, CI: 0.90–0.98), than for Efficient- Net V2L 0.74 (CI: 0.66-0.81; McNemar’s test p<0.001). In object detection, a mean average precision of 0.36 (CI: 0.35-0.37) was achieved at an intersection over union threshold of 50%. However, segmentation models reached lower performance, average precision 0.29 (0.28-0.30). In sum, for the larynx pathology analysis, DNNs show more potential for classification than segmentation tasks, with an advantage of Mask R-CNN over EfficientNet architecture.

Recommended Citation

Nowak, J., Buchwald, M., Kupinski, S., Pukacki, J., Klimza, H., Nogal, P., Jackowska, J., Wierzbicka, M. & Dyczkowski, K. (2025). Deep Neural Networks for Automatic Detection and Classification of Laryngeal Pathologies in Endoscopic ImagingIn I. Luković, S. Bjeladinović, B. Delibašić, D. Barać, N. Iivari, E. Insfran, M. Lang, H. Linger, & C. Schneider (Eds.), Empowering the Interdisciplinary Role of ISD in Addressing Contemporary Issues in Digital Transformation: How Data Science and Generative AI Contributes to ISD (ISD2025 Proceedings). Belgrade, Serbia: University of Gdańsk, Department of Business Informatics & University of Belgrade, Faculty of Organizational Sciences. ISBN: 978-83-972632-1-5. https://doi.org/10.62036/ISD.2025.40

Paper Type

Poster

DOI

10.62036/ISD.2025.40

Share

COinS
 

Deep Neural Networks for Automatic Detection and Classification of Laryngeal Pathologies in Endoscopic Imaging

This study explores the application of artificial intelligence (AI) methods for the automated detection and classification of laryngeal pathologies in fiberoptic laryngoscopy videos. From recordings of 292 patients, a total of 885 informative image frames were automatically ex- tracted, and subsequently segmented manually by experienced clinicians. Seven distinct pathol- ogy categories were examined using two deep learning models, Mask R-CNN, designed for classification, object detection, and segmentation tasks; and EfficientNet V2L, solely for clas- sification. For the classification task, an across-class average imbalance-resistant F1-score was higher for Mask R-CNN model, 0.95 (confidence interval, CI: 0.90–0.98), than for Efficient- Net V2L 0.74 (CI: 0.66-0.81; McNemar’s test p<0.001). In object detection, a mean average precision of 0.36 (CI: 0.35-0.37) was achieved at an intersection over union threshold of 50%. However, segmentation models reached lower performance, average precision 0.29 (0.28-0.30). In sum, for the larynx pathology analysis, DNNs show more potential for classification than segmentation tasks, with an advantage of Mask R-CNN over EfficientNet architecture.