Abstract
This study explores the application of artificial intelligence (AI) methods for the automated detection and classification of laryngeal pathologies in fiberoptic laryngoscopy videos. From recordings of 292 patients, a total of 885 informative image frames were automatically ex- tracted, and subsequently segmented manually by experienced clinicians. Seven distinct pathol- ogy categories were examined using two deep learning models, Mask R-CNN, designed for classification, object detection, and segmentation tasks; and EfficientNet V2L, solely for clas- sification. For the classification task, an across-class average imbalance-resistant F1-score was higher for Mask R-CNN model, 0.95 (confidence interval, CI: 0.90–0.98), than for Efficient- Net V2L 0.74 (CI: 0.66-0.81; McNemar’s test p<0.001). In object detection, a mean average precision of 0.36 (CI: 0.35-0.37) was achieved at an intersection over union threshold of 50%. However, segmentation models reached lower performance, average precision 0.29 (0.28-0.30). In sum, for the larynx pathology analysis, DNNs show more potential for classification than segmentation tasks, with an advantage of Mask R-CNN over EfficientNet architecture.
Paper Type
Poster
DOI
10.62036/ISD.2025.40
Deep Neural Networks for Automatic Detection and Classification of Laryngeal Pathologies in Endoscopic Imaging
This study explores the application of artificial intelligence (AI) methods for the automated detection and classification of laryngeal pathologies in fiberoptic laryngoscopy videos. From recordings of 292 patients, a total of 885 informative image frames were automatically ex- tracted, and subsequently segmented manually by experienced clinicians. Seven distinct pathol- ogy categories were examined using two deep learning models, Mask R-CNN, designed for classification, object detection, and segmentation tasks; and EfficientNet V2L, solely for clas- sification. For the classification task, an across-class average imbalance-resistant F1-score was higher for Mask R-CNN model, 0.95 (confidence interval, CI: 0.90–0.98), than for Efficient- Net V2L 0.74 (CI: 0.66-0.81; McNemar’s test p<0.001). In object detection, a mean average precision of 0.36 (CI: 0.35-0.37) was achieved at an intersection over union threshold of 50%. However, segmentation models reached lower performance, average precision 0.29 (0.28-0.30). In sum, for the larynx pathology analysis, DNNs show more potential for classification than segmentation tasks, with an advantage of Mask R-CNN over EfficientNet architecture.
Recommended Citation
Nowak, J., Buchwald, M., Kupinski, S., Pukacki, J., Klimza, H., Nogal, P., Jackowska, J., Wierzbicka, M. & Dyczkowski, K. (2025). Deep Neural Networks for Automatic Detection and Classification of Laryngeal Pathologies in Endoscopic ImagingIn I. Luković, S. Bjeladinović, B. Delibašić, D. Barać, N. Iivari, E. Insfran, M. Lang, H. Linger, & C. Schneider (Eds.), Empowering the Interdisciplinary Role of ISD in Addressing Contemporary Issues in Digital Transformation: How Data Science and Generative AI Contributes to ISD (ISD2025 Proceedings). Belgrade, Serbia: University of Gdańsk, Department of Business Informatics & University of Belgrade, Faculty of Organizational Sciences. ISBN: 978-83-972632-1-5. https://doi.org/10.62036/ISD.2025.40