Paper Type
Complete
Abstract
This paper proposes a generative AI framework for medical imaging classification that classifies magnetic resonance imaging (MRI) scans into four diagnostic classes: glioma, meningioma, non-tumor, and pituitary. The system leverages a multi-agent workflow built with LangChain, LangGraph and Google Gemini 2.5-Pro in an inference-only setting. A total of 13,351 MRI images were used for evaluation, comprising 10,680 images available for few-shot exemplar selection and 2,671 held-out test images, without any model fine-tuning. The workflow combines multimodal few-shot prompting, deterministic state routing, and strict JSON output validation. The results show that a large language model (LLM)-driven approach can produce meaningful assistive classification performance for image-conditioned inference, while still falling short of clinically acceptable autonomous diagnosis. The paper presents the dataset, system architecture, prompting strategy, classification workflow, and evaluation results, and situates the artifact within a broader discussion of AI reliability, incomplete knowledge, and the need for human-in-the-loop oversight in medical applications.
Paper Number
1818
Recommended Citation
Halder, Arnab and Nguyen, Thuan, "Generative AI Multi-Agent Framework for Brain Tumor Classification" (2026). AMCIS 2026 Proceedings. 24.
https://aisel.aisnet.org/amcis2026/sig_dsa/sig_dsa/24
Generative AI Multi-Agent Framework for Brain Tumor Classification
This paper proposes a generative AI framework for medical imaging classification that classifies magnetic resonance imaging (MRI) scans into four diagnostic classes: glioma, meningioma, non-tumor, and pituitary. The system leverages a multi-agent workflow built with LangChain, LangGraph and Google Gemini 2.5-Pro in an inference-only setting. A total of 13,351 MRI images were used for evaluation, comprising 10,680 images available for few-shot exemplar selection and 2,671 held-out test images, without any model fine-tuning. The workflow combines multimodal few-shot prompting, deterministic state routing, and strict JSON output validation. The results show that a large language model (LLM)-driven approach can produce meaningful assistive classification performance for image-conditioned inference, while still falling short of clinically acceptable autonomous diagnosis. The paper presents the dataset, system architecture, prompting strategy, classification workflow, and evaluation results, and situates the artifact within a broader discussion of AI reliability, incomplete knowledge, and the need for human-in-the-loop oversight in medical applications.
Comments
SIG DSA