Paper Type

Complete

Abstract

This paper proposes a generative AI framework for medical imaging classification that classifies magnetic resonance imaging (MRI) scans into four diagnostic classes: glioma, meningioma, non-tumor, and pituitary. The system leverages a multi-agent workflow built with LangChain, LangGraph and Google Gemini 2.5-Pro in an inference-only setting. A total of 13,351 MRI images were used for evaluation, comprising 10,680 images available for few-shot exemplar selection and 2,671 held-out test images, without any model fine-tuning. The workflow combines multimodal few-shot prompting, deterministic state routing, and strict JSON output validation. The results show that a large language model (LLM)-driven approach can produce meaningful assistive classification performance for image-conditioned inference, while still falling short of clinically acceptable autonomous diagnosis. The paper presents the dataset, system architecture, prompting strategy, classification workflow, and evaluation results, and situates the artifact within a broader discussion of AI reliability, incomplete knowledge, and the need for human-in-the-loop oversight in medical applications.

Paper Number

1818

Comments

SIG DSA

Share

COinS
 
Aug 15th, 12:00 AM

Generative AI Multi-Agent Framework for Brain Tumor Classification

This paper proposes a generative AI framework for medical imaging classification that classifies magnetic resonance imaging (MRI) scans into four diagnostic classes: glioma, meningioma, non-tumor, and pituitary. The system leverages a multi-agent workflow built with LangChain, LangGraph and Google Gemini 2.5-Pro in an inference-only setting. A total of 13,351 MRI images were used for evaluation, comprising 10,680 images available for few-shot exemplar selection and 2,671 held-out test images, without any model fine-tuning. The workflow combines multimodal few-shot prompting, deterministic state routing, and strict JSON output validation. The results show that a large language model (LLM)-driven approach can produce meaningful assistive classification performance for image-conditioned inference, while still falling short of clinically acceptable autonomous diagnosis. The paper presents the dataset, system architecture, prompting strategy, classification workflow, and evaluation results, and situates the artifact within a broader discussion of AI reliability, incomplete knowledge, and the need for human-in-the-loop oversight in medical applications.