Paper Number
ICIS2025-1538
Paper Type
Short
Abstract
This research presents a computational framework leveraging crowd-sourced video data and a dynamic attention mechanism to detect and simulate anomalies encountered by collaborative robots, enhancing their navigational capabilities. Addressing data scarcity in robotics research, the study introduces three novel attention mechanisms—Relevance-Driven, Group-Focused, and Context-Modulated Attention—informed by Contour Detector Theory of Visual-Spatial Attention. These mechanisms selectively extract significant visual anomalies from social media videos, efficiently generating structured prompts for fine-tuning a multimodal large language model. Subsequently, synthetic simulation environments are created for robot training using reinforcement learning techniques. Experimental comparisons demonstrate that simulations derived from crowd-sourced videos that showcase anomalies yield improved robotic navigation outcomes compared to baseline and randomized models, underscoring the value of real-world anomaly representation. This innovative integration of crowd intelligence and AI methodologies enables scalable and effective robot training, contributing to safer and more efficient human-robot interactions.
Recommended Citation
Benjamin, Victor, "Dynamic Attention Mechanism for Robot Video Anomaly Detection" (2025). ICIS 2025 Proceedings. 11.
https://aisel.aisnet.org/icis2025/gen_ai/gen_ai/11
Dynamic Attention Mechanism for Robot Video Anomaly Detection
This research presents a computational framework leveraging crowd-sourced video data and a dynamic attention mechanism to detect and simulate anomalies encountered by collaborative robots, enhancing their navigational capabilities. Addressing data scarcity in robotics research, the study introduces three novel attention mechanisms—Relevance-Driven, Group-Focused, and Context-Modulated Attention—informed by Contour Detector Theory of Visual-Spatial Attention. These mechanisms selectively extract significant visual anomalies from social media videos, efficiently generating structured prompts for fine-tuning a multimodal large language model. Subsequently, synthetic simulation environments are created for robot training using reinforcement learning techniques. Experimental comparisons demonstrate that simulations derived from crowd-sourced videos that showcase anomalies yield improved robotic navigation outcomes compared to baseline and randomized models, underscoring the value of real-world anomaly representation. This innovative integration of crowd intelligence and AI methodologies enables scalable and effective robot training, contributing to safer and more efficient human-robot interactions.
When commenting on articles, please be friendly, welcoming, respectful and abide by the AIS eLibrary Discussion Thread Code of Conduct posted here.
Comments
12-GenAI