Paper Number
ECIS2026-1673
Paper Type
SP
Abstract
This study addresses a central challenge in multimodal AI training data preparation: integrating privacy, copyright, and utility at the preprocessing stage. Existing approaches typically treat these regulatory dimensions in isolation, whereas our Compliance-Aware Data Pipeline (CAP) unifies them within a coherent technical artifact. Building on seven design requirements from the literature, the study develops a minimum viable product as a modular multi-agent pipeline that detects, transforms, and evaluates sensitive data towards compliance while preserving utility. An empirical evaluation using the Hateful Memes dataset confirms the expected compliance-utility trade-off, where higher compliance corresponds to reduced semantic proximity. Grounded in Design Science Research, the approach demonstrates the technical feasibility of proactive compliance and provides a foundation for further iterations incorporating additional modalities, human-in-the-loop mechanisms, and legal evaluation.
Recommended Citation
Riekers, Nils and Risius, Marten, "A Design Science Research Approach Towards Compliance-Aware Multimodal AI Training Data Preparation: Integrating Privacy, Copyright, and Utility" (2026). ECIS 2026 Proceedings. 9.
https://aisel.aisnet.org/ecis2026/datasc_isresearch/datasc_isresearch/9
A Design Science Research Approach Towards Compliance-Aware Multimodal AI Training Data Preparation: Integrating Privacy, Copyright, and Utility
This study addresses a central challenge in multimodal AI training data preparation: integrating privacy, copyright, and utility at the preprocessing stage. Existing approaches typically treat these regulatory dimensions in isolation, whereas our Compliance-Aware Data Pipeline (CAP) unifies them within a coherent technical artifact. Building on seven design requirements from the literature, the study develops a minimum viable product as a modular multi-agent pipeline that detects, transforms, and evaluates sensitive data towards compliance while preserving utility. An empirical evaluation using the Hateful Memes dataset confirms the expected compliance-utility trade-off, where higher compliance corresponds to reduced semantic proximity. Grounded in Design Science Research, the approach demonstrates the technical feasibility of proactive compliance and provides a foundation for further iterations incorporating additional modalities, human-in-the-loop mechanisms, and legal evaluation.
When commenting on articles, please be friendly, welcoming, respectful and abide by the AIS eLibrary Discussion Thread Code of Conduct posted here.