Journal of the Midwest Association for Information Systems (JMWAIS)
Abstract
Data scarcity poses a significant challenge for training robust machine learning models in safety-critical applications like snow detection, where real-world data collection is often limited and seasonal. This study explores the potential of synthetic data sets generated by prompt-based image synthesis models to enhance machine learning applications in such data-scarce environments. Using OpenAI's DALL·E 3 and xAI’s Aurora, synthetic images of snowy and clear sidewalks were compared against a real-world data set for training image-classification models. The findings reveal that an Aurora-based model achieved the highest F2 scores, excelling in snow detection because of its high photorealism and contextual relevance. However, the real-world data set demonstrated greater accuracy in detecting clear sidewalks, resulting in fewer overall classification errors. These results highlight the potential of synthetic data to supplement real-world data sets, particularly in data-scarce domains, while also emphasizing that real-world data remains crucial for balanced classification. This research underscores the necessity for advancements in generative models to more effectively capture complex environmental conditions and improve the generalizability of AI-generated data sets for scalable and practical machine learning applications.
Recommended Citation
de Deijn, Ricardo and Bukralia, Rajeev
(2025)
"Leveraging Synthetic Data from Generative Models for Snow Detection in Data-Scarce Environments,"
Journal of the Midwest Association for Information Systems (JMWAIS): Vol. 2025:
Iss.
2, Article 3.
DOI: 10.17705/3jmwa.000095
Available at:
https://aisel.aisnet.org/jmwais/vol2025/iss2/3
DOI
10.17705/3jmwa.000095