Abstract
The common belief that more data leads to better results often leads to all available data being used to derive the best possible decision. However, the age of data can strongly affect data-driven decision making. Consequently, the desire for larger data volume and at the same time contemporary data leads to the “volume vs. age” dilemma, which has not yet been sufficiently researched. In this work, we rigorously investigate the “volume vs. age” dilemma for textual data using four experiments with real-world data containing customer reviews from the Yelp platform. Contributing to theory and practice, we show that more data is not always better, as the effect of data age can outweigh the effect of data volume, resulting in overall poorer performance. Moreover, we demonstrate that different aspects within textual data can exhibit different temporal effects and that considering these effects when selecting training data can clearly outperform existing practices.
Recommended Citation
Hägele, Lukas; Klier, Mathias; Obermeier, Andreas; and Widmann, Torben, "Age Ain’t Just a Number: Exploring the Volume vs. Age Dilemma for Textual Data to Enhance Decision Making" (2024). Wirtschaftsinformatik 2024 Proceedings. 17.
https://aisel.aisnet.org/wi2024/17