Abstract

As the volume of available data increases exponentially, traditional data warehouses struggle to transform this data into actionable knowledge. This study explores the potentialities of Hadoop as a data transformation tool in the setting of a traditional data warehouse environment. Hadoop’s distributed parallel execution model and horizontal scalability offer great capabilities when the amounts of data to be processed require the infrastructure to expand. Through a typification of the SQL statements, responsible for the data transformation processes, we were able to understand that Hadoop, and its distributed processing model, delivers outstanding performance results associated with the analytical layer, namely in the aggregation of large data sets. We demonstrate, empirically, the performance gains that can be extracted from Hadoop, in comparison to a Relational Database Management System, regarding speed, storage usage, and scalability potential, and suggest how this can be used to evolve data warehouses into the age of Big Data.

Share

COinS