CONF-IRM 2019 Proceedings

Optimising HYBRIDJOIN to Process Semi-Stream Data in Near-real-time Data Warehousing

M Asif Naeem, Auckland University of TechnologyFollow
Omer Aziz, NFC Institute of Engineering & TechnologyFollow
Noreen Jamil, Unitec Institute of TechnologyFollow

Abstract

Near-real-time data warehousing plays an essential role for decision making in organizations where latest data is to be fed from various data sources on near-real-time basis. The stream of sales data coming from data sources needs to be transformed to the data warehouse format using disk-based master data. This transformation process is a challenging task due to slow disk access rate as compare to the fast stream data. For this purpose, an adaptive semi-stream join algorithm called HYBRIDJOIN (Hybrid Join) is presented in the literature. The algorithm uses a single buffer to load partitions from the master data. Therefore, the algorithm has to wait until the next disk partition overwrites the existing partition in the buffer. As the cost of loading the disk partition into the buffer is a major cost in the total algorithm’s processing cost, this leaves the performance of the algorithm sub-optimal. This paper presents optimisation of existing HYBRIDJOIN by introducing another buffer. This enables the algorithm to load the second buffer while the first one is under join execution. This reduces the time that the algorithm wait for loading of master data partition and consequently, this improves the performance of the algorithm significantly.

Recommended Citation

Naeem, M Asif; Aziz, Omer; and Jamil, Noreen, "Optimising HYBRIDJOIN to Process Semi-Stream Data in Near-real-time Data Warehousing" (2019). CONF-IRM 2019 Proceedings. 27.
https://aisel.aisnet.org/confirm2019/27

Download

COinS

CONF-IRM 2019 Proceedings

Optimising HYBRIDJOIN to Process Semi-Stream Data in Near-real-time Data Warehousing

Abstract

Recommended Citation

Search

Links

Browse

Author Corner

CONF-IRM 2019 Proceedings

Optimising HYBRIDJOIN to Process Semi-Stream Data in Near-real-time Data Warehousing

Authors

Abstract

Recommended Citation

Share

Search

Links

Browse

Author Corner