Near-real-time data warehousing plays an essential role for decision making in organizations where latest data is to be fed from various data sources on near-real-time basis. The stream of sales data coming from data sources needs to be transformed to the data warehouse format using disk-based master data. This transformation process is a challenging task due to slow disk access rate as compare to the fast stream data. For this purpose, an adaptive semi-stream join algorithm called HYBRIDJOIN (Hybrid Join) is presented in the literature. The algorithm uses a single buffer to load partitions from the master data. Therefore, the algorithm has to wait until the next disk partition overwrites the existing partition in the buffer. As the cost of loading the disk partition into the buffer is a major cost in the total algorithm’s processing cost, this leaves the performance of the algorithm sub-optimal. This paper presents optimisation of existing HYBRIDJOIN by introducing another buffer. This enables the algorithm to load the second buffer while the first one is under join execution. This reduces the time that the algorithm wait for loading of master data partition and consequently, this improves the performance of the algorithm significantly.
Naeem, M Asif; Aziz, Omer; and Jamil, Noreen, "Optimising HYBRIDJOIN to Process Semi-Stream Data in Near-real-time Data Warehousing" (2019). CONF-IRM 2019 Proceedings. 27.