Paper Type
Complete Research Paper
Description
Data stream mining (DSM) deals with continuous online processing and evaluation of fast-accumulating data, in cases where storing and evaluating large historical datasets is neither feasible nor efficient. This research introduces the Multiple Sliding Windows (MSW) algorithm, and demonstrates its application for a DSM scenario with discrete independent variables and a continuous dependent variable. The MSW development emerged from the need to dynamically allocate computational resources that are shared by many tasks, and predicts the required resources per task. The algorithm was evaluated with a large real-world dataset that reflects resource allocation at Intel's global data servers cloud. The evaluation assesses three MSW treatments: the use of multiple sliding-windows, a novel iterative mechanism for feature selection, and adaptive detection of concept drifts. The evaluation showed positive and significant results in terms of prediction quality and the ability to adapt to swift and/or graduate changes in data stream characteristics. Following the successful evaluation, the adoption of the proposed MSW solution by Intel led to cost savings estimated in millions of dollars annually. While evaluated in a specific context, the generic and modular definition of the MSW permits implementation in other domains that deal with DSM problems of similar nature.
DATA STREAM MINING WITH MULTIPLE SLIDING WINDOWS FOR CONTINUOUS PREDICTION
Data stream mining (DSM) deals with continuous online processing and evaluation of fast-accumulating data, in cases where storing and evaluating large historical datasets is neither feasible nor efficient. This research introduces the Multiple Sliding Windows (MSW) algorithm, and demonstrates its application for a DSM scenario with discrete independent variables and a continuous dependent variable. The MSW development emerged from the need to dynamically allocate computational resources that are shared by many tasks, and predicts the required resources per task. The algorithm was evaluated with a large real-world dataset that reflects resource allocation at Intel's global data servers cloud. The evaluation assesses three MSW treatments: the use of multiple sliding-windows, a novel iterative mechanism for feature selection, and adaptive detection of concept drifts. The evaluation showed positive and significant results in terms of prediction quality and the ability to adapt to swift and/or graduate changes in data stream characteristics. Following the successful evaluation, the adoption of the proposed MSW solution by Intel led to cost savings estimated in millions of dollars annually. While evaluated in a specific context, the generic and modular definition of the MSW permits implementation in other domains that deal with DSM problems of similar nature.