Nowadays, the forecasting time series task is relevant in solution of a wide range of problems in various spheres of human activities. One of the possible variants to provide prediction is to construct a forecasting model. The main criterion for the forecasting model quality is its accuracy. Researchers have resorted to different approaches to achieve the necessary accuracy of the forecasting model, including feature engineering. This paper presents an automated feature engineering method based on Bayesian optimization for time series data. The process of selection an optimal set of features in order to minimize the objective function is described. The developed method has an ability to create new features based on existing ones by using diverse algebraic operations. The proposed method considers any machine learning model as a black box, that allows applying different algorithms: linear regression, decision trees, neural networks, etc. The experiments demonstrated the high efficiency of the proposed approach. A comparative analysis showed that the developed algorithm in most cases was superior to human-made custom feature engineering. The accuracy of machine learning models is greatly improved with high-quality feature engineering. Mean squared error and coefficient of determination were applied to calculate quality metrics of machine learning models. Testing the developed method took place on open time series data from different subject areas (energy, manufacturing, air pollution), which provided reliable verification.
Danilov, Konstantin, "Automated feature engineering based on Bayesian optimization for time series" (2020). International Conference on Information Systems 2020 Special Interest Group on Big Data Proceedings. 3.