Abstract

Machine learning techniques have been increasingly employed in business research to discover or extract new simple features from large and unstructured data. These machine learned features (MLFs) are then used as independent or explanatory variables in the main econometric models for empirical research. Despite this growing trend, there has been little research regarding the impact of using MLFs on statistical inference for empirical research. In this paper, we undertake parameter estimation issues related to the use of topics/features extracted by Latent Dirichlet Allocation, a popular machine learning technique for text mining. We propose a novel method to extract features that result in the minimum-variance estimation of the regression model parameters. This enables a better use of unstructured text data for econometric modeling in empirical research. The effectiveness of the proposed method is validated with an experimental evaluation study on real-world text data.

Share

COinS