Abstract

The use of topic modeling for empirical analysis of text data has become increasingly prominent in business research. Standard studies often employ a two-step procedure: a topic model is first used to identify latent themes from textual data, and these themes are then combined with observed variables as explanatory factors in statistical models. A key limitation of this framework is that topic extraction is conducted independently of the response and observed variables, potentially weakening its effectiveness for downstream analysis. To address this concern, we introduce a unified topic model grounded in the latent Dirichlet allocation (LDA) framework. This approach simultaneously incorporates documents, observed variables, and the response variable within a regression analysis. Empirical validation with real-world data confirms the model’s advantages.

Abstract Only

Share

COinS