Advances in Research Methods
Event Title
Improving Causal Inference with Text as Data in Empirical IS Research: A Machine Learning Approach
Loading...
Paper Type
Short
Paper Number
2475
Description
This study combines two streams of literature – text representation and machine learning-based causal inference, to study how to represent text as data to improve causal inference, i.e., estimating treatment effects more accurately. We choose a real problem context, Yelp reviews, to demonstrate how to train a topic modeling or Word2Vec model to transform review text into meaningful metrics and the causal forest to estimate the treatment effect of an ‘Elite’ badge recognized by Yelp on received votes of the review. Results show that the estimated average treatment effect (ATE) significantly decreases after adding quantitative text representations into the model. This implies that the positive effect of ‘Elite’ badge was overestimated without text information. We also present specific steps to help other researchers leverage the causal forest to estimate the heterogeneous effects across subgroups. Overall, we show that transforming text into quantitative data makes the treatment effect estimation more accurate.
Recommended Citation
Yin, Guopeng and Chen, Jian, "Improving Causal Inference with Text as Data in Empirical IS Research: A Machine Learning Approach" (2020). ICIS 2020 Proceedings. 10.
https://aisel.aisnet.org/icis2020/adv_research_methods/adv_research_methods/10
Improving Causal Inference with Text as Data in Empirical IS Research: A Machine Learning Approach
This study combines two streams of literature – text representation and machine learning-based causal inference, to study how to represent text as data to improve causal inference, i.e., estimating treatment effects more accurately. We choose a real problem context, Yelp reviews, to demonstrate how to train a topic modeling or Word2Vec model to transform review text into meaningful metrics and the causal forest to estimate the treatment effect of an ‘Elite’ badge recognized by Yelp on received votes of the review. Results show that the estimated average treatment effect (ATE) significantly decreases after adding quantitative text representations into the model. This implies that the positive effect of ‘Elite’ badge was overestimated without text information. We also present specific steps to help other researchers leverage the causal forest to estimate the heterogeneous effects across subgroups. Overall, we show that transforming text into quantitative data makes the treatment effect estimation more accurate.
When commenting on articles, please be friendly, welcoming, respectful and abide by the AIS eLibrary Discussion Thread Code of Conduct posted here.
Comments
19-Methods