Advances in Research Methods

Loading...

Media is loading
 

Paper Type

Short

Paper Number

2475

Description

This study combines two streams of literature – text representation and machine learning-based causal inference, to study how to represent text as data to improve causal inference, i.e., estimating treatment effects more accurately. We choose a real problem context, Yelp reviews, to demonstrate how to train a topic modeling or Word2Vec model to transform review text into meaningful metrics and the causal forest to estimate the treatment effect of an ‘Elite’ badge recognized by Yelp on received votes of the review. Results show that the estimated average treatment effect (ATE) significantly decreases after adding quantitative text representations into the model. This implies that the positive effect of ‘Elite’ badge was overestimated without text information. We also present specific steps to help other researchers leverage the causal forest to estimate the heterogeneous effects across subgroups. Overall, we show that transforming text into quantitative data makes the treatment effect estimation more accurate.

Comments

19-Methods

Share

COinS
Best Paper Nominee badge
 
Dec 14th, 12:00 AM

Improving Causal Inference with Text as Data in Empirical IS Research: A Machine Learning Approach

This study combines two streams of literature – text representation and machine learning-based causal inference, to study how to represent text as data to improve causal inference, i.e., estimating treatment effects more accurately. We choose a real problem context, Yelp reviews, to demonstrate how to train a topic modeling or Word2Vec model to transform review text into meaningful metrics and the causal forest to estimate the treatment effect of an ‘Elite’ badge recognized by Yelp on received votes of the review. Results show that the estimated average treatment effect (ATE) significantly decreases after adding quantitative text representations into the model. This implies that the positive effect of ‘Elite’ badge was overestimated without text information. We also present specific steps to help other researchers leverage the causal forest to estimate the heterogeneous effects across subgroups. Overall, we show that transforming text into quantitative data makes the treatment effect estimation more accurate.

When commenting on articles, please be friendly, welcoming, respectful and abide by the AIS eLibrary Discussion Thread Code of Conduct posted here.