Abstract

Organizations increasingly rely on data-driven approaches to social media marketing to develop a content strategy that resonates with consumers. In particular, marketers can examine online engagement trends to identify the characteristics of content that appeal to the target audience. In an online setting, engagement can include any digital interaction consumers have with media published by a business or organization. Marketers and analysts can utilize advanced data analytics and business intelligence techniques to optimize the content these firms publish. By refining the strategy for publishing content, an organization will reap the benefits of heightened brand awareness and improved consumer relationships. This study advances our understanding of the methods by which organizations should conduct marketing research to identify what drives their audience to engage with their social media content. In this study, we propose a framework to investigate online engagement through a case study of an organization’s Instagram posts, identifying the attributes that make them engaging. We first define online engagement as the sum of likes and comments on a post, then use it as our target variable in a binary classification task to predict posts as either high-engaging or low-engaging. Posts are considered high or low-engaging based on the median engagement threshold. To determine which features contribute to high or low engagement, we analyze posts’ captions, images, and metadata. We investigate caption data by first vectorizing the textual dataset using three different techniques: Bag-of-Words (BoW), Term Frequency-Inverse Document Frequency (TF-IDF), and Bidirectional Encoder Representations from Transformers (BERT). We train a set of classifiers on each set of vectorized features to determine textual features linked to high or low engagement. Classifiers applied include Logistic Regression (LR), Random Forest (RF), Extreme Gradient Boosting (XGB), Support Vector Machine (SVM), and Multilayer Perceptron (MLP). To explore our image dataset, we apply ResNet-18 to detect and interpret visual features, then train another set of classifiers to determine which images the organization published resulted in higher or lower engagement. Classifiers applied include LR, RF, XGB, and MLP models. We further evaluate the image dataset by exploring the relationship between RGB color values and HSV color values and posts’ performance through sets of histograms. To ensure the comprehensiveness of our methodology, we concatenate ResNet-18 image features with either BOW or TF-IDF text features, forming joint feature representations. LR, RF, XGB, and MLP models are trained on these combined vectors. SHAP (Shapley Additive exPlanations) values are then calculated to illustrate the features that are most impactful to these models. We have also included a clustering of these features to discuss the contributing themes within the posts’ captions. Our study finds that text-mining models perform better overall when primarily focusing on model recall. An RF classifier trained on TF-IDF vectorized features proves to be most effective. Our framework is suited for determining which words and themes to incorporate into a business or organization’s Instagram post captions to increase the likelihood of high engagement. While our framework is comprehensive and can integrate textual and visual features, its current performance in image and combined text-image analysis is limited. Our methodology and results can be used across the digital marketing landscape to inform marketing analysts about the strategies to employ while analyzing social media data. Moreover, our framework is applicable to organizations seeking to better understand and improve their online engagement.

Share

COinS