We live in an evolving world of rapidly accumulating and highly granular big data, constantly growing in volume and increasing in velocity. This TREO paper focuses on the data quality dimension believability, specifically as it applies to evaluating influence in linked data. Believability is highly relevant in social media and other forms of linked data. Referring to the extent to which data are regarded as credible, believability is closely related to the fourth “V” of big data, veracity, which describes accuracy and trustworthiness (Shankaranarayanan and Blake, 2017), and contributes to the fifth “V” value. Prior work has focused on provenance-based (Prat and Madnick, 2008), context-based (Serra and Marotta, 2016), and reputationbased (Cai and Zhu, 2016) approaches to believability. In each case, their efficacy is situational, depending on the specific data under analysis. We propose a structural-based approach, exploiting the fact that regardless of its dynamic content and meta-content (e.g., provenance, context, reputation), the structure of linked data remains the same (if not, it ceases to be linked data). We illustrate our structural approach to believability by analyzing influence in linked data using a network from Yelp, an online linked directory service and crowd-sourced review forum largely about food. There are many ways to understand and measure influence in linked data. Consider one particular person in a graph containing people. We might be interested in determining their influence by looking at the number of their immediate friends (which can be structurally calculated by determining their vertex degree), along with how well connected and relevant those friends are (which can be structurally calculated through clustering coefficient and PageRank, respectively). Each of those metrics provide insight into influence (or lack thereof) within linked data, but none present the whole picture because there are ways to artificially inflate or otherwise “game” those individual measures. However, a holistic approach for evaluating the believability of influence measures in linked data seems attainable by combining individual measures. We will present phase one of this research-in-progress: combining graph analytics to develop a Structural Holistic Believability Metric for influence in linked data. Intuitively, incorporating multiple believability measures seems like it could increase data quality by improving credibility and value judgements of influence in linked data. But how do we know that? How can we test and validate it? Generating a sense of believability is difficult, as it is an inherently human concept. Although this human ability seems like it might be helpful for validating believability metrics, it is not, because the volume, velocity, and variety inherent in big data is too great for humans to process. We will encourage discussion of this important topic, which addresses phase two of this research-in-progress.

infographic-1.pdf (1365 kB)

Abstract Only



When commenting on articles, please be friendly, welcoming, respectful and abide by the AIS eLibrary Discussion Thread Code of Conduct posted here.