Big Data and Analytics: Pathways to Maturity

IRIS-DS: A New Approach for Identifiers and References Discovery in Document Stores

Manel Souibgui, Conservatoire national des arts et métiers (CNAM)Follow
Faten Atigui, CEDRIC, Conservatoire National des Arts et Métiers (CNAM)Follow
Sadok Ben Yahia, Department of Software Science, Tallinn University of TechnologyFollow
Samira Si-Said Cherfi, CEDRIC - Conservatoire National des Arts et MétiersFollow

Location

Online

Event Website

https://hicss.hawaii.edu/

Start Date

4-1-2021 12:00 AM

End Date

9-1-2021 12:00 AM

Description

NoSQL stores offer a new cost-effective and schema-free system. Although it is widely accepted today, Business Intelligence & Analytics (BI&A) remains associated with relational databases. Exploiting schema-free data for analytical purposes is issuing a challenge since it requires reviewing all the BI&A phases, particularly the Extract-Transform-Load (ETL) process, to fit big data sources as document stores. In the ETL process, the join of several collections, with a lack of explicitly known join fields, is a significant challenge. Detecting these fields manually is time and effort consuming, and even infeasible in large-scale datasets. In this paper, we study the problem of discovering join fields automatically, and introduce an algorithm to detect both identifiers and references on several document stores. The modus operandi of our approach underscores two core stages: (i) discovery of identifier candidates; and (ii) identifying candidate pairs of identifier and reference fields. We use scoring features and pruning rules based on both syntactic and semantic aspects to efficiently discover true candidates from a huge number of initial ones. Finally, we report our experimental findings that show very promising results.

Download

COinS

Jan 4th, 12:00 AM Jan 9th, 12:00 AM

IRIS-DS: A New Approach for Identifiers and References Discovery in Document Stores

Online

https://aisel.aisnet.org/hicss-54/da/big_data_and_analytics/5

Big Data and Analytics: Pathways to Maturity

IRIS-DS: A New Approach for Identifiers and References Discovery in Document Stores

Location

Event Website

Start Date

End Date

Description

Search

Browse

Author Corner

Big Data and Analytics: Pathways to Maturity

IRIS-DS: A New Approach for Identifiers and References Discovery in Document Stores

Presenter Information

Location

Event Website

Start Date

End Date

Description

Share

Search

Browse

Author Corner