DataHub and Apache Atlas: A Comparative Analysis of Data Catalog Tools

Diogo Rodrigues, Centro de Computação Gráfica Guimarães
Mariana Almeida, Centro de Computação Gráfica Guimarães
Pedro Guimarães, Centro de Computação Gráfica Guimarães
Maribel Yasmina Santos, University of Minho

Abstract

Big Data introduces a significant increase of complexity to projects, in which, the use of inadequate data will inevitably produce inadequate and incorrect analysis. Data Catalogs centralize the system’s metadata into one place, providing a global view of the stored data, so it is essential to use appropriate data catalog tools. The choice of the tool that best suits the needs of the projects must be well-founded. This paper uses the OSSpal methodology, usually used for comparing open-source technologies, to do a comparative analysis of two tools: DataHub and Apache Atlas.