Abstract

Data-driven disciplines like data mining and knowledge management already provide process-based frameworks for data analysis projects, such as the well-known cross-industry standard process for data mining (CRISP-DM) or knowledge discovery in databases (KDD). Although the domain of data science addresses a much broader problem space, i.e., also considers economic, social, and ecological impacts of data-driven projects, a corresponding domain-specific process model is still missing. Consequently, based on a total of four identified meta requirements and 17 corresponding requirements that were collected from experts of theory and practice, this contribution proposes the empirically grounded data science process model (DASC-PM)—a framework that maps a data science project as a four-step process model and contextualizes it among scientific procedures, various areas of application, IT infrastructures, and impacts. To illustrate the phase-oriented specification capabilities of the DASCPM, we exemplarily present competence and role profiles for the analysis phase of a data science project.

Share

COinS