Online software engineering repositories like GitHub are great resources of socio-technical data about software development process. GitHub as a large-scale social coding environment contains various types of open source projects. Selecting a suitable project from a developer's perspective is difficult and time-consuming task. In this paper, general Big Data approaches and machine learning techniques are used to analyse GitHub data. Variety of socio-technical metrics and factors are extracted from online repositories for data analysis. We find that data pre-processing plays an important role in the proposed approach for GitHub Mining. Design science research method is applied on the pre-processed data on open source software (OSS) projects to design recommendation system for project selection. Content-Based recommendation techniques are proposed with evaluation mechanism.
Bayati, Shahabedin and Tripathi, Arvind K., "DESIGNING A KNOWLEDGE BASE FOR OSS PROJECT RECOMMENDER SYSTEM: A BIG DATA ANALYTICS APPROACH" (2016). Research-in-Progress Papers. 37.