Abstract

Source code classification is an important step in archiving and reusing the code. Given the complex nature of software, source code is often organized into categories manually by field experts. Such categorization process not only requires a pre-existing category schema, but also is labor intensive which is difficult to keep up with the fast-growing available source codes. In this paper, we proposed an innovative method that can automatically classify a set of source codes into clusters based on similarity of their functionalities. We used a neural-network-based algorithm, Self-Organizing Maps (SOM), to cluster a list of source code extracted from an open-source software application site, SourceForge (sourceforge.net). Experiments have been conducted to test the feasibility of our approach. The research results showed SOM can automatically and effectively cluster source code with proper training. The implication of this study is discussed.

Share

COinS