Journal of the Midwest Association for Information Systems (JMWAIS)


Source code authorship attribution is the task of determining the author of a program. Code authorship attribution has many useful applications, such as plagiarism detection and settling copyright infringement disputes. With the rise in popularity of the R programming language in the Data Science community, the need for source code authorship attribution of R programs has also risen. In this research note, we propose and evaluate the use of a tool called “ASAP: A Source-code Authorship Program” for attributing authorship of R code. We run experiments on two different datasets of R code: a “clean” one (where we are sure of each program’s author), and an “unclean” one (with more realistic data, were authorship of some code files is not certain). We find that in both datasets running an experiment using the ASAP tool with the Source Code Author Profile (SCAP) algorithm on R programs attributes authorship successfully. A number of implications for both academics and practitioners are formulated based on these experiments, together with directions for future research in this area.




Open Materials badge