Abstract
Code comments are essential for software maintenance, enhancing code readability and facilitating effective collaboration among developers. With the growing complexity of software projects, Large Language Models (LLMs) have become crucial tools for automating various natural language processing tasks, including code comment classification. This study evaluates the performance of several LLMs, including GPT-4, PaLM-2, Llama-2, Llama-3, Mistral AI, and Gemini Pro, in classifying the usefulness of code comments. Our findings indicate that GPT-4 excels in both accuracy and efficiency, making it ideal for time-sensitive applications. PaLM-2 also shows notable efficiency. However, models like Llama-2, Llama-3, and Mistral AI demonstrate lower performance and longer running times, presenting scalability challenges. The study underscores the importance of robust model management to address operational issues such as tokenization errors and API limitations. Furthermore, the integration of Explainable AI (XAI) techniques is highlighted as crucial for ensuring transparency and ethical use of these "black-box" models. Future work should focus on optimizing model configurations and incorporating XAI methods to enhance the deployment of LLMs in real-world applications. This research provides valuable insights into the strengths and limitations of current LLMs, guiding the development of more robust and versatile AI solutions for software engineering tasks.
Recommended Citation
Sebin, Busra; Taskin, Nazim; and Mehdiyev, Nijat, "Generative AI For Code Comment Classification: A Comparative Analysis" (2024). MCIS 2024 Proceedings. 42.
https://aisel.aisnet.org/mcis2024/42