Abstract

The rapid development of large language models significantly impacts software development, particularly in code generation. This paper focuses on the analysis of the performance and features of ChatGPT and DeepSeek chatbots, based on their GPT-4o and V3 models, respectively, with an emphasis on code generation. Particular attention is given to the architecture of the models, multimodality, open-source status, and token limits. Through experimental evaluation of 60 TypeScript LeetCode problems across different difficulty levels, we evaluated accuracy, debugging ability, and the number of attempts needed for correct solutions. The results show that DeepSeek achieved an accuracy of 68.3%, while ChatGPT achieved 61.7%. The paper highlights the advantages of DeepSeek as an open-source option and points to the potential to improve generated code, contributing to the understanding of the application of large language models in programming.

Recommended Citation

Stamenković, F., Stanojević, J. & Simić, D. (2025). Comparing Code Generation Capabilities of ChatGPT-4o and DeepSeek V3 in Solving TypeScript Programming ProblemsIn I. Luković, S. Bjeladinović, B. Delibašić, D. Barać, N. Iivari, E. Insfran, M. Lang, H. Linger, & C. Schneider (Eds.), Empowering the Interdisciplinary Role of ISD in Addressing Contemporary Issues in Digital Transformation: How Data Science and Generative AI Contributes to ISD (ISD2025 Proceedings). Belgrade, Serbia: University of Gdańsk, Department of Business Informatics & University of Belgrade, Faculty of Organizational Sciences. ISBN: 978-83-972632-1-5. https://doi.org/10.62036/ISD.2025.21

Paper Type

Short Paper

DOI

10.62036/ISD.2025.21

Share

COinS
 

Comparing Code Generation Capabilities of ChatGPT-4o and DeepSeek V3 in Solving TypeScript Programming Problems

The rapid development of large language models significantly impacts software development, particularly in code generation. This paper focuses on the analysis of the performance and features of ChatGPT and DeepSeek chatbots, based on their GPT-4o and V3 models, respectively, with an emphasis on code generation. Particular attention is given to the architecture of the models, multimodality, open-source status, and token limits. Through experimental evaluation of 60 TypeScript LeetCode problems across different difficulty levels, we evaluated accuracy, debugging ability, and the number of attempts needed for correct solutions. The results show that DeepSeek achieved an accuracy of 68.3%, while ChatGPT achieved 61.7%. The paper highlights the advantages of DeepSeek as an open-source option and points to the potential to improve generated code, contributing to the understanding of the application of large language models in programming.