Abstract
Knowledge Distillation (KD) has emerged as an effective model compression technique to transfer knowledge from large, computationally expensive teacher models to smaller, more efficient student models. However, traditional KD methods often suffer from information loss during the distillation process, limiting the student model’s performance. In this research-in- progress study, we propose an innovative approach that integrates Generative Adversarial Networks (GANs) into the framework to enhance the distillation of large language models (LLMs). Our method utilizes a training mechanism where a generator synthesizes pseudo-representations to bridge the knowledge gap between the teacher and student models, while a discriminator enforces better alignment between their latent feature distributions. We hypothesize that this approach will improve the student model’s ability to capture the rich representations learned by the teacher, resulting in improved generalization and downstream task performance. Our findings may contribute to the ongoing efforts in compressing and deploying high-performing language models in resource-constrained environments.
Recommended Citation
Wang, Man and Zeng, David, "Using Generative Adversarial Network to Improve Knowledge Distillation on Large Language Models: A Design Science Approach" (2025). MWAIS 2025 Proceedings. 11.
https://aisel.aisnet.org/mwais2025/11