KD4MT: A Survey of Knowledge Distillation for Machine Translation
arXiv:2602.15845v1 Announce Type: new Abstract: Knowledge Distillation (KD) as a research area has gained a lot of traction in recent years as a compression tool to address challenges related to ever-larger models in NLP. Remarkably, Machine Translation (MT) offers a much more nuanced take on this narrative: in MT, KD also functions as a general-purpose knowledge transfer mechanism that shapes supervision and translation quality as well as efficiency. This survey synthesizes KD for MT (KD4MT) across 105 papers (through October 1, 2025). We begin by introducing both MT and KD for non-experts, followed by an overview of the standard KD approaches relevant to MT applications. Subsequently, we categorize advances in the KD4MT literature based on (i) their methodological contributions and (ii) their practical applications. Our qualitative and quantitative analyses identify common trends in the field and highlight key research gaps as well as the absence of unified evaluation practice for
arXiv:2602.15845v1 Announce Type: new Abstract: Knowledge Distillation (KD) as a research area has gained a lot of traction in recent years as a compression tool to address challenges related to ever-larger models in NLP. Remarkably, Machine Translation (MT) offers a much more nuanced take on this narrative: in MT, KD also functions as a general-purpose knowledge transfer mechanism that shapes supervision and translation quality as well as efficiency. This survey synthesizes KD for MT (KD4MT) across 105 papers (through October 1, 2025). We begin by introducing both MT and KD for non-experts, followed by an overview of the standard KD approaches relevant to MT applications. Subsequently, we categorize advances in the KD4MT literature based on (i) their methodological contributions and (ii) their practical applications. Our qualitative and quantitative analyses identify common trends in the field and highlight key research gaps as well as the absence of unified evaluation practice for KD methods in MT. We further provide practical guidelines for selecting a KD method in concrete settings and highlight potential risks associated with the application of KD to MT such as increased hallucination and bias amplification. Finally, we discuss the role of LLMs in re-shaping the KD4MT field. To support further research, we complement our survey with a publicly available database summarizing the main characteristics of the surveyed KD methods and a glossary of key terms.
Executive Summary
The article 'KD4MT: A Survey of Knowledge Distillation for Machine Translation' provides a comprehensive overview of Knowledge Distillation (KD) techniques specifically applied to Machine Translation (MT). The survey covers 105 papers up to October 1, 2025, and explores how KD functions not only as a model compression tool but also as a mechanism for enhancing translation quality and efficiency. The authors categorize methodological contributions and practical applications, identify trends, research gaps, and risks such as hallucination and bias amplification, and discuss the impact of Large Language Models (LLMs) on the field. The survey is supplemented by a publicly available database and glossary to support further research.
Key Points
- ▸ KD in MT serves both as a compression tool and a knowledge transfer mechanism.
- ▸ The survey categorizes KD4MT advances into methodological contributions and practical applications.
- ▸ Identifies trends, research gaps, and risks associated with KD in MT.
- ▸ Discusses the impact of LLMs on the KD4MT field.
- ▸ Provides practical guidelines for selecting KD methods and highlights the need for unified evaluation practices.
Merits
Comprehensive Coverage
The survey thoroughly reviews 105 papers, providing a broad and detailed overview of KD techniques in MT.
Practical Guidelines
Offers practical guidelines for selecting KD methods, which is valuable for researchers and practitioners.
Publicly Available Resources
Complements the survey with a publicly available database and glossary, supporting further research.
Demerits
Lack of Unified Evaluation
Highlights the absence of a unified evaluation practice for KD methods in MT, which could lead to inconsistencies in research outcomes.
Potential Risks
Identifies risks such as increased hallucination and bias amplification, but does not provide detailed mitigation strategies.
Future-Oriented Discussion
While discussing the impact of LLMs, the survey does not delve deeply into the immediate practical implications for current KD4MT practices.
Expert Commentary
The survey 'KD4MT: A Survey of Knowledge Distillation for Machine Translation' is a timely and valuable contribution to the field of NLP, particularly in the domain of machine translation. The comprehensive review of 105 papers provides a solid foundation for understanding the current state of KD techniques in MT. The authors' categorization of methodological contributions and practical applications is insightful and helps to identify key trends and research gaps. The discussion on the impact of LLMs is particularly relevant, given the rapid advancements in this area. However, the survey could benefit from a more detailed exploration of mitigation strategies for the identified risks, such as hallucination and bias amplification. Additionally, while the survey highlights the absence of a unified evaluation practice, it does not propose specific steps towards establishing such a framework. Overall, this survey is a crucial resource for researchers and practitioners in the field, and it sets the stage for future research and development in KD4MT.
Recommendations
- ✓ Develop a unified evaluation framework for KD methods in MT to ensure consistency and comparability across research studies.
- ✓ Conduct further research on mitigation strategies for risks such as hallucination and bias amplification in KD methods.