Academic

KD4MT: A Survey of Knowledge Distillation for Machine Translation

Ona de Gibert, Joseph Attieh, Timothee Mickus, Yves Scherrer, J\"org Tiedemann · February 20, 2026 · 1 min read · 5 views

#cs.CL

arXiv:2602.15845v1 Announce Type: new Abstract: Knowledge Distillation (KD) as a research area has gained a lot of traction in recent years as a compression tool to address challenges related to ever-larger models in NLP. Remarkably, Machine Translation (MT) offers a much more nuanced take on this narrative: in MT, KD also functions as a general-purpose knowledge transfer mechanism that shapes supervision and translation quality as well as efficiency. This survey synthesizes KD for MT (KD4MT) across 105 papers (through October 1, 2025). We begin by introducing both MT and KD for non-experts, followed by an overview of the standard KD approaches relevant to MT applications. Subsequently, we categorize advances in the KD4MT literature based on (i) their methodological contributions and (ii) their practical applications. Our qualitative and quantitative analyses identify common trends in the field and highlight key research gaps as well as the absence of unified evaluation practice for KD methods in MT. We further provide practical guidelines for selecting a KD method in concrete settings and highlight potential risks associated with the application of KD to MT such as increased hallucination and bias amplification. Finally, we discuss the role of LLMs in re-shaping the KD4MT field. To support further research, we complement our survey with a publicly available database summarizing the main characteristics of the surveyed KD methods and a glossary of key terms.

Executive Summary

The article 'KD4MT: A Survey of Knowledge Distillation for Machine Translation' provides a comprehensive overview of Knowledge Distillation (KD) techniques specifically applied to Machine Translation (MT). The survey covers 105 papers up to October 1, 2025, and explores how KD functions not only as a model compression tool but also as a mechanism for enhancing translation quality and efficiency. The authors categorize methodological contributions and practical applications, identify trends, research gaps, and risks such as hallucination and bias amplification, and discuss the impact of Large Language Models (LLMs) on the field. The survey is supplemented by a publicly available database and glossary to support further research.

Key Points

▸ KD in MT serves both as a compression tool and a knowledge transfer mechanism.
▸ The survey categorizes KD4MT advances into methodological contributions and practical applications.
▸ Identifies trends, research gaps, and risks associated with KD in MT.
▸ Discusses the impact of LLMs on the KD4MT field.
▸ Provides practical guidelines for selecting KD methods and highlights the need for unified evaluation practices.

Merits

Comprehensive Coverage

The survey thoroughly reviews 105 papers, providing a broad and detailed overview of KD techniques in MT.

Practical Guidelines

Offers practical guidelines for selecting KD methods, which is valuable for researchers and practitioners.

Publicly Available Resources

Complements the survey with a publicly available database and glossary, supporting further research.

Demerits

Lack of Unified Evaluation

Highlights the absence of a unified evaluation practice for KD methods in MT, which could lead to inconsistencies in research outcomes.

Potential Risks

Identifies risks such as increased hallucination and bias amplification, but does not provide detailed mitigation strategies.

Future-Oriented Discussion

While discussing the impact of LLMs, the survey does not delve deeply into the immediate practical implications for current KD4MT practices.

Expert Commentary

The survey 'KD4MT: A Survey of Knowledge Distillation for Machine Translation' is a timely and valuable contribution to the field of NLP, particularly in the domain of machine translation. The comprehensive review of 105 papers provides a solid foundation for understanding the current state of KD techniques in MT. The authors' categorization of methodological contributions and practical applications is insightful and helps to identify key trends and research gaps. The discussion on the impact of LLMs is particularly relevant, given the rapid advancements in this area. However, the survey could benefit from a more detailed exploration of mitigation strategies for the identified risks, such as hallucination and bias amplification. Additionally, while the survey highlights the absence of a unified evaluation practice, it does not propose specific steps towards establishing such a framework. Overall, this survey is a crucial resource for researchers and practitioners in the field, and it sets the stage for future research and development in KD4MT.

Recommendations

✓ Develop a unified evaluation framework for KD methods in MT to ensure consistency and comparability across research studies.
✓ Conduct further research on mitigation strategies for risks such as hallucination and bias amplification in KD methods.

Sources

arXiv - cs.CL

Something extraordinary is coming.

KD4MT: A Survey of Knowledge Distillation for Machine Translation

AI Commentary

Executive Summary

Key Points

Merits

Comprehensive Coverage

Practical Guidelines

Publicly Available Resources

Demerits

Lack of Unified Evaluation

Potential Risks

Future-Oriented Discussion

Expert Commentary

Recommendations

Sources

Related Articles

How Large Language Models Get Stuck: Early structure with persistent …

Distribution-Aware Companding Quantization of Large Language Models

Policy Compliance of User Requests in Natural Language for AI …

LLM-Bootstrapped Targeted Finding Guidance for Factual MLLM-based Medical Report Generation

JCG, PC

HSOLLC Co., Ltd.