GATES: Self-Distillation under Privileged Context with Consensus Gating
arXiv:2602.20574v1 Announce Type: cross Abstract: We study self-distillation in settings where supervision is unreliable: there are no ground truth labels, verifiable rewards, or external graders to evaluate answers. We focus on document-grounded question answering with asymmetric context, where a single model serves as both tutor (with access to a relevant source document during training) and student (answering from the question alone at test time). Rather than assuming tutor correctness, we derive supervision online from tutor consensus by sampling multiple document-grounded reasoning traces and using agreement to gate learning. Conditioned on this reliability signal, we distill knowledge through full tutor reasoning trajectories (not just final answers), providing a dense and stable learning signal. Empirically, this consensus-gated trajectory distillation substantially improves transfer to the document-free student. Held-out in-domain accuracy under asymmetric evaluation improves
arXiv:2602.20574v1 Announce Type: cross Abstract: We study self-distillation in settings where supervision is unreliable: there are no ground truth labels, verifiable rewards, or external graders to evaluate answers. We focus on document-grounded question answering with asymmetric context, where a single model serves as both tutor (with access to a relevant source document during training) and student (answering from the question alone at test time). Rather than assuming tutor correctness, we derive supervision online from tutor consensus by sampling multiple document-grounded reasoning traces and using agreement to gate learning. Conditioned on this reliability signal, we distill knowledge through full tutor reasoning trajectories (not just final answers), providing a dense and stable learning signal. Empirically, this consensus-gated trajectory distillation substantially improves transfer to the document-free student. Held-out in-domain accuracy under asymmetric evaluation improves from 46.0\% to 62.0\%, and average (maj@8) accuracy on public document-free math benchmarks improves from 20.2\% to 35.4\%.
Executive Summary
This article presents a novel approach to self-distillation in document-grounded question answering, leveraging consensus gating to derive supervision from tutor consensus. By sampling multiple reasoning traces and using agreement to gate learning, the authors condition the distillation process on a reliability signal, distilling knowledge through full tutor reasoning trajectories. This consensus-gated trajectory distillation significantly improves transfer to the document-free student, outperforming baseline methods in held-out in-domain accuracy and public document-free math benchmarks. The technique's success lies in its ability to mitigate the challenges of unreliable supervision and provide a dense and stable learning signal.
Key Points
- ▸ Self-distillation in unreliable supervision settings using consensus gating
- ▸ Derived supervision from tutor consensus improves learning signal reliability
- ▸ Conditioning distillation on reliability signal leads to improved transfer performance
Merits
Strengths of Consensus Gating
The consensus gating technique effectively mitigates the challenges of unreliable supervision by deriving a reliable learning signal from tutor consensus, leading to improved transfer performance.
Demerits
Limitation of Asymmetric Context
The study relies on asymmetric context, where a single model serves as both tutor and student, which may not generalize to more complex or dynamic real-world scenarios.
Expert Commentary
The article presents a significant contribution to the field of NLP, leveraging consensus gating to address the critical challenge of unreliable supervision. The technique's success lies in its ability to condition the distillation process on a reliability signal, providing a dense and stable learning signal. However, the study's reliance on asymmetric context may limit its generalizability to more complex or dynamic real-world scenarios. Future work should focus on extending this technique to more general settings and exploring its applications in various NLP tasks.
Recommendations
- ✓ Future research should investigate the extension of consensus gating to more general settings, such as multi-model or multi-task scenarios.
- ✓ The technique's potential applications in various NLP tasks, such as text classification or sentiment analysis, should be explored in more detail.