This platform requires JavaScript for full functionality. Please enable JavaScript in your browser settings.

Sayantan Dasgupta, Trevor Cohn, Timothy Baldwin

Articles by Sayantan Dasgupta, Trevor Cohn, Timothy Baldwin

Academic · 1 min

Don't Ignore the Tail: Decoupling top-K Probabilities for Efficient Language Model Distillation

arXiv:2602.20816v1 Announce Type: new Abstract: The core learning signal used in language model distillation is the standard Kullback-Leibler (KL) divergence between the student and teacher …

10 views Feb 26

Something extraordinary is coming.

Sayantan Dasgupta, Trevor Cohn, Timothy Baldwin

Articles by Sayantan Dasgupta, Trevor Cohn, Timothy Baldwin

Don't Ignore the Tail: Decoupling top-K Probabilities for Efficient Language Model Distillation

JCG, PC

HSOLLC Co., Ltd.