Multi-Aspect Knowledge Distillation for Language Model with Low-rank Factorization
arXiv:2604.03110v1 Announce Type: new Abstract: Knowledge distillation is an effective technique for pre-trained language model compression. However, existing methods only focus on the knowledge distribution …
Zihe Liu, Yulong Mao, Jinan Xu, Xinrui Peng, Kaiyu Huang
3 views