Skip to main content

Academic

Academic

Academic · 1 min

ExpertWeaver: Unlocking the Inherent MoE in Dense LLMs with GLU Activation Patterns

arXiv:2602.15521v1 Announce Type: new Abstract: Mixture-of-Experts (MoE) effectively scales model capacity while preserving computational efficiency through sparse expert activation. However, training high-quality MoEs from scratch …

Ziyu Zhao, Tong Zhu, Zhi Zhang, Tiantian Fan, Jinluan Yang, Kun Kuang, Zhongyu Wei, Fei Wu, Yu Cheng
6 views
Academic · 1 min

jina-embeddings-v5-text: Task-Targeted Embedding Distillation

arXiv:2602.15547v1 Announce Type: new Abstract: Text embedding models are widely used for semantic similarity tasks, including information retrieval, clustering, and classification. General-purpose models are typically …

Mohammad Kalim Akram, Saba Sturua, Nastia Havriushenko, Quentin Herreros, Michael G\"unther, Maximilian Werk, Han Xiao
3 views
Academic · 1 min

STAPO: Stabilizing Reinforcement Learning for LLMs by Silencing Rare Spurious Tokens

arXiv:2602.15620v1 Announce Type: new Abstract: Reinforcement Learning (RL) has significantly improved large language model reasoning, but existing RL fine-tuning methods rely heavily on heuristic techniques …

Shiqi Liu, Zeyu He, Guojian Zhan, Letian Tao, Zhilong Zheng, Jiang Wu, Yinuo Wang, Yang Guan, Kehua Sheng, Bo Zhang, Keqiang Li, Jingliang Duan, Shengbo Eben Li
3 views
Academic · 1 min

Rethinking Metrics for Lexical Semantic Change Detection

arXiv:2602.15716v1 Announce Type: new Abstract: Lexical semantic change detection (LSCD) increasingly relies on contextualised language model embeddings, yet most approaches still quantify change using a …

Roksana Goworek, Haim Dubossarsky
3 views