Skip to main content

All Articles

Articles

Academic · 1 min

What Language is This? Ask Your Tokenizer

arXiv:2602.17655v1 Announce Type: new Abstract: Language Identification (LID) is an important component of many multilingual natural language processing pipelines, where it facilitates corpus curation, training …

Clara Meister, Ahmetcan Yavuz, Pietro Lesci, Tiago Pimentel
4 views
Academic · 1 min

Sink-Aware Pruning for Diffusion Language Models

arXiv:2602.17664v1 Announce Type: new Abstract: Diffusion Language Models (DLMs) incur high inference cost due to iterative denoising, motivating efficient pruning. Existing pruning heuristics largely inherited …

Aidar Myrzakhan, Tianyi Li, Bowei Guo, Shengkun Tang, Zhiqiang Shen
4 views
Academic · 1 min

Hybrid-Gym: Training Coding Agents to Generalize Across Tasks

arXiv:2602.16819v1 Announce Type: cross Abstract: When assessing the quality of coding agents, predominant benchmarks focus on solving single issues on GitHub, such as SWE-Bench. In …

Yiqing Xie, Emmy Liu, Gaokai Zhang, Nachiket Kotalwar, Shubham Gandhi, Sathwik Acharya, Xingyao Wang, Carolyn Rose, Graham Neubig, Daniel Fried
8 views