Category

Academic

Academic · 1 min

OPRIDE: Offline Preference-based Reinforcement Learning via In-Dataset Exploration

arXiv:2604.02349v1 Announce Type: cross Abstract: Preference-based reinforcement learning (PbRL) can help avoid sophisticated reward designs and align better with human intentions, showing great promise in …

Yiqin Yang, Hao Hu, Yihuan Mao, Jin Zhang, Chengjie Wu, Yuhua Jiang, Xu Yang, Runpeng Xie, Yi Fan, Bo Liu, Yang Gao, Bo Xu, Chongjie Zhang
6 views
Academic · 1 min

Haiku to Opus in Just 10 bits: LLMs Unlock Massive Compression Gains

arXiv:2604.02343v1 Announce Type: cross Abstract: We study the compression of LLM-generated text across lossless and lossy regimes, characterizing a compression-compute frontier where more compression is …

Roy Rinberg, Annabelle Michael Carrell, Simon Henniger, Nicholas Carlini, Keri Warr
36 views
Academic · 1 min

LLM Reasoning with Process Rewards for Outcome-Guided Steps

arXiv:2604.02341v1 Announce Type: cross Abstract: Mathematical reasoning in large language models has improved substantially with reinforcement learning using verifiable rewards, where final answers can be …

Mohammad Rezaei, Jens Lehmann, Sahar Vahdati
6 views
Academic · 1 min

Automatic Textbook Formalization

arXiv:2604.03071v1 Announce Type: new Abstract: We present a case study where an automatic AI system formalizes a textbook with more than 500 pages of graduate-level …

Fabian Gloeckle, Ahmad Rammal, Charles Arnal, Remi Munos, Vivien Cabannes, Gabriel Synnaeve, Amaury Hayat
5 views
Academic · 1 min

Agentic-MME: What Agentic Capability Really Brings to Multimodal Intelligence?

arXiv:2604.03016v1 Announce Type: new Abstract: Multimodal Large Language Models (MLLMs) are evolving from passive observers into active agents, solving problems through Visual Expansion (invoking visual …

Qianshan Wei, Yishan Yang, Siyi Wang, Jinglin Chen, Binyu Wang, Jiaming Wang, Shuang Chen, Zechen Li, Yang Shi, Yuqi Tang, Weining Wang, Yi Yu, Chaoyou Fu, Qi Li, Yi-Fan Zhang
5 views