Tag: cs.MM

#cs.MM

Academic · 1 min

LightThinker++: From Reasoning Compression to Memory Management

arXiv:2604.03679v1 Announce Type: new Abstract: Large language models (LLMs) excel at complex reasoning, yet their efficiency is limited by the surging cognitive overhead of long …

Yuqi Zhu, Jintian Zhang, Zhenjie Wan, Yujie Luo, Shuofei Qiao, Zhengke Gui, Da Zheng, Lei Liang, Huajun Chen, Ningyu Zhang
18 views
Academic · 1 min

VoxEmo: Benchmarking Speech Emotion Recognition with Speech LLMs

arXiv:2603.08936v1 Announce Type: cross Abstract: Speech Large Language Models (LLMs) show great promise for speech emotion recognition (SER) via generative interfaces. However, shifting from closed-set …

Hezhao Zhang, Huang-Cheng Chou, Shrikanth Narayanan, Thomas Hain
29 views
Academic · 1 min

VDCook:DIY video data cook your MLLMs

arXiv:2603.05539v1 Announce Type: cross Abstract: We introduce VDCook: a self-evolving video data operating system, a configurable video data construction platform for researchers and vertical domain …

Chengwei Wu
22 views
Academic · 1 min

Human-Data Interaction, Exploration, and Visualization in the AI Era: Challenges and Opportunities

arXiv:2603.05542v1 Announce Type: cross Abstract: The rapid advancement of AI is transforming human-centered systems, with profound implications for human-AI interaction, human-data interaction, and visual analytics. …

Jean-Daniel Fekete, Yifan Hu, Dominik Moritz, Arnab Nandi, Senjuti Basu Roy, Eugene Wu, Nikos Bikakis, George Papastefanatos, Panos K. Chrysanthis, Guoliang Li, Lingyun Yu
45 views
Academic · 1 min

OmniGAIA: Towards Native Omni-Modal AI Agents

arXiv:2602.22897v1 Announce Type: new Abstract: Human intelligence naturally intertwines omni-modal perception -- spanning vision, audio, and language -- with complex reasoning and tool usage to …

Xiaoxi Li, Wenxiang Jiao, Jiarui Jin, Shijian Wang, Guanting Dong, Jiajie Jin, Hao Wang, Yinuo Wang, Ji-Rong Wen, Yuan Lu, Zhicheng Dou
27 views