Tag: cs.MM

#cs.MM

Latest First Most Viewed Alphabetical

All Conference (266) Law Review (314) Academic (4957) Think Tank (60) News (791) Journal (139) Technology & AI (4) Business & Strategy (1) Finance & Economics (2) Legal & Compliance (1) Innovation & Research (0) International Affairs (2) Cybersecurity (2) Healthcare & Biotech (2)

Academic · 1 min

From Load Tests to Live Streams: Graph Embedding-Based Anomaly Detection in Microservice Architectures

arXiv:2604.06448v1 Announce Type: new Abstract: Prime Video regularly conducts load tests to simulate the viewer traffic spikes seen during live events such as Thursday Night …

Srinidhi Madabhushi, Pranesh Vyas, Swathi Vaidyanathan, Mayur Kurup, Elliott Nash, Yegor Silyutin

22 views Apr 9

Academic · 1 min

LightThinker++: From Reasoning Compression to Memory Management

arXiv:2604.03679v1 Announce Type: new Abstract: Large language models (LLMs) excel at complex reasoning, yet their efficiency is limited by the surging cognitive overhead of long …

Yuqi Zhu, Jintian Zhang, Zhenjie Wan, Yujie Luo, Shuofei Qiao, Zhengke Gui, Da Zheng, Lei Liang, Huajun Chen, Ningyu Zhang

18 views Apr 7

Academic · 1 min

Beyond Final Answers: CRYSTAL Benchmark for Transparent Multimodal Reasoning Evaluation

arXiv:2603.13099v1 Announce Type: new Abstract: We introduce **CRYSTAL** (*__C__lear __R__easoning via __Y__ielded __S__teps, __T__raceability and __L__ogic*), a diagnostic benchmark with 6,372 instances that evaluates multimodal …

Wayner Barrios, SouYoung Jin

20 views Mar 17

Academic · 1 min

VoxEmo: Benchmarking Speech Emotion Recognition with Speech LLMs

arXiv:2603.08936v1 Announce Type: cross Abstract: Speech Large Language Models (LLMs) show great promise for speech emotion recognition (SER) via generative interfaces. However, shifting from closed-set …

Hezhao Zhang, Huang-Cheng Chou, Shrikanth Narayanan, Thomas Hain

29 views Mar 11

Academic · 1 min

VDCook:DIY video data cook your MLLMs

arXiv:2603.05539v1 Announce Type: cross Abstract: We introduce VDCook: a self-evolving video data operating system, a configurable video data construction platform for researchers and vertical domain …

Chengwei Wu

22 views Mar 9

Academic · 1 min

Human-Data Interaction, Exploration, and Visualization in the AI Era: Challenges and Opportunities

arXiv:2603.05542v1 Announce Type: cross Abstract: The rapid advancement of AI is transforming human-centered systems, with profound implications for human-AI interaction, human-data interaction, and visual analytics. …

Jean-Daniel Fekete, Yifan Hu, Dominik Moritz, Arnab Nandi, Senjuti Basu Roy, Eugene Wu, Nikos Bikakis, George Papastefanatos, Panos K. Chrysanthis, Guoliang Li, Lingyun Yu

45 views Mar 9

Academic · 1 min

Omni-C: Compressing Heterogeneous Modalities into a Single Dense Encoder

arXiv:2603.05528v1 Announce Type: cross Abstract: Recent multimodal systems often rely on separate expert modality encoders which cause linearly scaling complexity and computational overhead with added …

Kin Wai Lau, Yasar Abbas Ur Rehman, Lai-Man Po, Pedro Porto Buarque de Gusm\~ao

28 views Mar 9

Academic · 1 min

OmniCustom: Sync Audio-Video Customization Via Joint Audio-Video Generation Model

arXiv:2602.12304v1 Announce Type: cross Abstract: Existing mainstream video customization methods focus on generating identity-consistent videos based on given reference images and textual prompts. Benefiting from …

Maomao Li, Zhen Li, Kaipeng Zhang, Guosheng Yin, Zhifeng Li, Dong Xu

35 views Mar 7

Academic · 1 min

DesignAsCode: Bridging Structural Editability and Visual Fidelity in Graphic Design Generation

arXiv:2602.17690v1 Announce Type: cross Abstract: Graphic design generation demands a delicate balance between high visual fidelity and fine-grained structural editability. However, existing approaches typically bifurcate …

Ziyuan Liu, Shizhao Sun, Danqing Huang, Yingdong Shi, Meisheng Zhang, Ji Li, Jingsong Yu, Jiang Bian

40 views Mar 7

Academic · 1 min

OmniGAIA: Towards Native Omni-Modal AI Agents

arXiv:2602.22897v1 Announce Type: new Abstract: Human intelligence naturally intertwines omni-modal perception -- spanning vision, audio, and language -- with complex reasoning and tool usage to …

Xiaoxi Li, Wenxiang Jiao, Jiarui Jin, Shijian Wang, Guanting Dong, Jiajie Jin, Hao Wang, Yinuo Wang, Ji-Rong Wen, Yuan Lu, Zhicheng Dou

27 views Mar 1

Academic · 1 min

Understanding the Fine-Grained Knowledge Capabilities of Vision-Language Models

arXiv:2602.17871v1 Announce Type: cross Abstract: Vision-language models (VLMs) have made substantial progress across a wide range of visual question answering benchmarks, spanning visual reasoning, document …

Dhruba Ghosh, Yuhui Zhang, Ludwig Schmidt

37 views Feb 24

Academic · 1 min

S-PRESSO: Ultra Low Bitrate Sound Effect Compression With Diffusion Autoencoders And Offline Quantization

arXiv:2602.15082v1 Announce Type: cross Abstract: Neural audio compression models have recently achieved extreme compression rates, enabling efficient latent generative modeling. Conversely, latent generative models have …

Zineb Lahrichi (IP Paris), Ga\"etan Hadjeres (IP Paris), Ga\"el Richard (IP Paris), Geoffroy Peeters (IP Paris)

47 views Feb 23

1 2

#cs.MM

From Load Tests to Live Streams: Graph Embedding-Based Anomaly Detection in Microservice Architectures

LightThinker++: From Reasoning Compression to Memory Management

Beyond Final Answers: CRYSTAL Benchmark for Transparent Multimodal Reasoning Evaluation

VoxEmo: Benchmarking Speech Emotion Recognition with Speech LLMs

VDCook:DIY video data cook your MLLMs

Human-Data Interaction, Exploration, and Visualization in the AI Era: Challenges and Opportunities

Omni-C: Compressing Heterogeneous Modalities into a Single Dense Encoder

OmniCustom: Sync Audio-Video Customization Via Joint Audio-Video Generation Model

DesignAsCode: Bridging Structural Editability and Visual Fidelity in Graphic Design Generation

OmniGAIA: Towards Native Omni-Modal AI Agents

Understanding the Fine-Grained Knowledge Capabilities of Vision-Language Models

S-PRESSO: Ultra Low Bitrate Sound Effect Compression With Diffusion Autoencoders And Offline Quantization

JCG, PC

HSOLLC Co., Ltd.