Tag: cs.SD

#cs.SD

Academic · 1 min

Do Audio-Visual Large Language Models Really See and Hear?

arXiv:2604.02605v1 Announce Type: new Abstract: Audio-Visual Large Language Models (AVLLMs) are emerging as unified interfaces to multimodal perception. We present the first mechanistic interpretability study …

Ramaneswaran Selvakumar, Kaousheik Jayakumar, S Sakshi, Sreyan Ghosh, Ruohan Gao, Dinesh Manocha
7 views
Academic · 1 min

Abjad-Kids: An Arabic Speech Classification Dataset for Primary Education

arXiv:2603.20255v1 Announce Type: new Abstract: Speech-based AI educational applications have gained significant interest in recent years, particularly for children. However, children speech research remains limited …

Abdul Aziz Snoubara, Baraa Al_Maradni, Haya Al_Naal, Malek Al_Madrmani, Roaa Jdini, Seedra Zarzour, Khloud Al Jallad
33 views
Academic · 1 min

PARSA-Bench: A Comprehensive Persian Audio-Language Model Benchmark

arXiv:2603.14456v1 Announce Type: new Abstract: Persian poses unique audio understanding challenges through its classical poetry, traditional music, and pervasive code-switching - none captured by existing …

Mohammad Javad Ranjbar Kalahroodi, Mohammad Amini, Parmis Bathayan, Heshaam Faili, Azadeh Shakery
30 views