Modality Collapse as Mismatched Decoding: Information-Theoretic Limits of Multimodal LLMs
arXiv:2602.23136v1 Announce Type: new Abstract: Multimodal LLMs can process speech and images, but they cannot hear a speaker's voice or see an object's texture. We …
Jayadev Billa
15 views