When More Is Less: A Systematic Analysis of Spatial and Commonsense Information for Visual Spatial …
arXiv:2602.21619v1 Announce Type: new Abstract: Visual spatial reasoning (VSR) remains challenging for modern vision-language models (VLMs), despite advances in multimodal architectures. A common strategy is …