Understanding the Fine-Grained Knowledge Capabilities of Vision-Language Models
arXiv:2602.17871v1 Announce Type: cross Abstract: Vision-language models (VLMs) have made substantial progress across a wide range of visual question answering benchmarks, spanning visual reasoning, document …
Dhruba Ghosh, Yuhui Zhang, Ludwig Schmidt
6 views