Where Vision Becomes Text: Locating the OCR Routing Bottleneck in Vision-Language Models
arXiv:2602.22918v1 Announce Type: new Abstract: Vision-language models (VLMs) can read text from images, but where does this optical character recognition (OCR) information enter the language …
Jonathan Steinberg, Oren Gal
7 views