Index Light, Reason Deep: Deferred Visual Ingestion for Visual-Dense Document Question Answering
arXiv:2602.14162v1 Announce Type: new Abstract: Existing multimodal document question answering methods universally adopt a supply-side ingestion strategy: running a Vision-Language Model (VLM) on every page …
Tao Xu
11 views