Lang2Act: Fine-Grained Visual Reasoning through Self-Emergent Linguistic Toolchains
arXiv:2602.13235v1 Announce Type: new Abstract: Visual Retrieval-Augmented Generation (VRAG) enhances Vision-Language Models (VLMs) by incorporating external visual documents to address a given query. Existing VRAG …
Yuqi Xiong, Chunyi Peng, Zhipeng Xu, Zhenghao Liu, Zulong Chen, Yukun Yan, Shuo Wang, Yu Gu, Ge Yu
3 views