Nodes Are Early, Edges Are Late: Probing Diagram Representations in Large Vision-Language Models
arXiv:2603.02865v1 Announce Type: new Abstract: Large vision-language models (LVLMs) demonstrate strong performance on diagram understanding benchmarks, yet they still struggle with understanding relationships between elements, …
Haruto Yoshida, Keito Kudo, Yoichi Aoki, Ryota Tanaka, Itsumi Saito, Keisuke Sakaguchi, Kentaro Inui
6 views