FUTURE-VLA: Forecasting Unified Trajectories Under Real-time Execution
arXiv:2602.15882v1 Announce Type: cross Abstract: General vision-language models increasingly support unified spatiotemporal reasoning over long video streams, yet deploying such capabilities on robots remains constrained …
Jingjing Fan, Yushan Liu, Shoujie Li, Botao Ren, Siyuan Li, Xiao-Ping Zhang, Wenbo Ding, Zhidong Deng
7 views