CoMaTrack: Competitive Multi-Agent Game-Theoretic Tracking with Vision-Language-Action Models
arXiv:2603.22846v1 Announce Type: new Abstract: Embodied Visual Tracking (EVT), a core dynamic task in embodied intelligence, requires an agent to precisely follow a language-specified target. …
Youzhi Liu, Li Gao, Liu Liu, Mingyang Lv, Yang Cai
5 views