TPRU: Advancing Temporal and Procedural Understanding in Large Multimodal Models
arXiv:2602.18884v1 Announce Type: new Abstract: Multimodal Large Language Models (MLLMs), particularly smaller, deployable variants, exhibit a critical deficiency in understanding temporal and procedural visual data, …
Zhenkun Gao, Xuhong Wang, Xin Tan, Yuan Xie
3 views