Skip to main content
Academic

NoRD: A Data-Efficient Vision-Language-Action Model that Drives without Reasoning

arXiv:2602.21172v1 Announce Type: new Abstract: Vision-Language-Action (VLA) models are advancing autonomous driving by replacing modular pipelines with unified end-to-end architectures. However, current VLAs face two expensive requirements: (1) massive dataset collection, and (2) dense reasoning annotations. In this work, we address both challenges with \modelname (\textbf{No} \textbf{R}easoning for \textbf{D}riving). Compared to existing VLAs, \modelname achieves competitive performance while being fine-tuned on $<$60\% of the data and no reasoning annotations, resulting in 3$\times$ fewer tokens. We identify that standard Group Relative Policy Optimization (GRPO) fails to yield significant improvements when applied to policies trained on such small, reasoning-free datasets. We show that this limitation stems from difficulty bias, which disproportionately penalizes reward signals from scenarios that produce high-variance rollouts within GRPO. \modelname overcomes this by incorporati

I
Ishaan Rawal, Shubh Gupta, Yihan Hu, Wei Zhan
· · 1 min read · 0 views

arXiv:2602.21172v1 Announce Type: new Abstract: Vision-Language-Action (VLA) models are advancing autonomous driving by replacing modular pipelines with unified end-to-end architectures. However, current VLAs face two expensive requirements: (1) massive dataset collection, and (2) dense reasoning annotations. In this work, we address both challenges with \modelname (\textbf{No} \textbf{R}easoning for \textbf{D}riving). Compared to existing VLAs, \modelname achieves competitive performance while being fine-tuned on $<$60\% of the data and no reasoning annotations, resulting in 3$\times$ fewer tokens. We identify that standard Group Relative Policy Optimization (GRPO) fails to yield significant improvements when applied to policies trained on such small, reasoning-free datasets. We show that this limitation stems from difficulty bias, which disproportionately penalizes reward signals from scenarios that produce high-variance rollouts within GRPO. \modelname overcomes this by incorporating Dr.~GRPO, a recent algorithm designed to mitigate difficulty bias in LLMs. As a result, \modelname achieves competitive performance on Waymo and NAVSIM with a fraction of the training data and no reasoning overhead, enabling more efficient autonomous systems.

Executive Summary

The paper proposes a novel vision-language-action model, NoRD, which achieves competitive performance in autonomous driving tasks with significantly reduced dataset requirements and no reasoning annotations. By incorporating Dr. GRPO, an algorithm designed to mitigate difficulty bias, NoRD overcomes the limitations of standard Group Relative Policy Optimization (GRPO) and achieves results comparable to existing VLAs with a fraction of the training data. This breakthrough has significant implications for the development of efficient autonomous systems. The authors' innovative approach addresses two major challenges in the field, making NoRD a promising solution for real-world applications.

Key Points

  • NoRD is a data-efficient vision-language-action model that achieves competitive performance in autonomous driving tasks.
  • NoRD requires only 60% of the data and no reasoning annotations, resulting in 3x fewer tokens.
  • The authors address difficulty bias in GRPO by incorporating Dr. GRPO, an algorithm designed to mitigate difficulty bias.

Merits

Strength in Efficiency

NoRD's ability to achieve competitive performance with significantly reduced dataset requirements and no reasoning annotations is a significant strength. This efficiency is crucial in real-world applications where data collection and annotation can be time-consuming and costly.

Demerits

Limitation in Generalizability

The authors' approach may not generalize well to other tasks or domains where difficulty bias is not a significant issue. Further research is needed to understand the limitations of NoRD in such scenarios.

Expert Commentary

The authors' innovative approach to addressing difficulty bias in GRPO is a significant contribution to the field of autonomous driving. By incorporating Dr. GRPO, NoRD achieves competitive performance with significantly reduced dataset requirements and no reasoning annotations. This breakthrough has significant implications for the development of efficient autonomous systems. However, further research is needed to understand the limitations of NoRD in scenarios where difficulty bias is not a significant issue. Additionally, policymakers need to consider the implications of NoRD on the development of autonomous systems, including issues related to safety, security, and liability.

Recommendations

  • Further research is needed to understand the limitations of NoRD in scenarios where difficulty bias is not a significant issue.
  • Policymakers should consider the implications of NoRD on the development of autonomous systems, including issues related to safety, security, and liability.

Sources