Academic

A Survey of Reasoning in Autonomous Driving Systems: Open Challenges and Emerging Paradigms

arXiv:2603.11093v1 Announce Type: new Abstract: The development of high-level autonomous driving (AD) is shifting from perception-centric limitations to a more fundamental bottleneck, namely, a deficit in robust and generalizable reasoning. Although current AD systems manage structured environments, they consistently falter in long-tail scenarios and complex social interactions that require human-like judgment. Meanwhile, the advent of large language and multimodal models (LLMs and MLLMs) presents a transformative opportunity to integrate a powerful cognitive engine into AD systems, moving beyond pattern matching toward genuine comprehension. However, a systematic framework to guide this integration is critically lacking. To bridge this gap, we provide a comprehensive review of this emerging field and argue that reasoning should be elevated from a modular component to the system's cognitive core. Specifically, we first propose a novel Cognitive Hierarchy to decompose the monolithic dr

arXiv:2603.11093v1 Announce Type: new Abstract: The development of high-level autonomous driving (AD) is shifting from perception-centric limitations to a more fundamental bottleneck, namely, a deficit in robust and generalizable reasoning. Although current AD systems manage structured environments, they consistently falter in long-tail scenarios and complex social interactions that require human-like judgment. Meanwhile, the advent of large language and multimodal models (LLMs and MLLMs) presents a transformative opportunity to integrate a powerful cognitive engine into AD systems, moving beyond pattern matching toward genuine comprehension. However, a systematic framework to guide this integration is critically lacking. To bridge this gap, we provide a comprehensive review of this emerging field and argue that reasoning should be elevated from a modular component to the system's cognitive core. Specifically, we first propose a novel Cognitive Hierarchy to decompose the monolithic driving task according to its cognitive and interactive complexity. Building on this, we further derive and systematize seven core reasoning challenges, such as the responsiveness-reasoning trade-off and social-game reasoning. Furthermore, we conduct a dual-perspective review of the state-of-the-art, analyzing both system-centric approaches to architecting intelligent agents and evaluation-centric practices for their validation. Our analysis reveals a clear trend toward holistic and interpretable "glass-box" agents. In conclusion, we identify a fundamental and unresolved tension between the high-latency, deliberative nature of LLM-based reasoning and the millisecond-scale, safety-critical demands of vehicle control. For future work, a primary objective is to bridge the symbolic-to-physical gap by developing verifiable neuro-symbolic architectures, robust reasoning under uncertainty, and scalable models for implicit social negotiation.

Executive Summary

This article addresses a critical shift in autonomous driving (AD) from perception-centric limitations to a fundamental bottleneck in robust reasoning. While current systems excel in structured environments, they falter in complex, long-tail scenarios requiring human-like judgment. The emergence of LLMs and MLLMs presents a transformative opportunity to integrate cognitive engines into AD systems, moving beyond pattern matching to genuine comprehension. The authors propose a novel Cognitive Hierarchy to decompose driving tasks by cognitive complexity and identify seven core reasoning challenges, offering a dual-perspective review of system-centric and evaluation-centric practices. The analysis highlights a growing trend toward interpretable 'glass-box' agents and identifies a unresolved tension between LLM-based deliberative reasoning and safety-critical millisecond demands. The work calls for bridging the symbolic-to-physical gap via neuro-symbolic architectures and robust reasoning under uncertainty.

Key Points

  • Shift from perception to reasoning as the central bottleneck in AD
  • Introduction of Cognitive Hierarchy to structure complexity in AD tasks
  • Identification of seven core reasoning challenges impacting generalizability and adaptability

Merits

Comprehensive Framework

The authors provide a novel and structured framework (Cognitive Hierarchy) that offers clarity on how to integrate reasoning into AD systems at a foundational level.

Interdisciplinary Relevance

The article effectively bridges AI, cognitive science, and systems engineering by proposing a unified approach to reasoning in autonomous systems.

Forward-Looking Recommendations

The identification of unresolved tensions (e.g., latency vs. safety) and suggested solutions (neuro-symbolic architectures) position the work as a catalyst for future research directions.

Demerits

Ambiguity in Implementation

While the Cognitive Hierarchy and reasoning challenges are well-articulated, the article lacks concrete examples or case studies illustrating how these concepts translate into real-world AD implementations.

Evaluation Gap

The review of state-of-the-art practices is descriptive; a comparative evaluation of specific systems or metrics to validate the proposed framework is absent.

Expert Commentary

The article represents a significant contribution to the discourse on autonomous driving by elevating reasoning from a modular component to a central cognitive pillar. Historically, AD systems have been designed as reactive architectures, optimized for speed and deterministic outcomes. The authors’ shift toward embedding reasoning as a core cognitive engine marks a paradigm shift, particularly in acknowledging the necessity of human-like judgment in complex social interactions. Their proposal to integrate LLMs/MLLMs as cognitive engines is timely, given the rapid advances in multimodal modeling. However, the article’s greatest strength—its conceptual rigor—also presents a practical hurdle: translating abstract cognitive frameworks into quantifiable, performance-oriented metrics for vehicle control remains an open problem. The tension between LLM-based reasoning latency and millisecond safety constraints is not merely a technical issue; it is a legal and ethical dilemma requiring interdisciplinary adjudication. Future work must move beyond theoretical constructs to include empirical validation through hybrid architectures that balance interpretability with real-time performance. Without such bridging efforts, the promise of LLMs in AD may remain confined to academic discourse.

Recommendations

  • Develop standardized benchmarks for evaluating reasoning capabilities in AD systems under uncertain and complex scenarios.
  • Encourage interdisciplinary working groups to integrate legal, ethical, and engineering perspectives in designing verifiable neuro-symbolic architectures.

Sources