TaCarla: A comprehensive benchmarking dataset for end-to-end autonomous driving
arXiv:2602.23499v1 Announce Type: cross Abstract: Collecting a high-quality dataset is a critical task that demands meticulous attention to detail, as overlooking certain aspects can render the entire dataset unusable. Autonomous driving challenges remain a prominent area of research, requiring further exploration to enhance the perception and planning performance of vehicles. However, existing datasets are often incomplete. For instance, datasets that include perception information generally lack planning data, while planning datasets typically consist of extensive driving sequences where the ego vehicle predominantly drives forward, offering limited behavioral diversity. In addition, many real datasets struggle to evaluate their models, especially for planning tasks, since they lack a proper closed-loop evaluation setup. The CARLA Leaderboard 2.0 challenge, which provides a diverse set of scenarios to address the long-tail problem in autonomous driving, has emerged as a valuable alt
arXiv:2602.23499v1 Announce Type: cross Abstract: Collecting a high-quality dataset is a critical task that demands meticulous attention to detail, as overlooking certain aspects can render the entire dataset unusable. Autonomous driving challenges remain a prominent area of research, requiring further exploration to enhance the perception and planning performance of vehicles. However, existing datasets are often incomplete. For instance, datasets that include perception information generally lack planning data, while planning datasets typically consist of extensive driving sequences where the ego vehicle predominantly drives forward, offering limited behavioral diversity. In addition, many real datasets struggle to evaluate their models, especially for planning tasks, since they lack a proper closed-loop evaluation setup. The CARLA Leaderboard 2.0 challenge, which provides a diverse set of scenarios to address the long-tail problem in autonomous driving, has emerged as a valuable alternative platform for developing perception and planning models in both open-loop and closed-loop evaluation setups. Nevertheless, existing datasets collected on this platform present certain limitations. Some datasets appear to be tailored primarily for limited sensor configuration, with particular sensor configurations. To support end-to-end autonomous driving research, we have collected a new dataset comprising over 2.85 million frames using the CARLA simulation environment for the diverse Leaderboard 2.0 challenge scenarios. Our dataset is designed not only for planning tasks but also supports dynamic object detection, lane divider detection, centerline detection, traffic light recognition, prediction tasks and visual language action models . Furthermore, we demonstrate its versatility by training various models using our dataset. Moreover, we also provide numerical rarity scores to understand how rarely the current state occurs in the dataset.
Executive Summary
This article presents the TaCarla dataset, a comprehensive benchmarking dataset for end-to-end autonomous driving. The dataset was collected using the CARLA simulation environment and comprises over 2.85 million frames across diverse Leaderboard 2.0 challenge scenarios. TaCarla is designed to support various autonomous driving tasks, including planning, dynamic object detection, and visual language action models. The dataset's versatility is demonstrated through the training of multiple models. Additionally, numerical rarity scores are provided to assess the rarity of certain states in the dataset. This contribution addresses the need for high-quality datasets in autonomous driving research and provides a valuable resource for the development of perception and planning models.
Key Points
- ▸ TaCarla is a comprehensive benchmarking dataset for end-to-end autonomous driving
- ▸ The dataset was collected using the CARLA simulation environment
- ▸ TaCarla comprises over 2.85 million frames across diverse scenarios
- ▸ The dataset supports various autonomous driving tasks, including planning and object detection
Merits
Comprehensive Coverage
TaCarla covers a wide range of scenarios and tasks, making it a valuable resource for autonomous driving research
High-Quality Data
The dataset's large size and diverse scenarios provide a high-quality benchmark for evaluating perception and planning models
Versatility
TaCarla can be used to train models for various tasks, including planning, object detection, and visual language action models
Demerits
Limited Real-World Data
The dataset was collected using the CARLA simulation environment, which may not accurately represent real-world driving scenarios
Dependence on Simulation
The dataset's quality and relevance may depend on the accuracy of the CARLA simulation environment
Expert Commentary
The TaCarla dataset represents a significant contribution to the field of autonomous driving research. Its comprehensive coverage of scenarios and tasks, combined with its high-quality data, make it an ideal benchmark for evaluating perception and planning models. However, the dataset's limitations, such as its dependence on simulation, should be carefully considered. As the field of autonomous driving continues to evolve, the development of high-quality datasets like TaCarla will be essential for advancing our understanding of perception and planning in autonomous vehicles.
Recommendations
- ✓ Researchers and developers should utilize TaCarla to develop and evaluate perception and planning models for autonomous vehicles
- ✓ Future research should focus on creating high-quality datasets that can accurately represent real-world driving scenarios