Academic

RelBench v2: A Large-Scale Benchmark and Repository for Relational Data

arXiv:2602.12606v1 Announce Type: new Abstract: Relational deep learning (RDL) has emerged as a powerful paradigm for learning directly on relational databases by modeling entities and their relationships across multiple interconnected tables. As this paradigm evolves toward larger models and relational foundation models, scalable and realistic benchmarks are essential for enabling systematic evaluation and progress. In this paper, we introduce RelBench v2, a major expansion of the RelBench benchmark for RDL. RelBench v2 adds four large-scale relational datasets spanning scholarly publications, enterprise resource planning, consumer platforms, and clinical records, increasing the benchmark to 11 datasets comprising over 22 million rows across 29 tables. We further introduce autocomplete tasks, a new class of predictive objectives that require models to infer missing attribute values directly within relational tables while respecting temporal constraints, expanding beyond traditional f

arXiv:2602.12606v1 Announce Type: new Abstract: Relational deep learning (RDL) has emerged as a powerful paradigm for learning directly on relational databases by modeling entities and their relationships across multiple interconnected tables. As this paradigm evolves toward larger models and relational foundation models, scalable and realistic benchmarks are essential for enabling systematic evaluation and progress. In this paper, we introduce RelBench v2, a major expansion of the RelBench benchmark for RDL. RelBench v2 adds four large-scale relational datasets spanning scholarly publications, enterprise resource planning, consumer platforms, and clinical records, increasing the benchmark to 11 datasets comprising over 22 million rows across 29 tables. We further introduce autocomplete tasks, a new class of predictive objectives that require models to infer missing attribute values directly within relational tables while respecting temporal constraints, expanding beyond traditional forecasting tasks constructed via SQL queries. In addition, RelBench v2 expands beyond its native datasets by integrating external benchmarks and evaluation frameworks: we translate event streams from the Temporal Graph Benchmark into relational schemas for unified relational-temporal evaluation, interface with ReDeLEx to provide uniform access to 70+ real-world databases suitable for pretraining, and incorporate 4DBInfer datasets and tasks to broaden multi-table prediction coverage. Experimental results demonstrate that RDL models consistently outperform single-table baselines across autocomplete, forecasting, and recommendation tasks, highlighting the importance of modeling relational structure explicitly.

Executive Summary

RelBench v2 represents a significant advancement in the field of relational deep learning (RDL) by introducing a comprehensive benchmark and repository for relational data. This expanded version includes four large-scale datasets across various domains, totaling 11 datasets with over 22 million rows across 29 tables. The benchmark introduces new autocomplete tasks that require models to infer missing attribute values within relational tables while respecting temporal constraints. Additionally, RelBench v2 integrates external benchmarks and evaluation frameworks, enhancing its utility for pretraining and multi-table prediction. Experimental results demonstrate that RDL models outperform single-table baselines across various tasks, underscoring the importance of modeling relational structure explicitly.

Key Points

  • RelBench v2 expands the benchmark to include four large-scale datasets, totaling 11 datasets with over 22 million rows.
  • Introduction of autocomplete tasks that require models to infer missing attribute values within relational tables.
  • Integration of external benchmarks and evaluation frameworks to enhance pretraining and multi-table prediction capabilities.
  • Experimental results show that RDL models consistently outperform single-table baselines across various tasks.

Merits

Comprehensive Benchmark

RelBench v2 significantly expands the scope and scale of the benchmark, providing a more robust and realistic evaluation framework for RDL models.

Innovative Task Design

The introduction of autocomplete tasks adds a new dimension to predictive objectives, requiring models to handle missing data and temporal constraints.

Integration of External Resources

By integrating external benchmarks and evaluation frameworks, RelBench v2 enhances its utility for pretraining and multi-table prediction, making it a more versatile tool.

Demerits

Complexity and Scalability

The increased complexity and scale of the benchmark may pose challenges for smaller research groups or institutions with limited computational resources.

Data Diversity

While the benchmark covers a wide range of domains, there may still be gaps in certain specialized areas, limiting its applicability in some niche applications.

Expert Commentary

RelBench v2 represents a significant leap forward in the evaluation and development of relational deep learning models. The inclusion of large-scale datasets and innovative task designs, such as autocomplete tasks, provides a more comprehensive and realistic benchmark for assessing model performance. The integration of external benchmarks and evaluation frameworks further enhances the utility of RelBench v2, making it a valuable resource for researchers and practitioners alike. However, the increased complexity and scale of the benchmark may pose challenges for smaller research groups, and the diversity of data may still leave gaps in certain specialized areas. Ensuring data privacy and model interpretability will be crucial as RDL models continue to evolve and find applications in various domains. Overall, RelBench v2 sets a new standard for evaluating relational deep learning models and is poised to drive significant advancements in the field.

Recommendations

  • Researchers should explore methods to enhance the interpretability of RDL models to ensure their practical applicability.
  • Policymakers should establish robust data governance frameworks to address privacy and security concerns associated with large-scale relational datasets.

Sources