Bench-MFG: A Benchmark Suite for Learning in Stationary Mean Field Games
arXiv:2602.12517v1 Announce Type: new Abstract: The intersection of Mean Field Games (MFGs) and Reinforcement Learning (RL) has fostered a growing family of algorithms designed to solve large-scale multi-agent systems. However, the field currently lacks a standardized evaluation protocol, forcing researchers to rely on bespoke, isolated, and often simplistic environments. This fragmentation makes it difficult to assess the robustness, generalization, and failure modes of emerging methods. To address this gap, we propose a comprehensive benchmark suite for MFGs (Bench-MFG), focusing on the discrete-time, discrete-space, stationary setting for the sake of clarity. We introduce a taxonomy of problem classes, ranging from no-interaction and monotone games to potential and dynamics-coupled games, and provide prototypical environments for each. Furthermore, we propose MF-Garnets, a method for generating random MFG instances to facilitate rigorous statistical testing. We benchmark a variety
arXiv:2602.12517v1 Announce Type: new Abstract: The intersection of Mean Field Games (MFGs) and Reinforcement Learning (RL) has fostered a growing family of algorithms designed to solve large-scale multi-agent systems. However, the field currently lacks a standardized evaluation protocol, forcing researchers to rely on bespoke, isolated, and often simplistic environments. This fragmentation makes it difficult to assess the robustness, generalization, and failure modes of emerging methods. To address this gap, we propose a comprehensive benchmark suite for MFGs (Bench-MFG), focusing on the discrete-time, discrete-space, stationary setting for the sake of clarity. We introduce a taxonomy of problem classes, ranging from no-interaction and monotone games to potential and dynamics-coupled games, and provide prototypical environments for each. Furthermore, we propose MF-Garnets, a method for generating random MFG instances to facilitate rigorous statistical testing. We benchmark a variety of learning algorithms across these environments, including a novel black-box approach (MF-PSO) for exploitability minimization. Based on our extensive empirical results, we propose guidelines to standardize future experimental comparisons. Code available at \href{https://github.com/lorenzomagnino/Bench-MFG}{https://github.com/lorenzomagnino/Bench-MFG}.
Executive Summary
The article introduces Bench-MFG, a comprehensive benchmark suite for evaluating learning algorithms in stationary Mean Field Games (MFGs). It addresses the lack of standardized evaluation protocols in the field, proposing a taxonomy of problem classes and prototypical environments. The study also introduces MF-Garnets for generating random MFG instances and benchmarks various algorithms, including a novel black-box approach (MF-PSO). The authors provide guidelines for future experimental comparisons and make their code available for further research.
Key Points
- ▸ Introduction of Bench-MFG to standardize evaluation of MFG algorithms.
- ▸ Taxonomy of problem classes and prototypical environments for MFGs.
- ▸ Development of MF-Garnets for generating random MFG instances.
- ▸ Benchmarking of various algorithms, including MF-PSO.
- ▸ Proposal of guidelines for future experimental comparisons.
Merits
Comprehensive Benchmark Suite
Bench-MFG provides a standardized and rigorous framework for evaluating MFG algorithms, addressing a significant gap in the field.
Taxonomy and Prototypical Environments
The proposed taxonomy and environments cover a wide range of MFG scenarios, facilitating a thorough assessment of algorithm performance.
Novel Algorithm Introduction
The introduction of MF-PSO offers a new approach to exploitability minimization, contributing to the methodological diversity in the field.
Demerits
Limited Scope
The focus on discrete-time, discrete-space, stationary settings may limit the applicability of the benchmark suite to other MFG scenarios.
Empirical Focus
The study relies heavily on empirical results, which may not fully capture the theoretical nuances of MFG algorithms.
Expert Commentary
The introduction of Bench-MFG represents a significant step forward in the standardization of evaluation protocols for Mean Field Games. The comprehensive benchmark suite addresses a critical gap in the field, providing researchers with a rigorous framework for assessing the robustness and generalization of various algorithms. The proposed taxonomy and prototypical environments cover a wide range of scenarios, ensuring a thorough evaluation of algorithm performance. The development of MF-Garnets for generating random MFG instances further enhances the robustness of the benchmark suite, facilitating rigorous statistical testing. The introduction of MF-PSO offers a novel approach to exploitability minimization, contributing to the methodological diversity in the field. However, the focus on discrete-time, discrete-space, stationary settings may limit the applicability of the benchmark suite to other MFG scenarios. Additionally, the study's reliance on empirical results may not fully capture the theoretical nuances of MFG algorithms. Despite these limitations, Bench-MFG provides a valuable tool for researchers and practitioners, fostering the development of more robust and generalizable MFG algorithms. The guidelines proposed for future experimental comparisons will likely become a standard in the field, ensuring comparability and reproducibility of results. Overall, the study represents a significant advancement in the field of Mean Field Games and multi-agent systems, with broad implications for both practical applications and policy decisions.
Recommendations
- ✓ Expand the benchmark suite to include continuous-time and continuous-space settings to broaden its applicability.
- ✓ Incorporate theoretical analyses to complement the empirical results, providing a more comprehensive understanding of algorithm performance.