Academic

Characterizing MARL for Energy Control: A Multi-KPI Benchmark on the CityLearn Environment

Aymen Khouja, Imen Jendoubi, Oumayma Mahjoub, Oussama Mahfoudhi, Claude Formanek, Siddarth Singh, Ruan De Kock · March 7, 2026 · 1 min read · 4 views

#cs.AI #cs.LG #cs.MA

arXiv:2602.19223v1 Announce Type: new Abstract: The optimization of urban energy systems is crucial for the advancement of sustainable and resilient smart cities, which are becoming increasingly complex with multiple decision-making units. To address scalability and coordination concerns, Multi-Agent Reinforcement Learning (MARL) is a promising solution. This paper addresses the imperative need for comprehensive and reliable benchmarking of MARL algorithms on energy management tasks. CityLearn is used as a case study environment because it realistically simulates urban energy systems, incorporates multiple storage systems, and utilizes renewable energy sources. By doing so, our work sets a new standard for evaluation, conducting a comparative study across multiple key performance indicators (KPIs). This approach illuminates the key strengths and weaknesses of various algorithms, moving beyond traditional KPI averaging which often masks critical insights. Our experiments utilize widely accepted baselines such as Proximal Policy Optimization (PPO) and Soft Actor Critic (SAC), and encompass diverse training schemes including Decentralized Training with Decentralized Execution (DTDE) and Centralized Training with Decentralized Execution (CTDE) approaches and different neural network architectures. Our work also proposes novel KPIs that tackle real world implementation challenges such as individual building contribution and battery storage lifetime. Our findings show that DTDE consistently outperforms CTDE in both average and worst-case performance. Additionally, temporal dependency learning improved control on memory dependent KPIs such as ramping and battery usage, contributing to more sustainable battery operation. Results also reveal robustness to agent or resource removal, highlighting both the resilience and decentralizability of the learned policies.

Executive Summary

This study contributes to the development of sustainable and resilient smart cities by proposing a comprehensive benchmark for Multi-Agent Reinforcement Learning (MARL) algorithms in energy management tasks. The authors utilize the CityLearn environment, a realistic simulation of urban energy systems, to evaluate the performance of various MARL algorithms across multiple key performance indicators (KPIs). The study highlights the strengths and weaknesses of different algorithms, including the consistently better performance of Decentralized Training with Decentralized Execution (DTDE) and the improved control achieved through temporal dependency learning. The findings also demonstrate the robustness of the learned policies to agent or resource removal, showcasing the resilience and decentralizability of MARL in energy management. This work sets a new standard for evaluation and has significant implications for the development of sustainable smart cities.

Key Points

▸ The study proposes a multi-KPI benchmark for MARL algorithms in energy management tasks.
▸ The CityLearn environment is used as a realistic simulation of urban energy systems for evaluation.
▸ Decentralized Training with Decentralized Execution (DTDE) consistently outperforms Centralized Training with Decentralized Execution (CTDE).
▸ Temporal dependency learning improves control on memory-dependent KPIs such as ramping and battery usage.

Merits

Comprehensive Evaluation

The study proposes a multi-KPI benchmark, providing a more comprehensive evaluation of MARL algorithms in energy management tasks.

Realistic Simulation

The use of the CityLearn environment provides a realistic simulation of urban energy systems, enhancing the validity of the study's findings.

Improved Control

Temporal dependency learning improves control on memory-dependent KPIs, contributing to more sustainable battery operation.

Demerits

Limited Generalizability

The study's findings may not be directly generalizable to other energy management tasks or environments, limiting its applicability.

Overreliance on a Single Environment

The study's reliance on the CityLearn environment may limit its generalizability and the development of more comprehensive benchmarks.

Expert Commentary

This study is a significant contribution to the field of energy management in smart cities, highlighting the potential of MARL algorithms in decentralized decision-making. The use of the CityLearn environment provides a realistic simulation of urban energy systems, enhancing the validity of the study's findings. However, the study's reliance on a single environment may limit its generalizability, and further research is needed to develop more comprehensive benchmarks. The study's emphasis on the importance of comprehensive evaluation highlights the need for policymakers to prioritize the development of robust and effective benchmarks for energy management.

Recommendations

✓ Future research should focus on developing more comprehensive benchmarks for MARL algorithms in energy management tasks, incorporating diverse environments and tasks.
✓ Policymakers should consider incorporating decentralized decision-making mechanisms into energy management policies, leveraging the potential of MARL in energy management.

Sources

arXiv - cs.AI

Characterizing MARL for Energy Control: A Multi-KPI Benchmark on the CityLearn Environment

AI Commentary

Executive Summary

Key Points

Merits

Comprehensive Evaluation

Realistic Simulation

Improved Control

Demerits

Limited Generalizability

Overreliance on a Single Environment

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs