Academic

[Re] FairDICE: A Gap Between Theory And Practice

arXiv:2603.03454v1 Announce Type: new Abstract: Offline Reinforcement Learning (RL) is an emerging field of RL in which policies are learned solely from demonstrations. Within offline RL, some environments involve balancing multiple objectives, but existing multi-objective offline RL algorithms do not provide an efficient way to find a fair compromise. FairDICE (see arXiv:2506.08062v2) seeks to fill this gap by adapting OptiDICE (an offline RL algorithm) to automatically learn weights for multiple objectives to e.g.\ incentivise fairness among objectives. As this would be a valuable contribution, this replication study examines the replicability of claims made regarding FairDICE. We find that many theoretical claims hold, but an error in the code reduces FairDICE to standard behaviour cloning in continuous environments, and many important hyperparameters were originally underspecified. After rectifying this, we show in experiments extending the original paper that FairDICE can scale t

Peter Adema, Karim Galliamov, Aleksey Evstratovskiy, Ross Geurts · March 6, 2026 · 1 min read · 18 views

#cs.LG

Executive Summary

The article examines the replicability of FairDICE, a multi-objective offline Reinforcement Learning algorithm. While theoretical claims hold, an error in the code reduces FairDICE to standard behavior cloning in continuous environments. After rectification, experiments show FairDICE can scale to complex environments, but relies on online hyperparameter tuning. The study concludes that FairDICE is theoretically interesting, but its experimental justification requires revision.

Key Points

▸ FairDICE's code error reduces its functionality in continuous environments
▸ Theoretical claims of FairDICE hold, but experimental justification is flawed
▸ FairDICE can scale to complex environments with proper hyperparameter tuning

Merits

Theoretical Foundation

FairDICE's adaptation of OptiDICE provides a solid theoretical basis for multi-objective offline RL

Scalability

FairDICE can handle complex environments and high-dimensional rewards with proper tuning

Demerits

Code Error

The error in FairDICE's code reduces its functionality in continuous environments

Hyperparameter Tuning

FairDICE's reliance on online hyperparameter tuning can be a significant limitation

Expert Commentary

The study's findings highlight the importance of rigorous testing and validation in RL research. While FairDICE's theoretical foundation is sound, its experimental justification requires significant revision. The implications of this study are far-reaching, with potential applications in areas like fairness and multi-objective optimization. However, the limitations of FairDICE, including its reliance on online hyperparameter tuning, must be carefully considered in future research and development.

Recommendations

✓ Conduct thorough code reviews and testing to ensure the accuracy of RL algorithms
✓ Develop more efficient and effective methods for hyperparameter tuning in offline RL
✓ Explore applications of FairDICE in real-world domains, with careful consideration of its limitations

Sources

arXiv - cs.LG

[Re] FairDICE: A Gap Between Theory And Practice

AI Commentary

Executive Summary

Key Points

Merits

Theoretical Foundation

Scalability

Demerits

Code Error

Hyperparameter Tuning

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs