[Re] FairDICE: A Gap Between Theory And Practice
arXiv:2603.03454v1 Announce Type: new Abstract: Offline Reinforcement Learning (RL) is an emerging field of RL in which policies are learned solely from demonstrations. Within offline RL, some environments involve balancing multiple objectives, but existing multi-objective offline RL algorithms do not provide an efficient way to find a fair compromise. FairDICE (see arXiv:2506.08062v2) seeks to fill this gap by adapting OptiDICE (an offline RL algorithm) to automatically learn weights for multiple objectives to e.g.\ incentivise fairness among objectives. As this would be a valuable contribution, this replication study examines the replicability of claims made regarding FairDICE. We find that many theoretical claims hold, but an error in the code reduces FairDICE to standard behaviour cloning in continuous environments, and many important hyperparameters were originally underspecified. After rectifying this, we show in experiments extending the original paper that FairDICE can scale t
arXiv:2603.03454v1 Announce Type: new Abstract: Offline Reinforcement Learning (RL) is an emerging field of RL in which policies are learned solely from demonstrations. Within offline RL, some environments involve balancing multiple objectives, but existing multi-objective offline RL algorithms do not provide an efficient way to find a fair compromise. FairDICE (see arXiv:2506.08062v2) seeks to fill this gap by adapting OptiDICE (an offline RL algorithm) to automatically learn weights for multiple objectives to e.g.\ incentivise fairness among objectives. As this would be a valuable contribution, this replication study examines the replicability of claims made regarding FairDICE. We find that many theoretical claims hold, but an error in the code reduces FairDICE to standard behaviour cloning in continuous environments, and many important hyperparameters were originally underspecified. After rectifying this, we show in experiments extending the original paper that FairDICE can scale to complex environments and high-dimensional rewards, though it can be reliant on (online) hyperparameter tuning. We conclude that FairDICE is a theoretically interesting method, but the experimental justification requires significant revision.
Executive Summary
The article examines the replicability of FairDICE, a multi-objective offline Reinforcement Learning algorithm. While theoretical claims hold, an error in the code reduces FairDICE to standard behavior cloning in continuous environments. After rectification, experiments show FairDICE can scale to complex environments, but relies on online hyperparameter tuning. The study concludes that FairDICE is theoretically interesting, but its experimental justification requires revision.
Key Points
- ▸ FairDICE's code error reduces its functionality in continuous environments
- ▸ Theoretical claims of FairDICE hold, but experimental justification is flawed
- ▸ FairDICE can scale to complex environments with proper hyperparameter tuning
Merits
Theoretical Foundation
FairDICE's adaptation of OptiDICE provides a solid theoretical basis for multi-objective offline RL
Scalability
FairDICE can handle complex environments and high-dimensional rewards with proper tuning
Demerits
Code Error
The error in FairDICE's code reduces its functionality in continuous environments
Hyperparameter Tuning
FairDICE's reliance on online hyperparameter tuning can be a significant limitation
Expert Commentary
The study's findings highlight the importance of rigorous testing and validation in RL research. While FairDICE's theoretical foundation is sound, its experimental justification requires significant revision. The implications of this study are far-reaching, with potential applications in areas like fairness and multi-objective optimization. However, the limitations of FairDICE, including its reliance on online hyperparameter tuning, must be carefully considered in future research and development.
Recommendations
- ✓ Conduct thorough code reviews and testing to ensure the accuracy of RL algorithms
- ✓ Develop more efficient and effective methods for hyperparameter tuning in offline RL
- ✓ Explore applications of FairDICE in real-world domains, with careful consideration of its limitations