Rooted Absorbed Prefix Trajectory Balance with Submodular Replay for GFlowNet Training
arXiv:2603.00454v1 Announce Type: new Abstract: Generative Flow Networks (GFlowNets) enable fine-tuning large language models to approximate reward-proportional posteriors, but they remain prone to mode collapse, manifesting as prefix collapse and length bias. We attribute this to two factors: (i) weak credit assignment to early prefixes, and (ii) biased replay that induces a shifted, non-representative training flow distribution. We propose Rooted absorbed prefix Trajectory Balance RapTB, an objective that anchors subtrajectory supervision at the root and propagates terminal rewards to intermediate prefixes via absorbed suffix-based backups, providing dense prefix-level learning signals. To mitigate replay-induced distribution shift, we further introduce SubM, a submodular replay refresh strategy that promotes both high reward and diversity. Empirically, on tasks such as molecule generation with LLM using SMILES strings, RapTB combined with SubM consistently improves optimization per
arXiv:2603.00454v1 Announce Type: new Abstract: Generative Flow Networks (GFlowNets) enable fine-tuning large language models to approximate reward-proportional posteriors, but they remain prone to mode collapse, manifesting as prefix collapse and length bias. We attribute this to two factors: (i) weak credit assignment to early prefixes, and (ii) biased replay that induces a shifted, non-representative training flow distribution. We propose Rooted absorbed prefix Trajectory Balance RapTB, an objective that anchors subtrajectory supervision at the root and propagates terminal rewards to intermediate prefixes via absorbed suffix-based backups, providing dense prefix-level learning signals. To mitigate replay-induced distribution shift, we further introduce SubM, a submodular replay refresh strategy that promotes both high reward and diversity. Empirically, on tasks such as molecule generation with LLM using SMILES strings, RapTB combined with SubM consistently improves optimization performance and molecular diversity while preserving high validity.
Executive Summary
The article proposes Rooted Absorbed Prefix Trajectory Balance (RapTB) and Submodular Replay (SubM) to address mode collapse in Generative Flow Networks (GFlowNets). RapTB provides dense prefix-level learning signals by anchoring subtrajectory supervision at the root and propagating terminal rewards to intermediate prefixes. SubM mitigates replay-induced distribution shift by promoting high reward and diversity. The approach improves optimization performance and molecular diversity in molecule generation tasks. The combination of RapTB and SubM demonstrates potential for fine-tuning large language models and approximating reward-proportional posteriors.
Key Points
- ▸ Introduction of Rooted Absorbed Prefix Trajectory Balance (RapTB) to address mode collapse in GFlowNets
- ▸ Proposal of Submodular Replay (SubM) to mitigate replay-induced distribution shift
- ▸ Empirical evaluation on molecule generation tasks using SMILES strings
Merits
Improved Optimization Performance
RapTB and SubM demonstrate improved optimization performance in molecule generation tasks
Enhanced Molecular Diversity
The approach preserves high validity while improving molecular diversity
Demerits
Complexity of Implementation
The proposed approach may require significant computational resources and expertise to implement
Expert Commentary
The proposed approach demonstrates a nuanced understanding of the challenges in training GFlowNets. By addressing mode collapse and replay-induced distribution shift, the authors provide a valuable contribution to the field. The empirical evaluation on molecule generation tasks highlights the potential of RapTB and SubM. However, further research is necessary to fully explore the implications and limitations of this approach. The interplay between RapTB, SubM, and the underlying GFlowNet architecture warrants additional investigation to optimize performance and robustness.
Recommendations
- ✓ Further evaluation of RapTB and SubM on diverse tasks and datasets
- ✓ Investigation of the scalability and computational requirements of the proposed approach