Academic

Rooted Absorbed Prefix Trajectory Balance with Submodular Replay for GFlowNet Training

arXiv:2603.00454v1 Announce Type: new Abstract: Generative Flow Networks (GFlowNets) enable fine-tuning large language models to approximate reward-proportional posteriors, but they remain prone to mode collapse, manifesting as prefix collapse and length bias. We attribute this to two factors: (i) weak credit assignment to early prefixes, and (ii) biased replay that induces a shifted, non-representative training flow distribution. We propose Rooted absorbed prefix Trajectory Balance RapTB, an objective that anchors subtrajectory supervision at the root and propagates terminal rewards to intermediate prefixes via absorbed suffix-based backups, providing dense prefix-level learning signals. To mitigate replay-induced distribution shift, we further introduce SubM, a submodular replay refresh strategy that promotes both high reward and diversity. Empirically, on tasks such as molecule generation with LLM using SMILES strings, RapTB combined with SubM consistently improves optimization per

X
Xi Wang, Wenbo Lu, Shengjie Wang
· · 1 min read · 17 views

arXiv:2603.00454v1 Announce Type: new Abstract: Generative Flow Networks (GFlowNets) enable fine-tuning large language models to approximate reward-proportional posteriors, but they remain prone to mode collapse, manifesting as prefix collapse and length bias. We attribute this to two factors: (i) weak credit assignment to early prefixes, and (ii) biased replay that induces a shifted, non-representative training flow distribution. We propose Rooted absorbed prefix Trajectory Balance RapTB, an objective that anchors subtrajectory supervision at the root and propagates terminal rewards to intermediate prefixes via absorbed suffix-based backups, providing dense prefix-level learning signals. To mitigate replay-induced distribution shift, we further introduce SubM, a submodular replay refresh strategy that promotes both high reward and diversity. Empirically, on tasks such as molecule generation with LLM using SMILES strings, RapTB combined with SubM consistently improves optimization performance and molecular diversity while preserving high validity.

Executive Summary

The article proposes Rooted Absorbed Prefix Trajectory Balance (RapTB) and Submodular Replay (SubM) to address mode collapse in Generative Flow Networks (GFlowNets). RapTB provides dense prefix-level learning signals by anchoring subtrajectory supervision at the root and propagating terminal rewards to intermediate prefixes. SubM mitigates replay-induced distribution shift by promoting high reward and diversity. The approach improves optimization performance and molecular diversity in molecule generation tasks. The combination of RapTB and SubM demonstrates potential for fine-tuning large language models and approximating reward-proportional posteriors.

Key Points

  • Introduction of Rooted Absorbed Prefix Trajectory Balance (RapTB) to address mode collapse in GFlowNets
  • Proposal of Submodular Replay (SubM) to mitigate replay-induced distribution shift
  • Empirical evaluation on molecule generation tasks using SMILES strings

Merits

Improved Optimization Performance

RapTB and SubM demonstrate improved optimization performance in molecule generation tasks

Enhanced Molecular Diversity

The approach preserves high validity while improving molecular diversity

Demerits

Complexity of Implementation

The proposed approach may require significant computational resources and expertise to implement

Expert Commentary

The proposed approach demonstrates a nuanced understanding of the challenges in training GFlowNets. By addressing mode collapse and replay-induced distribution shift, the authors provide a valuable contribution to the field. The empirical evaluation on molecule generation tasks highlights the potential of RapTB and SubM. However, further research is necessary to fully explore the implications and limitations of this approach. The interplay between RapTB, SubM, and the underlying GFlowNet architecture warrants additional investigation to optimize performance and robustness.

Recommendations

  • Further evaluation of RapTB and SubM on diverse tasks and datasets
  • Investigation of the scalability and computational requirements of the proposed approach

Sources