Graph-GRPO: Training Graph Flow Models with Reinforcement Learning
arXiv:2603.10395v1 Announce Type: new Abstract: Graph generation is a fundamental task with broad applications, such as drug discovery. Recently, discrete flow matching-based graph generation, \aka, graph flow model (GFM), has emerged due to its superior performance and flexible sampling. However, effectively aligning GFMs with complex human preferences or task-specific objectives remains a significant challenge. In this paper, we propose Graph-GRPO, an online reinforcement learning (RL) framework for training GFMs under verifiable rewards. Our method makes two key contributions: (1) We derive an analytical expression for the transition probability of GFMs, replacing the Monte Carlo sampling and enabling fully differentiable rollouts for RL training; (2) We propose a refinement strategy that randomly perturbs specific nodes and edges in a graph, and regenerates them, allowing for localized exploration and self-improvement of generation quality. Extensive experiments on both synthetic
arXiv:2603.10395v1 Announce Type: new Abstract: Graph generation is a fundamental task with broad applications, such as drug discovery. Recently, discrete flow matching-based graph generation, \aka, graph flow model (GFM), has emerged due to its superior performance and flexible sampling. However, effectively aligning GFMs with complex human preferences or task-specific objectives remains a significant challenge. In this paper, we propose Graph-GRPO, an online reinforcement learning (RL) framework for training GFMs under verifiable rewards. Our method makes two key contributions: (1) We derive an analytical expression for the transition probability of GFMs, replacing the Monte Carlo sampling and enabling fully differentiable rollouts for RL training; (2) We propose a refinement strategy that randomly perturbs specific nodes and edges in a graph, and regenerates them, allowing for localized exploration and self-improvement of generation quality. Extensive experiments on both synthetic and real datasets demonstrate the effectiveness of Graph-GRPO. With only 50 denoising steps, our method achieves 95.0\% and 97.5\% Valid-Unique-Novelty scores on the planar and tree datasets, respectively. Moreover, Graph-GRPO achieves state-of-the-art performance on the molecular optimization tasks, outperforming graph-based and fragment-based RL methods as well as classic genetic algorithms.
Executive Summary
This article proposes Graph-GRPO, a novel online reinforcement learning framework for training graph flow models (GFMs) under verifiable rewards. The approach leverages analytical expressions for transition probabilities, enabling fully differentiable rollouts and localized exploration. Graph-GRPO achieves state-of-the-art performance on synthetic and real-world datasets, including molecular optimization tasks. The study demonstrates the effectiveness of Graph-GRPO in generating high-quality graphs efficiently, with only 50 denoising steps required to achieve desired Valid-Unique-Novelty scores. The contributions of this research have significant implications for various applications, including drug discovery, and highlight the potential of reinforcement learning in graph generation tasks.
Key Points
- ▸ Graph-GRPO proposes an online reinforcement learning framework for training GFMs under verifiable rewards.
- ▸ The framework leverages analytical expressions for transition probabilities to enable fully differentiable rollouts.
- ▸ Graph-GRPO achieves state-of-the-art performance on synthetic and real-world datasets, including molecular optimization tasks.
Merits
Strength in Mathematical Derivation
The article presents an elegant mathematical derivation for the transition probability of GFMs, replacing Monte Carlo sampling with fully differentiable rollouts. This contribution has significant implications for the field, enabling more efficient and accurate training of GFMs.
Effective Graph Generation
Graph-GRPO demonstrates the ability to generate high-quality graphs efficiently, with only 50 denoising steps required to achieve desired Valid-Unique-Novelty scores. This is a significant achievement, especially in the context of complex graph generation tasks.
Demerits
Limited Exploration
The article's focus on localized exploration through refinement strategies may limit the method's ability to explore a broader range of graph structures. This could be a concern in applications where diverse and innovative graph structures are required.
Expert Commentary
The article's contributions are significant, and the proposed framework shows great promise in generating high-quality graphs efficiently. However, the limitations of localized exploration and the need for further investigation into transfer learning and explainability in graph generation tasks are areas that require attention in future research. The study's findings highlight the potential of reinforcement learning in graph generation tasks and have significant implications for various applications and policy-making.
Recommendations
- ✓ Future research should investigate the applicability of transfer learning in graph generation tasks and develop techniques to understand and visualize the generated graph structures.
- ✓ The study's findings should be replicated and extended to other graph generation tasks, such as molecular optimization and social network analysis, to further validate the effectiveness of Graph-GRPO and its potential applications.