Discrete Flow Matching Policy Optimization
arXiv:2604.06491v1 Announce Type: new Abstract: We introduce Discrete flow Matching policy Optimization (DoMinO), a unified framework for Reinforcement Learning (RL) fine-tuning Discrete Flow Matching (DFM) …