One-Step Flow Policy: Self-Distillation for Fast Visuomotor Policies
arXiv:2603.12480v1 Announce Type: cross Abstract: Generative flow and diffusion models provide the continuous, multimodal action distributions needed for high-precision robotic policies. However, their reliance on iterative sampling introduces severe inference latency, degrading control frequency and harming performance in time-sensitive manipulation. To address this problem, we propose the One-Step Flow Policy (OFP), a from-scratch self-distillation framework for high-fidelity, single-step action generation without a pre-trained teacher. OFP unifies a self-consistency loss to enforce coherent transport across time intervals, and a self-guided regularization to sharpen predictions toward high-density expert modes. In addition, a warm-start mechanism leverages temporal action correlations to minimize the generative transport distance. Evaluations across 56 diverse simulated manipulation tasks demonstrate that a one-step OFP achieves state-of-the-art results, outperforming 100-step diff
arXiv:2603.12480v1 Announce Type: cross Abstract: Generative flow and diffusion models provide the continuous, multimodal action distributions needed for high-precision robotic policies. However, their reliance on iterative sampling introduces severe inference latency, degrading control frequency and harming performance in time-sensitive manipulation. To address this problem, we propose the One-Step Flow Policy (OFP), a from-scratch self-distillation framework for high-fidelity, single-step action generation without a pre-trained teacher. OFP unifies a self-consistency loss to enforce coherent transport across time intervals, and a self-guided regularization to sharpen predictions toward high-density expert modes. In addition, a warm-start mechanism leverages temporal action correlations to minimize the generative transport distance. Evaluations across 56 diverse simulated manipulation tasks demonstrate that a one-step OFP achieves state-of-the-art results, outperforming 100-step diffusion and flow policies while accelerating action generation by over $100\times$. We further integrate OFP into the $\pi_{0.5}$ model on RoboTwin 2.0, where one-step OFP surpasses the original 10-step policy. These results establish OFP as a practical, scalable solution for highly accurate and low-latency robot control.
Executive Summary
The article proposes the One-Step Flow Policy (OFP), a novel self-distillation framework for high-fidelity, single-step action generation in robotic policies. OFP leverages a self-consistency loss, self-guided regularization, and a warm-start mechanism to achieve state-of-the-art results in 56 diverse simulated manipulation tasks, outperforming 100-step diffusion and flow policies while accelerating action generation by over 100 times. The authors demonstrate the scalability of OFP by integrating it into the π0.5 model on RoboTwin 2.0, where one-step OFP surpasses the original 10-step policy. The proposed method establishes OFP as a practical, scalable solution for highly accurate and low-latency robot control.
Key Points
- ▸ The One-Step Flow Policy (OFP) is a novel self-distillation framework for high-fidelity, single-step action generation in robotic policies.
- ▸ OFP leverages a self-consistency loss, self-guided regularization, and a warm-start mechanism to achieve state-of-the-art results.
- ▸ The authors demonstrate the scalability of OFP in 56 diverse simulated manipulation tasks and integrate it into the π0.5 model on RoboTwin 2.0.
Merits
Strength
The proposed method achieves state-of-the-art results in simulated manipulation tasks, outperforming 100-step diffusion and flow policies while accelerating action generation by over 100 times.
Demerits
Limitation
The article primarily focuses on simulated manipulation tasks and lacks real-world experiments to validate the scalability and robustness of the proposed method.
Expert Commentary
The article presents a significant contribution to the field of robotic control, proposing a novel self-distillation framework that achieves state-of-the-art results in simulated manipulation tasks. While the authors demonstrate the scalability of the proposed method, it is essential to conduct real-world experiments to validate its robustness and robustness in diverse scenarios. The proposed method has the potential to revolutionize the field of robotic control, enabling fast, accurate, and low-latency action generation in real-world scenarios. However, further research is needed to address potential limitations and challenges associated with the proposed method.
Recommendations
- ✓ Future research should focus on conducting real-world experiments to validate the scalability and robustness of the proposed method.
- ✓ The authors should explore the application of the proposed method in various domains, such as healthcare, manufacturing, and logistics, to demonstrate its practical implications.