Enhancing sample efficiency in reinforcement-learning-based flow control: replacing the critic with an adaptive reduced-order model
arXiv:2604.04986v1 Announce Type: new Abstract: Model-free deep reinforcement learning (DRL) methods suffer from poor sample efficiency. To overcome this limitation, this work introduces an adaptive reduced-order-model (ROM)-based reinforcement learning framework for active flow control. In contrast to conventional actor--critic architectures, the proposed approach leverages a ROM to estimate the gradient information required for controller optimization. The design of the ROM structure incorporates physical insights. The ROM integrates a linear dynamical system and a neural ordinary differential equation (NODE) for estimating the nonlinearity in the flow. The parameters of the linear component are identified via operator inference, while the NODE is trained in a data-driven manner using gradient-based optimization. During controller--environment interactions, the ROM is continuously updated with newly collected data, enabling adaptive refinement of the model. The controller is then op
arXiv:2604.04986v1 Announce Type: new Abstract: Model-free deep reinforcement learning (DRL) methods suffer from poor sample efficiency. To overcome this limitation, this work introduces an adaptive reduced-order-model (ROM)-based reinforcement learning framework for active flow control. In contrast to conventional actor--critic architectures, the proposed approach leverages a ROM to estimate the gradient information required for controller optimization. The design of the ROM structure incorporates physical insights. The ROM integrates a linear dynamical system and a neural ordinary differential equation (NODE) for estimating the nonlinearity in the flow. The parameters of the linear component are identified via operator inference, while the NODE is trained in a data-driven manner using gradient-based optimization. During controller--environment interactions, the ROM is continuously updated with newly collected data, enabling adaptive refinement of the model. The controller is then optimized through differentiable simulation of the ROM. The proposed ROM-based DRL framework is validated on two canonical flow control problems: Blasius boundary layer flow and flow past a square cylinder. For the Blasius boundary layer, the proposed method effectively reduces to a single-episode system identification and controller optimization process, yet it yields controllers that outperform traditional linear designs and achieve performance comparable to DRL approaches with minimal data. For the flow past a square cylinder, the proposed method achieves superior drag reduction with significantly fewer exploration data compared with DRL approaches. The work addresses a key component of model-free DRL control algorithms and lays the foundation for designing more sample-efficient DRL-based active flow controllers.
Executive Summary
This paper presents a novel reinforcement learning (RL) framework for active flow control that addresses the critical limitation of sample inefficiency in model-free deep RL methods. By replacing the traditional actor-critic architecture with an adaptive reduced-order model (ROM) incorporating physical insights and neural ordinary differential equations (NODEs), the authors demonstrate significant improvements in data efficiency. The proposed approach autonomously identifies linear dynamics via operator inference and trains NODEs for nonlinearities, enabling continuous model refinement during interactions. Validation on two canonical flow control problems—Blasius boundary layer and flow past a square cylinder—shows competitive performance with conventional DRL methods while requiring substantially less data. The work bridges the gap between physics-informed modeling and machine learning, offering a scalable solution for real-world flow control applications where data acquisition is costly.
Key Points
- ▸ Introduces an adaptive ROM-based RL framework that replaces the critic component with a physics-informed reduced-order model (ROM), leveraging linear dynamics and NODEs to estimate gradients for controller optimization.
- ▸ Demonstrates superior sample efficiency compared to model-free DRL, achieving comparable performance with minimal data in two canonical flow control benchmarks (Blasius boundary layer and square cylinder flow).
- ▸ Validates the framework through differentiable simulation of the ROM, enabling continuous model adaptation and gradient-based controller optimization without reliance on conventional actor-critic architectures.
Merits
Innovative Hybrid Modeling
The integration of operator inference for linear dynamics and NODEs for nonlinearities represents a sophisticated hybrid approach that combines physical constraints with data-driven learning, enhancing interpretability and generalization.
Significant Sample Efficiency
By replacing the critic with an adaptive ROM, the framework reduces data requirements by orders of magnitude compared to model-free DRL, addressing a core bottleneck in practical RL applications.
End-to-End Differentiability
The use of differentiable simulation for controller optimization ensures seamless gradient flow, enabling efficient and stable training while maintaining physical consistency.
Demerits
Reduced Flexibility in Complex Systems
The reliance on ROMs may limit the framework's applicability to highly nonlinear or chaotic systems where reduced-order models struggle to capture essential dynamics without excessive complexity.
Computational Overhead of NODEs
Training NODEs in a data-driven manner can be computationally intensive, particularly for high-dimensional systems, potentially offsetting some of the sample efficiency gains.
Validation Scope
While validated on canonical problems, the framework's performance in real-world, high-stakes applications (e.g., industrial flow control) remains untested, raising questions about scalability and robustness.
Expert Commentary
This work represents a significant advancement in bridging the gap between model-based and model-free control strategies. By replacing the critic with an adaptive ROM, the authors not only address the sample inefficiency endemic to traditional DRL but also introduce a framework that is inherently more interpretable and physically consistent. The integration of operator inference for linear dynamics and NODEs for nonlinearities is particularly elegant, as it leverages the strengths of both physics-based and data-driven methods. However, the reliance on reduced-order models may pose challenges in systems with strong nonlinearities or chaotic behavior, where the fidelity of the ROM could become a limiting factor. The validation on canonical problems is compelling, but real-world deployment will require further testing in more complex, high-dimensional systems. This paper sets a new benchmark for sample-efficient RL in flow control and should inspire further research into hybrid modeling approaches across engineering disciplines.
Recommendations
- ✓ Conduct additional validation on high-fidelity, industrial-scale flow control problems to assess scalability and robustness in complex environments.
- ✓ Explore adaptive mechanisms for NODE training that balance computational efficiency with model accuracy, particularly in systems with time-varying dynamics.
- ✓ Develop standardized benchmark suites for flow control that include diverse, real-world scenarios to facilitate fair comparison with existing methods.
Sources
Original: arXiv - cs.LG