Academic

Actor-Critic Pretraining for Proximal Policy Optimization

arXiv:2602.23804v1 Announce Type: new Abstract: Reinforcement learning (RL) actor-critic algorithms enable autonomous learning but often require a large number of environment interactions, which limits their applicability in robotics. Leveraging expert data can reduce the number of required environment interactions. A common approach is actor pretraining, where the actor network is initialized via behavioral cloning on expert demonstrations and subsequently fine-tuned with RL. In contrast, the initialization of the critic network has received little attention, despite its central role in policy optimization. This paper proposes a pretraining approach for actor-critic algorithms like Proximal Policy Optimization (PPO) that uses expert demonstrations to initialize both networks. The actor is pretrained via behavioral cloning, while the critic is pretrained using returns obtained from rollouts of the pretrained policy. The approach is evaluated on 15 simulated robotic manipulation and lo

Andreas Kernbach, Amr Elsheikh, Nicolas Grupp, Ren\'e Nagel, Marco F. Huber · March 3, 2026 · 1 min read · 42 views

#cs.LG

Executive Summary

This article proposes a pretraining approach for actor-critic algorithms, specifically Proximal Policy Optimization (PPO), using expert demonstrations to initialize both actor and critic networks. The actor is pretrained via behavioral cloning, while the critic is pretrained using returns obtained from rollouts of the pretrained policy. Experimental results show significant improvements in sample efficiency, with an average improvement of 86.1% compared to no pretraining and 30.9% compared to actor-only pretraining.

Key Points

▸ Actor-critic pretraining approach for PPO using expert demonstrations
▸ Initialization of both actor and critic networks
▸ Significant improvements in sample efficiency

Merits

Improved Sample Efficiency

The proposed approach achieves significant improvements in sample efficiency, reducing the number of environment interactions required for training.

Demerits

Limited Evaluation

The approach is evaluated on a limited number of tasks, which may not be representative of all possible scenarios.

Expert Commentary

The proposed actor-critic pretraining approach for PPO demonstrates significant potential for improving sample efficiency in reinforcement learning tasks. By initializing both actor and critic networks using expert demonstrations, the approach can reduce the number of environment interactions required for training, making it more practical for real-world applications. However, further evaluation is needed to fully understand the limitations and potential applications of this approach.

Recommendations

✓ Further evaluation of the approach on a wider range of tasks and scenarios
✓ Investigation of the potential applications of the approach in real-world robotics and autonomous systems

Sources

arXiv - cs.LG

Actor-Critic Pretraining for Proximal Policy Optimization

AI Commentary

Executive Summary

Key Points

Merits

Improved Sample Efficiency

Demerits

Limited Evaluation

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs