Abstract
Learning from small amounts of samples with reinforcement learning (RL) is challenging in many tasks, especially in real-world applications, such as robotics. Meta-Reinforcement Learning (meta-RL) has been proposed as an approach to address this problem by generalizing to new tasks through experience from previous similar tasks. However, these approaches generally perform meta-optimization by focusing direct policy search methods on validation samples from adapted policies, thus requiring large amounts of on-policy samples during meta-training. To this end, we propose a novel algorithm called Supervised Meta-Reinforcement Learning with Trajectory Optimization (SMRL-TO) by integrating Model-Agnostic Meta-Learning (MAML) and iLQR-based trajectory optimization. Our approach is designed to provide online supervision for validation samples through iLQR-based trajectory optimization and embed simple imitation learning into the meta-optimization rather than policy gradient steps. This is actually a bi-level optimization that needs to calculate several gradient updates in each meta-iteration, consisting of off-policy reinforcement learning in the inner loop and online imitation learning in the outer loop. SMRL-TO can achieve significant improvements in sample efficiency without human-provided demonstrations, due to the effective supervision from iLQR-based trajectory optimization. In this paper, we describe how to use iLQR-based trajectory optimization to obtain labeled data and then how leverage them to assist the training of meta-learner. Through a series of robotic manipulation tasks, we further show that compared with the previous methods, the proposed approach can substantially improve sample efficiency and achieve better asymptotic performance.
Original language | English |
---|---|
Pages (from-to) | 681-691 |
Number of pages | 11 |
Journal | IEEE Transactions on Cognitive and Developmental Systems |
Volume | 16 |
Issue number | 2 |
Early online date | 15 Jun 2023 |
DOIs | |
Publication status | Published online - 15 Jun 2023 |
Bibliographical note
Publisher Copyright:IEEE
Keywords
- Task analysis
- Trajectory optimization
- Robots
- Heuristic algorithms
- Training
- Complexity theory
- Dynamical systems
- Reinforcement learning
- meta learning
- iLQR
- trajectory optimization
- robotic manipulation