Supervised Meta-Reinforcement Learning with Trajectory Optimization for Manipulation Tasks

Lei Wang, Yunzhou Zhang, Delong Zhu, Sonya Coleman, Dermot Kerr

Research output: Contribution to journalArticlepeer-review

107 Downloads (Pure)

Abstract

Learning from small amounts of samples with reinforcement learning (RL) is challenging in many tasks, especially in real-world applications, such as robotics. Meta-Reinforcement Learning (meta-RL) has been proposed as an approach to address this problem by generalizing to new tasks through experience from previous similar tasks. However, these approaches generally perform meta-optimization by focusing direct policy search methods on validation samples from adapted policies, thus requiring large amounts of on-policy samples during meta-training. To this end, we propose a novel algorithm called Supervised Meta-Reinforcement Learning with Trajectory Optimization (SMRL-TO) by integrating Model-Agnostic Meta-Learning (MAML) and iLQR-based trajectory optimization. Our approach is designed to provide online supervision for validation samples through iLQR-based trajectory optimization and embed simple imitation learning into the meta-optimization rather than policy gradient steps. This is actually a bi-level optimization that needs to calculate several gradient updates in each meta-iteration, consisting of off-policy reinforcement learning in the inner loop and online imitation learning in the outer loop. SMRL-TO can achieve significant improvements in sample efficiency without human-provided demonstrations, due to the effective supervision from iLQR-based trajectory optimization. In this paper, we describe how to use iLQR-based trajectory optimization to obtain labeled data and then how leverage them to assist the training of meta-learner. Through a series of robotic manipulation tasks, we further show that compared with the previous methods, the proposed approach can substantially improve sample efficiency and achieve better asymptotic performance.
Original languageEnglish
Pages (from-to)681-691
Number of pages11
JournalIEEE Transactions on Cognitive and Developmental Systems
Volume16
Issue number2
Early online date15 Jun 2023
DOIs
Publication statusPublished online - 15 Jun 2023

Bibliographical note

Publisher Copyright:
IEEE

Keywords

  • Task analysis
  • Trajectory optimization
  • Robots
  • Heuristic algorithms
  • Training
  • Complexity theory
  • Dynamical systems
  • Reinforcement learning
  • meta learning
  • iLQR
  • trajectory optimization
  • robotic manipulation

Fingerprint

Dive into the research topics of 'Supervised Meta-Reinforcement Learning with Trajectory Optimization for Manipulation Tasks'. Together they form a unique fingerprint.

Cite this