Open-ended continuous reinforcement learning for mobile robots

  • Paresh Dhakan

Student thesis: Doctoral Thesis


Creating an intelligent agent capable of open-ended learning and long term autonomy is still an active research area. Reinforcement learning, where an agent learns by interacting with its environment, is suitable for agent autonomy and its extensions, such as motivated reinforcement learning and goal-oriented agent architectures, with their focus on meta-cognitive aspects such as ‘what to learn’, enable autonomous multitask learning. The other essential aspect is the cognitive aspect of ‘how to learn’, a focus area of lifelong learning architectures. When these aspects are combined, it creates a comprehensive agent architecture that would endow an agent to learn new skills with minimal human intervention. The first contribution of this thesis is an agent architecture consisting of a task generation module, knowledge management module and learning module, providing the agent with open-ended, continuous and autonomous learning capabilities. Further, this thesis contributes to each of the modules of this architecture as follows.

In reinforcement learning, the agent learns by interacting with its environment, guided by reward. However, for many dynamic environments, it is unknown upfront what tasks the agent will need to learn, and research has acknowledged the benefits of generating subtasks to direct learning. Depending on how the broken-down subtasks are considered to be accomplished, they can be an achievement, maintenance, approach or avoidance type. The second contribution of this thesis, related to the learning module of the proposed architecture, is a reward design based on these different types of tasks. For a continuously learning agent, tasks direct what the agent learns. Typically, the task design requires external intervention, thus hindering the agent’s autonomy. The third contribution of this thesis, related to the task generation module of the proposed architecture, is a mechanism to generate tasks at different levels of complexity. That enables the agent to learn simpler, more primitive tasks first, followed by more difficult compound tasks.

Furthermore, one of the key characteristics of continuous learning is that the agent should be able to use its existing knowledge to solve future tasks. Compound tasks can be either a sequential or a concurrent combination of primitive tasks. The fourth contribution of this thesis, related to the knowledge management module of the proposed architecture, is a compositionality technique whereby the agent can combine its primitive skills for disjoint tasks to solve a compound task that is a concurrent combination of those tasks. Finally, the fifth contribution of this thesis is metrics to measure task difficulty, agent’s competency for a particular skill and agent performance for tasks of different types. A mobile robot is used for all the experiments to show how the agent generates new tasks, learns solutions to those tasks and combines the skills to accomplish compound tasks, thus demonstrating autonomous behaviour of continuous learning in an open-ended way.
Date of AwardMay 2022
Original languageEnglish
SupervisorPhilip Vance (Supervisor), Nazmul Siddique (Supervisor), Inaki Rano (Supervisor) & Kasmarik Kathryn (Supervisor)


  • Lifelong learning
  • Curriculum learning
  • Agent architecture
  • Task-independent reward design
  • Self-generation of tasks
  • Concurrent composition of skills

Cite this