Abstract: We propose a novel method to develop robust action policies using an automated curriculum which seeks to improve task generalization and reduce policy brittleness. Our Reward-guided Curriculum (RgC) self-reflectively chooses what to train on in order to maximize rewards over a task domain, and is a single-policy meta-learning approach designed to augment the training of existing architectures. Experiments on multiple retro Sonic the Hedgehog video games and classical controls tasks indicate notable improvements in task generalization and robustness of the policies trained with RgC. RgC yields a more than 15% improvement over existing baselines on held-out levels in the video game tasks and boosts robustness to changes in dynamics on the controls tasks by more than 10%.
Recommended citation: Mysore, S., Platt, R., Saenko, K. (2018). “Reward-guided Curriculum for Robust Reinforcement Learning”;