Abstract: We propose a novel method to develop robust action policies using an automated curriculum which seeks to improve task generalization and reduce policy brittleness by self-reflectively choosing what to train on in order to maximize rewards over a task domain. Our Reward-guided Curriculum (RgC) is a single-policy meta-learning approach which is designed to augment the training of existing architectures. Experiments on multiple video games and classical controls tasks indicate notable improvements in task generalization and robustness of the policies trained with RgC.
Recommended citation: Mysore, S., Platt, R., Saenko, K. (2018). “Reward-guided Curriculum for Learning Robust Action Policies”, Workshop on Multi-task and Lifelong Reinforcement Learning at ICML 2019, Long beach CA;