Abstract: Conventional reinforcement learning rarely considers how variations in the environment affect the policy learned by the agent. In this paper, we explore how changes in the environment affect policy generalization. We observe experimentally that, for some tasks, certain environmental settings result in more robust policies that generalize well to future variants of the environment. We propose a novel method to exploit this observation to develop robust actor policies by automatically developing a sampling curriculum to automatically choose environment settings to use in training - thus allowing agents to choose what they experience. Ours is a model-free, single-policy approach and experiments demonstrate that the performance of our method is on par with the best policies found by an exhaustive grid search, while bearing a significantly lower computational cost.
Recommended citation: Mysore, S., Platt, R., Saenko, K. (2018). “Reward-guided Curriculum for Learning Robust Action Policies”;