Title: Model-Based Reinforcement Learning with Predictability Maximization 予測可能性最大化によるモデルベース強化学習の手法
Intelligence is often associated with the ability to optimize the environment for maximizing one’s objectives (e.g. survival). In particular, the ability to predict the future conditioned on own actions enables intelligent agents to efficiently evaluate possible futures and choose the best one to realize. Such model-based reinforcement learning (RL) algorithms have recently shown promising results in sample-efficient learning of robotics and gaming RL environments. However, standard model-based approaches naively try to predict everything about the world, including noises that are not predictable or controllable. In this talk, I will share my recent works (temporal difference models (TDM), and dynamics-aware discovery of skills (DADS)) and discuss how goal-conditioned Q-learning and empowerment — the ability to predictively change the world — relate to model-based RL and can learn abstracted Markov Decision Processes (MDPs) where the predictability is inherently maximized. I’ll show that such approaches enable successful model-based planning in difficult environments where classic model-based planners fail, significantly outperforming model-free approaches in terms of sample efficiency. I’ll end with a discussion of how reachability and empowerment/mutual information connect to each other and potential directions of future research.
Shixiang (Shane) Gu is a Research Scientist at Google Brain, where he mainly works on research problems in deep learning, reinforcement learning, robotics, and probabilistic machine learning. His recent research focuses on scalable RL methods that could solve difficult continuous control problems in the real-world, which have been covered by Google Research Blogpost and MIT Technology Review. He completed PhD in Machine Learning at the University of Cambridge and the Max Planck Institute for Intelligent Systems in Tübingen, where he was co-supervised by Richard E. Turner, Zoubin Ghahramani, and Bernhard Schölkopf. During his PhD, he also interned and collaborated closely with Sergey Levine/Ilya Sutskever at UC Berkeley/Google Brain and Timothy Lillicrap at DeepMind. He holds B.ASc. in Engineering Science from the University of Toronto, where he did my thesis with Geoffrey Hinton in distributed training of neural networks using evolutionary algorithms. He is a Japan-born Chinese Canadian. Having lived in Japan, China, Canada, the US, the UK, and Germany, he goes under multiple names: Shane Gu, Shixiang Gu, 顾世翔, 顧世翔(ぐう せいしょう).
|Date||November 13, 2019 (Wed) 15:00 - 16:30|