AI Seminar: Probabilistic Inference in Reinforcement Learning Done Right

Jean Tarbouriech
Event Speaker
Jean Tarbouriech
Event Speaker Description
Research Scientist
Google DeepMind, London
Event Type
Artificial Intelligence
Event Location
BEXL 320 and Zoom
Event Description


‘RL as Inference’ is a popular but flawed perspective. In this talk, we empower it with a principled, Bayesian treatment that yields efficient exploration. We first clarify how control and statistical inference - the 2 facets of RL - can be distilled into a single quantity, PΓ*, the posterior probability of each state-action being visited by the optimal policy. Previous approaches approximate PΓ* in an arbitrarily poor way that does not perform well in challenging problems. We prove that PΓ* can be used to generate a policy that explores efficiently, as measured by regret, although computing it is intractable. We thus derive a new variational Bayesian approximation yielding a tractable convex optimization problem and establish that the resulting policy also explores efficiently. We call our approach VAPOR and show that it has strong connections to Thompson sampling, K-learning, and maximum entropy exploration. We conclude with some experiments demonstrating the performance advantage of a deep RL version of VAPOR.

Speaker Biography

Jean Tarbouriech is a Research Scientist at Google DeepMind, London. His main research interest is Reinforcement Learning, with a focus on efficient exploration to improve RL agents and large language models. He obtained his PhD from Inria Lille and Meta AI Paris.