AI Seminar: Investigating Latent State and Uncertainty Representations in Reinforcement Learning

Image
Event Speaker
Anurag Koul
PostDoctoral Researcher, Microsoft Research (New York)
Event Type
Artificial Intelligence
Date
Event Location
KEC 1001
Event Description

Learning latent space representations of high-dimensional world states has been at the core of recent rapid growth in reinforcement learning(RL). At the same time, RL algorithms have suffered from ignored uncertainties in the predicted estimates of model-free or model-based methods. In our work, we investigate both of these aspects independently.

Firstly, we studied the explainability of policies learned over latent representations. In particular, we focus on control policies represented as recurrent neural networks (RNNs) which are difficult to explain, understand, and analyze due to their use of continuous-valued memory vectors and observation features. We introduced a new technique, Quantized Bottleneck Insertion, to learn finite representations of these vectors and features. This helped us to create a finite-state machine representation of the policies which we show improves their interpretability.

Secondly, we studied model-based reinforcement learning approaches for continuous action spaces based on tree-based planning over learned latent dynamics. We demonstrate improvement in sample efficiency and performance on a majority of challenging continuous-control benchmarks compared to the state-of-the-art methods by including look-ahead search during decision-time planning.

Thirdly, we study policy evaluation over offline historical data and highlight the need to couple confidence values with the estimated policy evaluations for capturing uncertainties. Towards this, we created a benchmark to study confidence estimation by offline reinforcement learning(ORL) methods. This benchmark is derived by adding sets of policy comparison queries to datasets from ORL and comes with a set of evaluation metrics. In addition, we present an empirical evaluation of a class of model-based baselines over our benchmark. These baselines learn ensembles of dynamics models, which are used in various ways to produce simulations for answering queries with confidence values. While our results suggested advantages for certain baseline variations, there appears to be significant room for improvement in future work.

Speaker Biography

Anurag Koul is a PostDoctoral Researcher at Microsoft Research (New York). He recently graduated with a Ph.D. from Oregon State University under the supervision of Prof. Alan Fern. His research work revolves around reinforcement learning (RL) where he has worked on explainability of RL agents, planning with latent space models, and understanding uncertainty in offline reinforcement learning.