A tight excess risk bound via a new complexity based on a unification of PAC-Bayesian, normalized maximum likelihood, and Rademacher complexities

Event Speaker
Nishant Mehta
Assistant professor , Department of Computer Science, University of Victoria, Canada
Event Type
Colloquium
Date
Event Location
LINC 200
Event Description

I will talk about a new notion of complexity, “COMP”, that interpolates between and generalizes some existing complexity notions from statistical learning theory. I will first derive COMP as a generalization of the Shtarkov sum, the normalizer in the normalized maximum likelihood (NML) distribution. When the NML distribution exists, the logarithm of the Shtarkov sum is precisely equal to the minimax regret for individual sequence prediction under log loss. Next, via a PAC-Bayesian analysis, I will show how COMP can be used to obtain tight upper bounds on the excess risk for randomized estimators (which include generalized Bayesian estimators). This excess risk bound will be in terms of COMP itself. Under a certain specialization, further upper bounding COMP leads to a standard PAC-Bayesian excess risk bound whose right hand side is the information complexity (essentially the empirical excess risk plus an additional complexity term involving the KL divergence from the posterior distribution to the prior distribution). Under a different specialization, the special case of empirical risk minimization with VC-type classes and "large classes" (whose empirical L2 entropy exhibits polynomial growth), we will see how COMP is upper bounded in a way which yields optimal rates of convergence of the excess risk for such classes. Along the way, we will see connections to Rademacher complexity, and, in particular, we will recover bounds based on local Rademacher complexity while completely avoiding complicated local Rademacher complexity-based arguments. This is joint work with Peter Grünwald at CWI (in Amsterdam) and Leiden University.

Speaker Biography

Nishant Mehta is an assistant professor in the Department of Computer Science at the University of Victoria. His research is in statistical learning theory and, more recently, online learning. The unifying theme of his research is to identify how the field of machine learning can obtain learning systems that learn well with much less data than is currently being used. Central to this theme is a unification of what makes learning easy or hard in different learning paradigms. He has studied properties common to online learning and statistical learning that capture when it is possible to learn up to a fixed amount of error using much less data. He also has made contributions to the problem of learning multiple tasks at the same time, and he has developed rigorous error guarantees for learning sparse features from data for use in prediction tasks. Two primary focuses are the generalization properties of deep learning methods in non-worst-case settings and better, principled algorithms for lifelong learning when tasks exhibit some different types of similarity.

    Nishant previously was a postdoctoral researcher at Centrum Wiskunde & Informatica with Peter Grünwald where he worked on developing PAC-Bayesian risk bounds for heavy-tailed situations as well as the present work. Prior to that, he was a postdoctoral research fellow at the Australian National University and NICTA with Bob Williamson, where we worked on problems related to learning at faster rates in statistical and online learning. He completed his Ph.D. in Computer Science from Georgia Tech, advised by Alexander Gray.