AI Seminar: From Pixels to Measurements: Understanding the Dynamic World

Image
Adam Harley
Event Speaker
Adam Harley
Event Speaker Description
Postdoctoral Scholar
Stanford University
Event Type
Artificial Intelligence
Date
Event Location
KEC 1001 and Zoom
Event Description

Join the Zoom seminar

In computer vision, “video understanding” typically concerns summarization: tracking the main objects, or describing the main actions. While progress here has been impressive, many practical applications require extracting information which is much more fine-grained. For example, biologists are highly interested in tracking specific key points of organisms in long video recordings. Algorithms for such tasks require the generality and precision of low-level vision methods (e.g., optical flow), but benefit from knowledge about the 3D physical world (e.g., things continue to exist while they are occluded). In this talk, I will present our progress on this crucial space of problems. Our central contribution is to widen the window of “temporal context” used for inference: instead of tracking entities from one frame to the next, we inspect dozens of frames simultaneously, and return an answer that makes sense for the full clip. I will discuss the methods and datasets that we have created to drive progress along these lines, and highlight natural science applications of the work. Finally, I will introduce our ongoing effort to produce a “foundation model” of motion, aiming to deliver reliable arbitrary-granularity tracking for the huge variety of real-world situations where this is required.
 

Speaker Biography

Adam Harley is a postdoctoral scholar at Stanford University, working with Leonidas Guibas. He received a Ph.D. in robotics from Carnegie Mellon University, where he worked with Katerina Fragkiadaki. He received his M.S. in Computer Science at Toronto Metropolitan University, working with Kosta Derpanis. Adam is a recipient of the NSERC PGS-D scholarship, and the Toronto Metropolitan University Gold Medal. His research interests lie in Computer Vision and Machine Learning, particularly for 3D understanding and fine-grained tracking.