
Computer vision architectures used to be built on a sparse sample of points in the 80s and 90s. In the 2000s, dense models started to become popular for visual recognition as heuristically defined sparse models do not cover all the important parts of an image. However, with deep learning and end-to-end training approaches, this does not have to continue and sparse models may still have significant advantages in saving unnecessary computation as well as being more flexible. In this talk, I will talk about the deep point cloud convolutional backbones that we have developed in the past few years, including results on point cloud segmentation tasks, as well as recent applications on interaction modeling among objects, point cloud completion and world models for robot manipulation tasks. Point cloud approaches can also work well as 2D image recognition backbones. I will introduce our work AutoFocusFormer that uses point cloud backbones and decoders to work on 2D image recognition, with a novel adaptive downsampling module that enables the end-to-end learning of adaptive downsampling for dense prediction tasks such as segmentation. This is very helpful for detecting tiny objects faraway in the scene which would have been decimated by conventional grid downsampling approaches.
Fuxin Li is currently an associate professor in the School of Electrical Engineering and Computer Science at Oregon State University. He has held research positions at Apple Inc., University of Bonn and Georgia Institute of Technology. He had obtained a Ph.D. degree in the Institute of Automation, Chinese Academy of Sciences in 2009. He has won an NSF CAREER award and an Amazon Research Award, among other accolades. He is a program chair of CVPR 2025. He has published more than 90 papers in computer vision, machine learning, as well as applications of machine learning and computer vision. His main research interests are point cloud deep networks, human understanding of deep learning, video object segmentation, multi-target tracking and uncertainty estimation in deep learning.