Efficient AI Computing

AI Faculty Research Area

Efficient AI Computing studies how to reduce the computational, memory, and energy costs of modern AI models while preserving accuracy and usability. Research in this area develops efficient model representations, training and inference optimizations, and cross-layer techniques spanning algorithms, runtimes, and system software for large-scale and resource-constrained AI deployment.

Key directions include model compression and quantization, efficient attention mechanisms (e.g., linearization and KV cache optimization), and runtime-aware inference techniques that reduce latency and increase throughput for large language and vision models. This work also explores software system co-design, in which model structures and execution strategies are jointly optimized to more effectively utilize computing resources across edge, cloud, and HPC environments.

By enabling scalable, cost-effective, and sustainable AI, this research supports the deployment of foundation models and emerging AI applications in real-world settings.