May 12, 2025

Challenges with Ultra-low Latency LLM Inference at Scale

Topic: Systems and Networking

Haytham Abuelfutuh

Union.ai

TYPE: Videos

YEAR: 2025

In this talk, we will discuss the challenges of running ultra-low latency Large Language Model (LLM) inference at scale. We will cover the unique challenges of LLM inference, such as large model sizes, KV Caching. We will also discuss the challenges of scaling LLM inference to handle large volumes of requests, including the need for hardware, efficient scale up, and new routing architectures. Finally, we will present some of our recent work on addressing these challenges, including our development of inference infrastructure at Union.

SUBSCRIBE TO @SCALE

← Back

Challenges with Ultra-low Latency LLM Inference at Scale

Haytham Abuelfutuh

TYPE: Videos

YEAR: 2025

SUBSCRIBE TO @SCALE

Thank you for your response. ✨

RECENT POSTS

RELATED POSTS