June 17, 2024

SCALABLE SOLUTIONS FOR RUNNING LARGE LANGUAGE MODELS

Topic: Systems and Networking

Jiaxin Cao

Lepton AI

TYPE: Videos

YEAR: 2024

The advent of open-source large language models like Llama and Mixtral demands innovative deployment strategies for efficiency and cost-effectiveness. We will explore adaptive workload management for infrastructure optimization, crucial for handling varying demands efficiently. Next, we will delve into LLM caching techniques, including sticky routing and prompt caching, to enhance response times and optimize system utilization. Additionally, we’ll discuss strategies designed to mitigate system pressure during spikes in traffic. These strategies collectively aim to enhance the scalability and efficiency of AI platforms in the era of advanced LLMs.

SUBSCRIBE TO @SCALE

← Back

SCALABLE SOLUTIONS FOR RUNNING LARGE LANGUAGE MODELS

Jiaxin Cao

TYPE: Videos

YEAR: 2024

SUBSCRIBE TO @SCALE

Thank you for your response. ✨

RECENT POSTS

RELATED POSTS