Faster Than Fast: Networking and Communication Optimizations for Llama 3

Pavan Balaji

Adi Gangidi

TOPIC: Data, Systems and Networking

@SCALE SERIES: Networking @Scale

TYPE: video

YEAR: 2024

TAGS:

Network and Collective Communication stack plays a pivotal role in extracting the best performance out of large GenAI Clusters. In this talk, we will go over in-depth Network and Communicational library tuning that helped achieve optimal performance for GenAI Models such as LLaMA3. We’ll touch on both optimizations, from training workload as well as model serving perspective. We’ll dig into how we mitigated the impact of network latency by implementing novel collective algorithms, network routing enhancements and steps taken to reduce the impact of compute-overlap on communication time. We’ll provide our perspective on challenges that remain in scaling these models to a larger scale, while still achieving optimal Compute and Network efficiency.