Faster Than Fast: Networking and Communication Optimizations for Llama 3

Network and Collective Communication stack plays a pivotal role in extracting the best performance out of large GenAI Clusters. In this talk, we will go over in-depth Network and Communicational library tuning that helped achieve optimal performance for GenAI Models such as LLaMA3. We’ll touch on both optimizations, from training workload as well as model serving perspective. We’ll dig into how we mitigated the impact of network latency by implementing novel collective algorithms, network routing enhancements and steps taken to reduce the impact of compute-overlap on communication time. We’ll provide our perspective on challenges that remain in scaling these models to a larger scale, while still achieving optimal Compute and Network efficiency.


To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy