This session includes an in-depth look at the world of multinode training for complex NLU models such as BERT. Sharan describes the challenges of tuning for speed and accuracy at the scale needed to bring training times down from weeks to minutes. Drawing from real-world experience running models on as many as 1,500 GPUs with reduced precision techniques, he explores the impact of different optimizers, strategies to reduce communication time, and improvements to per-GPU performance.
- WATCH NOW
- VIEW 2023 EVENTS
- DIVIDER
- EXPLORE TOPICS
- MACHINE LEARNING AND AI
- Data, Systems, and Networking
- ANDROID, VIDEO, AND WEB
- DEV TOOLS AND OPS, PRIVACY, SUSTAINABILITY, AND PERFORMANCE
- Fighting Abuse and Security
- DIVIDER
- Annual @Scale Conference
- Blog
- Community Forum
- About @Scale