This session includes an in-depth look at the world of multinode training for complex NLU models such as BERT. Sharan describes the challenges of tuning for speed and accuracy at the scale needed to bring training times down from weeks to minutes. Drawing from real-world experience running models on as many as 1,500 GPUs with reduced precision techniques, he explores the impact of different optimizers, strategies to reduce communication time, and improvements to per-GPU performance.
- WATCH NOW
- 2024 EVENTS
- PAST EVENTS
- 2023
- 2022
- February
- RTC @Scale 2022
- March
- Systems @Scale Spring 2022
- April
- Product @Scale Spring 2022
- May
- Data @Scale Spring 2022
- June
- Systems @Scale Summer 2022
- Networking @Scale Summer 2022
- August
- Reliability @Scale Summer 2022
- September
- AI @Scale 2022
- November
- Networking @Scale Fall 2022
- Video @Scale Fall 2022
- December
- Systems @Scale Winter 2022
- 2021
- 2020
- 2019
- 2018
- 2017
- 2016
- 2015
- Blog & Video Archive
- Speaker Submissions