@Scale 2019: Multinode: Natural language understanding at scale

This session includes an in-depth look at the world of multinode training for complex NLU models such as BERT. Sharan describes the challenges of tuning for speed and accuracy at the scale needed to bring training times down from weeks to minutes. Drawing from real-world experience running models on as many as 1,500 GPUs with reduced precision techniques, he explores the impact of different optimizers, strategies to reduce communication time, and improvements to per-GPU performance.

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy