The Gen-AI boom in 2023 has initiated a surge in demand for high-performance, low-latency, and lossless AI networks to support large-scale model training. In response, Meta started on a journey to develop scalable AI networks, with the focus on Distributed Switch Fabric (DSF). DSF’s modular architecture is designed to optimize load balancing and congestion control, ensuring high performance for both intra and inter-cluster traffic. This talk explores the challenges and innovations surrounding DSF, and discusses future directions, including the creation of mega clusters through DSF and non-DSF region interconnectivity, as well as the exploration of alternative switching technologies.
- WATCH NOW
- 2025 EVENTS
- PAST EVENTS
- 2024
- 2023
- 2022
- February
- RTC @Scale 2022
- March
- Systems @Scale Spring 2022
- April
- Product @Scale Spring 2022
- May
- Data @Scale Spring 2022
- June
- Systems @Scale Summer 2022
- Networking @Scale Summer 2022
- August
- Reliability @Scale Summer 2022
- September
- AI @Scale 2022
- November
- Networking @Scale Fall 2022
- Video @Scale Fall 2022
- December
- Systems @Scale Winter 2022
- 2021
- 2020
- 2019
- 2018
- 2017
- 2016
- 2015
- Blog & Video Archive
- Speaker Submissions