Facebook’s centrally controlled wide-area network connects data centers serving a few billion users. The backbone design faced some challenges: growing demand, quality of service, provisioning, operation in scale, slow convergence. These challenges led to a holistic centrally controlled network solution called Express Backbone (EBB). This talk will focus on the operations challenges we faced on EBB, and how we met them.
We will describe high level designs and recent updates on EBB, and then dive into specific reliability challenges and how we achieved operational reliability with software sharding and replication to be in sync with the network design.
Furthermore, we will talk about how we solved some of the challenges specific to software defined network involving traditional network activities, e.g. network maintenance and outages. To conclude, we will summarize takeaways, realizations and how a multi-disciplinary team (network production engineers and software engineers) together operates facebook’s software defined network.
Speaker
Palak Mehta,Facebook