How do you build and operate one of the largest global networks at scale? At Meta, we believe it starts with automation. In place of traditional network scripts and manual checklists, we here at Meta have opted for Workflows within our Network. Workflows are a composition of steps which perform some desired network operations. Teams solve complex network problems by designing, building and executing Workflows on our platforms.
Whether it’s network deployment, operations or lifecycle management, each day, tens of thousands of Workflows are executed. Executing these Workflows across an ever growing fleet of network devices, while also providing a robust platform for engineers and network operators to develop and scale their Workflows poses a unique set of challenges.
Achieving scalability in this space requires more than just load balancing. Internally our systems scale by employing strict resource constraints and offering a secure multi-tenancy environment. This is achieved through the use of Linux Cgroups, POSIX Signals and Processes. Leveraging a torrent based package management system decouples users’ business logic from core framework logic. Users are free to build, deploy and scale their Workflows independent of the framework and other users. It’s this modularity and decoupling that allows us to scale Workflows reliably from the tens to hundreds to tens of thousands!