Managing demand driven infra cost regarding product growth is essential to a company’s expansion. Demand and Efficiency management is key to enable services to scale reliably and efficiently. COVID-19 caused unexpected growth on user traffic and shortage on hardware supply, making demand and efficiency management more critical than ever.
Multi-tenant services have its unique challenges in demand and efficiency management in terms of attribution (especially interweaving social graph services) and holding all users accountable.
Meta has tens of large multi-tenant services, and each is used by hundreds of Meta teams. Using Meta web tier as an example, every day there are thousands of code changes and feature roll outs. Managing demand and efficiency at this scale and development pace is challenging. To tackle that, we built a one-stop shop to cover end to end management flow including (1) quota management/enforcement and admission control as a safe net to manage the overall demand (2) pre-production and production regression detection to prevent adding new cost (3) optimization framework to reduce existing cost. In this presentation, we will introduce this toolkit, how it supports web demand and efficiency management, and how we scale the tooling and process to manage demand across Meta large multi-tenant services.