Measuring the Reliability of Live Video Infrastructure
Serving Live Videos with high reliability is challenging, not only from the perspective of deploying improvements on top of a distributed system but also from the perspective of defining correct measurements to capture reliability gaps that matter to users. Facebook’s Live platform is spanning from the ingest endpoints where creators upload their streams, to services that are in charge of transcoding and generating multiple video renditions, services executing the delivery on egress stack and finally CDN endpoints that serve broadcasts to the millions of viewers. All of these pipelines have to do the work in orchestration with realtime guarantees and even a single failure can be severe for our users. In this talk we’ll show how we evolved thinking about key reliability metrics over the time, and how we derive actionable insights to make Facebook Live as reliable as possible from the user’s perspective.