Systems @Scale Fall 2021
Share

Blaming in a Blameless World

Attribution of reliability in a microservice architecture can be solved, and has been solved, in very different ways due to how services are cataloged across the industry. Our hypothesis at Lyft was that service catalogs can become stale, but ownership derived from an on-call rotation will be significantly more reliable for attribution. We’d like to share our journey through combining Envoy, Pagerduty, and an organizational hierarchy to identify reliability concerns across Lyft through standardized SLOs and Director-level rollups.

Related Topics

Join the @Scale Mailing List and Get the Latest News & Event Info

Code of Conduct

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy