Reliability @Scale Summer 2022

Virtual 10:00am - 2:30pm


Reliability @Scale is a technical conference for engineers who are passionate about building and understanding highly resilient and reliable systems at massive scale. Whether it’s novel design decisions, or outages that impact billions of people, providing reliable experiences for Systems at this scale present unique technical challenges. The @Scale community focuses on bringing people together to discuss these challenges and collaborate on the development of new solutions.

Reliability @Scale will be hosted virtually. Joining us are speakers from Akamai, Fastly, Google, Meta, and Roblox. The event will be hosted on August 31, 2022 with talks themed around large-scale outages, incident response and learnings, and measuring reliability at scale.

Read More Read Less

Our Pledge
In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to make participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, religion, or sexual identity and orientation.

Our Standards
Examples of behavior that contributes to creating a positive environment include:

  • Using welcoming and inclusive language
  • Being respectful of differing viewpoints and experiences
  • Gracefully accepting constructive criticism
  • Focusing on what is best for the community
  • Showing empathy towards other community members

Examples of unacceptable behavior by participants include:

  • The use of sexualized language or imagery and unwelcome sexual attention or advances
  • Trolling, insulting/derogatory comments, and personal or political attacks
  • Public or private harassment
  • Publishing others’ private information, such as a physical or electronic address, without explicit permission
  • Other conduct which could reasonably be considered inappropriate in a professional setting

Our Responsibilities
Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behavior.

Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful.

This Code of Conduct applies within all project spaces, and it also applies when an individual is representing the project or its community in public spaces. Examples of representing a project or community include using an official project e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. Representation of a project may be further defined and clarified by project maintainers.

This Code of Conduct also applies outside the project spaces when there is a reasonable belief that an individual's behavior may have a negative impact on the project or its community.

Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team at All complaints will be reviewed and investigated and will result in a response that is deemed necessary and appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately.

Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project’s leadership.

This Code of Conduct is adapted from the Contributor Covenant, version 1.4, available at

For answers to common questions about this code of conduct, see

10:00am - 11:30am

Production Incident Presentations from Meta, Roblox, Google, Akamai, Fastly, and more!

11:30am - 12:00pm

Live Panel

All speakers + Moderated by Anca Agape
12:00pm - 12:20pm


12:20pm - 12:40pm

Improving Reliability @ Meta: By Analyzing Historical Events That Led to SLO Violations

Learn about culture of tracking Service Level Indicators/Service Level Objectives at Instagram specifically and Meta in general, the tools that we use and how teams' SLI/SLO workflows can be improved by annotating SLO violations and analysing them later. In the talk we will briefly cover history of SLI/SLO tracking at Meta, then talk about how Instagram team used data annotations to tackle some of the reliability issues they had and how we're expanding this approach to the whole company.
12:40pm - 1:00pm

Service Degradation at Scale: Creating Instant Capacity

We will talk about what factors made us realize that service degradation is necessary for our infrastructure and the challenges we faced while implementing service degradation at scale. We will also speak about how we are changing our Fault Tolerance Strategy to use service degradation instead of provisioning extra buffer.
1:00pm - 1:20pm

Shrinking the Impact of Production Incidents

1:20pm - 1:40pm

Reliably Changing Configuration @ Scale

1:40pm - 2:00pm

Meta's SEV Culture: How Today's SEVs Create Tomorrow's Reliability

Would you believe us if we said the more SEVs we have, the more reliable we are? In this talk we'll talk about the reasons why we love SEVs at Meta, and how our culture around SEVs has allowed us to build reliable services at scale. We'll start by exploring research from other industries about how incident culture shapes how reliable they are. We'll then share how we've applied these lessons to our own culture. Along the way we'll give a peek at our SEV tool, some insight into our SEV review process, and describe how we encourage a "culture of SEVs" from the very first day an engineer arrives at Meta.
2:00pm - 2:30pm

Live Q&A Session

All Speakers + Moderated by Christian Monzon

Join the @Scale Mailing List and Get the Latest News & Event Info

Code of Conduct

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy