RTC Slider 1 desktop
RTC @Scale 2022

RTC @Scale Live Panel – RTC in the Metaverse with Sriram Srinivasan, Mike Arcuri, Paul Boustead, and Cullen Jennings.

Watch video

Summer Systems @Scale 2022

Cache Made Consistent – Cache invalidation might no longer be a hard thing in Computer Science

Cache invalidation is considered one of the hardest things in Computer Science. We, at Meta, operate some of the world’s largest cache deployments (e.g. Memcache and TAO), serving more than one quadrillion queries a day. We have ...
Summer Systems @Scale 2022

Leveraging Data in Motion in a Cloud-first World

Apache Kafka has emerged to the de-facto standard for event streaming platform in enterprise architectures. Many business applications are moving away from data-at-rest to an event-driven architecture so that they could leverage the ...
Summer Systems @Scale 2022

Introducing Zelos – Zookeeper API leveraging Delos

In this presentation we will introduce Zelos. Zelos provides the exact same semantics as ZooKeeper but is built using Delos. ZooKeeper forms the foundation of Meta’s infrastructure stack and we have been using it over a decade. ...
Summer Systems @Scale 2022

Hosting Open Source Relational Databases at Scale on Microsoft Azure

Hosting managed relational database services in the cloud with the level of availability, reliability guarantees demanded by mission critical workloads and doing it at scale presents a set of interesting challenges. This talk will walk ...
Summer Systems @Scale 2022

How Meta Keeps its Large-scale Infrastructure Hardware Up and Running

Internet services like Facebook, Instagram, and Whatsapp rely on large-scale infrastructure to support the various compute, storage, and AI workloads. With the support of data and ML techniques, we can scale our infrastructure ...
Summer Systems @Scale 2022

DADI @Scale: Deploying Containers at Scale in Alibaba

Alibaba Cloud offers a comprehensive suite of elastic computing services that are based on container technology. Alibaba Group is one of the key customers of Alibaba Cloud and all of the major applications across its large and diverse ...
Summer Systems @Scale 2022

Scaling End to End Reliability Tracking Across Large Scale, Multiplexed Products and Services

This talk introduces a new user experience-focused reliability measurement that exposes end-to-end reliability guarantees across the vertical service stack used by Meta’s family of Apps. The talk discusses the difference between the ...
Summer Systems @Scale 2022

Don’t Ship the Org Chart: Rebuilding Istio for User Maintainability

While the cry of “breaking apart the monolith” can be heard throughout the industry, the Istio service mesh took a different tack, and consolidated its control plane microservices into one binary. How did we get here? In ...
Summer Systems @Scale 2022

Configuration Safety at Scale with Ads

The Configerator repository provides Meta developers with a way to make changes easily and quickly to production services. By default, it pushes changes to all services at Meta in a matter of seconds, and doesn’t have the traditional ...
Summer Systems @Scale 2022

Getting from Schemaless Ingest to Fast SQL at Rockset

Rockset provides low-latency SQL access to schemaless data that is ingested in real-time. Immediate access to dynamically structured data is very powerful, enabling rapid development and iteration for products built on top, but it ...
Summer Systems @Scale 2022

The Ent Framework: Meta’s Object-Relational Mapping

When you think about Meta’s family of apps, what comes to mind? Maybe the over 6 thousand photos and videos created per second on Instagram, the 5 trillion photos on Facebook, or the 60 million group posts loaded each second. It’s ...
Summer Systems @Scale 2022

Infra Cloud Service Platform (ICSP)

Building and operating a service is challenging and complex. At scale, service owners need to consider a number of responsibilities including how they develop, deploy, scale and monitor their service in production. Each of these ...
Summer Systems @Scale 2022

Lessons Learned from Scaling Infrastructure as Code

You adopted an infrastructure as code tool like Terraform. What started as one person writing some configuration and deploying new infrastructure scales to everyone in the company writing their own infrastructure configuration and ...
Summer Systems @Scale 2022

Global Capacity Management at Meta

Meta currently operates more than 15 data center regions around the world. This rapidly expanding global datacenter footprint poses new challenges for service owners and for our infrastructure management systems. In this talk, we will ...
Products @Scale Spring 2022

Keynote | Ime Archibong

Ime Archibong, head of New Product Experimentation (NPE) at Meta, and 12-year Meta veteran will talk about 0-1 innovation, at scale. He’ll discuss the value of experimentation as an approach, and demystify how real breakthroughs happen.
Products @Scale Spring 2022

Building a Cross-platform Runtime for AR Experiences | Nikita Lutsenko and Paul Wu

There are many tools that Creators can use to build novel AR experiences. However not many of these tools can deliver a wide ranging set of capabilities and creative assets to billions of devices with both quality and speed. In this ...
Products @Scale Spring 2022

Challenges and opportunities for building crowdsourced mapping services for autonomous driving at scale | Ruchi Bhargava

NVIDIA Map aggregates data from millions of NVIDIA DRIVE Hyperion consumer and survey data-collection vehicles for safe, reliable, and up-to-date global high-def map coverage. The platform supports automated driving functionality from ...
Products @Scale Spring 2022

Scaling Messenger Product Development | Joshua Evenson

As a mobile app grows in users, features, and contributing engineers, there are often tradeoffs between the performance of the app and the velocity of feature growth. Messenger’s users have high performance expectations, so ...
Products @Scale Spring 2022

ML Algorithms for Trust and Safety @ YouTube | Emre Sargin

In this talk, I’ll be providing an overview of how we use ML algorithms to detect policy violative content on YouTube across all entity types: videos, comments, livestreams, engagements, etc and keep our community safe. ML ...
Products @Scale Spring 2022

Building private products at WhatsApp | Aleksander Bello

An overview of how WhatsApp thinks of privacy in the messaging world. We’ll go through some of our general principles, concrete product use cases, and challenges that come with privacy at scale.
Products @Scale Spring 2022

Scaling ML workflows for real-time moderation challenges at Twitch | Lukas Tencer, Lena Evans, Shiming Ren

Trust & Safety at Twitch is uniquely challenging, as the vast majority of content and chat interactions unfold in real time, across a wide variety of communities with different needs, cultures, and audiences. Mitigating and ...
Products @Scale Spring 2022

Live Panel: The Good, the Bad and the Glory of Building Products @Scale

Participants: Sara Wong (Meta), Xiao Li (Meta), Boulos Harb (Level), Daniel Jacobson (Google)
Products @Scale Spring 2022

Keynote – Building Products at Scale | Vijaye Raji

Building successful products is hard. Building successful products at scale? Ridiculously hard! It takes strong vision, deep dedication, consistent execution, with a healthy sprinkle of unorthodox methods. This talk shares a few ...
Products @Scale Spring 2022

Building visually stunning products at Scale at Instagram | Steph Rhee and Laycee Berkas

We’ll talk about two 0-1 products in the creator space: Subscriptions and Music Releases on IG. We’ll walk through how we built the early stages of these as visually stunning products, as well as the unique set of challenges our teams ...
Products @Scale Spring 2022

The Evolution of Facebook’s Mobile App Architecture | Dustin Shahidehpour

In 2007, Facebook released their first iOS App. It was written in HTML, and it was supported by a single engineer. Since then, the Facebook iOS App has grown into a native ‘platform’ which supports more than 100 products, and hundreds ...
Products @Scale Spring 2022

Mobile Development @ Scale | Chad Landis

At Capital One, building beautiful, rich, and performant mobile applications for iOS and Android is essential to providing a best-in-class experience for our customers and delivering on our mission to change banking for good. However, ...
Products @Scale Spring 2022

Live Panel: Cross Platform Product Development @Scale

Participants: Jason Grandelli (Meta), Dan Schafer (Meta), Denise Noyes (Meta), Kevin Galligan (TouchLab), Vishnu Nath (Microsoft)
Networking @Scale Summer 2022

The Future With QUIC | Jana Iyengar

We’ve all heard much about QUIC in the past few years, and a lot has been made of its performance benefits for HTTP/3. For some of us however, HTTP/3 was always just the beginning, just the vehicle for us to get QUIC out into the ...
Networking @Scale Summer 2022

Quick Cache DSR | Matt Joras and Yair Gottdenker

In a typical CDN architecture the caching tier is fronted by a load-balancing tier; response content flows from the cache to the requester through the load-balancer. With this architecture extra I/O, CPU cycles and intra-cluster ...
Networking @Scale Summer 2022

Improving Transfer Times in the Backbone Network Using QUIC Jump Start | Joseph Beshay

Transfers in high-BDP links incur a startup delay for congestion control to probe the bandwidth of the underlying link. The impact of this delay is inversely proportional to the size of the transfer since small transfers may repeatedly ...
Networking @Scale Summer 2022

LIVE Q&A | Moderated by Bharat Parekh

LIVE Q&A featuring Jana Iyengar, Matt Joras, Yair Gottdenker & Joseph Beshay
Networking @Scale Summer 2022

Layer Four and Three Quarters: Fantastic Quirks and Where to Find Them | Lucas Pardue

Nestled between transport protocols (TCP, UDP, QUIC) and application protocols (HTTP, etc.) is a layer few are familiar with. Layer 4¾ sits hiding in plain sight, often only being glimpsed during curious events that raise its ...
Networking @Scale Summer 2022

The Challenges of 0-RTT in IETF QUIC | Ian Swett

A key feature of HTTP/3 over QUIC is the ability to send a request in the first flight with the ClientHello. 0-RTT in IETF QUIC is notably more complex than gQUIC, with multiple packet number spaces and a limit on the amplification ...
Networking @Scale Summer 2022

Tackling DC Congestion and Bursts | Balasubramanian Madhavan and Abhishek Dhamija

A talk about two specific DC transport tuning initiatives (a) handling sustained congestion in the network (b) tackling bursts in network. Covers the motivation, implementation overview, wins and lessons learnt for both these initiatives.
Networking @Scale Summer 2022

NetEdit: Fine-grained Network Tuning at Scale | Prashanth Kannan and Prankur Gupta

We will share the design, implementation, and production experience of BPF based platform used to tune the network transport across millions of servers at Meta.
Networking @Scale Summer 2022

LIVE Q&A | Moderated by Neil Spring

LIVE Q&A featuring Prashanth Kannan, Balasubramanian Madhavan, Abhishek Dhamija, Prankur Gupta & Kumar Saurabh Arora
Networking @Scale Summer 2022

NATless IPv6/IPv4 Address Translation | Keerti Lakshminarayan and Alok Tiagi

We will demonstrate a performant and novel approach to performing NAT, that uses a unique transition mechanism utilizing a new flag introduced to the seccomp() system call, to intercept egress connect calls to opportunistically use a ...
Networking @Scale Summer 2022

Network Entitlement: From Hose-based Approval to Host-based Admission | Guanqing Yan and Manikandan Somasundaram

The Wide Area Network (WAN) connects many datacenter (DC) regions and hundreds of Points of Presence (POPs) of Meta. The WAN resource is shared by several high network demand services at Meta. The network must be built for peak demand ...
Networking @Scale Summer 2022

LIVE Q&A | Moderated by Ying Zhang

LIVE Q&A featuring Keerti Lakshminarayan, Alok Tiagi, Guanqing Yan, Manikandan Somasundaram & Jitu Padhye
Data @Scale Spring 2022

Automated Model Update & Evaluation

This talk breaks down stage-by-stage requirements and challenges for online prediction and fully automated, on-demand continual learning. We’ll also discuss key design decisions a company might face when building or adopting a machine ...
Data @Scale Spring 2022

Real-Time Data Processing for ML Feature Engineering

In Meta, we had developed multiple real-time data processing infrastructure like Puma, Stylus and Turbine (SIGMOD ’16 and ICDE ’20). As Meta grows, the needs for real-time data has grown way beyond traditional data ...
Data @Scale Spring 2022

Scalable Data Transportation & Ingestion with MemQ

Machine learning is at the heart of Pinterest and is powered by large scale ML training log collection. To solve the cost efficient data ingestion & transportation problem at Pinterest we developed MemQ, a PubSub system that ...
Data @Scale Spring 2022

ML Monitoring & Observability @Meta Scale

ML generates significant value for Meta’s infrastructure, tools, products, and users. It drives a varied set of insights; from end-user products such as recommendations and feeds on Facebook and Instagram, to infrastructure insights ...
Data @Scale Spring 2022

Enabling Machine Learning through Real-Time Data Processing using Rockset

Data Infrastructure has evolved in the last 15 years from Hadoop’s batch system, to streaming systems like Spark and Kafka and now to realtime systems like Rockset and Clickhouse. Automatic decision making based on massive data ...
Data @Scale Spring 2022

TorchData and TorchArrow: Data Preprocessing for ML at Production Scale

The problem of deep learning and building large scale systems for production is not just one of model training, but data preprocessing as well. At production scale, just the data loading and processing part of the system can cause ...
Data @Scale Spring 2022

Making Data Quality an integral part of developing Machine Learning and Data Products

“Machine Learning models are only as good as the data that was used to train them. Datasets are often plagued with problems such as quality, discoverability, and undesirable social biases. As data and modeling tools are becoming ...
Data @Scale Spring 2022

Minimize Risks and Accelerate MLOps with Model Performance Monitoring and Explainability

We’re truly living under the rule of Algorithms, our day-to-day activities from news consumption, job search, and mortgage financing are increasingly being decided by algorithms. Most of these algorithms are AI-based and are ...
Systems @Scale Spring 2022

Q&A | Moderated by Francois Richard. Featuring Yuri Grinshteyn, Jie Huang, Christopher Bunn, Osama Abuelsorour, Amr Mahdi, Jason Flinn & Arushi Aggarwal

Q&A | Moderated by Francois Richard. Featuring Yuri Grinshteyn, Jie Huang, Christopher Bunn, Osama Abuelsorour, Amr Mahdi, Jason Flinn & Arushi Aggarwal
Systems @Scale Spring 2022

Owl | Arushi Aggarwal & Jason Flinn

We will describe Owl, a new system for high-fanout distribution of large data objects to hosts in Meta’s private cloud. Owl distributes over 700 petabytes of data per day to millions of client processes. It has improved download ...
Systems @Scale Spring 2022

Southpaw: Token-based service load balancing, scaling and QoS system | Osama Abuelsorour & Amr Mahdi

Southpaw is load balancing, scaling and QoS management system for compute-heavy inferencing services. It takes the approach of abstracting services capabilities into tokens and worklanes, where clients are granted tokens that gives ...
Systems @Scale Spring 2022

Vacuum Testing for Resiliency: Verifying Disaster Recovery in Complex | Jie Huang & Christopher Bunn

Engineers at Meta run thousands of services across millions of machines, and those services all have similar needs that can’t be managed by hand: configuration, deployment, monitoring, routing, orchestration, security. To solve the ...
Systems @Scale Spring 2022

Shrinking the Impact of Production Incidents | Yuri Grinshteyn

Shrinking Production Incidents details an organized approach for reducing the overall impact of production outages. Attendees can expect to learn how to prioritize reliability-related engineering tasks based on incident postmortem ...
Systems @Scale Spring 2022

Highly Available and Strongly Consistent Storage Service Using Chain Replication | Kumar Mrinal & Binbin Lu

Highly Available and Strongly Consistent Storage Service Using Chain Replication | Kumar Mrinal & Binbin Lu – In this talk, we present Dumbo – a simple, reliable, highly available, low dependency object storage system ...
Systems @Scale Spring 2022

ACS: De-Identified Authentication at Scale | Shiv Kushwah & Haozhi Xiong

Privacy is core to Meta engineering culture, and one of our fundamental principles is data minimization. We strive to collect and create the minimum amount of data required to provide service. One critical space we’ve identified across ...
Systems @Scale Spring 2022

The Cosmos Big Data Platform at Microsoft: Over a Decade of Progress and a Decade to Look Forward | Ivan Santa Maria Filho

Cosmos is the exabyte-scale big data platform at Microsoft, and SCOPE is its main analytics engine. SCOPE and Cosmos support ETL pipelines, decision support systems, and machine learning pipelines. Applications range from simple ...
Systems @Scale Spring 2022

Scaling Data Ingestion for ML Training at Meta | Aarti Basant

AI models drive several Meta products like News Feed, Ads, IG Reels, language translation to name a few. Our ranking models consume massive datasets to continuously improve user experience on our platform. In this talk, we discuss our ...
Systems @Scale Spring 2022

Cassandra@Scale: A Deep Dive into Apache Cassandra 4.0 | Dinesh Joshi

At almost two years in the making Apache Cassandra 4.0 is here. With a focus on performance and stability, it is full of interesting features. This talk takes you through a tour of the new features and performance improvements. From ...
Systems @Scale Spring 2022

Manifold: Storage Platform Consolidation | Jacob Lacouture

Since 2016, we’ve built, deployed, and scaled a new BLOB storage platform at Meta, called Manifold. Manifold builds on existing BLOB storage infrastructure, but provides a richer, higher-level, general purpose API, and thereby enables ...
Systems @Scale Spring 2022

Transparent Memory Offloading @Meta | Niket Agarwal, Dan Schatzberg, Johannes Weiner

The unrelenting growth of the memory needs of emerging data center applications, along with ever-increasing cost and volatility of DRAM prices, has led to DRAM being a major infrastructure expense. Alternative technologies, such as ...
RTC @Scale 2022

Real-time Communication for Today and Future Experiences – Maher Saba

Real-time Communication for Today and Future Experiences – Maher Saba
RTC @Scale 2022

Holographic Video Calling – Nitin Garg

Holographic Video Calling – Nitin Garg During COVID, the importance of video calling grew as people were stuck and separated from their family and friends. But the 2D experience falls short of making you feel present in the same ...
RTC @Scale 2022

Spatial Communications at Scale in Virtual Environments – Paul Boustead

Spatial Communications at Scale in Virtual Environments – Paul Boustead Talking with a group of friends face-to-face can be very engaging, with fast-paced turn-taking and overlapping conversations. You can even have such a ...
RTC @Scale 2022

RTC3 – Justin Uberti

RTC3 – Justin Uberti The real-time communications industry has evolved rapidly since the release of Skype in 2003, and saw unprecedented growth during the COVID-19 pandemic. This talk will look at the trends of the last 20 years ...
RTC @Scale 2022

RTC @Scale, Future RTC Experiences – Live Q&A

RTC @Scale, Future RTC Experiences – Live Q&A RTC @Scale, Future RTC Experiences Session – Live Q&A with Nitin Garg, Paul Boustead, Justin Uberti, and Rahul Gowda
RTC @Scale 2022

Developing Machine Learning Based Speech Enhancement Models for Teams and Skype – Ross Cutler

Developing Machine Learning Based Speech Enhancement Models for Teams and Skype – Ross Cutler Microsoft Teams and Skype are used daily by hundreds of millions of users, and their usage has increased significantly since the ...
RTC @Scale 2022

Can AI Disrupt Speech Compression? – Jan Skoglund

Can AI Disrupt Speech Compression? – Jan Skoglund AI and deep learning has radically advanced many speech and audio processing applications. For example, we have all experienced improvements in speech recognition and synthesis in ...
RTC @Scale 2022

RTC @Scale, Audio ML – Live Q&A

RTC @Scale, Audio ML – Live Q&A RTC @Scale, Audio ML Session – Live Q&A with Ross Cutler and Jan Skoglund
RTC @Scale 2022

RTC @Scale Live Panel – RTC in the Metaverse

RTC @Scale Live Panel – RTC in the Metaverse RTC @Scale Live Panel with Sriram Srinivasan, Mike Arcuri, Paul Boustead, and Cullen Jennings.
RTC @Scale 2022

AV1 Encoder for RTC – Marco Paniconi

AV1 Encoder for RTC – Marco Paniconi In this presentation we discuss the various features and techniques that make libaom AV1 encoder suitable for RTC applications: from encoding tool selection to reducing complexity of the ...
RTC @Scale 2022

AV1 for RTC: Current and Future – Zoe Liu

AV1 for RTC: Current and Future – Zoe Liu In this talk, we will mainly focus on the state-of-the-art AV1 software encoding capability for its deployment in RTC use cases, taking our Aurora1 AV1 as an instantiation. RTC in essence ...
RTC @Scale 2022

RTC @Scale, Video – Live Q&A

RTC @Scale, Video – Live Q&A RTC @Scale, Video Session – Live Q&A with Marco Paniconi and Zoe Liu.
RTC @Scale 2022

Making Meta RTC Audio More Resilient – Andy Yang

Making Meta RTC Audio More Resilient – Andy Yang The users of Meta RTC products experience a very diverse set of network conditions, some of those may be far from perfect. In this presentation, we are going to cover the following ...
RTC @Scale 2022

Private Calling at WhatsApp – Xi Deng

Private Calling at WhatsApp – Xi Deng WhatsApp’s mission is to connect the world privately by designing a product that’s simple and private. Privacy and security is in our DNA. In this presentation, we are going to talk ...
RTC @Scale 2022

Group Call End-to-End Encryption and the Challenges of Encrypting Large Calls – Abo-Talib Mahfoodh

Group Call End-to-End Encryption and the Challenges of Encrypting Large Calls – Abo-Talib Mahfoodh Meta helps billions of users connect daily by providing real time communication services. Group call is one of these services were ...
RTC @Scale 2022

RTC @Scale, Resilience and Encryption – Live Q&A

RTC @Scale, Resilience and Encryption – Live Q&A RTC @Scale, Resilience and Encryption Session – Live Q&A with Andy Yang, Xi Deng, and Abo-Talib Mahfoodh
Systems @Scale Winter 2021

Week 1 Q&A | Moderated by Ahmad Mamdouh Abdou. Featuring Dávid Bartók, Filip Klepo, Jared Casper, Antonio Davoli & Leandro Silva

Week 1 Q&A | Moderated by Ahmad Mamdouh Abdou. Featuring Dávid Bartók, Filip Klepo, Jared Casper, Antonio Davoli & Leandro Silva
Systems @Scale Winter 2021

Software and Hardware Remediations At Meta | Antonio Davoli & Leandro Silva

Efficient software and hardware failure remediations are the foundations for sustaining high fleet availability at large-scale environments such as Meta. In this talk, we will describe the general architecture that we use to maximize ...
Systems @Scale Winter 2021

SLICK: Driving SLO Culture At Meta | Dávid Bartók & Filip Klepo

SLIs (Service Level Indicators) and SLOs (Service Level Objectives) are industry-standard concepts to measure the long-term reliability of systems. In this presentation, we are going to talk about SLICK, the central SLO tracking ...
Systems @Scale Winter 2021

Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM | Jared Casper

In this talk we present how we trained a 530B parameter language model on a DGX SuperPOD with over 3,000 A100 GPUs and a high speed Infiniband interconnect, and how we can scale to even larger models. We explore three types of ...
Systems @Scale Winter 2021

LogDevice At Scale | Miroslav Crnic & Nick Sukhanov

Meta uses a strongly consistent distributed log storage system to broadcast updates in graphs, deliver signals to ML training pipelines, and collect data for analytics. All of these cases require the underlying log system to be highly ...
Systems @Scale Winter 2021

Scheduled Deletions At Scale | Sneha Padgalwar

At Meta, a large part of our data is ephemeral in nature, such as Instagram or Meta Stories which need to be deleted after a specific time regardless of the action taken by the user. This is sometimes referred to as Time to Live (TTL). ...
Video @Scale 2021

The Importance of Audio Today

Live Panel – “The Importance of Audio Today”. Recently, Audio has been front and center with the emergence of new audio-only products and experiences and many new audio-focused investments to enhance video viewing. ...
Video @Scale 2021

Highly-Efficient SVT-AV1-based Solutions for VOD Applications

This presentation will highlight the latest improvements of the VOD-targeted high-latency Constant Rate Factor (CRF) and Variable Bit Rate (VBR) modes of the SVT-AV1 encoder. It will first present the latest SVT-AV1 cycles-quality ...
Video @Scale 2021

Scaling Your Encoding Backend Using Eve

In this talk, we will discuss the current state in terms of bitrate/quality and complexity of Two Orioles’ Eve video encoder for the VP9 & AV1 video codecs. VP9 provides meaningful quality improvements over H.264 with a mature ...
Video @Scale 2021

Video Quality Assessment of User Generated Contents

Video quality of User Generated Content (UGC) is extremely difficult to wrangle with due to their high diversity of contents and quality. They bring new challenges to how we traditionally measured and assessed video quality. Most ...

Join the @Scale Mailing List and Get the Latest News & Event Info

Code of Conduct

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy