TOPIC: Data, Systems and Networking

Networking @Scale Summer 2022

JUNE 01, 2022 @ 10:00 AM PDT - 12:30 PM PDT
JUNE 02, 2022 @ 10:00 AM PDT - 12:30 PM PDT
Designed for engineers that build and manage large-scale networks. Networking solutions are critical for building applications and services that serve billions of people around the world. Building and operating such large-scale networks often present complex engineering challenges to solve.


Top-level summary:
Building and operating large-scale networks hosting applications that serve billions of people worldwide often present complex engineering challenges to solve. At the recently held Networking@Scale 2022 virtual conference hosted by Meta on Jun 01 and Jun 02, 2022, engineers from Cloudflare, Fastly, Google, Microsoft Azure, Netflix, and Meta presented talks and engaged in live panel discussions with the audience around these challenges.

The conference was held virtually and saw a great turnout of attendees from industry and academia alike. This summer edition of Networking@Scale was themed around Transport Innovation – more specifically, on how to efficiently and quickly move data across the network, addressing congestion, performance, reliability, and extensibility through innovations in the transport layer. The conference was spread across two days and focused on transport protocols such as QUIC, TCP, and RDMA.

Day 1 of the conference focused on the value proposition and innovations in using the QUIC protocol in the Internet architecture and specific use-case studies demonstrating high performance and lower latencies achieved with QUIC at the CDN, Edge, and Backbone layers.

Day 2 pivoted to the challenges in Datacenter (DC) and WANs around networking and how innovations in TCP and other protocols (e.g.RoCE) help tackle these.

The Q&A sessions saw great engagement from the audience and presenters where they discussed topics such as QUIC’s agility, QUIC/HTTP3 adoption on the web both on browsers and servers. On the TCP side, there were discussions around BPF tuning vs in-kernel changes, deploying changes at scale, RoCE security and congestion management etc.

Recordings of the presentations are below. If you are interested in future events, please visit the @Scale website, follow the @Scale Facebook page, or join the Networking@Scale attendees Facebook group.


Event times below are displayed in PT.

June 1

June 2

10:00 AM - 10:10 AM
Opening Remarks
Speaker Omar Baldonado,Meta
10:10 AM - 10:30 AM
The Future With QUIC

We've all heard much about QUIC in the past few years, and a lot has been made of its performance benefits for HTTP/3. For some of us however, HTTP/3 was always just the beginning, just the vehicle for us to get QUIC out into the world. This talk will go beyond these immediate benefits of QUIC and present my view on our somewhat anticipated sleight of hand. The talk will discuss QUIC's long-term value proposition for the Internet's architecture, including some recent projects and a broad sketch of where it can go.

Speaker Jana Iyengar,Fastly
10:30 AM - 10:40 AM
Quick Cache DSR

In a typical CDN architecture the caching tier is fronted by a load-balancing tier; response content flows from the cache to the requester through the load-balancer. With this architecture extra I/O, CPU cycles and intra-cluster network bandwidth are spent to stream the content through multiple hops. We present a solution utilizing QUIC's properties to implement a form of Direct Server Return (DSR) from the caching layer, directly to the client. This form of DSR obviates the need for most intra-cluster communication when serving cached content. In this talk we go over the technical challenges in implementing QUIC cache DSR, its security properties, the expected performance improvements, and future applications.

Speaker Matt Joras,Meta
Speaker Yair Gottdenker,Meta
10:40 AM - 11:00 AM
Improving Transfer Times in the Backbone Network Using QUIC Jump Start

Transfers in high-BDP links incur a startup delay for congestion control to probe the bandwidth of the underlying link. The impact of this delay is inversely proportional to the size of the transfer since small transfers may repeatedly spend all their transfer time probing for the available bandwidth and never reach it or utilize it. While this is necessary for links with rapidly changing capacity, it can be avoided in more predictable links such as backbone links. Existing TCP approaches are either limited to specific pairs of endpoints or require intermediate proxies. In this presentation, we share the approach we’ve developed for use with QUIC deployments in Meta’s backbone network. We use a modified congestion controller that tracks the average congestion control state for connections using each backbone path. This state is then used to “jumpstart” new connections across the same path, significantly reducing the startup delay. This, coupled with QUIC 0-rtt, offers significant savings compared to existing TCP-based approaches for transfers of size close to the path BDP. Screen reader support enabled.

Speaker Joseph Beshay,Meta
11:00 AM - 11:20 AM
Live Q&A

LIVE Q&A featuring Jana Iyengar, Matt Joras, Yair Gottdenker & Joseph Beshay

Speaker Bharat Parekh,Meta
11:20 AM - 11:50 AM
Layer Four and Three Quarters: Fantastic Quirks and Where to Find Them

Nestled between transport protocols (TCP, UDP, QUIC) and application protocols (HTTP, etc.) is a layer few are familiar with. Layer 4¾ sits hiding in plain sight, often only being glimpsed during curious events that raise its prominence, such as edge cases under scale of deployment or diverse usage. In this talk, we'll take a look at the Cloudflare Protocol's team view of the Internet edge and explore some of the fantastic cases we've seen, and what that might mean for future developments of Layer 4 and Layer 7 and the eponymous inbetween.

Speaker Lucas Pardue,Cloudflare
11:50 AM - 12:10 PM
The Challenges of 0-RTT in IETF QUIC

A key feature of HTTP/3 over QUIC is the ability to send a request in the first flight with the ClientHello. 0-RTT in IETF QUIC is notably more complex than gQUIC, with multiple packet number spaces and a limit on the amplification factor. Walk through some issues we hit and the tooling we used to identify and debug them before 0-RTT became a performance win for applications.

Speaker Ian Swett,Google
12:10 PM - 12:30 PM
Live Q&A

LIVE Q&A featuring Lucas Pardue & Ian Swett

Speaker Luca Niccolini,Meta
10:00 AM - 10:20 AM
Tackling DC Congestion and Bursts

A talk about two specific DC transport tuning initiatives (a) handling sustained congestion in the network (b) tackling bursts in network. Covers the motivation, implementation overview, wins and lessons learnt for both these initiatives.

Speaker Balasubramanian Madhavan,Meta
Speaker Abhishek Dhamija,Meta
10:20 AM - 10:40 AM
NetEdit: Fine-grained Network Tuning at Scale

We will share the design, implementation, and production experience of BPF based platform used to tune the network transport across millions of servers at Meta.

Speaker Prashanth Kannan,Meta
Speaker Prankur Gupta,Meta
10:40 AM - 11:00 AM
Live Q&A

LIVE Q&A featuring Prashanth Kannan, Balasubramanian Madhavan, Abhishek Dhamija, Prankur Gupta & Kumar Saurabh Arora

Speaker Neil Spring,Meta
11:00 AM - 11:20 AM
NATless IPv6/IPv4 Address Translation

We will demonstrate a performant and novel approach to performing NAT, that uses a unique transition mechanism utilizing a new flag introduced to the seccomp() system call, to intercept egress connect calls to opportunistically use a transition IPv4 address when possible, saving applications the pain of dealing with the end host not being reachable, while still living in an IPv6-only environment.

Speaker Keerti Lakshminarayan,Netflix
Speaker Alok Tiagi,Netflix
11:20 AM - 11:35 AM
Network Entitlement: From Hose-based Approval to Host-based Admission

The Wide Area Network (WAN) connects many datacenter (DC) regions and hundreds of Points of Presence (POPs) of Meta. The WAN resource is shared by several high network demand services at Meta. The network must be built for peak demand and also account for failure scenarios to reduce the impact on Meta products. However, building a resilient network that is over-provisioned for all service peak demands at our current growth rates is practically infeasible due to fiber sourcing, deployment constraints and the costs involved. This talk presents Meta’s production traffic classification and WAN Entitlement solution that is currently used by our services to share the network safely and efficiently. Network Entitlement framework aims to provide a simple, stable, and operations-friendly abstraction of network for sharing the backbone. Our framework includes two key parts: (1) an hose-based entitlement granting system that establishes an agile contract while achieving network efficiency and meeting long-term SLO guarantees, and (2) a flexible large-scale distributed host-based traffic admission system that enforces the contract on the production traffic.

Speaker Guanqing Yan,Meta
Speaker Manikandan Somasundaram,Meta
11:35 AM - 12:10 PM
RDMA @Scale

Coming Soon!

Speaker Jitu Padhye,Microsoft Azure
12:10 PM - 12:30 PM
Live Q&A

LIVE Q&A featuring Keerti Lakshminarayan, Alok Tiagi, Guanqing Yan, Manikandan Somasundaram & Jitu Padhye


Omar supports the teams developing, deploying, and operating Meta's global data center networks. This... read more

Omar Baldonado


Jana Iyengar is the Product Lead for Infrastructure Services at Fastly, where he is... read more

Jana Iyengar


Matt Joras is a Software Engineer at Meta where he primarily works on their... read more

Matt Joras


Yair Gottdenker is a Production Engineer at Meta. He has over 14 years of... read more

Yair Gottdenker


Joseph is a Research Scientist at Meta. He is part of the Traffic Protocols... read more

Joseph Beshay


Bharat is a Software Engineering Manager in the Traffic Infrastructure group at Meta. He... read more

Bharat Parekh


Lucas is a Senior Software Engineer on the Protocols teams at Cloudflare, and co-chair... read more

Lucas Pardue


Ian Swett is the Manager of Google Cloud Networking's Protocols and Web Performance teams.... read more

Ian Swett


Luca is a Software Engineer working on network protocols, improving applications performance at scale.... read more

Luca Niccolini


I am Bala Madhavan. I am a networking enthusiast. In the past, I have... read more

Balasubramanian Madhavan


I am a Production Engineer working on Host Networking team at Meta. I work... read more

Abhishek Dhamija


Prashanth Kannan is a software engineer working in the host network team at Meta... read more

Prashanth Kannan


Prankur Gupta is a Software Engineer for Meta Platforms, Inc. working towards unifying all... read more

Prankur Gupta


Neil is a Research Scientist at Meta, working on tools to measure and improve... read more

Neil Spring


Keerti is a software engineer currently working in the Network Platform at Netflix. read more

Keerti Lakshminarayan


Alok is a software engineer currently working in the Network Platform at Netflix. read more

Alok Tiagi


I joined Meta as a software engineer in 2017. My focus is on building... read more

Guanqing Yan


I am a Software Engineer at Meta. Over the many years at Meta, I... read more

Manikandan Somasundaram


Jitendra Padhye received his PhD from UMass Amherst in 2000. He has been at... read more

Jitu Padhye

Microsoft Azure
UPCOMING EVENT   JULY 31, 2024 @ 2:30 PM PDT - 7:00 PM PDT - IN PERSON EVENT | AUGUST 7, 2024 @ 2:30 PM PDT - 5:30 PM PDT - VIRTUAL PROGRAM AI @Scale

AI Infra @Scale 2024

Meta’s Engineering and Infrastructure teams are excited to return for the second year in a row to host AI Infra @Scale on July 31. This year’s event is open to a limited number of in-person...
UPCOMING EVENT   August 14, 2024 Product @Scale

Product @Scale 2024

Product @Scale conferences are designed for technologists who work on solving complex product problems at scale. The @Scale community focuses on bringing forward people's experiences in creating innovative solutions to large-scale products serving millions or...
UPCOMING EVENT   September 11, 2024 | Santa Clara Convention Center Networking @Scale

Networking @Scale 2024

Meta’s Networking team invites you to Networking@scale on September 11th. . This year’s event is an in-person event hosted at the Santa Clara Convention center and will also be live streamed for virtual attendees. Registration...
UPCOMING EVENT   October 9, 2024 Reliability @Scale

Reliability @Scale 2024

Reliability @Scale is a technical conference for engineers who are passionate about building and understanding highly resilient and reliable systems and products at massive scale. Whether it’s novel design decisions, or outages that impact billions...
UPCOMING EVENT   October 23, 2024 Mobile @Scale

Mobile @Scale 2024

Mobile @Scale is a technical conference designed for the engineers, product managers, and engineering leaders building mobile experiences at significant scale (millions to billions of daily users). Mobile @Scale provides a rare opportunity to gather...
UPCOMING EVENT   November 20, 2024 Video @Scale

Video @Scale 2024

Video @Scale 2024 is a technical conference designed for engineers that develop or manage large-scale video systems serving millions of people. The development of large-scale video systems includes complex, unprecedented engineering challenges. The @Scale community...
PAST EVENT   March 20, 2024 @ 9am PT - 3pm PT RTC @Scale

RTC @Scale 2024

RTC @Scale is for engineers who develop and manage large-scale real-time communication (RTC) systems serving millions of people. The operations of large-scale RTC systems have always involved complex engineering challenges which continue to attract attention...
Past EVENT   May 22, 2024 Data @Scale

Data @Scale 2024

Data @Scale is a technical conference for engineers who are interested in building, operating, and using data systems at scale. Companies across the industry use data and underlying infrastructure to build products with user empathy,...
Past EVENT   June 12, 2024 Systems @Scale

Systems @Scale 2024

Systems @Scale 2024 is a technical conference intended for engineers that build and manage large-scale distributed systems serving millions or billions of users. The development and operation of such systems often introduces complex, unprecedented engineering...

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy