TOPIC: Data, Systems and Networking

Networking @Scale Summer 2022

JUNE 01, 2022 @ 10:00 AM - JUNE 02, 2022 @ 12:30 PM PT
Designed for engineers that build and manage large-scale networks. Networking solutions are critical for building applications and services that serve billions of people around the world. Building and operating such large-scale networks often present complex engineering challenges to solve.
RSVPS CLOSED
AGENDA SPEAKERS

ABOUT EVENT

Top-level summary:
Building and operating large-scale networks hosting applications that serve billions of people worldwide often present complex engineering challenges to solve. At the recently held Networking@Scale 2022 virtual conference hosted by Meta on Jun 01 and Jun 02, 2022, engineers from Cloudflare, Fastly, Google, Microsoft Azure, Netflix, and Meta presented talks and engaged in live panel discussions with the audience around these challenges.

The conference was held virtually and saw a great turnout of attendees from industry and academia alike. This summer edition of Networking@Scale was themed around Transport Innovation – more specifically, on how to efficiently and quickly move data across the network, addressing congestion, performance, reliability, and extensibility through innovations in the transport layer. The conference was spread across two days and focused on transport protocols such as QUIC, TCP, and RDMA.

Day 1 of the conference focused on the value proposition and innovations in using the QUIC protocol in the Internet architecture and specific use-case studies demonstrating high performance and lower latencies achieved with QUIC at the CDN, Edge, and Backbone layers.

Day 2 pivoted to the challenges in Datacenter (DC) and WANs around networking and how innovations in TCP and other protocols (e.g.RoCE) help tackle these.

The Q&A sessions saw great engagement from the audience and presenters where they discussed topics such as QUIC’s agility, QUIC/HTTP3 adoption on the web both on browsers and servers. On the TCP side, there were discussions around BPF tuning vs in-kernel changes, deploying changes at scale, RoCE security and congestion management etc.

Recordings of the presentations are below. If you are interested in future events, please visit the @Scale website, follow the @Scale Facebook page, or join the Networking@Scale attendees Facebook group.

EVENT AGENDA

Event times below are displayed in PT.

June 1

June 2

10:00 AM - 10:10 AM
Opening Remarks
SPEAKER Omar Baldonado,Meta
10:10 AM - 10:30 AM
The Future With QUIC

We've all heard much about QUIC in the past few years, and a lot has been made of its performance benefits for HTTP/3. For some of us however, HTTP/3 was always just the beginning, just the vehicle for us to get QUIC out into the world. This talk will go beyond these immediate benefits of QUIC and present my view on our somewhat anticipated sleight of hand. The talk will discuss QUIC's long-term value proposition for the Internet's architecture, including some recent projects and a broad sketch of where it can go.

SPEAKER Jana Iyengar,Fastly
10:30 AM - 10:40 AM
Quick Cache DSR

In a typical CDN architecture the caching tier is fronted by a load-balancing tier; response content flows from the cache to the requester through the load-balancer. With this architecture extra I/O, CPU cycles and intra-cluster network bandwidth are spent to stream the content through multiple hops. We present a solution utilizing QUIC's properties to implement a form of Direct Server Return (DSR) from the caching layer, directly to the client. This form of DSR obviates the need for most intra-cluster communication when serving cached content. In this talk we go over the technical challenges in implementing QUIC cache DSR, its security properties, the expected performance improvements, and future applications.

SPEAKER Matt Joras,Meta
SPEAKER Yair Gottdenker,Meta
10:40 AM - 11:00 AM
Improving Transfer Times in the Backbone Network Using QUIC Jump Start

Transfers in high-BDP links incur a startup delay for congestion control to probe the bandwidth of the underlying link. The impact of this delay is inversely proportional to the size of the transfer since small transfers may repeatedly spend all their transfer time probing for the available bandwidth and never reach it or utilize it. While this is necessary for links with rapidly changing capacity, it can be avoided in more predictable links such as backbone links. Existing TCP approaches are either limited to specific pairs of endpoints or require intermediate proxies. In this presentation, we share the approach we’ve developed for use with QUIC deployments in Meta’s backbone network. We use a modified congestion controller that tracks the average congestion control state for connections using each backbone path. This state is then used to “jumpstart” new connections across the same path, significantly reducing the startup delay. This, coupled with QUIC 0-rtt, offers significant savings compared to existing TCP-based approaches for transfers of size close to the path BDP. Screen reader support enabled.

SPEAKER Joseph Beshay,Meta
11:00 AM - 11:20 AM
Live Q&A

LIVE Q&A featuring Jana Iyengar, Matt Joras, Yair Gottdenker & Joseph Beshay

SPEAKER Bharat Parekh,Meta
11:20 AM - 11:50 AM
Layer Four and Three Quarters: Fantastic Quirks and Where to Find Them

Nestled between transport protocols (TCP, UDP, QUIC) and application protocols (HTTP, etc.) is a layer few are familiar with. Layer 4¾ sits hiding in plain sight, often only being glimpsed during curious events that raise its prominence, such as edge cases under scale of deployment or diverse usage. In this talk, we'll take a look at the Cloudflare Protocol's team view of the Internet edge and explore some of the fantastic cases we've seen, and what that might mean for future developments of Layer 4 and Layer 7 and the eponymous inbetween.

SPEAKER Lucas Pardue,Cloudflare
11:50 AM - 12:10 PM
The Challenges of 0-RTT in IETF QUIC

A key feature of HTTP/3 over QUIC is the ability to send a request in the first flight with the ClientHello. 0-RTT in IETF QUIC is notably more complex than gQUIC, with multiple packet number spaces and a limit on the amplification factor. Walk through some issues we hit and the tooling we used to identify and debug them before 0-RTT became a performance win for applications.

SPEAKER Ian Swett,Google
12:10 PM - 12:30 PM
Live Q&A

LIVE Q&A featuring Lucas Pardue & Ian Swett

SPEAKER Luca Niccolini,Meta
10:00 AM - 10:20 AM
Tackling DC Congestion and Bursts

A talk about two specific DC transport tuning initiatives (a) handling sustained congestion in the network (b) tackling bursts in network. Covers the motivation, implementation overview, wins and lessons learnt for both these initiatives.

SPEAKER Balasubramanian Madhavan,Meta
SPEAKER Abhishek Dhamija,Meta
10:20 AM - 10:40 AM
NetEdit: Fine-grained Network Tuning at Scale

We will share the design, implementation, and production experience of BPF based platform used to tune the network transport across millions of servers at Meta.

SPEAKER Prashanth Kannan,Meta
SPEAKER Prankur Gupta,Meta
10:40 AM - 11:00 AM
Live Q&A

LIVE Q&A featuring Prashanth Kannan, Balasubramanian Madhavan, Abhishek Dhamija, Prankur Gupta & Kumar Saurabh Arora

SPEAKER Neil Spring,Meta
11:00 AM - 11:20 AM
NATless IPv6/IPv4 Address Translation

We will demonstrate a performant and novel approach to performing NAT, that uses a unique transition mechanism utilizing a new flag introduced to the seccomp() system call, to intercept egress connect calls to opportunistically use a transition IPv4 address when possible, saving applications the pain of dealing with the end host not being reachable, while still living in an IPv6-only environment.

SPEAKER Keerti Lakshminarayan,Netflix
SPEAKER Alok Tiagi,Netflix
11:20 AM - 11:35 AM
Network Entitlement: From Hose-based Approval to Host-based Admission

The Wide Area Network (WAN) connects many datacenter (DC) regions and hundreds of Points of Presence (POPs) of Meta. The WAN resource is shared by several high network demand services at Meta. The network must be built for peak demand and also account for failure scenarios to reduce the impact on Meta products. However, building a resilient network that is over-provisioned for all service peak demands at our current growth rates is practically infeasible due to fiber sourcing, deployment constraints and the costs involved. This talk presents Meta’s production traffic classification and WAN Entitlement solution that is currently used by our services to share the network safely and efficiently. Network Entitlement framework aims to provide a simple, stable, and operations-friendly abstraction of network for sharing the backbone. Our framework includes two key parts: (1) an hose-based entitlement granting system that establishes an agile contract while achieving network efficiency and meeting long-term SLO guarantees, and (2) a flexible large-scale distributed host-based traffic admission system that enforces the contract on the production traffic.

SPEAKER Guanqing Yan,Meta
SPEAKER Manikandan Somasundaram,Meta
11:35 AM - 12:10 PM
RDMA @Scale

Coming Soon!

SPEAKER Jitu Padhye,Microsoft Azure
12:10 PM - 12:30 PM
Live Q&A

LIVE Q&A featuring Keerti Lakshminarayan, Alok Tiagi, Guanqing Yan, Manikandan Somasundaram & Jitu Padhye

SPEAKER ,

SPEAKERS AND MODERATORS

Omar supports the teams developing, deploying, and operating Meta's global data center networks. This includes the overall topologies/control stack, the... read more

Omar Baldonado

Meta

Jana Iyengar is the Product Lead for Infrastructure Services at Fastly, where he is responsible for the core hardware, software,... read more

Jana Iyengar

Fastly

Matt Joras is a Software Engineer at Meta where he primarily works on their QUIC implementation, mvfst. He is also... read more

Matt Joras

Meta

Yair Gottdenker is a Production Engineer at Meta. He has over 14 years of experience in CDN and Traffic space... read more

Yair Gottdenker

Meta

Joseph is a Research Scientist at Meta. He is part of the Traffic Protocols team working on Meta’s QUIC implementation... read more

Joseph Beshay

Meta

Bharat is a Software Engineering Manager in the Traffic Infrastructure group at Meta. He supports the Traffic Protocols Team which... read more

Bharat Parekh

Meta

Lucas is a Senior Software Engineer on the Protocols teams at Cloudflare, and co-chair of the IETF QUIC Working Group.... read more

Lucas Pardue

Cloudflare

Ian Swett is the Manager of Google Cloud Networking's Protocols and Web Performance teams. Ian was heavily involved in the... read more

Ian Swett

Google

Luca is a Software Engineer working on network protocols, improving applications performance at scale. Most recently involved with the implementation... read more

Luca Niccolini

Meta

I am Bala Madhavan. I am a networking enthusiast. In the past, I have worked on building L4 & L7... read more

Balasubramanian Madhavan

Meta

I am a Production Engineer working on Host Networking team at Meta. I work on challenges dealing with scaling, performance... read more

Abhishek Dhamija

Meta

Prashanth Kannan is a software engineer working in the host network team at Meta Platforms, Inc. Prior to Meta, he... read more

Prashanth Kannan

Meta

Prankur Gupta is a Software Engineer for Meta Platforms, Inc. working towards unifying all host level packet and flow editing... read more

Prankur Gupta

Meta

Neil is a Research Scientist at Meta, working on tools to measure and improve service performance on the data center... read more

Neil Spring

Meta

Keerti is a software engineer currently working in the Network Platform at Netflix. read more

Keerti Lakshminarayan

Netflix

Alok is a software engineer currently working in the Network Platform at Netflix. read more

Alok Tiagi

Netflix

I joined Meta as a software engineer in 2017. My focus is on building systems to support network simulation, planning... read more

Guanqing Yan

Meta

I am a Software Engineer at Meta. Over the many years at Meta, I have worked on several things related... read more

Manikandan Somasundaram

Meta

Jitendra Padhye received his PhD from UMass Amherst in 2000. He has been at Microsoft since 2002. He has worked... read more

Jitu Padhye

Microsoft Azure

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy