Networking @Scale

Hotel Commonwealth 8:30am - 6:30pm

Event Completed

Networking @Scale is an invitation-only technical conference for engineers that build and manage large-scale networks.

Networking solutions are critical for building applications and services that serve millions, and sometimes billions, of people around the world. At this scale, there are always complex engineering challenges to solve. We’ll spend the day sharing experiences in improving reliability, security and performance in large-scale networks and collaborating on the development of new solutions.

Networking @Scale will be held at Hotel Commonwealth in Boston, Massachusetts on Tuesday, November 12th beginning at 8:30 AM ET. Be sure to also stick around for Happy Hour in the evening.

Read More Read Less

@Scale brings thousands of engineers together throughout the year to discuss complex engineering challenges and to work on the development of new solutions. We're committed to providing a safe and welcoming environment — one that encourages collaboration and sparks innovation.

Every @Scale event participant has the right to enjoy his or her experience without fear of harassment, discrimination, or condescension. The @Scale code of conduct outlines the behavior that we support and don't support at @Scale events and conferences. We expect participants to follow these rules at all @Scale event venues, online communities, and event-related social activities. These guidelines will keep the @Scale community a safe and enjoyable one for everyone.

Be welcoming. Everyone is welcome at @Scale events, inclusive of (but not limited to) gender, gender identity or expression, sexual orientation, body size, differing abilities, ethnicity, national origin, language, religion, political beliefs, socioeconomic status, age, color and neurodiversity. We have a zero-tolerance policy for discrimination.

Choose your words carefully. Treat one another with respect and in a professional manner. We're here to collaborate. Conflict is not part of the equation.

Know where the line is, and don't cross it. Harassment, threats, or intimidation of any kind will not be tolerated. This includes verbal, physical, sexual (such as sexualized imagery on clothing, presentations, in print, or onscreen), written, or any other form of aggression (whether outright, subtle, or micro). Behavior that is offensive, as determined by @Scale organizers, security staff, or conference management, will not be tolerated. Participants who are asked to stop a behavior or an action are expected to comply immediately or will be asked to leave.

Don't be afraid to call out bad behavior. If you're the target of harmful or offensive behavior, or if you witness someone else being harassed, threatened, or intimidated, don't look away. Tell an @Scale staff member, a security staff member, or a conference organizer immediately. Please notify our event staff, security staff, or conference organizers of any harmful or offensive behavior that you've experienced or witnessed in any form, whether in person or online.

We at @Scale want our events to be safe for everyone, and we have a zero-tolerance policy for violations of our code of conduct. @Scale conference organizers will investigate any allegation of problematic behavior, and we will respond accordingly. We reserve the right to take any follow-up actions we determine are needed. These include being warned, being refused admittance, being ejected from the conference with no refund, and being banned from future @Scale events.

Event Completed
Agenda
8:30am - 10:00am

Registration & Breakfast

8:30am - 10:00am

Women in Tech Breakfast & Panel Discussion

10:00am - 10:05am

Welcome

10:05am - 10:30am

Keynote - Network Reliability: Where we have been and where we are going

Network grew up as a mostly best-effort service but has evolved into one of the foundational elements of modern cloud-based computer systems. Networking solutions are critical for building applications and services that serve billions of people around the world. Today’s networks are expected to be highly reliable. This talk is a retrospective look at how networks, networking technologies and network professionals have evolved over the last several years. More importantly, the talk touches on areas we need to focus on in order to advance network reliability an inch closer to the mythical 100% reliable system.
10:30am - 11:00am

All the Bits, Everywhere, All of the Time: Challenges in Building and Automating a Reliable Global Network

Large global WAN networks have unique reliability and capacity delivery requirements. They typically connect to the Internet, which means they use distributed routing protocols. They are typically much more sparse and irregular than large cluster networks, and can have significantly poorer reachability depending on where in the world they are. Yet, we depend on these networks to reach our customers. We need to build and maintain these networks at an extremely high level of reliability, while at the same time, growing the capacity on these network at hithertofore unseen speeds, while doing it cheaper than ever before. These needs are often directly in conflict. In this talk, Ashok will go over some of his experiences in building and automating Google’s network backbone. He will cover: -- The perceived and real reliability differences between SDN and on-box routed networks. -- The importance of network automation and programmatic network management to capacity delivery as well as reliability. -- The risks introduced by these management paradigms, and how they can be mitigated. -- The importance of defining and measuring network SLOs, and tracking network health and capacity availability over time against these SLOs. -- Some of the hard problems in global WAN availability today, such as global routes, BGP and MPLS, and where we could go from here in the search for a truly 6-nines network.
11:00am - 11:30am

Detecting Unusually-Routed ASes

The routes used in the Internet's interdomain routing system are a rich information source that could be exploited to answer a wide range of questions. However, analyzing routes is difficult, because the fundamental object of study is a set of paths. In this talk we will present new analysis tools -- metrics and methods -- for analyzing AS paths, and apply them to study interdomain routing in the Internet over a recent 13-year period. Using these tools we will try to present a quantitative understanding of changes in Internet routing at the micro level (of individual ASes) as well as at the macro level (of the set of all ASes). More specifically, we will show that at the micro level, our tools can identify clusters of ASes that have the most unusual routing at each time (interestingly, such clusters often correspond to sets of jointly-owned ASes). We will also show that analysis of individual ASes can expose business and engineering strategies of the organizations owning the ASes. These strategies are often related to content delivery or service replication. At the macro level, we will show that ASes with the most unusual routing define discernible and interpretable phases of the Internet's evolution. Furthermore, we will discuss how our tools can be used to provide a quantitative measure of the "flattening" of the Internet.
11:30am - 12:00pm

Anycast Content Delivery at Akamai

Akamai is well-known as a DNS-based CDN. Instead of building a few dozens of very large POPs, Akamai tries to serve content from a few thousands of small POPs very close to the end users and use DNS to direct end users to a POP that is best for them. This generally gets better performance and scale. However, there are some unique cases where the alternative, anycast-based content delivery, is a better option. Igor will present Akamai's "hybrid anycast" architecture that allows Akamai to serve traffic from thousands of edge deployments but over anycast addresses announced from dozens of POPs. He'll discuss advantages of this architecture as well as hurdles and experiences.
12:00pm - 12:45pm

Lunch

12:45pm - 1:15pm

Self Organizing Mesh Access (SOMA)

SOMA focuses on an enterprise-level Wi-Fi mesh network optimized for providing connectivity in unconnected and underserved markets. By lowering the total cost of ownership (TCO), simplifying connectivity installations and reducing operational overhead, Facebook’s goal is to help ISPs all over the world in expanding their footprint. We currently have several successful mesh deployments in Africa with over 200 mesh APs that demonstrate this Facebook technology very effectively for public Wi-Fi use cases.
1:15pm - 1:45pm

Security Performance Management

Security threats arising from supply chains pose a serious and growing danger, but traditional risk management techniques are largely subjective, often ambiguous, and scale poorly. We have developed an objective set of metrics that describe security performance of organizations, using a variety of external observations (including compromised systems, endpoint telemetry, file sharing activity, and server configurations, among others); we compute daily updates to these metrics for hundred of thousands of organizations worldwide. In this session, we will discuss some of the key challenges in collecting, storing, and processing cybersecurity observations on a global scale. These data also provide a unique perspective into widespread security events and trends; as an example, we will present an analysis of the attack surface introduced by recent vulnerabilities and use this to gain insight into the effectiveness of security controls across various industries and localities.
1:45pm - 2:15pm

Enforcing Encryption @Scale

At Facebook, we run a global infrastructure that supports thousands of services, with many new ones spinning up daily. We take protecting our network traffic very seriously, so we must have a sustainable way to enforce our security policies transparently and globally. One of the requirements is that all traffic that crosses "unsafe" network links must be encrypted with TLS 1.2 or above using secure modern ciphers and robust key management. This talk describes the infrastructure we built for enforcing the "encrypt all" policy on the end-hosts. We discuss alternatives and tradeoffs and how we use BPF programs. We also go over some of the numerous challenges we faced when realizing this plan. Additionally, we talk about one of our solutions, Transparent TLS (TTLS), that we've built for services that either could not enable TLS natively or could not upgrade to a newer version of TLS easily.
2:15pm - 2:35pm

Break

2:35pm - 3:05pm

Improving QUIC CPU Performance

QUIC is a new internet transport that forms the foundation of HTTP/3 at the IETF. The 2017 SIGCOMM paper on QUIC estimated it constituted 7% of public internet traffic, making the CPU efficiency of QUIC extremely important. However, as of 2017, QUIC consumed over 2x the CPU of HTTPS over TCP. Learn how the QUIC and YouTube teams massively reduced QUIC CPU consumption, reaching parity with TCP in some cases.
3:05pm - 3:35pm

Using SmartNICs to Offload Connection Encryption in the Data Center

In an age where ensuring data privacy is becoming more essential than ever, encryption within the datacenter is becoming a reality. However, this incurs a significant CPU cost. This talk will explain how SmartNICs can be used to offload TLS encryption, both ensuring that the host TCP stack is not compromised and how the NIC can keep all the necessary state of a socket based mechanism, dealing with the myriad of exception cases such as packet drops, out of order packets and host side packet mangling. We will then demonstrate the benefits to be gained from this type of offload in a variety of cases. Finally, we will look at the possibilities of applying this type of technology to emerging protocols such as QUIC and the benefits of integrating encryption and congestion control mechanisms to ensure optimal performance.
3:35pm - 4:05pm

Performance Tools and Techniques to Improve Envoy Scalability

As Envoy scales with traffic growth, service complexity, and processor-count, to achieve our performance goals we need an increasing array of tools. We need tools to help visualize latency, throughput, memory, CPU-load, and thread contention. Some of these tools already exist, such as kcachegrind and Google’s performance benchmarking library. Others needed to be built, such as a new OSS L7 load-tester based on the Envoy networking stack, that is capable of driving HTTP2 traffic through proxies. In this talk, we’ll discuss these tools and how we’ve applied them to find and fix bottlenecks in Envoy, and help us make decisions about how to improve the system and its usage.
4:05pm - 4:25pm

Break

4:25pm - 4:55pm

Adaptive Cache Networks with Optimality Guarantees

Optimally placing content over a network of arises in many networking applications. Given the content demand, described by content requests and paths they follow, we wish to determine the content placement that maximizes the expected caching gain, i.e., the reduction of routing costs due to intermediate caching. The offline version of this problem is NP-hard. To make matters worse, in most cases, both the demand and the network topology may be a priori unknown; hence, distributed, adaptive content placement algorithms that yield constant approximation guarantees are desired. We show that path replication, an algorithm encountered often in both networking literature and in practice, can be arbitrarily suboptimal when combined with traditional cache eviction policies, like LRU, LFU, or FIFO. We propose a distributed, adaptive algorithm that provably constructs a probabilistic content placement within 1−1/e factor from the optimal, in expectation.
4:55pm - 5:25pm

Building Stadia's Edge Compute Platform

Building an edge platform to support Stadia (Google's gaming platform) has presented a number of challenges. To ensure the best performance for users on a product of Stadia's scope, we've had to scale Google's edge platform and build new networking, compute, and storage services. This talk will explore some of the challenges we've faced scaling Google's stack both up and down to support the reach and performance requirements of a new gaming platform.
5:25pm - 5:30pm

Closing Remarks

5:30pm - 6:30pm

Happy Hour

Join the @Scale Mailing List and Get the Latest News & Event Info

Code of Conduct

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy