Systems @Scale Tel Aviv Fall 2019

NOVEMBER 19, 2019 @ 9:00 AM PST - 6:00 PM PST

Designed for engineers that manage large-scale information systems serving millions of people. The operation of large-scale systems often introduces complex, unprecedented engineering challenges.

RSVPS CLOSED

AGENDA SPEAKERS

ABOUT EVENT

Systems @Scale Tel Aviv: Building Distributed Systems is an invitation-only technical conference for engineers that manage large-scale information systems serving millions of people.

As our systems continue to scale, the problem of understanding whether they’re behaving as desired gets progressively harder. As a community, we have developed tools, techniques, and approaches that can be applied to observing the state of these complex distributed systems with the goal of understanding system availability, reliability, performance, and efficiency.

We’ll spend the day covering a wide range of topics exploring these challenges and collaborating on the development of new solutions.

EVENT AGENDA

Event times below are displayed in PT.

November 19

09:00 AM - 10:00 AM

Registration & Breakfast

10:00 AM - 10:05 AM

Welcome

Speaker Tamar Bar Lev,Facebook

10:05 AM - 10:30 AM

Scaling Facebook’s Data Center Infrastructure

WATCH NOW

To run services such as Facebook requires a highly reliable, scalable and efficient Data Center infrastructure.

Learn more about the constant innovation of technology pushing the boundaries of physical infrastructure, allowing Facebook to scale to serve and connect billions of people around the planet.

Speaker Joel Kjellgren,Facebook

10:30 AM - 11:00 AM

Kill the mutants – cause it is about time to test your tests

WATCH NOW

Unit tests are part of our day today. Some of us are even practicing TDD. But we don't have a good measure of the quality of the tests. Tests are supposed to prove the correctness of the code, and together with CI, you also get registration for free. But the big question that we don't address is what is the quality of the test?

If you are using Chaos Monkey then you are already familiar with the concept: Inject failures to your system and check the system robustness as well as the quality of your monitoring and alerts. Mutation Testing adopts the ‘Chaos Monkey’ methodology to the world of unit tests: Inject bugs to your code to see whether the test suite covers it. Or in other words, create mutations to the tested code and validate your tests can identify the mutations and kill them.

Mutation Testing is not a new idea, but considered as too theoretical and was an academic thing. Noways that CPUs are faster and tools are better it is raising up again as a practical quality technique.

Speaker Yonatan Maman,Outbrain

11:00 AM - 11:30 AM

Managing Tradeoffs for Data Prefetching

WATCH NOW

The ability to prefetch data is a key lever in improving FBLite responsiveness. It gives the perception of instant data availability served from local cache.

However, excessive prefetching can lead to data usage that’s not used by the user and performance regressions. This talk will explore the technical challenges we face when serving cached content to FBLite users, and how we balance data usage and resources while maximizing prefetching.

Speaker Michal Trudler,Facebook

11:30 AM - 12:00 PM

Break

12:00 PM - 12:30 PM

The world changed. Did our designs?

WATCH NOW

When we build systems our design and tradeoffs reflect the different scales of the system: the speed of disks, latency of network; They reflect the constraints and abilities of the underlying technologies.

But as technology advances some of these assumptions have become invalid. We are no longer running on physical machines for which RDBMS systems were designed; SSD changed pretty much everything in the storage world, but our software was designed for magnetic disks; NVRAM? O/S design is way off.

This talk will show how changes in hardware technologies impact design rational of various systems, highlighting the importance of understanding and rethinking the design rational and explore new designs that arise from the new rational.

Speaker Avishai Ish-Shalom,Aleph VC

12:30 PM - 01:00 PM

The journey for a new ORM in Go

WATCH NOW

Over the course of the last year, Go became the main programming language for developing services in Facebook Connectivity. Some of them, have a complicated data-model with tens of types and relations.

At Facebook we like to think about our data-model in graph concepts. We've had a good experience with this model internally. The lack of a proper Graph-based ORM for Go, led us to write one and open-source it.

In this talk I’ll share the journey of taking this concept from idea to implementation, and will deep dive into some of the challenges and the technical decisions.

Speaker Ariel Mashraki,Facebook

01:00 PM - 02:00 PM

Lunch

02:00 PM - 02:00 PM

Memory Analysis @Scale

WATCH NOW

In Facebook we run huge Java services, this applies both to the size of a single process and to scale of our servers fleet.
Facebook Lite is one of these dominant Java services within Facebook, serving hundreds of millions of users every month. The architecture of Facebook Lite is unique, as it offloads client’s typical work (data retrieval, business logic, layout calculation, etc.) to the server, causing it to evolve into a memory bound service.

This architecture provides clear advantages to Facebook Lite users and developers, however it also imposes difficulties on service owners for keeping the service healthy and safe from memory regressions. For instance, even a memory regression of 1% has high stability and cost implications on our production system. Therefore, should be detected and blocked as soon as possible.

In this session we will go through the evolution of the Facebook Lite service from a point in time in which it was occasionally suffering from massive memory regressions that put it at risk, through building a scalable and advanced memory analysis infrastructure, to providing high granularity memory visibility to developers and enabling them to push our service to its efficiency limits with massive memory wins.

Speaker Erez Alon,Facebook

02:30 PM - 03:00 PM

The Challenge to Align Data Points @Scale

WATCH NOW

At Singular, we combine data pulled periodically from 2500+ sources and streamlined data that we receive in real-time. Joining these data sets, we encountered a few unique challenges: frequent changes in the periodic data that was pulled from our different sources which affect our real-time data retroactively and periodic and real-time data arrive at different times and should always be aligned and matched.

In this session, we’ll share some of the tricks we use to keep the data aligned @ scale, including separating frequently and infrequently changed data to streamline alignment, detecting changes in the data using consistent hashing and storing data to efficiently apply changes with our bz2 inline-block edit optimization

Speaker Ron Konigsberg,Singular

03:00 PM - 03:30 PM

Monorepos: Moving Fast in a Huge Repository

WATCH NOW

Keeping all of your code in a single repository has huge benefits, but comes with equally huge obstacles. In this session I’ll talk about the challenges Facebook has faced with its massive codebase, and how we’re radically extending our source control system to enable our entire ecosystem of developer tools to remain fast in the face of tremendous growth.

I’ll briefly introduce the concept of a monorepo, give a rough sense of our repository scale, talk about the problems it causes in development (slow source control, slow builds, complex test infrastructure, difficulties maintaining release quality, etc), then talk about a few source control innovations we’ve made to tackle these challenges.

Speaker Durham Goode,Facebook

03:30 PM - 04:00 PM

Operating low-latency fraud prevention systems at scale

WATCH NOW

At Forter, we’re on a mission to build the foundations for a more credible internet by blocking fraudsters and abusers on e-commerce platforms. To achieve that, we need to take millions of high-risk, low-latency decisions per day while processing billions of events.

We’re doing all of this with a very lean and mean R&D team. We had to invent many solutions from the ground up, and we’ll share some of our insights with you.

Speaker Re'em Bensimhon,Forter

04:00 PM - 04:30 PM

Detection & Alerting at FB: Detecting significant metric movements @ Scale

WATCH NOW

Monitoring metrics for any significant movements is key to detecting problems with systems and products. This talk provides an overview of our detection and alerting framework: the scale in the number of timeseries we monitor, the different detection algorithms we offer (rule-based and ML-based) and the ability to auto-slice data along multiple dimensions to identify deeper issues.

Deriving signal without being inundated with noise is crucial at our scale, and we have built tools to empower teams to maintain high signal-to-noise ratio.

To cater to our future scale needs, we are currently focused on automatic monitoring: proactively logging and monitoring the right metrics for different artifacts, proactively analyzing any flagged events and hopefully predicting potential critical incidents.

Speaker Ben Southgate,Facebook

04:30 PM - 04:45 PM

Closing Remarks

04:45 PM - 06:00 PM

Networking Happy Hour

SPEAKERS AND MODERATORS

Tamar Bar Lev

Facebook

Joel Kjellgren

Facebook

Yonatan Maman

Outbrain

Michal Trudler

Facebook

Avishai Ish-Shalom

Aleph VC

Ariel Mashraki

Facebook

Erez Alon

Facebook

Ron Konigsberg

Singular

Durham Goode

Facebook

Re'em Bensimhon

Forter

Ben Southgate

Facebook

UPCOMING EVENT October 23, 2024 | Mobile @Scale

Mobile @Scale 2024

Mobile @Scale is a technical conference designed for the engineers, product managers, and engineering leaders building mobile experiences at significant scale (millions to billions of daily users). Mobile @Scale provides a rare opportunity to gather...

UPCOMING EVENT November 20, 2024 | Video @Scale

Video @Scale 2024

Video @Scale 2024 is a technical conference designed for engineers that develop or manage large-scale video systems serving millions of people. The development of large-scale video systems includes complex, unprecedented engineering challenges. The @Scale community...

PAST EVENT March 20, 2024 @ 9am PT - 3pm PT | RTC @Scale

RTC @Scale 2024

RTC @Scale is for engineers who develop and manage large-scale real-time communication (RTC) systems serving millions of people. The operations of large-scale RTC systems have always involved complex engineering challenges which continue to attract attention...

Past EVENT May 22, 2024 | Data @Scale

Data @Scale 2024

Data @Scale is a technical conference for engineers who are interested in building, operating, and using data systems at scale. Companies across the industry use data and underlying infrastructure to build products with user empathy,...

Past EVENT June 12, 2024 | Systems @Scale

Systems @Scale 2024

Systems @Scale 2024 is a technical conference intended for engineers that build and manage large-scale distributed systems serving millions or billions of users. The development and operation of such systems often introduces complex, unprecedented engineering...

Past EVENT JULY 31, 2024 @ 2:30 PM PDT - 7:00 PM PDT - IN PERSON EVENT | AUGUST 7, 2024 @ 2:30 PM PDT - 5:30 PM PDT - VIRTUAL PROGRAM | AI Infra @Scale

AI Infra @Scale 2024

Meta’s Engineering and Infrastructure teams are excited to return for the second year in a row to host AI Infra @Scale on July 31. This year’s event is open to a limited number of in-person...

Past EVENT August 14, 2024 | Product @Scale

Product @Scale 2024

Product @Scale conferences are designed for technologists who work on solving complex product problems at scale. The @Scale community focuses on bringing forward people's experiences in creating innovative solutions to large-scale products serving millions or...

Past EVENT September 11, 2024 | Santa Clara Convention Center | Networking @Scale

Networking @Scale 2024

Meta’s Networking team invites you to Networking@scale on September 11th. This year’s event is an in-person event hosted at the Santa Clara Convention center and will also be live streamed for virtual attendees. Registration is...

Past EVENT October 9, 2024 | Reliability @Scale

Reliability @Scale 2024

In the digital age, where systems operate at unprecedented scales, the importance of robust configuration management cannot be overstated. This year’s Reliability @Scale will focus on a central theme of "Move Safely", emphasizing the critical...

FIND @SCALE TOPICS

Dev Tools and Ops, Privacy, Sustainability and Performance Fighting Abuse and Security Machine Learning and AI Mobile, Video and Web

Systems @Scale Tel Aviv Fall 2019

ABOUT EVENT

EVENT AGENDA

November 19

November 19

SPEAKERS AND MODERATORS

Tamar Bar Lev

Joel Kjellgren

Yonatan Maman

Michal Trudler

Avishai Ish-Shalom

Ariel Mashraki

Erez Alon

Ron Konigsberg

Durham Goode

Re'em Bensimhon

Ben Southgate

Mobile @Scale 2024

Video @Scale 2024

RTC @Scale 2024

Data @Scale 2024

Systems @Scale 2024

AI Infra @Scale 2024

Product @Scale 2024

Networking @Scale 2024

Reliability @Scale 2024

FIND @SCALE TOPICS

EXPLORE OTHER SERIES

Data @Scale

Networking @Scale

Reliability @Scale