Systems @Scale Tel Aviv

Avigdor Event Space 9:00am - 6:00pm

Event Completed

Systems @Scale Tel Aviv: Building Distributed Systems is an invitation-only technical conference for engineers that manage large-scale information systems serving millions of people.

As our systems continue to scale, the problem of understanding whether they’re behaving as desired gets progressively harder. As a community, we have developed tools, techniques, and approaches that can be applied to observing the state of these complex distributed systems with the goal of understanding system availability, reliability, performance, and efficiency.

We’ll spend the day covering a wide range of topics exploring these challenges and collaborating on the development of new solutions.

Read More Read Less

@Scale brings thousands of engineers together throughout the year to discuss complex engineering challenges and to work on the development of new solutions. We're committed to providing a safe and welcoming environment — one that encourages collaboration and sparks innovation.

Every @Scale event participant has the right to enjoy his or her experience without fear of harassment, discrimination, or condescension. The @Scale code of conduct outlines the behavior that we support and don't support at @Scale events and conferences. We expect participants to follow these rules at all @Scale event venues, online communities, and event-related social activities. These guidelines will keep the @Scale community a safe and enjoyable one for everyone.

Be welcoming. Everyone is welcome at @Scale events, inclusive of (but not limited to) gender, gender identity or expression, sexual orientation, body size, differing abilities, ethnicity, national origin, language, religion, political beliefs, socioeconomic status, age, color and neurodiversity. We have a zero-tolerance policy for discrimination.

Choose your words carefully. Treat one another with respect and in a professional manner. We're here to collaborate. Conflict is not part of the equation.

Know where the line is, and don't cross it. Harassment, threats, or intimidation of any kind will not be tolerated. This includes verbal, physical, sexual (such as sexualized imagery on clothing, presentations, in print, or onscreen), written, or any other form of aggression (whether outright, subtle, or micro). Behavior that is offensive, as determined by @Scale organizers, security staff, or conference management, will not be tolerated. Participants who are asked to stop a behavior or an action are expected to comply immediately or will be asked to leave.

Don't be afraid to call out bad behavior. If you're the target of harmful or offensive behavior, or if you witness someone else being harassed, threatened, or intimidated, don't look away. Tell an @Scale staff member, a security staff member, or a conference organizer immediately. Please notify our event staff, security staff, or conference organizers of any harmful or offensive behavior that you've experienced or witnessed in any form, whether in person or online.

We at @Scale want our events to be safe for everyone, and we have a zero-tolerance policy for violations of our code of conduct. @Scale conference organizers will investigate any allegation of problematic behavior, and we will respond accordingly. We reserve the right to take any follow-up actions we determine are needed. These include being warned, being refused admittance, being ejected from the conference with no refund, and being banned from future @Scale events.

Event Completed
Agenda
9:00am - 10:00am

Registration & Breakfast

10:00am - 10:05am

Welcome

10:05am - 10:30am

Scaling Facebook’s Data Center Infrastructure

To run services such as Facebook requires a highly reliable, scalable and efficient Data Center infrastructure. Learn more about the constant innovation of technology pushing the boundaries of physical infrastructure, allowing Facebook to scale to serve and connect billions of people around the planet.
10:30am - 11:00am

Kill the mutants – cause it is about time to test your tests

Unit tests are part of our day today. Some of us are even practicing TDD. But we don't have a good measure of the quality of the tests. Tests are supposed to prove the correctness of the code, and together with CI, you also get registration for free. But the big question that we don't address is what is the quality of the test? If you are using Chaos Monkey then you are already familiar with the concept: Inject failures to your system and check the system robustness as well as the quality of your monitoring and alerts. Mutation Testing adopts the ‘Chaos Monkey’ methodology to the world of unit tests: Inject bugs to your code to see whether the test suite covers it. Or in other words, create mutations to the tested code and validate your tests can identify the mutations and kill them. Mutation Testing is not a new idea, but considered as too theoretical and was an academic thing. Noways that CPUs are faster and tools are better it is raising up again as a practical quality technique.
11:00am - 11:30am

Managing Tradeoffs for Data Prefetching

The ability to prefetch data is a key lever in improving FBLite responsiveness. It gives the perception of instant data availability served from local cache. However, excessive prefetching can lead to data usage that’s not used by the user and performance regressions. This talk will explore the technical challenges we face when serving cached content to FBLite users, and how we balance data usage and resources while maximizing prefetching.
11:30am - 12:00pm

Break

12:00pm - 12:30pm

The world changed. Did our designs?

When we build systems our design and tradeoffs reflect the different scales of the system: the speed of disks, latency of network; They reflect the constraints and abilities of the underlying technologies. But as technology advances some of these assumptions have become invalid. We are no longer running on physical machines for which RDBMS systems were designed; SSD changed pretty much everything in the storage world, but our software was designed for magnetic disks; NVRAM? O/S design is way off. This talk will show how changes in hardware technologies impact design rational of various systems, highlighting the importance of understanding and rethinking the design rational and explore new designs that arise from the new rational.
12:30pm - 1:00pm

The journey for a new ORM in Go

Over the course of the last year, Go became the main programming language for developing services in Facebook Connectivity. Some of them, have a complicated data-model with tens of types and relations. At Facebook we like to think about our data-model in graph concepts. We've had a good experience with this model internally. The lack of a proper Graph-based ORM for Go, led us to write one and open-source it. In this talk I’ll share the journey of taking this concept from idea to implementation, and will deep dive into some of the challenges and the technical decisions.
1:00pm - 2:00pm

Lunch

2:00pm - 2:30pm

Memory Analysis @Scale

In Facebook we run huge Java services, this applies both to the size of a single process and to scale of our servers fleet. Facebook Lite is one of these dominant Java services within Facebook, serving hundreds of millions of users every month. The architecture of Facebook Lite is unique, as it offloads client’s typical work (data retrieval, business logic, layout calculation, etc.) to the server, causing it to evolve into a memory bound service. This architecture provides clear advantages to Facebook Lite users and developers, however it also imposes difficulties on service owners for keeping the service healthy and safe from memory regressions. For instance, even a memory regression of 1% has high stability and cost implications on our production system. Therefore, should be detected and blocked as soon as possible. In this session we will go through the evolution of the Facebook Lite service from a point in time in which it was occasionally suffering from massive memory regressions that put it at risk, through building a scalable and advanced memory analysis infrastructure, to providing high granularity memory visibility to developers and enabling them to push our service to its efficiency limits with massive memory wins.
2:30pm - 3:00pm

The Challenge to Align Data Points @Scale

At Singular, we combine data pulled periodically from 2500+ sources and streamlined data that we receive in real-time. Joining these data sets, we encountered a few unique challenges: frequent changes in the periodic data that was pulled from our different sources which affect our real-time data retroactively and periodic and real-time data arrive at different times and should always be aligned and matched. In this session, we’ll share some of the tricks we use to keep the data aligned @ scale, including separating frequently and infrequently changed data to streamline alignment, detecting changes in the data using consistent hashing and storing data to efficiently apply changes with our bz2 inline-block edit optimization
3:00pm - 3:30pm

Monorepos: Moving Fast in a Huge Repository

Keeping all of your code in a single repository has huge benefits, but comes with equally huge obstacles. In this session I’ll talk about the challenges Facebook has faced with its massive codebase, and how we’re radically extending our source control system to enable our entire ecosystem of developer tools to remain fast in the face of tremendous growth. I’ll briefly introduce the concept of a monorepo, give a rough sense of our repository scale, talk about the problems it causes in development (slow source control, slow builds, complex test infrastructure, difficulties maintaining release quality, etc), then talk about a few source control innovations we’ve made to tackle these challenges.
3:30pm - 4:00pm

Operating low-latency fraud prevention systems at scale

At Forter, we’re on a mission to build the foundations for a more credible internet by blocking fraudsters and abusers on e-commerce platforms. To achieve that, we need to take millions of high-risk, low-latency decisions per day while processing billions of events. We’re doing all of this with a very lean and mean R&D team. We had to invent many solutions from the ground up, and we’ll share some of our insights with you.
4:00pm - 4:30pm

Detection & Alerting at FB: Detecting significant metric movements @ Scale

Monitoring metrics for any significant movements is key to detecting problems with systems and products. This talk provides an overview of our detection and alerting framework: the scale in the number of timeseries we monitor, the different detection algorithms we offer (rule-based and ML-based) and the ability to auto-slice data along multiple dimensions to identify deeper issues. Deriving signal without being inundated with noise is crucial at our scale, and we have built tools to empower teams to maintain high signal-to-noise ratio. To cater to our future scale needs, we are currently focused on automatic monitoring: proactively logging and monitoring the right metrics for different artifacts, proactively analyzing any flagged events and hopefully predicting potential critical incidents.
4:30pm - 4:45pm

Closing Remarks

4:45pm - 6:00pm

Networking Happy Hour

Join the @Scale Mailing List and Get the Latest News & Event Info

Code of Conduct

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy