TOPIC: Data, Systems and Networking

Data @Scale 2017

JUNE 08, 2017 @ 8:30 AM PDT - 6:00 PM PDT

Designed for engineers who are interested in building, operating, and using data systems at scale. Data already enables companies to build products with user empathy, find new market opportunities, understand trends, make better decisions, and ensure that their services and systems stay healthy.

RSVPS CLOSED

AGENDA SPEAKERS

ABOUT EVENT

Data @Scale is an invitation-only technical conference for engineers working on large-scale storage systems and analytics. Building services that serve millions or even billions of people presents a set of complex, and often unprecedented, engineering challenges. The @Scale community is focused on bringing people together to openly discuss these challenges and collaborate on the development of new solutions.

For 2017, we’ll also be looking at how Big Data is transforming machine learning, even as new machine learning techniques are leading to an evolution in infrastructure, hardware engineering and data center design.

Join experts from Facebook, Google, LinkedIn, Microsoft, Pinterest, Uber and Yandex to openly discuss these challenges and collaborate on the development of new solutions.

EVENT AGENDA

Event times below are displayed in PT.

June 8

08:30 AM - 10:00 AM

Registration and Breakfast

08:15 AM - 09:45 AM

Women in Engineering Breakfast & Panel

Speaker Fiona Fung,Facebook

Speaker Hyma Murthy,Google

Speaker Kim Hazelwood,Meta AI

Speaker Vidhya Srinivasan,Amazon

10:00 AM - 10:05 AM

Welcome

10:05 AM - 10:45 AM

Accelerating Machine Learning for Computer Vision

WATCH NOW

This talk will address recent insights in distributed systems design for training machine learning models at scale.

Speaker Pieter Noordhuis,Facebook

10:45 AM - 11:25 AM

Next Generation of Globally-Distributed Databases in Azure

WATCH NOW

Developing applications is hard, developing globally-distributed applications with data at planet-scale that are fast, scalable, elastic, always available and yet simple - is even harder. Yet it is a fundamental pre-requisite in reaching people globally in our modern world. In this talk, I will describe the next generation of globally- distributed databases at Microsoft that can run on millions of nodes across hundreds of data centers, handling up to trillions of data objects, 24/7 – all backed by industry-leading comprehensive SLAs.

Speaker Rimma Nehme,Microsoft

11:25 AM - 12:05 PM

Yandex Clickhouse - A DBMS for Interactive Analytics at Scale

WATCH NOW

This session walks through the development of ClickHouse, and how an iterative approach to data storage organization resulted in a system that: can ingest clickstream data in realtime, generate interactive reports on non-aggregated data, process 100 billion rows per second on HDDs, scales linearly, supports the SQL language dialect and is open source.

Speaker Alexey Milovidov,Yandex

12:05 PM - 12:55 PM

Lunch

12:55 PM - 01:35 PM

Evolution of Storage and Serving at Pinterest

WATCH NOW

This talk will cover the evolution of storage and serving at scale as Pinterest grows. We started with a Python Django application with Memcached on un-sharded MySQL. We had to have MySQL sharded and introduce Twemproxy (Memcached Proxy from Twitter) into our stack to make both persistent storage and caching horizontally scalable to enable future growth of Pinterest. As we scale the organization, we introduced graph store as a service (Zen) and KV store as a service (UMS) to boost developer velocity to enable rapid product innovations. As the journey goes on, to make Pinterest even more relevant, even more real-time, the team takes on the journey to build a machine learning serving platform with C++, Folly, FB Thrift and RocksDB to address new challenges on how to efficiently serve feeds with complicated machine learned ranking models, huge number of features scattered across many data sets with very low latency to deliver delightful experiences.

Speaker Yongsheng Wu,Pinterest

01:35 PM - 02:15 PM

Cadence - Microservice Architecture Beyond Request/Reply

WATCH NOW

Cadence is an open source solution for building and running microservices that expose asynchronous, long-running operations a scalable and resilient way. It borrows a lot of ideas from AWS Simple Workflow service. It is written in Go and relies on Cassandra for storage.

Speaker Maxim Fateev,Uber

02:15 PM - 02:55 PM

How Reporting and Experimentation Fuel Product Innovation at LinkedIn

WATCH NOW

LinkedIn is a deeply data driven organization where metrics measurement and experimentation plays a crucial role in every product decision. In response to our experience at LinkedIn, we built UMP and XLNT, platforms for metrics computation and experimentation, respectively. Over the last few years, these platforms have allowed us to perform measurement and experimentation very efficiently at scale, while preserving trust in data.

Speaker Kapil Surlaker,LinkedIn

02:55 PM - 03:35 PM

Spanner's SQL Evolution

WATCH NOW

Spanner is a globally-distributed data management system that backs hundreds of mission-critical services at Google. Spanner is built on ideas from both the systems and database communities. Initially, Spanner focused on the systems aspects such as scalability, automatic sharding, fault tolerance, consistent replication, external consistency, and wide-area distribution. More recently, we have been working on turning Spanner into a SQL DBMS. In this talk, we describe distributed query execution in the presence of resharding, query restarts upon transient failures, range extraction that drives query routing and index seeks, and the improved blockwise-columnar storage format. We touch upon migrating Spanner to the common SQL dialect shared with other systems at Google. The talk is based on a paper published at SIGMOD'17.

Speaker Sergey Melnik,Google

03:35 PM - 03:50 PM

Break

03:50 PM - 04:30 PM

Architectures for the New Era of Cloud Specialization

WATCH NOW

The incipient end of Moore’s Law will drive a paradigm shift in our systems and architectures. Ever increasing computational needs, driven in part by big data and machine learning, will force our systems to become more heterogeneous. We must change how we think about large-scale systems, with new stacks and new interfaces. I will describe some of Microsoft’s efforts along these lines, one of which is large-scale deployments of programmable hardware in our cloud, including both the hardware and the resource management interfaces.

Speaker Doug Burger,Microsoft

04:30 PM - 05:00 PM

Bulk Data Movement Serving Facebook's Global Data Storage and Processing

WATCH NOW

Facebook uses diverse storage systems and compute tools spread across data centers worldwide, together forming a global system for data storage and processing. A bulk data movement service that "just works" helps this global system. Bulk data movement supports use cases such as commissioning / decommissioning, rebalancing, disaster readiness, replication for local availability, and secure backups. While serving these needs, the system must be a "good citizen" to storage systems and the global network. This talk describes Facebook's system for bulk data movement across storage systems worldwide.

Speaker Steve Stroiney,Facebook

05:00 PM - 05:05 PM

Closing Remarks

05:05 PM - 06:00 PM

Happy Hour

SPEAKERS AND MODERATORS

Fiona Fung

Facebook

Hyma Murthy

Google

Kim Hazelwood is an engineering leader whose expertise lies at the intersection of artificial... read more

Kim Hazelwood

Meta AI

Vidhya Srinivasan

Amazon

Pieter Noordhuis

Facebook

Rimma Nehme

Microsoft

Alexey Milovidov

Yandex

Yongsheng Wu

Maxim Fateev

Uber

Kapil Surlaker

Sergey Melnik

Google

Doug Burger

Microsoft

Steve Stroiney

Facebook

UPCOMING EVENT JULY 31, 2024 @ 2:30 PM PDT - 7:00 PM PDT - IN PERSON EVENT | AUGUST 7, 2024 @ 2:30 PM PDT - 5:30 PM PDT - VIRTUAL PROGRAM AI @Scale

AI Infra @Scale 2024

Meta’s Engineering and Infrastructure teams are excited to return for the second year in a row to host AI Infra @Scale on July 31. This year’s event is open to a limited number of in-person...

UPCOMING EVENT August 14, 2024 Product @Scale

Product @Scale 2024

Product @Scale conferences are designed for technologists who work on solving complex product problems at scale. The @Scale community focuses on bringing forward people's experiences in creating innovative solutions to large-scale products serving millions or...

UPCOMING EVENT September 11, 2024 | Santa Clara Convention Center Networking @Scale

Networking @Scale 2024

Meta’s Networking team invites you to Networking@scale on September 11th. . This year’s event is an in-person event hosted at the Santa Clara Convention center and will also be live streamed for virtual attendees. Registration...

UPCOMING EVENT October 9, 2024 Reliability @Scale

Reliability @Scale 2024

Reliability @Scale is a technical conference for engineers who are passionate about building and understanding highly resilient and reliable systems and products at massive scale. Whether it’s novel design decisions, or outages that impact billions...

UPCOMING EVENT October 23, 2024 Mobile @Scale

Mobile @Scale 2024

Mobile @Scale is a technical conference designed for the engineers, product managers, and engineering leaders building mobile experiences at significant scale (millions to billions of daily users). Mobile @Scale provides a rare opportunity to gather...

UPCOMING EVENT November 20, 2024 Video @Scale

Video @Scale 2024

Video @Scale 2024 is a technical conference designed for engineers that develop or manage large-scale video systems serving millions of people. The development of large-scale video systems includes complex, unprecedented engineering challenges. The @Scale community...

PAST EVENT March 20, 2024 @ 9am PT - 3pm PT RTC @Scale

RTC @Scale 2024

RTC @Scale is for engineers who develop and manage large-scale real-time communication (RTC) systems serving millions of people. The operations of large-scale RTC systems have always involved complex engineering challenges which continue to attract attention...

Past EVENT May 22, 2024 Data @Scale

Data @Scale 2024

Data @Scale is a technical conference for engineers who are interested in building, operating, and using data systems at scale. Companies across the industry use data and underlying infrastructure to build products with user empathy,...

Past EVENT June 12, 2024 Systems @Scale

Systems @Scale 2024

Systems @Scale 2024 is a technical conference intended for engineers that build and manage large-scale distributed systems serving millions or billions of users. The development and operation of such systems often introduces complex, unprecedented engineering...

FIND @SCALE TOPICS

Dev Tools and Ops, Privacy, Sustainability and Performance Fighting Abuse and Security Machine Learning and AI Mobile, Video and Web

Data @Scale 2017

ABOUT EVENT

EVENT AGENDA

June 8

June 8

SPEAKERS AND MODERATORS

Fiona Fung

Hyma Murthy

Kim Hazelwood

Vidhya Srinivasan

Pieter Noordhuis

Rimma Nehme

Alexey Milovidov

Yongsheng Wu

Maxim Fateev

Kapil Surlaker

Sergey Melnik

Doug Burger

Steve Stroiney

AI Infra @Scale 2024

Product @Scale 2024

Networking @Scale 2024

Reliability @Scale 2024

Mobile @Scale 2024

Video @Scale 2024

RTC @Scale 2024

Data @Scale 2024

Systems @Scale 2024

FIND @SCALE TOPICS

EXPLORE OTHER SERIES

Networking @Scale

Reliability @Scale

Systems @Scale