TOPIC: Data, Systems and Networking

Data @Scale 2017

JUNE 08, 2017 @ 10:00 AM - JUNE 08, 2017 @ 06:00 PM PT
Designed for engineers who are interested in building, operating, and using data systems at scale. Data already enables companies to build products with user empathy, find new market opportunities, understand trends, make better decisions, and ensure that their services and systems stay healthy.
RSVPS CLOSED
AGENDA SPEAKERS

ABOUT EVENT

Data @Scale is an invitation-only technical conference for engineers working on large-scale storage systems and analytics. Building services that serve millions or even billions of people presents a set of complex, and often unprecedented, engineering challenges. The @Scale community is focused on bringing people together to openly discuss these challenges and collaborate on the development of new solutions.

For 2017, we’ll also be looking at how Big Data is transforming machine learning, even as new machine learning techniques are leading to an evolution in infrastructure, hardware engineering and data center design.

Join experts from Facebook, Google, LinkedIn, Microsoft, Pinterest, Uber and Yandex to openly discuss these challenges and collaborate on the development of new solutions.

EVENT AGENDA

Event times below are displayed in PT.

June 8

08:30 AM - 10:00 AM
Registration and Breakfast
08:15 AM - 09:45 AM
Women in Engineering Breakfast & Panel
SPEAKER Fiona Fung,Facebook
SPEAKER Hyma Murthy,Google
SPEAKER Kim Hazelwood,Facebook
SPEAKER Vidhya Srinivasan,Amazon
10:00 AM - 10:05 AM
Welcome
10:05 AM - 10:45 AM
Accelerating Machine Learning for Computer Vision

This talk will address recent insights in distributed systems design for training machine learning models at scale.

SPEAKER Pieter Noordhuis,Facebook
10:45 AM - 11:25 AM
Next Generation of Globally-Distributed Databases in Azure

Developing applications is hard, developing globally-distributed applications with data at planet-scale that are fast, scalable, elastic, always available and yet simple - is even harder. Yet it is a fundamental pre-requisite in reaching people globally in our modern world. In this talk, I will describe the next generation of globally- distributed databases at Microsoft that can run on millions of nodes across hundreds of data centers, handling up to trillions of data objects, 24/7 – all backed by industry-leading comprehensive SLAs.

SPEAKER Rimma Nehme,Microsoft
11:25 AM - 12:05 PM
Yandex Clickhouse - A DBMS for Interactive Analytics at Scale

This session walks through the development of ClickHouse, and how an iterative approach to data storage organization resulted in a system that: can ingest clickstream data in realtime, generate interactive reports on non-aggregated data, process 100 billion rows per second on HDDs, scales linearly, supports the SQL language dialect and is open source.

SPEAKER Alexey Milovidov,Yandex
12:05 PM - 12:55 PM
Lunch
12:55 PM - 01:35 PM
Evolution of Storage and Serving at Pinterest

This talk will cover the evolution of storage and serving at scale as Pinterest grows. We started with a Python Django application with Memcached on un-sharded MySQL. We had to have MySQL sharded and introduce Twemproxy (Memcached Proxy from Twitter) into our stack to make both persistent storage and caching horizontally scalable to enable future growth of Pinterest. As we scale the organization, we introduced graph store as a service (Zen) and KV store as a service (UMS) to boost developer velocity to enable rapid product innovations. As the journey goes on, to make Pinterest even more relevant, even more real-time, the team takes on the journey to build a machine learning serving platform with C++, Folly, FB Thrift and RocksDB to address new challenges on how to efficiently serve feeds with complicated machine learned ranking models, huge number of features scattered across many data sets with very low latency to deliver delightful experiences.

SPEAKER Yongsheng Wu,Pinterest
01:35 PM - 02:15 PM
Cadence - Microservice Architecture Beyond Request/Reply

Cadence is an open source solution for building and running microservices that expose asynchronous, long-running operations a scalable and resilient way. It borrows a lot of ideas from AWS Simple Workflow service. It is written in Go and relies on Cassandra for storage.

SPEAKER Maxim Fateev,Uber
02:15 PM - 02:55 PM
How Reporting and Experimentation Fuel Product Innovation at LinkedIn

LinkedIn is a deeply data driven organization where metrics measurement and experimentation plays a crucial role in every product decision. In response to our experience at LinkedIn, we built UMP and XLNT, platforms for metrics computation and experimentation, respectively. Over the last few years, these platforms have allowed us to perform measurement and experimentation very efficiently at scale, while preserving trust in data.

SPEAKER Kapil Surlaker,LinkedIn
02:55 PM - 03:35 PM
Spanner's SQL Evolution

Spanner is a globally-distributed data management system that backs hundreds of mission-critical services at Google. Spanner is built on ideas from both the systems and database communities. Initially, Spanner focused on the systems aspects such as scalability, automatic sharding, fault tolerance, consistent replication, external consistency, and wide-area distribution. More recently, we have been working on turning Spanner into a SQL DBMS. In this talk, we describe distributed query execution in the presence of resharding, query restarts upon transient failures, range extraction that drives query routing and index seeks, and the improved blockwise-columnar storage format. We touch upon migrating Spanner to the common SQL dialect shared with other systems at Google. The talk is based on a paper published at SIGMOD'17.

SPEAKER Sergey Melnik,Google
03:35 PM - 03:50 PM
Break
03:50 PM - 04:30 PM
Architectures for the New Era of Cloud Specialization

The incipient end of Moore’s Law will drive a paradigm shift in our systems and architectures. Ever increasing computational needs, driven in part by big data and machine learning, will force our systems to become more heterogeneous. We must change how we think about large-scale systems, with new stacks and new interfaces. I will describe some of Microsoft’s efforts along these lines, one of which is large-scale deployments of programmable hardware in our cloud, including both the hardware and the resource management interfaces.

SPEAKER Doug Burger,Microsoft
04:30 PM - 05:00 PM
Bulk Data Movement Serving Facebook's Global Data Storage and Processing

Facebook uses diverse storage systems and compute tools spread across data centers worldwide, together forming a global system for data storage and processing. A bulk data movement service that "just works" helps this global system. Bulk data movement supports use cases such as commissioning / decommissioning, rebalancing, disaster readiness, replication for local availability, and secure backups. While serving these needs, the system must be a "good citizen" to storage systems and the global network. This talk describes Facebook's system for bulk data movement across storage systems worldwide.

SPEAKER Steve Stroiney,Facebook
05:00 PM - 05:05 PM
Closing Remarks
05:05 PM - 06:00 PM
Happy Hour

SPEAKERS AND MODERATORS

Fiona Fung

Facebook

Hyma Murthy

Google

Kim Hazelwood

Facebook

Vidhya Srinivasan

Amazon

Pieter Noordhuis

Facebook

Rimma Nehme

Microsoft

Alexey Milovidov

Yandex

Yongsheng Wu

Pinterest

Maxim Fateev

Uber

Kapil Surlaker

LinkedIn

Sergey Melnik

Google

Doug Burger

Microsoft

Steve Stroiney

Facebook

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy