Data @Scale 2017

Bell Harbor International Conference Center 10:00am - 6:00pm

Event Completed

Data @Scale is an invitation-only technical conference for engineers working on large-scale storage systems and analytics. Building services that serve millions or even billions of people presents a set of complex, and often unprecedented, engineering challenges. The @Scale community is focused on bringing people together to openly discuss these challenges and collaborate on the development of new solutions.

For 2017, we’ll also be looking at how Big Data is transforming machine learning, even as new machine learning techniques are leading to an evolution in infrastructure, hardware engineering and data center design.

Join experts from Facebook, Google, LinkedIn, Microsoft, Pinterest, Uber and Yandex to openly discuss these challenges and collaborate on the development of new solutions.

Read More Read Less

@Scale brings thousands of engineers together throughout the year to discuss complex engineering challenges and to work on the development of new solutions. We're committed to providing a safe and welcoming environment — one that encourages collaboration and sparks innovation.

Every @Scale event participant has the right to enjoy his or her experience without fear of harassment, discrimination, or condescension. The @Scale code of conduct outlines the behavior that we support and don't support at @Scale events and conferences. We expect participants to follow these rules at all @Scale event venues, online communities, and event-related social activities. These guidelines will keep the @Scale community a safe and enjoyable one for everyone.

Be welcoming. Everyone is welcome at @Scale events, inclusive of (but not limited to) gender, gender identity or expression, sexual orientation, body size, differing abilities, ethnicity, national origin, language, religion, political beliefs, socioeconomic status, age, color and neurodiversity. We have a zero-tolerance policy for discrimination.

Choose your words carefully. Treat one another with respect and in a professional manner. We're here to collaborate. Conflict is not part of the equation.

Know where the line is, and don't cross it. Harassment, threats, or intimidation of any kind will not be tolerated. This includes verbal, physical, sexual (such as sexualized imagery on clothing, presentations, in print, or onscreen), written, or any other form of aggression (whether outright, subtle, or micro). Behavior that is offensive, as determined by @Scale organizers, security staff, or conference management, will not be tolerated. Participants who are asked to stop a behavior or an action are expected to comply immediately or will be asked to leave.

Don't be afraid to call out bad behavior. If you're the target of harmful or offensive behavior, or if you witness someone else being harassed, threatened, or intimidated, don't look away. Tell an @Scale staff member, a security staff member, or a conference organizer immediately. Please notify our event staff, security staff, or conference organizers of any harmful or offensive behavior that you've experienced or witnessed in any form, whether in person or online.

We at @Scale want our events to be safe for everyone, and we have a zero-tolerance policy for violations of our code of conduct. @Scale conference organizers will investigate any allegation of problematic behavior, and we will respond accordingly. We reserve the right to take any follow-up actions we determine are needed. These include being warned, being refused admittance, being ejected from the conference with no refund, and being banned from future @Scale events.

Event Completed
8:30am - 10:00am

Registration and Breakfast

8:15am - 9:45am

Women in Engineering Breakfast & Panel

10:00am - 10:05am


10:05am - 10:45am

Accelerating Machine Learning for Computer Vision

This talk will address recent insights in distributed systems design for training machine learning models at scale.
10:45am - 11:25am

Next Generation of Globally-Distributed Databases in Azure

Developing applications is hard, developing globally-distributed applications with data at planet-scale that are fast, scalable, elastic, always available and yet simple - is even harder. Yet it is a fundamental pre-requisite in reaching people globally in our modern world. In this talk, I will describe the next generation of globally- distributed databases at Microsoft that can run on millions of nodes across hundreds of data centers, handling up to trillions of data objects, 24/7 – all backed by industry-leading comprehensive SLAs.
11:25am - 12:05pm

Yandex Clickhouse - A DBMS for Interactive Analytics at Scale

This session walks through the development of ClickHouse, and how an iterative approach to data storage organization resulted in a system that: can ingest clickstream data in realtime, generate interactive reports on non-aggregated data, process 100 billion rows per second on HDDs, scales linearly, supports the SQL language dialect and is open source.
12:05pm - 12:55pm


12:55pm - 1:35pm

Evolution of Storage and Serving at Pinterest

This talk will cover the evolution of storage and serving at scale as Pinterest grows. We started with a Python Django application with Memcached on un-sharded MySQL. We had to have MySQL sharded and introduce Twemproxy (Memcached Proxy from Twitter) into our stack to make both persistent storage and caching horizontally scalable to enable future growth of Pinterest. As we scale the organization, we introduced graph store as a service (Zen) and KV store as a service (UMS) to boost developer velocity to enable rapid product innovations. As the journey goes on, to make Pinterest even more relevant, even more real-time, the team takes on the journey to build a machine learning serving platform with C++, Folly, FB Thrift and RocksDB to address new challenges on how to efficiently serve feeds with complicated machine learned ranking models, huge number of features scattered across many data sets with very low latency to deliver delightful experiences.
1:35pm - 2:15pm

Cadence - Microservice Architecture Beyond Request/Reply

Cadence is an open source solution for building and running microservices that expose asynchronous, long-running operations a scalable and resilient way. It borrows a lot of ideas from AWS Simple Workflow service. It is written in Go and relies on Cassandra for storage.
2:15pm - 2:55pm

How Reporting and Experimentation Fuel Product Innovation at LinkedIn

LinkedIn is a deeply data driven organization where metrics measurement and experimentation plays a crucial role in every product decision. In response to our experience at LinkedIn, we built UMP and XLNT, platforms for metrics computation and experimentation, respectively. Over the last few years, these platforms have allowed us to perform measurement and experimentation very efficiently at scale, while preserving trust in data.
2:55pm - 3:35pm

Spanner's SQL Evolution

Spanner is a globally-distributed data management system that backs hundreds of mission-critical services at Google. Spanner is built on ideas from both the systems and database communities. Initially, Spanner focused on the systems aspects such as scalability, automatic sharding, fault tolerance, consistent replication, external consistency, and wide-area distribution. More recently, we have been working on turning Spanner into a SQL DBMS. In this talk, we describe distributed query execution in the presence of resharding, query restarts upon transient failures, range extraction that drives query routing and index seeks, and the improved blockwise-columnar storage format. We touch upon migrating Spanner to the common SQL dialect shared with other systems at Google. The talk is based on a paper published at SIGMOD'17.
3:35pm - 3:50pm


3:50pm - 4:30pm

Architectures for the New Era of Cloud Specialization

The incipient end of Moore’s Law will drive a paradigm shift in our systems and architectures. Ever increasing computational needs, driven in part by big data and machine learning, will force our systems to become more heterogeneous. We must change how we think about large-scale systems, with new stacks and new interfaces. I will describe some of Microsoft’s efforts along these lines, one of which is large-scale deployments of programmable hardware in our cloud, including both the hardware and the resource management interfaces.
4:30pm - 5:00pm

Bulk Data Movement Serving Facebook's Global Data Storage and Processing

Facebook uses diverse storage systems and compute tools spread across data centers worldwide, together forming a global system for data storage and processing. A bulk data movement service that "just works" helps this global system. Bulk data movement supports use cases such as commissioning / decommissioning, rebalancing, disaster readiness, replication for local availability, and secure backups. While serving these needs, the system must be a "good citizen" to storage systems and the global network. This talk describes Facebook's system for bulk data movement across storage systems worldwide.
5:00pm - 5:05pm

Closing Remarks

5:05pm - 6:00pm

Happy Hour

Join the @Scale Mailing List and Get the Latest News & Event Info

Code of Conduct

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy