Data @Scale 2024

May 22, 2024

Data @Scale is a technical conference for engineers who are interested in building, operating, and using data systems at scale. Companies across the industry use data and underlying infrastructure to build products with user empathy, find new market opportunities, understand trends, make better decisions, and ensure that their products and systems stay healthy. Generative AI is evolving the landscape of data infrastructure at extreme scale, imposing unique and complex engineering problems that many of us are excited to solve.

On May 22, speakers from Amazon Web Services, Google, Meta, MotherDuck, Razorpay and Snowflake will delve into the fresh challenges and opportunities posed by Generative AI for data infrastructure. The virtual conference will feature keynote presentations, tech talks, Q&A sessions, and a fireside chat.

RSVPS CLOSED

AGENDA SPEAKERS

EVENT AGENDA

Event times below are displayed in PT.

May 22

09:00 AM - 09:05 AM

Opening Remarks

WATCH NOW

Speaker Maor Kleider,META

Speaker Jelena Pješivac-Grbović,Meta

09:05 AM - 09:25 AM

The AI-First Data Infrastructure

WATCH NOW

Barak Yagour is a Director of Engineering at Meta, responsible for Data Infrastructure. Over the last seven years at Meta, Barak has built a world-class team that’s now responsible for some of the largest data systems on the planet. The Data Infra teams are focused on evolving and scaling our Analytics, Monitoring and Observability, ML/AI Infrastructure, as well as our Responsibility and Privacy Infra efforts. These systems and tools allow everyone at Meta to make better data-driven decisions and do this faster. Dealing with an extremely broad range of problems, touching all SW stack levels and a wide range of systems and platforms. Prior to joining Meta, Barak held leadership roles in small startups and in major corporations, tackling complex problems in the space of Cyber, Messaging, Networking, Virtualization and Machine Learning.

Speaker Barak Yagour,Meta

09:25 AM - 09:55 AM

Navigating Data's Next Great Shift

WATCH NOW

Data is the foundational building block of GenAI. As we navigate this transition we face exciting new opportunities as well as long standing challenges in the data space. In this conversation we’ll explore these opportunities as well as lessons we can draw from the past. We’ll cover some of the things that need to be built, thoughts on ensuring that we’re building for the good of everyone, and the need to push this space forward together as a community.

Speaker DJ Patil,General Partner, GreatPoint Ventures and Former U.S. Chief Data Scientist

Speaker Daniel Francisco,Meta

09:55 AM - 10:15 AM

Composable Data Management Systems

WATCH NOW

In this talk, we will present how Meta has evolved the traditional monolithic way of developing data management systems into a composable architecture that promotes reusability and improves engineering efficiency. We will discuss the new reference architecture and present the Velox open source project, highlighting its impact at Meta and the open source community. We will also discuss related projects and future work in this space.

Speaker Pedro Pedreira,Meta

Speaker Amit Purohit,Meta

ADDITIONAL RESOURCES

COMPOSABLE DATA MANAGEMENT AT META read more

10:15 AM - 10:35 AM

Demystifying the Data Stack of the Largest and Fastest Growing Payment Gateway in India: Razorpay

WATCH NOW

Razorpay stands as India's fastest-growing payment aggregator, processing over $150 billion USD annually in Total Payment Volume, with a global footprint across various regions. This is possible because our data stack is built with ability to handle challenges in functionality, scale, availability, risk, security, compliance all with cost efficiency.

Intrigued? Wanna know how? Join my session as I share the secret sauce of our stack.

You will also hear about a few unsolved, tough problems that we intend to taking head on in the coming months!

Speaker Murali Brahmadesam,Razorpay

10:35 AM - 10:40 AM

Break

10:40 AM - 11:00 AM

Scaling Meta’s Infra with GenAI: Journey to Faster and Smarter Incident Response

WATCH NOW

With billions of active users, Meta's incident response process is critical to maintaining our reliability commitments. In this talk, we explore the challenges we face due to the complexity and scale of our operations and how we are leveraging AI to revolutionize onboarding responders and root cause analysis. Join us to learn about our journey, the lessons we've learned, and our vision for the future of incident response.

Speaker Diana Hsu,META

Speaker Mohamed Farrag,META

11:00 AM - 11:20 AM

A Case Study in Bridging Production Software and Data Practices for LLM Model Training Using Snowflake

WATCH NOW

We present a case study detailing the utilization of Snowflake, a cloud-based data platform, in various stages of the LLM data pipeline from initial annotation to LLM model productionization. The system we have built brings production software and data practices to the field of LLM model training. We describe how every step in the system is built, including data annotation, filtering, global deduplication, decontamination and tokenization. We show how the data engineering capabilities of a cloud warehouse like Snowflake can be used to enhance data exploration and LLM data ablations and experimentation. A key aspect of LLM productionisation that we cover involves incorporating data lineage tracking, facilitated by output cards at each stage of the data pipelines, ensuring transparency and traceability throughout the LLM model development lifecycle.

Speaker Nathan Wiegand,Snowflake

Speaker Kelvin So,Snowflake

11:20 AM - 11:45 AM

Data @Scale Live Q&A Session #1

WATCH NOW

Moderator Manju Anand,META

Speaker Pedro Pedreira,Meta

Speaker Amit Purohit,Meta

Speaker Murali Brahmadesam,Razorpay

Speaker Diana Hsu,META

Speaker Mohamed Farrag,META

Speaker Nathan Wiegand,Snowflake

Speaker Kelvin So,Snowflake

11:45 AM - 12:05 PM

Lunch Break

12:05 PM - 12:25 PM

Large-Scale Data Graph: Scale and Optimize Privacy & Security in Offline Data Systems

WATCH NOW

Meta operates large-scale offline data systems across Data Warehouse, Stream Data Processing and Monitoring & Observability. Privacy, security, and other cross-cutting management and governance concerns are critical in the data infrastructure. This talk explains how we map out the offline data systems using unified metadata representation of assets and link them through the relationships and flow of information into a giant graph, which we call “Data Graph”, and apply it to privacy and security

Speaker Can Lin,META

Speaker David Taieb,META

Featured Article

LARGE-SCALE DATA GRAPH: SCALE OFFLINE PRIVACY & SECURITY read more

12:25 PM - 12:40 PM

Taking Flight with Interactive Analytics

WATCH NOW

Modern data processing systems were designed at a time when hardware looked very different than it does today. And as the early hype around big data settled, it became clear that most analytics only use a subset of the data that is available.

Speaker Frances Perry,MotherDuck

12:40 PM - 01:00 PM

Smarter, Faster Data Analytics with Generative AI and Machine Learning

WATCH NOW

Data volume continues to increase and so is the need for large number of data workers to derive insights from data. This puts pressure on the data teams as well as requires scalable data infrastructure that can be programmed easily, even by low-code developers, and operated seamlessly at scale without active intervention of data operators. This talk discusses the key challenges, and talks about new innovations using GenAI and ML that help address these challenges, helping enterprises to accelerate their business using data analysis efficiently.

Speaker Santosh Chandrachood,AWS

01:00 PM - 01:20 PM

Beam Up Your GenAI Usage: Usability, Efficiency, Reliability with Apache Beam

WATCH NOW

Apache Beam, a data processing framework, has seen a surge in adoption for GenAI use cases. This growth can be attributed to our focus on building a user-friendly, efficient, and reliable platform. Beam ML empowers GenAI practitioners to dedicate their expertise to core tasks by removing technical hurdles and streamlining workflows. This approach has unlocked exponential growth, attracting industry leaders like Spotify and Google, alongside numerous mid-sized and smaller businesses, to leverage Beam ML for their production workloads. Join this talk to delve into the key learnings from Beam's ML journey and discover how you can accelerate your own GenAI initiatives.

Speaker Ahmet Altay,Google

01:20 PM - 01:25 PM

Break

01:25 PM - 01:50 PM

Data @Scale Live Q&A Session #2

WATCH NOW

Moderator Manju Anand,META

Speaker Can Lin,META

Speaker David Taieb,META

Speaker Frances Perry,MotherDuck

Speaker Ahmet Altay,Google

01:50 PM - 02:20 PM

Fireside Chat: Evolution of AI-First Data Infrastructure

WATCH NOW

The fireside chat will focus on the key trends that have emerged over the past year and a half in Data infrastructure, delve into the biggest challenges in building a world-class, AI-first data infrastructure and discuss how Meta approaches problem selection for research.

Moderator Manju Anand,META

Speaker Aparna Ramani,Meta

Speaker Delia David,Meta

02:20 PM - 02:25 PM

Closing Remarks

WATCH NOW

Speaker Maor Kleider,META

Speaker Jelena Pješivac-Grbović,Meta

SPEAKERS AND MODERATORS

Maor Kleider is a Director of Product Management at Meta, supporting the product team... read more

Maor Kleider

META

Dr. Jelena Pješivac-Grbović is an engineering director in Data Infrastructure in Meta. Her teams... read more

Jelena Pješivac-Grbović

Meta

Barak Yagour is the Vice President of Engineering at Meta, leading the Data Infrastructure... read more

LATEST NOTES

AI & Data @Scale

05/22/2024

Protected: Large-Scale Data Graph: Scale Offline Privacy & Security

Introduction Meta operates large-scale offline data systems across Data Warehouse, Stream Data Processing, and Monitoring & Observability. Those data systems...

past EVENT November 20-21, 2024 | Mobile, Video and Web

Video @Scale 2024

Video @Scale 2024 is a technical conference designed for engineers that develop or manage large-scale video systems serving millions of people. The development of large-scale video systems includes complex, unprecedented engineering challenges. The @Scale community...

PAST EVENT March 20, 2024 @ 9am PT - 3pm PT | Mobile, Video and Web

RTC @Scale 2024

RTC @Scale is for engineers who develop and manage large-scale real-time communication (RTC) systems serving millions of people. The operations of large-scale RTC systems have always involved complex engineering challenges which continue to attract attention...

Past EVENT May 22, 2024 | Data, Machine Learning and AI

Data @Scale 2024

Past EVENT June 12, 2024 | Systems and Networking

Systems @Scale 2024

Systems @Scale 2024 is a technical conference intended for engineers that build and manage large-scale distributed systems serving millions or billions of users. The development and operation of such systems often introduces complex, unprecedented engineering...

Past EVENT JULY 31, 2024 @ 2:30 PM PDT - 7:00 PM PDT - IN PERSON EVENT | AUGUST 7, 2024 @ 2:30 PM PDT - 5:30 PM PDT - VIRTUAL PROGRAM | Data, Machine Learning and AI

AI Infra @Scale 2024

Meta’s Engineering and Infrastructure teams are excited to return for the second year in a row to host AI Infra @Scale on July 31. This year’s event is open to a limited number of in-person...

Past EVENT August 14, 2024 | Mobile, Video and Web

Product @Scale 2024

Product @Scale conferences are designed for technologists who work on solving complex product problems at scale. The @Scale community focuses on bringing forward people's experiences in creating innovative solutions to large-scale products serving millions or...

Past EVENT September 11, 2024 | Santa Clara Convention Center | Systems and Networking

Networking @Scale 2024

Meta’s Networking team invites you to Networking@scale on September 11th. This year’s event is an in-person event hosted at the Santa Clara Convention center and will also be live streamed for virtual attendees. Registration is...

Past EVENT October 9, 2024 | Systems and Networking

Reliability @Scale 2024

In the digital age, where systems operate at unprecedented scales, the importance of robust configuration management cannot be overstated. This year’s Reliability @Scale will focus on a central theme of "Move Safely", emphasizing the critical...

Past EVENT October 23, 2024 | Mobile, Video and Web

Mobile @Scale 2024

Mobile @Scale is a technical conference designed for the engineers, product managers, and engineering leaders building mobile experiences at significant scale (millions to billions of daily users). Mobile @Scale provides a rare opportunity to gather...