TOPIC: Data, Systems and Networking

Data @Scale 2024

May 22, 2024

Data @Scale is a technical conference for engineers who are interested in building, operating, and using data systems at scale. Companies across the industry use data and underlying infrastructure to build products with user empathy, find new market opportunities, understand trends, make better decisions, and ensure that their products and systems stay healthy. Generative AI is evolving the landscape of data infrastructure at extreme scale, imposing unique and complex engineering problems that many of us are excited to solve.

On May 22, speakers from Amazon Web Services, Google, Meta, MotherDuck, Razorpay and Snowflake will delve into the fresh challenges and opportunities posed by Generative AI for data infrastructure. The virtual conference will feature keynote presentations, tech talks, Q&A sessions, and a fireside chat.

Register today and check back for upcoming speaker and agenda announcements!

RSVPS CLOSED
AGENDA SPEAKERS

EVENT AGENDA

Event times below are displayed in PT.

May 22

09:00 AM - 09:05 AM
Opening Remarks
Speaker Maor Kleider,META
Speaker Jelena Pješivac-Grbović,Meta
09:05 AM - 09:25 AM
The AI-First Data Infrastructure

Barak Yagour is a Director of Engineering at Meta, responsible for Data Infrastructure. Over the last seven years at Meta, Barak has built a world-class team that’s now responsible for some of the largest data systems on the planet. The Data Infra teams are focused on evolving and scaling our Analytics, Monitoring and Observability, ML/AI Infrastructure, as well as our Responsibility and Privacy Infra efforts. These systems and tools allow everyone at Meta to make better data-driven decisions and do this faster. Dealing with an extremely broad range of problems, touching all SW stack levels and a wide range of systems and platforms. Prior to joining Meta, Barak held leadership roles in small startups and in major corporations, tackling complex problems in the space of Cyber, Messaging, Networking, Virtualization and Machine Learning.

Speaker Barak Yagour,Meta
09:25 AM - 09:55 AM
Navigating Data's Next Great Shift

Data is the foundational building block of GenAI. As we navigate this transition we face exciting new opportunities as well as long standing challenges in the data space. In this conversation we’ll explore these opportunities as well as lessons we can draw from the past. We’ll cover some of the things that need to be built, thoughts on ensuring that we’re building for the good of everyone, and the need to push this space forward together as a community.

Speaker DJ Patil,General Partner, GreatPoint Ventures and Former U.S. Chief Data Scientist
Speaker Daniel Francisco,Meta
09:55 AM - 10:15 AM
Composable Data Management Systems

In this talk, we will present how Meta has evolved the traditional monolithic way of developing data management systems into a composable architecture that promotes reusability and improves engineering efficiency. We will discuss the new reference architecture and present the Velox open source project, highlighting its impact at Meta and the open source community. We will also discuss related projects and future work in this space.

Speaker Pedro Pedreira,Meta
Speaker Amit Purohit,Meta
ADDITIONAL RESOURCES
COMPOSABLE DATA MANAGEMENT AT META  read more
10:15 AM - 10:35 AM
Demystifying the Data Stack of the Largest and Fastest Growing Payment Gateway in India: Razorpay

Razorpay stands as India's fastest-growing payment aggregator, processing over $150 billion USD annually in Total Payment Volume, with a global footprint across various regions. This is possible because our data stack is built with ability to handle challenges in functionality, scale, availability, risk, security, compliance all with cost efficiency.

Intrigued? Wanna know how? Join my session as I share the secret sauce of our stack.

You will also hear about a few unsolved, tough problems that we intend to taking head on in the coming months!

Speaker Murali Brahmadesam,Razorpay
10:35 AM - 10:40 AM
Break
10:40 AM - 11:00 AM
Scaling Meta’s Infra with GenAI: Journey to Faster and Smarter Incident Response

With billions of active users, Meta's incident response process is critical to maintaining our reliability commitments. In this talk, we explore the challenges we face due to the complexity and scale of our operations and how we are leveraging AI to revolutionize onboarding responders and root cause analysis. Join us to learn about our journey, the lessons we've learned, and our vision for the future of incident response.

Speaker Diana Hsu,META
Speaker Mohamed Farrag,META
11:00 AM - 11:20 AM
A Case Study in Bridging Production Software and Data Practices for LLM Model Training Using Snowflake

We present a case study detailing the utilization of Snowflake, a cloud-based data platform, in various stages of the LLM data pipeline from initial annotation to LLM model productionization. The system we have built brings production software and data practices to the field of LLM model training. We describe how every step in the system is built, including data annotation, filtering, global deduplication, decontamination and tokenization. We show how the data engineering capabilities of a cloud warehouse like Snowflake can be used to enhance data exploration and LLM data ablations and experimentation. A key aspect of LLM productionisation that we cover involves incorporating data lineage tracking, facilitated by output cards at each stage of the data pipelines, ensuring transparency and traceability throughout the LLM model development lifecycle.

Speaker Nathan Wiegand,Snowflake
Speaker Kelvin So,Snowflake
11:20 AM - 11:45 AM
Data @Scale Live Q&A Session #1
Moderator Manju Anand,META
Speaker Pedro Pedreira,Meta
Speaker Amit Purohit,Meta
Speaker Murali Brahmadesam,Razorpay
Speaker Diana Hsu,META
Speaker Mohamed Farrag,META
Speaker Nathan Wiegand,Snowflake
Speaker Kelvin So,Snowflake
11:45 AM - 12:05 PM
Lunch Break
12:05 PM - 12:25 PM
Large-Scale Data Graph: Scale and Optimize Privacy & Security in Offline Data Systems

Meta operates large-scale offline data systems across Data Warehouse, Stream Data Processing and Monitoring & Observability. Privacy, security, and other cross-cutting management and governance concerns are critical in the data infrastructure. This talk explains how we map out the offline data systems using unified metadata representation of assets and link them through the relationships and flow of information into a giant graph, which we call “Data Graph”, and apply it to privacy and security

Speaker Can Lin,META
Speaker David Taieb,META
Featured Article
LARGE-SCALE DATA GRAPH: SCALE OFFLINE PRIVACY & SECURITY  read more
12:25 PM - 12:40 PM
Taking Flight with Interactive Analytics

Modern data processing systems were designed at a time when hardware looked very different than it does today. And as the early hype around big data settled, it became clear that most analytics only use a subset of the data that is available.

Speaker Frances Perry,MotherDuck
12:40 PM - 01:00 PM
Smarter, Faster Data Analytics with Generative AI and Machine Learning

Data volume continues to increase and so is the need for large number of data workers to derive insights from data. This puts pressure on the data teams as well as requires scalable data infrastructure that can be programmed easily, even by low-code developers, and operated seamlessly at scale without active intervention of data operators. This talk discusses the key challenges, and talks about new innovations using GenAI and ML that help address these challenges, helping enterprises to accelerate their business using data analysis efficiently.

Speaker Santosh Chandrachood,AWS
01:00 PM - 01:20 PM
Beam Up Your GenAI Usage: Usability, Efficiency, Reliability with Apache Beam

Apache Beam, a data processing framework, has seen a surge in adoption for GenAI use cases. This growth can be attributed to our focus on building a user-friendly, efficient, and reliable platform. Beam ML empowers GenAI practitioners to dedicate their expertise to core tasks by removing technical hurdles and streamlining workflows. This approach has unlocked exponential growth, attracting industry leaders like Spotify and Google, alongside numerous mid-sized and smaller businesses, to leverage Beam ML for their production workloads. Join this talk to delve into the key learnings from Beam's ML journey and discover how you can accelerate your own GenAI initiatives.

Speaker Ahmet Altay,Google
01:20 PM - 01:25 PM
Break
01:25 PM - 01:50 PM
Data @Scale Live Q&A Session #2
Moderator Manju Anand,META
Speaker Can Lin,META
Speaker David Taieb,META
Speaker Frances Perry,MotherDuck
Speaker Ahmet Altay,Google
01:50 PM - 02:20 PM
Fireside Chat: Evolution of AI-First Data Infrastructure

The fireside chat will focus on the key trends that have emerged over the past year and a half in Data infrastructure, delve into the biggest challenges in building a world-class, AI-first data infrastructure and discuss how Meta approaches problem selection for research.

Moderator Manju Anand,META
Speaker Aparna Ramani,Meta
Speaker Delia David,Meta
02:20 PM - 02:25 PM
Closing Remarks
Speaker Maor Kleider,META
Speaker Jelena Pješivac-Grbović,Meta

SPEAKERS AND MODERATORS

Maor Kleider is a Director of Product Management at Meta, supporting the product team... read more

Maor Kleider

META

Dr. Jelena Pješivac-Grbović is an engineering director in Data Infrastructure in Meta. Her teams... read more

Jelena Pješivac-Grbović

Meta

Barak Yagour is a Director of Engineering at Meta, responsible for Data Infrastructure. Over... read more

Barak Yagour

Meta

DJ Patil is an entrepreneur, investor, scientist, and leader in public policy. He has... read more

DJ Patil

General Partner, GreatPoint Ventures and Former U.S. Chief Data Scientist

Daniel is a Director of Product at Meta. He works on AI, Data, and... read more

Daniel Francisco

Meta

Pedro Pedreira is a Software Engineer at Meta. During his 11-year tenure, he has... read more

Pedro Pedreira

Meta

I have been with Meta for over a year supporting the Compute Engines for... read more

Amit Purohit

Meta

Murali Brahmadesam, Razorpay's CTO and Head of Engineering, has over two decades of experience... read more

Murali Brahmadesam

Razorpay

Diana is a Product Manager at Infra currently focusing on how to leverage AI... read more

Diana Hsu

META

Mohamed is a software engineering working on Data Infrastructure team for the past 3... read more

Mohamed Farrag

META

Nathan is a Principal Software Engineer at Snowflake, where he focuses on Cortex and... read more

Nathan Wiegand

Snowflake

Kelvin So is a Principal Software Engineer at Snowflake, where he is one of... read more

Kelvin So

Snowflake

Manju Anand is engineering manager for Data Infra Storage Engine team that is responsible... read more

Manju Anand

META

Can Lin is a software engineer in the AI & Data Infrastructure Responsibility area... read more

Can Lin

META

David Taieb is Software Engineer on the Data Infra Team at Meta, working on... read more

David Taieb

META

Frances Perry is an engineering manager at MotherDuck, the serverless analytics platform and data... read more

Frances Perry

MotherDuck

Santosh Chandrachood has been with AWS over last 7+ years and helped build, launch,... read more

Santosh Chandrachood

AWS

Ahmet Altay is an Apache Beam committer, member of Apache Software Foundation, and Engineering... read more

Ahmet Altay

Google

Aparna Ramani is VP of Engineering at Meta, responsible for Data, Developer and AI... read more

Aparna Ramani

Meta

Delia David is Data Infrastructure tech lead for AI. Over the last 10+ years,... read more

Delia David

Meta

LATEST NOTES

Data @Scale
05/22/2024
Protected: Large-Scale Data Graph: Scale Offline Privacy & Security
Introduction Meta operates large-scale offline data systems across Data Warehouse, Stream Data Processing, and Monitoring & Observability. Those data systems...
UPCOMING EVENT   07/31/2024 AI @Scale

AI Infra @Scale 2024

Meta's Engineering and Infrastructure teams are excited to host AI Infra @Scale, a one-day virtual event featuring a range of speakers from Meta who will unveil the latest AI infrastructure investments and innovations powering Meta's...
UPCOMING EVENT   August 7, 2024 Product @Scale

Product @Scale 2024

Product @Scale conferences are designed for technologists who work on solving complex product problems at scale. The @Scale community focuses on bringing forward people's experiences in creating innovative solutions to large-scale products serving millions or...
UPCOMING EVENT   September 4-5, 2024 (2 day event) Networking @Scale

Networking @Scale 2024

Networking @Scale is a technical conference for engineers that build and manage large-scale networks. Meta’s Networking Infrastructure team is excited to host Networking @Scale, a two-day virtual event featuring a range of speakers from Meta...
UPCOMING EVENT   October 9, 2024 Reliability @Scale

Reliability @Scale 2024

Reliability @Scale is a technical conference for engineers who are passionate about building and understanding highly resilient and reliable systems and products at massive scale. Whether it’s novel design decisions, or outages that impact billions...
UPCOMING EVENT   October 23, 2024 Mobile @Scale

Mobile @Scale 2024

Mobile @Scale is a technical conference designed for the engineers, product managers, and engineering leaders building mobile experiences at significant scale (millions to billions of daily users). Mobile @Scale provides a rare opportunity to gather...
UPCOMING EVENT   November 20, 2024 Video @Scale

Video @Scale 2024

Video @Scale 2024 is a technical conference designed for engineers that develop or manage large-scale video systems serving millions of people. The development of large-scale video systems includes complex, unprecedented engineering challenges. The @Scale community...
PAST EVENT   March 20, 2024 @ 9am PT - 3pm PT RTC @Scale

RTC @Scale 2024

RTC @Scale is for engineers who develop and manage large-scale real-time communication (RTC) systems serving millions of people. The operations of large-scale RTC systems have always involved complex engineering challenges which continue to attract attention...
Past EVENT   May 22, 2024 Data @Scale

Data @Scale 2024

Data @Scale is a technical conference for engineers who are interested in building, operating, and using data systems at scale. Companies across the industry use data and underlying infrastructure to build products with user empathy,...
Past EVENT   June 12, 2024 Systems @Scale

Systems @Scale 2024

Systems @Scale 2024 is a technical conference intended for engineers that build and manage large-scale distributed systems serving millions or billions of users. The development and operation of such systems often introduces complex, unprecedented engineering...

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy