EVENT AGENDA

Event times below are displayed in PT.

May 22

09:00 AM - 09:05 AM
Opening Remarks
Speaker Maor Kleider,META
Speaker Jelena Pješivac-Grbović,Meta
09:05 AM - 09:25 AM
The AI-First Data Infrastructure

Barak Yagour is a Director of Engineering at Meta, responsible for Data Infrastructure. Over the last seven years at Meta, Barak has built a world-class team that’s now responsible for some of the largest data systems on the planet. The Data Infra teams are focused on evolving and scaling our Analytics, Monitoring and Observability, ML/AI Infrastructure, as well as our Responsibility and Privacy Infra efforts. These systems and tools allow everyone at Meta to make better data-driven decisions and do this faster. Dealing with an extremely broad range of problems, touching all SW stack levels and a wide range of systems and platforms. Prior to joining Meta, Barak held leadership roles in small startups and in major corporations, tackling complex problems in the space of Cyber, Messaging, Networking, Virtualization and Machine Learning.

Speaker Barak Yagour,Meta
09:25 AM - 09:55 AM
Navigating Data's Next Great Shift

Data is the foundational building block of GenAI. As we navigate this transition we face exciting new opportunities as well as long standing challenges in the data space. In this conversation we’ll explore these opportunities as well as lessons we can draw from the past. We’ll cover some of the things that need to be built, thoughts on ensuring that we’re building for the good of everyone, and the need to push this space forward together as a community.

Speaker DJ Patil,General Partner, GreatPoint Ventures and Former U.S. Chief Data Scientist
Speaker Daniel Francisco,Meta
09:55 AM - 10:15 AM
Composable Data Management Systems

In this talk, we will present how Meta has evolved the traditional monolithic way of developing data management systems into a composable architecture that promotes reusability and improves engineering efficiency. We will discuss the new reference architecture and present the Velox open source project, highlighting its impact at Meta and the open source community. We will also discuss related projects and future work in this space.

Speaker Pedro Pedreira,Meta
Speaker Amit Purohit,Meta
10:15 AM - 10:35 AM
Demystifying the Data Stack of the Largest and Fastest Growing Payment Gateway in India: Razorpay

Razorpay stands as India's fastest-growing payment aggregator, processing over $150 billion USD annually in Total Payment Volume, with a global footprint across various regions. This is possible because our data stack is built with ability to handle challenges in functionality, scale, availability, risk, security, compliance all with cost efficiency.

Intrigued? Wanna know how? Join my session as I share the secret sauce of our stack.

You will also hear about a few unsolved, tough problems that we intend to taking head on in the coming months!

Speaker Murali Brahmadesam,Razorpay
10:35 AM - 10:40 AM
Break
10:40 AM - 11:00 AM
Scaling Meta’s Infra with GenAI: Journey to Faster and Smarter Incident Response

With billions of active users, Meta's incident response process is critical to maintaining our reliability commitments. In this talk, we explore the challenges we face due to the complexity and scale of our operations and how we are leveraging AI to revolutionize onboarding responders and root cause analysis. Join us to learn about our journey, the lessons we've learned, and our vision for the future of incident response.

Speaker Diana Hsu,META
Speaker Mohamed Farrag,META
11:00 AM - 11:20 AM
A Case Study in Bridging Production Software and Data Practices for LLM Model Training Using Snowflake

We present a case study detailing the utilization of Snowflake, a cloud-based data platform, in various stages of the LLM data pipeline from initial annotation to LLM model productionization. The system we have built brings production software and data practices to the field of LLM model training. We describe how every step in the system is built, including data annotation, filtering, global deduplication, decontamination and tokenization. We show how the data engineering capabilities of a cloud warehouse like Snowflake can be used to enhance data exploration and LLM data ablations and experimentation. A key aspect of LLM productionisation that we cover involves incorporating data lineage tracking, facilitated by output cards at each stage of the data pipelines, ensuring transparency and traceability throughout the LLM model development lifecycle.

Speaker Nathan Wiegand,Snowflake
Speaker Kelvin So,Snowflake
11:20 AM - 11:45 AM
Data @Scale Live Q&A Session #1
Moderator Manju Anand,META
Speaker Pedro Pedreira,Meta
Speaker Amit Purohit,Meta
Speaker Murali Brahmadesam,Razorpay
Speaker Diana Hsu,META
Speaker Mohamed Farrag,META
Speaker Nathan Wiegand,Snowflake
Speaker Kelvin So,Snowflake
11:45 AM - 12:05 PM
Lunch Break
12:05 PM - 12:25 PM
Large-Scale Data Graph: Scale and Optimize Privacy & Security in Offline Data Systems

Meta operates large-scale offline data systems across Data Warehouse, Stream Data Processing and Monitoring & Observability. Privacy, security, and other cross-cutting management and governance concerns are critical in the data infrastructure. This talk explains how we map out the offline data systems using unified metadata representation of assets and link them through the relationships and flow of information into a giant graph, which we call “Data Graph”, and apply it to privacy and security

Speaker Can Lin,META
Speaker David Taieb,META
12:25 PM - 12:40 PM
Taking Flight with Interactive Analytics

Modern data processing systems were designed at a time when hardware looked very different than it does today. And as the early hype around big data settled, it became clear that most analytics only use a subset of the data that is available.

Speaker Frances Perry,MotherDuck
12:40 PM - 01:00 PM
Smarter, Faster Data Analytics with Generative AI and Machine Learning

Data volume continues to increase and so is the need for large number of data workers to derive insights from data. This puts pressure on the data teams as well as requires scalable data infrastructure that can be programmed easily, even by low-code developers, and operated seamlessly at scale without active intervention of data operators. This talk discusses the key challenges, and talks about new innovations using GenAI and ML that help address these challenges, helping enterprises to accelerate their business using data analysis efficiently.

Speaker Santosh Chandrachood,AWS
01:00 PM - 01:20 PM
Beam Up Your GenAI Usage: Usability, Efficiency, Reliability with Apache Beam

Apache Beam, a data processing framework, has seen a surge in adoption for GenAI use cases. This growth can be attributed to our focus on building a user-friendly, efficient, and reliable platform. Beam ML empowers GenAI practitioners to dedicate their expertise to core tasks by removing technical hurdles and streamlining workflows. This approach has unlocked exponential growth, attracting industry leaders like Spotify and Google, alongside numerous mid-sized and smaller businesses, to leverage Beam ML for their production workloads. Join this talk to delve into the key learnings from Beam's ML journey and discover how you can accelerate your own GenAI initiatives.

Speaker Ahmet Altay,Google
01:20 PM - 01:25 PM
Break
01:25 PM - 01:50 PM
Data @Scale Live Q&A Session #2
Moderator Manju Anand,META
Speaker Can Lin,META
Speaker David Taieb,META
Speaker Frances Perry,MotherDuck
Speaker Ahmet Altay,Google
01:50 PM - 02:20 PM
Fireside Chat

The fireside chat will focus on the key trends that have emerged over the past year and a half in Data infrastructure, delve into the biggest challenges in building a world-class, AI-first data infrastructure and discuss how Meta approaches problem selection for research.

Moderator Manju Anand,META
Speaker Aparna Ramani,Meta
Speaker Delia David,Meta
02:20 PM - 02:25 PM
Closing Remarks
Speaker Maor Kleider,META
Speaker Jelena Pješivac-Grbović,Meta

LATEST NOTES

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy