EVENT AGENDA
Event times below are displayed in PT.
Data @Scale is a technical conference for engineers who are interested in building, operating, and using data systems at scale. Companies across the industry use data and underlying infrastructure to build products with user empathy, find new market opportunities, understand trends, make better decisions, and ensure that their products and systems stay healthy. Generative AI is evolving the landscape of data infrastructure at extreme scale, imposing unique and complex engineering problems that many of us are excited to solve.
On May 22, speakers from Amazon Web Services, Google, Meta, MotherDuck, Razorpay and Snowflake will delve into the fresh challenges and opportunities posed by Generative AI for data infrastructure. The virtual conference will feature keynote presentations, tech talks, Q&A sessions, and a fireside chat.
Register today and check back for upcoming speaker and agenda announcements!
Event times below are displayed in PT.
Barak Yagour is a Director of Engineering at Meta, responsible for Data Infrastructure. Over the last seven years at Meta, Barak has built a world-class team that’s now responsible for some of the largest data systems on the planet. The Data Infra teams are focused on evolving and scaling our Analytics, Monitoring and Observability, ML/AI Infrastructure, as well as our Responsibility and Privacy Infra efforts. These systems and tools allow everyone at Meta to make better data-driven decisions and do this faster. Dealing with an extremely broad range of problems, touching all SW stack levels and a wide range of systems and platforms. Prior to joining Meta, Barak held leadership roles in small startups and in major corporations, tackling complex problems in the space of Cyber, Messaging, Networking, Virtualization and Machine Learning.
Data is the foundational building block of GenAI. As we navigate this transition we face exciting new opportunities as well as long standing challenges in the data space. In this conversation we’ll explore these opportunities as well as lessons we can draw from the past. We’ll cover some of the things that need to be built, thoughts on ensuring that we’re building for the good of everyone, and the need to push this space forward together as a community.
In this talk, we will present how Meta has evolved the traditional monolithic way of developing data management systems into a composable architecture that promotes reusability and improves engineering efficiency. We will discuss the new reference architecture and present the Velox open source project, highlighting its impact at Meta and the open source community. We will also discuss related projects and future work in this space.
Razorpay stands as India's fastest-growing payment aggregator, processing over $150 billion USD annually in Total Payment Volume, with a global footprint across various regions. This is possible because our data stack is built with ability to handle challenges in functionality, scale, availability, risk, security, compliance all with cost efficiency.
Intrigued? Wanna know how? Join my session as I share the secret sauce of our stack.
You will also hear about a few unsolved, tough problems that we intend to taking head on in the coming months!
With billions of active users, Meta's incident response process is critical to maintaining our reliability commitments. In this talk, we explore the challenges we face due to the complexity and scale of our operations and how we are leveraging AI to revolutionize onboarding responders and root cause analysis. Join us to learn about our journey, the lessons we've learned, and our vision for the future of incident response.
We present a case study detailing the utilization of Snowflake, a cloud-based data platform, in various stages of the LLM data pipeline from initial annotation to LLM model productionization. The system we have built brings production software and data practices to the field of LLM model training. We describe how every step in the system is built, including data annotation, filtering, global deduplication, decontamination and tokenization. We show how the data engineering capabilities of a cloud warehouse like Snowflake can be used to enhance data exploration and LLM data ablations and experimentation. A key aspect of LLM productionisation that we cover involves incorporating data lineage tracking, facilitated by output cards at each stage of the data pipelines, ensuring transparency and traceability throughout the LLM model development lifecycle.
Meta operates large-scale offline data systems across Data Warehouse, Stream Data Processing and Monitoring & Observability. Privacy, security, and other cross-cutting management and governance concerns are critical in the data infrastructure. This talk explains how we map out the offline data systems using unified metadata representation of assets and link them through the relationships and flow of information into a giant graph, which we call “Data Graph”, and apply it to privacy and security
Modern data processing systems were designed at a time when hardware looked very different than it does today. And as the early hype around big data settled, it became clear that most analytics only use a subset of the data that is available.
Data volume continues to increase and so is the need for large number of data workers to derive insights from data. This puts pressure on the data teams as well as requires scalable data infrastructure that can be programmed easily, even by low-code developers, and operated seamlessly at scale without active intervention of data operators. This talk discusses the key challenges, and talks about new innovations using GenAI and ML that help address these challenges, helping enterprises to accelerate their business using data analysis efficiently.
Apache Beam, a data processing framework, has seen a surge in adoption for GenAI use cases. This growth can be attributed to our focus on building a user-friendly, efficient, and reliable platform. Beam ML empowers GenAI practitioners to dedicate their expertise to core tasks by removing technical hurdles and streamlining workflows. This approach has unlocked exponential growth, attracting industry leaders like Spotify and Google, alongside numerous mid-sized and smaller businesses, to leverage Beam ML for their production workloads. Join this talk to delve into the key learnings from Beam's ML journey and discover how you can accelerate your own GenAI initiatives.
The fireside chat will focus on the key trends that have emerged over the past year and a half in Data infrastructure, delve into the biggest challenges in building a world-class, AI-first data infrastructure and discuss how Meta approaches problem selection for research.
Maor Kleider is a Director of Product Management at Meta, supporting the product team... read more
Dr. Jelena Pješivac-Grbović is an engineering director in Data Infrastructure in Meta. Her teams... read more
Barak Yagour is a Director of Engineering at Meta, responsible for Data Infrastructure. Over... read more
DJ Patil is an entrepreneur, investor, scientist, and leader in public policy. He has... read more
Daniel is a Director of Product at Meta. He works on AI, Data, and... read more
Pedro Pedreira is a Software Engineer at Meta. During his 11-year tenure, he has... read more
I have been with Meta for over a year supporting the Compute Engines for... read more
Murali Brahmadesam, Razorpay's CTO and Head of Engineering, has over two decades of experience... read more
Diana is a Product Manager at Infra currently focusing on how to leverage AI... read more
Mohamed is a software engineering working on Data Infrastructure team for the past 3... read more
Nathan is a Principal Software Engineer at Snowflake, where he focuses on Cortex and... read more
Kelvin So is a Principal Software Engineer at Snowflake, where he is one of... read more
Manju Anand is engineering manager for Data Infra Storage Engine team that is responsible... read more
Can Lin is a software engineer in the AI & Data Infrastructure Responsibility area... read more
David Taieb is Software Engineer on the Data Infra Team at Meta, working on... read more
Frances Perry is an engineering manager at MotherDuck, the serverless analytics platform and data... read more
Santosh Chandrachood has been with AWS over last 7+ years and helped build, launch,... read more
Ahmet Altay is an Apache Beam committer, member of Apache Software Foundation, and Engineering... read more
Aparna Ramani is VP of Engineering at Meta, responsible for Data, Developer and AI... read more
Delia has been a Software Engineer at Meta for the past 13 years. She... read more