AI Infra @Scale 2024

JULY 31, 2024 @ 2:30 PM PDT - 7:00 PM PDT - IN PERSON EVENT | AUGUST 7, 2024 @ 2:30 PM PDT - 5:30 PM PDT - VIRTUAL PROGRAM

Meta’s Engineering and Infrastructure teams are excited to return for the second year in a row to host AI Infra @Scale on July 31. This year’s event is open to a limited number of in-person attendees at Meta HQ in Menlo Park on Wed, July 31. The event will also be livestreamed for virtual attendees on Wed, August 7. Registration is open now for both viewing options.

The challenges of scaling AI efforts mean rethinking every layer of an infra stack – from data centers and silicon all the way up to software systems. Attendees can expect to get a deeper look at Meta’s large GPU cluster work and insights on how Meta is building, training, and serving their most advanced models – including the recently open sourced Llama 3 models. You will hear about Meta’s commitment to open source in infrastructure from hardware designs (Grand Teton and Open Rack) to software (PyTorch) – and hear from other industry voices about the importance of open innovation. There will also be more details around Meta’s work to build its own silicon for some of their more unique workloads. This event will highlight the full stack challenges of scaling AI – both for now and in the future.

Registration is now closed for in person attendance. Registration for the August 7th virtual program will remain open through the event.

RSVPS CLOSED

AGENDA SPEAKERS

EVENT AGENDA

Event times below are displayed in PT.

July 31 & August 7

02:30 PM - 02:35 PM

Opening Remarks

WATCH NOW

Speaker Damien Sereni,Meta

02:35 PM - 02:55 PM

Keynote

WATCH NOW

Speaker Aparna Ramani,Meta

Session 1: Bringing LLaMa 3 to Life

02:55 PM - 03:05 PM

Overview of Llamas

WATCH NOW

Presentation information coming soon!

Speaker Joe Spisak,Meta

03:05 PM - 03:15 PM

Data for GenAI

WATCH NOW

This talk discusses the diversity, volume and freshness of data required for GenAI, as well as the need to extract and prepare data differently based on its type, including interleaved data and multi-step trajectories for learning agentic behaviors. The talk also presents some of investments we have done to improve researcher productivity.

Speaker Delia David,Meta

03:15 PM - 03:25 PM

LlaMa Training at Scale

WATCH NOW

Large scale training requires substantial investment across the infrastructure stack. In this talk, we delve into some of the data center, network and software investments that enabled the development of our Llama3 models.

Speaker Kaushik Veeraraghavan,Meta

03:25 PM - 03:35 PM

LLaMa Inference at Meta

WATCH NOW

Optimizing and scaling LLM inference is crucial for enabling large-scale product applications at reasonable cost. This presentation will introduce key parallelism techniques that help scale model sizes and context windows, which in turn influence inference system designs. Additionally, we will discuss practical challenges associated with deploying these complex serving paradigms throughout our internal cloud to our data center of heterogeneous hardware, including the need for multi-faceted trade-offs when facing large-scale and dynamic real-world loads.

Speaker Ye (Charlotte) Qi,Meta

03:35 PM - 03:50 PM

Break

Session 2: How PyTorch Powers Training and Inference

03:50 PM - 04:05 PM

Open Innovation: Unlocking AI's Potential

WATCH NOW

In recent years, we've entered an AI summer, characterized by soaring investments, insatiable demand for compute power, and widespread enthusiasm for AI-driven technologies such as ChatGPT, GitHub Copilot, and MidJourney. As we stand on the brink of the next wave of AI advancements—featuring AI agents, co-pilots, and AI-powered process automation—the success of these advances hinges on developing safe, efficient, and highly capable AI components. In this talk, we will explore the next wave of AI and how open innovation in models, datasets, libraries, and research serves as a critical cornerstone for this progress. By leveraging open innovation, we can provide the foundation necessary to achieve these ambitious goals and propel the next wave of AI forward.

Speaker Hagay Lupesko,Databricks

04:05 PM - 04:12 PM

PyTorch @ Scale

WATCH NOW

In this talk, we will go through the PyTorch advancements for Large Language Models (LLMs), developments that enhance every aspects of the LLM lifecycle. This includes our newest features/tools to enable large scale training, memory efficient fine-tuning, and on device LLM capabilities.

Speaker Wanchao Liang,META

04:12 PM - 04:26 PM

Efficient Fine-tuning and Inference of Large Language Models

WATCH NOW

In this talk, we will discuss fine-tuning and deploying LLMs for local inference. First, we will discuss the importance of memory-efficient fine-tuning and a couple common architectural and algorithmic techniques to enable fine-tuning on consumer-grade hardware. The second half of the talk will cover challenges in deploying such large models for on-device deployment and some of the techniques such as quantization that make deployment possible.

Speaker Kimish Patel,META

Speaker Evan Smothers,META

Session 3: Hardware & Co-Design

04:26 PM - 04:36 PM

Model Co-design for MTIA

WATCH NOW

MTIA is Meta's in-house ML accelerator program, and the second generation chip is serving in data centers. This talk describes the co-design process in building custom silicon, the Pytorch software ecosystem, and model architectures for Meta's key applications.

We show how MTIA achieves the performance, efficiency, and developer experience to successfully launch models into production. We highlight several co-design examples where we utilize special silicon features to accelerate our models. Finally, we describe future directions for MTIA.

Speaker Joel Coburn,Meta

04:36 PM - 04:42 PM

MTIA Next Generation Accelerator

WATCH NOW

Introduce the landed silicon MTIA Next Generation Accelerator. Meta specific optimizations to accelerate Meta workloads. Performance gains over software/GPU solutions. Future silicon roadmap.

Speaker Junqiang Lan,META

04:42 PM - 04:50 PM

Silicon Software

WATCH NOW

Presentation information coming soon!

Speaker Jack Montgomery,Meta

Closing Session

04:50 PM - 05:20 PM

Industry Panel

WATCH NOW

Details coming soon!

Speaker Michael Suo,Meta

Speaker Chip Huyen,Voltron Data

Speaker Chris Lattner,Modular AI

05:20 PM - 07:00 PM

Happy Hour

SPEAKERS AND MODERATORS

Damien Sereni

Meta

Aparna Ramani is VP of Engineering at Meta, responsible for Data, Developer and AI... read more

Aparna Ramani

Meta

Joe Spisak is Product Director and Head of Open Source in Meta’s Generative AI... read more

Joe Spisak

Meta

Delia has been a Software Engineer at Meta for the past 13 years. She... read more

Delia David

Meta

I work on AI. My focus is on ensuring that our data center, network,... read more

Kaushik Veeraraghavan

Meta

Ye (Charlotte) Qi is a Production Engineer at Meta. read more

Ye (Charlotte) Qi

Meta

Hagay Lupesko is an engineering lead at Databricks, where he focuses on making generative... read more

Hagay Lupesko

Databricks

Software Engineer, PyTorch Core team. Working on distributed training, author of torchtitan, Tensor Parallel... read more

Wanchao Liang

META

Kimish has worked on enabling PyTorch on Meta's family of apps, primarily focusing on... read more

Kimish Patel

META

Evan is a software engineer on the PyTorch Domains team at Meta. He currently... read more

Evan Smothers

META

Joel Coburn is a software engineer on the AI and Systems Co-Design team at... read more

Joel Coburn

Meta

Tech lead in Architecture team of Infra silicon; Leading ML/video accelerator architecture, functional/performance modeling,... read more

Junqiang Lan

META

Jack is a Software Engineer at Meta. read more

Jack Montgomery

Meta

Michael is a Software Engineer at Meta. read more

Michael Suo

Meta

Chip Huyen works to accelerate data analytics on GPUs at Voltron Data. She also... read more

Chip Huyen

Voltron Data

Chris Lattner is a co-founder and the CEO of Modular, which is building an... read more

Chris Lattner

Modular AI

UPCOMING EVENT August 14, 2024 | Product @Scale

Product @Scale 2024

Product @Scale conferences are designed for technologists who work on solving complex product problems at scale. The @Scale community focuses on bringing forward people's experiences in creating innovative solutions to large-scale products serving millions or...