EVENT AGENDA
Event times below are displayed in PT.
Meta’s Engineering and Infrastructure teams are excited to return for the second year in a row to host AI Infra @Scale on July 31. This year’s event is open to a limited number of in-person attendees at Meta HQ in Menlo Park on Wed, July 31. The event will also be livestreamed for virtual attendees on Wed, August 7. Registration is open now for both viewing options.
The challenges of scaling AI efforts mean rethinking every layer of an infra stack – from data centers and silicon all the way up to software systems. Attendees can expect to get a deeper look at Meta’s large GPU cluster work and insights on how Meta is building, training, and serving their most advanced models – including the recently open sourced Llama 3 models. You will hear about Meta’s commitment to open source in infrastructure from hardware designs (Grand Teton and Open Rack) to software (PyTorch) – and hear from other industry voices about the importance of open innovation. There will also be more details around Meta’s work to build its own silicon for some of their more unique workloads. This event will highlight the full stack challenges of scaling AI – both for now and in the future.
Registration is now closed for in person attendance. Registration for the August 7th virtual program will remain open through the event.
Event times below are displayed in PT.
This talk discusses the diversity, volume and freshness of data required for GenAI, as well as the need to extract and prepare data differently based on its type, including interleaved data and multi-step trajectories for learning agentic behaviors. The talk also presents some of investments we have done to improve researcher productivity.
Large scale training requires substantial investment across the infrastructure stack. In this talk, we delve into some of the data center, network and software investments that enabled the development of our Llama3 models.
Optimizing and scaling LLM inference is crucial for enabling large-scale product applications at reasonable cost. This presentation will introduce key parallelism techniques that help scale model sizes and context windows, which in turn influence inference system designs. Additionally, we will discuss practical challenges associated with deploying these complex serving paradigms throughout our internal cloud to our data center of heterogeneous hardware, including the need for multi-faceted trade-offs when facing large-scale and dynamic real-world loads.
In recent years, we've entered an AI summer, characterized by soaring investments, insatiable demand for compute power, and widespread enthusiasm for AI-driven technologies such as ChatGPT, GitHub Copilot, and MidJourney. As we stand on the brink of the next wave of AI advancements—featuring AI agents, co-pilots, and AI-powered process automation—the success of these advances hinges on developing safe, efficient, and highly capable AI components. In this talk, we will explore the next wave of AI and how open innovation in models, datasets, libraries, and research serves as a critical cornerstone for this progress. By leveraging open innovation, we can provide the foundation necessary to achieve these ambitious goals and propel the next wave of AI forward.
In this talk, we will go through the PyTorch advancements for Large Language Models (LLMs), developments that enhance every aspects of the LLM lifecycle. This includes our newest features/tools to enable large scale training, memory efficient fine-tuning, and on device LLM capabilities.
In this talk, we will discuss fine-tuning and deploying LLMs for local inference. First, we will discuss the importance of memory-efficient fine-tuning and a couple common architectural and algorithmic techniques to enable fine-tuning on consumer-grade hardware. The second half of the talk will cover challenges in deploying such large models for on-device deployment and some of the techniques such as quantization that make deployment possible.
MTIA is Meta's in-house ML accelerator program, and the second generation chip is serving in data centers. This talk describes the co-design process in building custom silicon, the Pytorch software ecosystem, and model architectures for Meta's key applications.
We show how MTIA achieves the performance, efficiency, and developer experience to successfully launch models into production. We highlight several co-design examples where we utilize special silicon features to accelerate our models. Finally, we describe future directions for MTIA.
Introduce the landed silicon MTIA Next Generation Accelerator. Meta specific optimizations to accelerate Meta workloads. Performance gains over software/GPU solutions. Future silicon roadmap.
Details coming soon!
Aparna Ramani is VP of Engineering at Meta, responsible for Data, Developer and AI... read more
Joe Spisak is Product Director and Head of Open Source in Meta’s Generative AI... read more
Delia has been a Software Engineer at Meta for the past 13 years. She... read more
I work on AI. My focus is on ensuring that our data center, network,... read more
Ye (Charlotte) Qi is a Production Engineer at Meta. read more
Hagay Lupesko is an engineering lead at Databricks, where he focuses on making generative... read more
Software Engineer, PyTorch Core team. Working on distributed training, author of torchtitan, Tensor Parallel... read more
Kimish has worked on enabling PyTorch on Meta's family of apps, primarily focusing on... read more
Evan is a software engineer on the PyTorch Domains team at Meta. He currently... read more
Joel Coburn is a software engineer on the AI and Systems Co-Design team at... read more
Tech lead in Architecture team of Infra silicon; Leading ML/video accelerator architecture, functional/performance modeling,... read more
Jack is a Software Engineer at Meta. read more
Michael is a Software Engineer at Meta. read more
Chip Huyen works to accelerate data analytics on GPUs at Voltron Data. She also... read more
Chris Lattner is a co-founder and the CEO of Modular, which is building an... read more