Event times below are displayed in PT.
Meta's Engineering and Infrastructure teams are excited to host AI Infra @Scale, a one-day virtual event featuring a range of speakers from Meta who will unveil the latest AI infrastructure investments and innovations powering Meta's products and services. Join us as we share how Meta is creating the next generation of AI infrastructure to build and scale technologies to power Meta’s products and services today and in the future. We’ll also discuss how these advances in AI will positively impact the broader community. Register today and check back for upcoming speaker and agenda announcements.
AI Infra @Scale 2023 will be hosted virtually and feature a range of speakers from Meta’s engineering and infrastructure teams. Meta's Head of Infrastructure, Santosh Janardhan, will deliver opening and closing remarks, while also guiding us through six exciting technical presentations on some of Meta's latest AI infrastructure investments. Additionally, we’re thrilled to host a fireside chat with a panel of Meta AI infrastructure leaders as they discuss "The Future of AI Infra: The Opportunities and Challenges That Await Us On Our Journey.” The event will take place via webcast on May 18, 2023.
Event times below are displayed in PT.
Meta is facing a challenging and exciting future as it expands beyond its current capabilities in the social space. Ensuring our platform is open to as many diverse cultures, languages and perspectives is a significant challenge that requires intensive large-scale AI models. The complexities of adding a virtual reality Metaverse further increases the challenge space, requiring much larger models with greater numbers of modalities and parameters.
Meta anticipated these challenges and has built a dedicated high-performance state-of-the art cluster to accelerate AI research. We present the architectural choices that went into building the cluster composed of 16K GPUs, high-performance storage and a non-blocking Infiniband network. We will discuss some of the lessons learned and how they have been applied to Meta in general.
Finally, we reflect on the impact the RSC has had on our research projects, and provide some insight into future directions.
Building AI capacity is essential to the future of our company, and supporting AI workloads at scale requires a different approach than scaling to support our regular online services. Our new data center design will support the next generation of AI systems. We are building an increased level of flexibility into our design, which will allow us to pivot in response to shifts and changes in the AI space.
The new design will have fewer but denser racks to support large scale AI clusters, allowing us to have a smaller footprint while serving the same capacity as our previous data center designs. This design was created with efficiency at the forefront. Each data center going forward will be optimized for water and energy usage depending on the site/region, and will continue to incorporate sustainable features to ensure efficient facilities. We anticipate this design will also be faster and cheaper to build.
What makes PyTorch beloved makes it harder to compile. After almost five years, we finally cracked the technologies that made it possible to compile any PyTorch model, resulting in a step-function change in PyTorch’s approach to execution efficiency. We called it PyTorch 2.0.
PyTorch 2.0 delivers significant performance improvements over a wide variety of models, often with just a simple one-liner change. This talk focuses on the two critical technologies underlying PyTorch 2.0, TorchDynamo and TorchInductor.
PyTorch 2.0 was released in March. But do not mistake it as the end of the story. The first release of PyTorch 2.0 marks the beginning of a roadmap for improving PyTorch execution efficiency via compiled mode.
Meta has traditionally relied on using CPU-based servers for running AI workloads, but the increasing compute and memory requirements of these models have pushed the company towards using specialized solutions such as GPUs or other hardware accelerators. This talk describes the company's effort in constructing its first silicon designed for its internal AI workloads and systems; It describes the accelerator architecture and platform design, and the software stack for enabling and optimizing workloads. It also touches upon the upcoming challenges and evolving requirements that need to be accommodated moving forward.
This presentation will introduce MSVP (Meta's Scalable Video Processor), the first generation server grade video processing hardware accelerator of its kind developed at Meta. We will describe the motivation behind it, the architecture, and some of the novel algorithms that are in the video encoder and other video processing blocks to achieve high video quality. We will also describe how the hardware accelerators are used in Meta’s data center to support processing and transcoding billions of videos every day and provide premium video quality to end users, while saving power.
At Meta, we have built upon existing research published by FAIR to develop our own AI-Assisted code authoring tools. The freedom to experiment with the model combined with the ability to train on first-party code has enabled us to deliver tooling that has had a measurable impact on developer productivity.
This panel discussion will focus on The Future of AI Infra: The Opportunities and Challenges That Await Us On Our Journey. Moderated by Irina Kofman, head of XAI and responsible for cross-company AI efforts at Meta, this panel features leaders across Meta's infrastructure organization and will discuss the challenges and opportunities they see with building world-class, custom infrastructure specially built for AI.
Santosh Janardhan is the head of infrastructure at Meta, where he supports the teams... read more
Scott Jeschonek is a Technical Program Manager at Meta, overseeing the Research SuperCluster. Scott... read more
Kalyan Saladi joined Meta in 2015. He works on the AI Research Super-Cluster as... read more
Alan Duong is Global Director of Data Centers Engineering team at Meta Platforms, where... read more
Dr. Peng Wu is the engineering manager of the PyTorch Compiler team at Meta.... read more
Roman Levenstein is leading the development of the compiler and SW stacks for Meta's... read more
Amin Firoozshahian is a member of the ASIC architecture team, working on architecture definition... read more
Joel Coburn is a software engineer on the AI System Co-Design team at Meta... read more
Olivia Wu is a design lead for the AI System Co-Design team at Meta... read more
Harikrishna Reddy is a Technical Lead in the Infra Silicon Team at Meta, leading... read more
Dr. Ioannis Katsavounidis is part of the Video Infrastructure team, leading technical efforts in... read more
Michael Bolin is a software engineer who has spent the past decade working in... read more
Irina is the head of XAI, where she is responsible for the cross-company AI... read more
Dr. Alexis B. Björlin is Vice President of Infrastructure at Meta, responsible for shaping... read more
Aparna Ramani is VP of Engineering at Meta, responsible for Data, Developer and AI... read more
Kim Hazelwood is an engineering leader whose expertise lies at the intersection of artificial... read more
As Vice President for Data Center Strategy, Rachel Peterson oversees Meta’s global infrastructure expansion... read more