TOPIC: Machine Learning and AI

AI Infra @Scale

MAY 18, 2023 @ 9:00 AM PDT - 11:20 AM PDT

Meta's Engineering and Infrastructure teams are excited to host AI Infra @Scale, a one-day virtual event featuring a range of speakers from Meta who will unveil the latest AI infrastructure investments and innovations powering Meta's products and services. Join us as we share how Meta is creating the next generation of AI infrastructure to build and scale technologies to power Meta’s products and services today and in the future. We’ll also discuss how these advances in AI will positively impact the broader community. Register today and check back for upcoming speaker and agenda announcements.

RSVPS CLOSED

AGENDA SPEAKERS

ABOUT EVENT

AI Infra @Scale 2023 will be hosted virtually and feature a range of speakers from Meta’s engineering and infrastructure teams. Meta's Head of Infrastructure, Santosh Janardhan, will deliver opening and closing remarks, while also guiding us through six exciting technical presentations on some of Meta's latest AI infrastructure investments. Additionally, we’re thrilled to host a fireside chat with a panel of Meta AI infrastructure leaders as they discuss "The Future of AI Infra: The Opportunities and Challenges That Await Us On Our Journey.” The event will take place via webcast on May 18, 2023.

EVENT AGENDA

Event times below are displayed in PT.

May 18

09:00 AM - 09:05 AM

Opening Remarks

WATCH NOW

Speaker SANTOSH JANARDHAN,Meta

ADDITIONAL RESOURCES

Reimagining Our Infrastructure for the AI Age read more

09:05 AM - 09:25 AM

Meta's Research SuperCluster(RSC): Accelerating AI Research at Scale

WATCH NOW

Meta is facing a challenging and exciting future as it expands beyond its current capabilities in the social space. Ensuring our platform is open to as many diverse cultures, languages and perspectives is a significant challenge that requires intensive large-scale AI models. The complexities of adding a virtual reality Metaverse further increases the challenge space, requiring much larger models with greater numbers of modalities and parameters.

Meta anticipated these challenges and has built a dedicated high-performance state-of-the art cluster to accelerate AI research. We present the architectural choices that went into building the cluster composed of 16K GPUs, high-performance storage and a non-blocking Infiniband network. We will discuss some of the lessons learned and how they have been applied to Meta in general.

Finally, we reflect on the impact the RSC has had on our research projects, and provide some insight into future directions.

Speaker Scott Jeschonek,Meta AI

Speaker Kalyan Saladi,Meta AI

ADDITIONAL RESOURCES

Pursuing groundbreaking scale and accelerating research using Meta’s Research SuperCluster read more

09:25 AM - 09:40 AM

NEXT-GENERATION DATA CENTER DESIGN

WATCH NOW

Building AI capacity is essential to the future of our company, and supporting AI workloads at scale requires a different approach than scaling to support our regular online services. Our new data center design will support the next generation of AI systems. We are building an increased level of flexibility into our design, which will allow us to pivot in response to shifts and changes in the AI space.

The new design will have fewer but denser racks to support large scale AI clusters, allowing us to have a smaller footprint while serving the same capacity as our previous data center designs. This design was created with efficiency at the forefront. Each data center going forward will be optimized for water and energy usage depending on the site/region, and will continue to incorporate sustainable features to ensure efficient facilities. We anticipate this design will also be faster and cheaper to build.

Speaker Alan Duong,Meta

09:40 AM - 09:55 AM

Pytorch 2.0

WATCH NOW

What makes PyTorch beloved makes it harder to compile. After almost five years, we finally cracked the technologies that made it possible to compile any PyTorch model, resulting in a step-function change in PyTorch’s approach to execution efficiency. We called it PyTorch 2.0.

PyTorch 2.0 delivers significant performance improvements over a wide variety of models, often with just a simple one-liner change. This talk focuses on the two critical technologies underlying PyTorch 2.0, TorchDynamo and TorchInductor.

PyTorch 2.0 was released in March. But do not mistake it as the end of the story. The first release of PyTorch 2.0 marks the beginning of a roadmap for improving PyTorch execution efficiency via compiled mode.

Speaker Peng Wu,Meta

09:55 AM - 10:15 AM

MTIA: Meta's First Generation of AI Accelerators

WATCH NOW

Meta has traditionally relied on using CPU-based servers for running AI workloads, but the increasing compute and memory requirements of these models have pushed the company towards using specialized solutions such as GPUs or other hardware accelerators. This talk describes the company's effort in constructing its first silicon designed for its internal AI workloads and systems; It describes the accelerator architecture and platform design, and the software stack for enabling and optimizing workloads. It also touches upon the upcoming challenges and evolving requirements that need to be accommodated moving forward.

Speaker Roman Levenstein,Meta

Speaker Amin Firoozshahian,Meta

Speaker Joel Coburn,Meta

Speaker Olivia Wu,Meta

ADDITIONAL RESOURCES

MTIA v1: Meta’s first-generation AI inference accelerator read more

10:15 AM - 10:30 AM

Break

10:30 AM - 10:40 AM

MSVP: Meta's Scalable Video Processor

WATCH NOW

This presentation will introduce MSVP (Meta's Scalable Video Processor), the first generation server grade video processing hardware accelerator of its kind developed at Meta. We will describe the motivation behind it, the architecture, and some of the novel algorithms that are in the video encoder and other video processing blocks to achieve high video quality. We will also describe how the hardware accelerators are used in Meta’s data center to support processing and transcoding billions of videos every day and provide premium video quality to end users, while saving power.

Speaker Harikrishna Reddy,Meta

Speaker Ioannis Katsavounidis,Meta

ADDITIONAL RESOURCES

MSVP: Meta’s first ASIC for video transcoding read more

10:40 AM - 10:50 AM

Gen AI-Assisted Code Authoring at Meta

WATCH NOW

At Meta, we have built upon existing research published by FAIR to develop our own AI-Assisted code authoring tools. The freedom to experiment with the model combined with the ability to train on first-party code has enabled us to deliver tooling that has had a measurable impact on developer productivity.

Speaker Michael Bolin,Meta

10:50 AM - 11:15 AM

Fireside Chat: The Future of AI Infra: The Opportunities and Challenges That Await Us On Our Journey

WATCH NOW

This panel discussion will focus on The Future of AI Infra: The Opportunities and Challenges That Await Us On Our Journey. Moderated by Irina Kofman, head of XAI and responsible for cross-company AI efforts at Meta, this panel features leaders across Meta's infrastructure organization and will discuss the challenges and opportunities they see with building world-class, custom infrastructure specially built for AI.

Speaker Irina Kofman,Meta AI

Speaker Alexis Björlin,Meta

Speaker Aparna Ramani,Meta

Speaker Kim Hazelwood,Meta AI

Speaker Rachel Peterson,Meta

11:15 AM - 11:20 AM

Closing Remarks

WATCH NOW

Speaker SANTOSH JANARDHAN,Meta

SPEAKERS AND MODERATORS

Santosh Janardhan is the head of infrastructure at Meta, where he supports the teams... read more

SANTOSH JANARDHAN

Meta

Scott Jeschonek is a Technical Program Manager at Meta, overseeing the Research SuperCluster. Scott... read more

Scott Jeschonek

Meta AI

Kalyan Saladi joined Meta in 2015. He works on the AI Research Super-Cluster as... read more

Kalyan Saladi

Meta AI

Alan Duong is Global Director of Data Centers Engineering team at Meta Platforms, where... read more

Alan Duong

Meta

Dr. Peng Wu is the engineering manager of the PyTorch Compiler team at Meta.... read more

Peng Wu

Meta

Roman Levenstein is leading the development of the compiler and SW stacks for Meta's... read more

Roman Levenstein

Meta

Amin Firoozshahian is a member of the ASIC architecture team, working on architecture definition... read more

Amin Firoozshahian

Meta

Joel Coburn is a software engineer on the AI and Systems Co-Design team at... read more

Joel Coburn

Meta

Olivia Wu is a design lead for the AI System Co-Design team at Meta... read more

Olivia Wu

Meta

Harikrishna Reddy is a Technical Lead in the Infra Silicon Team at Meta, leading... read more

Harikrishna Reddy

Meta

Dr. Ioannis Katsavounidis is part of the Video Infrastructure team, leading technical efforts in... read more

Ioannis Katsavounidis

Meta

Michael Bolin is a software engineer who has spent the past decade working in... read more

Michael Bolin

Meta

Irina is the head of XAI, where she is responsible for the cross-company AI... read more

Irina Kofman

Meta AI

Dr. Alexis B. Björlin is Vice President of Infrastructure at Meta, responsible for shaping... read more

Alexis Björlin

Meta

Aparna Ramani is VP of Engineering at Meta, responsible for Data, Developer and AI... read more

Aparna Ramani

Meta

Kim Hazelwood is an engineering leader whose expertise lies at the intersection of artificial... read more

Kim Hazelwood

Meta AI

As Vice President for Data Center Strategy, Rachel Peterson oversees Meta’s global infrastructure expansion... read more

Rachel Peterson

Meta

UPCOMING EVENT JULY 31, 2024 @ 2:30 PM PDT - 7:00 PM PDT - IN PERSON EVENT | AUGUST 7, 2024 @ 2:30 PM PDT - 5:30 PM PDT - VIRTUAL PROGRAM AI @Scale

AI Infra @Scale 2024

Meta’s Engineering and Infrastructure teams are excited to return for the second year in a row to host AI Infra @Scale on July 31. This year’s event is open to a limited number of in-person...

UPCOMING EVENT August 14, 2024 Product @Scale

Product @Scale 2024

Product @Scale conferences are designed for technologists who work on solving complex product problems at scale. The @Scale community focuses on bringing forward people's experiences in creating innovative solutions to large-scale products serving millions or...

UPCOMING EVENT September 11, 2024 | Santa Clara Convention Center Networking @Scale

Networking @Scale 2024

Meta’s Networking team invites you to Networking@scale on September 11th. . This year’s event is an in-person event hosted at the Santa Clara Convention center and will also be live streamed for virtual attendees. Registration...

UPCOMING EVENT October 9, 2024 Reliability @Scale

Reliability @Scale 2024

Reliability @Scale is a technical conference for engineers who are passionate about building and understanding highly resilient and reliable systems and products at massive scale. Whether it’s novel design decisions, or outages that impact billions...

UPCOMING EVENT October 23, 2024 Mobile @Scale

Mobile @Scale 2024

Mobile @Scale is a technical conference designed for the engineers, product managers, and engineering leaders building mobile experiences at significant scale (millions to billions of daily users). Mobile @Scale provides a rare opportunity to gather...

UPCOMING EVENT November 20, 2024 Video @Scale

Video @Scale 2024

Video @Scale 2024 is a technical conference designed for engineers that develop or manage large-scale video systems serving millions of people. The development of large-scale video systems includes complex, unprecedented engineering challenges. The @Scale community...

PAST EVENT March 20, 2024 @ 9am PT - 3pm PT RTC @Scale

RTC @Scale 2024

RTC @Scale is for engineers who develop and manage large-scale real-time communication (RTC) systems serving millions of people. The operations of large-scale RTC systems have always involved complex engineering challenges which continue to attract attention...

Past EVENT May 22, 2024 Data @Scale

Data @Scale 2024

Data @Scale is a technical conference for engineers who are interested in building, operating, and using data systems at scale. Companies across the industry use data and underlying infrastructure to build products with user empathy,...

Past EVENT June 12, 2024 Systems @Scale

Systems @Scale 2024

Systems @Scale 2024 is a technical conference intended for engineers that build and manage large-scale distributed systems serving millions or billions of users. The development and operation of such systems often introduces complex, unprecedented engineering...

FIND @SCALE TOPICS

Data, Systems and Networking Dev Tools and Ops, Privacy, Sustainability and Performance Fighting Abuse and Security Mobile, Video and Web

AI Infra @Scale

ABOUT EVENT

EVENT AGENDA

May 18

ADDITIONAL RESOURCES

ADDITIONAL RESOURCES

ADDITIONAL RESOURCES

ADDITIONAL RESOURCES

SPEAKERS AND MODERATORS

SANTOSH JANARDHAN

Scott Jeschonek

Kalyan Saladi

Alan Duong

Peng Wu

Roman Levenstein

Amin Firoozshahian

Joel Coburn

Olivia Wu

Harikrishna Reddy

Ioannis Katsavounidis

Michael Bolin

Irina Kofman

Alexis Björlin

Aparna Ramani

Kim Hazelwood

Rachel Peterson

AI Infra @Scale 2024

Product @Scale 2024

Networking @Scale 2024

Reliability @Scale 2024

Mobile @Scale 2024

Video @Scale 2024

RTC @Scale 2024

Data @Scale 2024

Systems @Scale 2024

FIND @SCALE TOPICS

EXPLORE OTHER SERIES

MACHINE LEARNING @scale