AI @Scale 2022

Virtual 9:00am - 2:20pm


AI @Scale is a technical conference for engineers that are interested in solving scalability problems in machine learning. This is an exciting area that brings many challenges in the nexus of distributed system, hardware design and machine learning technique innovations. The @Scale community focuses on bringing people together to discuss these challenges and collaborate on the development of new solutions.

AI @Scale will be hosted virtually. Joining us are speakers from AWS, Microsoft, Salesforce, Fiddler, Outerbounds, Unity,, Cruise and Meta. The event will be hosted on September 28, 2022 with talks themed around AI systems implementation at scale.

Read More Read Less
9:00am - 9:20am

Supporting AI holistically Across the Infra stack

In this talk, we delve into how we have designed Meta’s data centers, network, hardware, storage and software systems to support the needs of existing and emerging AI workloads.
9:20am - 9:40am

The Laws of ML at Scale

Scaling Machine Learning Models from the laptop to production is non-linear. Different rules and laws apply at scale. Ketan Umare (CEO, TSC, will cover the three laws for ML at scale. These laws will be grounded using anecdotes that the team encountered while working with some of the top companies utilizing ML in production. The talk will cover the following topics, – Machine Learning products are resource intensive. Oftentimes the cost of ML projects balloon and access to infrastructure is scarce without predefined ROI. – ML at scale is a team sport. Production teams need a common platform to collaborate efficiently. Small in-efficiencies in collaboration can slow down an entire organization. If things can go wrong, at scale they will go wrong. We need a safety net - versioning, reproducibility are important in delivering robust ML products at Scale. To deliver these robust products, just code and data is not enough, it is essential to capture the infrastructure & configuration.
9:40am - 10:00am

Human Friendly, Production Ready Data Science Stack

There is a pressing need for tools and workflows that meet data scientists where they are. This is also a serious business need: How to enable an organization of data scientists, who are not software engineers by training, to build and deploy end-to-end machine learning workflows and applications independently? In this talk, we discuss the problem space and the approach we took to solving it at Netflix. We wanted to provide the best possible user experience for data scientists, allowing them to focus on parts they like (modeling using their favorite off-the-shelf libraries) while providing robust built-in solutions for the foundational infrastructure: data, compute, orchestration, and versioning.
10:00am - 10:20am

ML Infrastructure for Autonomous Vehicles @ Cruise

In this talk I will present the machine learning infrastructure that supports the large scale data preparation and training at Cruise. And how it enables the model improvements in an effective, reliable, continuous and automatic fashion.
10:20am - 10:40am

OPT-175B: LLM Development Lifecycle & Challenges

Coming Soon
10:40am - 11:00am

Customizable Computer Vision Expands Data Access Without Compromising Privacy

Computer vision has made huge strides recently, helped by large-scale labeled datasets. However, these datasets had no guarantees or analysis diversity. Additionally, privacy concerns may limit the ability to collect more data. These problems are particularly acute in human-centric computer vision for AR/VR applications. An emerging alternative to real-world data that alleviates some of these issues is synthetic data. However, creating synthetic data generators is incredibly challenging and prevents researchers from exploring their usefulness. To promote research into the use of synthetic data, we release a set of data generators for computer vision. We found that pre-training a network using synthetic data and fine-tuning on real-world target data results in models that outperform models trained with the real data alone. Furthermore, we find remarkable gains when limited real-world data is available. These freely available data generators should enable a wide range of research into the emerging field of simulation to real transfer learning for computer vision.
11:00am - 11:30am

Live Q&A Session

Live Q&A session with all speakers
11:30am - 11:40am


11:40am - 12:00pm

AI @Scale at Microsoft

Microsoft’s AI @Scale encompasses multiple dimensions. It gives customers access to state-of-the-art large-scale AI models, training and inferencing optimization tools, and supercomputing resources; It provides cross-platform AI runtime that enables customers to infuse AI into every aspect of their platforms and products, across cloud and edge. In this talk, I will give an overview of Microsoft AI @Scale. I will focus on the roles that AI software plays, and discuss the challenges we are facing and the areas for potential collaborations between AI software and hardware.
12:00pm - 12:20pm

At-scale Training with pyTorch and Amazon SageMaker

Coming Soon
12:20pm - 12:40pm

The Future of Inference for Complex AI Applications

Inference of single-model applications has, in recent years, become a multi-stage process, combining online Feature Stores with specialized model-hosting runtimes like ONNX and Triton. Less simplistic AI applications such as those in chatbots, mixed-mode recommenders, and search engines fold "candidate retrieval" steps into the mix. As these applications get more sophisticated, they typically now require all of feature-hydration, semantic encoding/embedding, candidate selection, and candidate re-ranking (not to mention any pre/post-processing or "format massaging" in between any of these steps). While for batch and streaming applications, composable DAG engines and DSLs have been developed to allow these applications to become arbitrarily deep, in the on-demand/realtime world, DAG engines which are open source / open-specification are few and far between, even though they are sorely needed.
12:40pm - 1:00pm

What makes PyTorch beloved makes it hard to compile

Coming Soon
1:00pm - 1:20pm

Model Performance Monitoring - A Practitioner's Perspective

Artificial Intelligence is increasingly playing an integral role in determining our day-to-day experiences. Moreover, with the proliferation of AI-based solutions in areas such as hiring, lending, criminal justice, healthcare, and education, the resulting personal and professional implications of AI are far-reaching. The dominant role played by AI models in these domains has led to a growing concern regarding potential performance drift and bias in these models, and demand for model transparency and interpretability. Model Monitoring has become a prerequisite for building trust and adoption of AI systems in high-stakes domains requiring reliability and safety such as healthcare and automated transportation, and critical industrial applications with significant economic implications such as predictive maintenance, exploration of natural resources, and climate change modeling. In this talk, we will be talking about how Fiddler.AI is building Model Performance Management that solves this problem by continuously monitoring AI algorithms for performance and bias issues and reports actionable insights with explanations to the entire organization. For more information, you can visit or follow us on Twitter @fiddlerlabs.
1:20pm - 1:50pm

Live Q&A Session

Live Q&A session with all speakers
1:50pm - 2:20pm

Fireside Chat

Presented by: Yann LeCun & Ludovic Hauduc

Join the @Scale Mailing List and Get the Latest News & Event Info

Code of Conduct

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy