AI @Scale 2020

NOVEMBER 23, 2020 @ 9:00 AM PST - 11:00 AM PST

NOVEMBER 24, 2020 @ 9:00 AM PST - 3:13 PM PST

Designed for engineers interested in solving machine learning scaling problems.

RSVPS CLOSED

AGENDA SPEAKERS

ABOUT EVENT

November 23-24: AI @Scale is an invitation-only technical conference designed for engineers interested in solving machine learning scaling problems. Topics to be discussed at this year’s virtual conference include: latency improvements for training workflows; how to improve the resiliency of ML training; improving the developer experience for ML engineers; and scaling, sustaining model quality, machine efficiency, among others. The AI @Scale community focuses on bringing people together to discuss these challenges, share ideas and collaborate.

EVENT AGENDA

Event times below are displayed in PT.

November 23

November 24

09:00 AM - 09:20 AM

Azure Cognitive Services @Scale

WATCH NOW

Azure Cognitive Services sits at the core of many essential products and services at Microsoft for internal and external workloads. Anand’s talk describes the hardware and software infrastructure that supports Ai services at global scale. Azure Cognitive Services workloads are extremely diverse: services require many different types of models in practice. This diversity has implications at all layers in the system stack. In addition, the computational requirements are also intense, leveraging both GPUs and CPUs for real-time inference. Addressing these and other emerging challenges continues to require diverse efforts that span algorithms, software, and hardware design. In this talk, Anand also walks through some of the challenges, including data privacy, deep customization, and bias correction, and discusses solutions they have built to tackle these challenges.

Speaker Anand Raman,Microsoft

09:20 AM - 09:35 AM

High Performance Observability Across the ML Lifecycle

WATCH NOW

The scale and breadth of ML applications have increased dramatically thanks to scalable model-training and serving technologies. Builders of enterprise ML systems often have to contend with both real-time inference and massive amounts of data, prompting increasing investment in tools for MLOps and ML Observability. Data logging is a critical component of a robust ML pipeline, as it provides essential insights into the system’s health and performance. However, performant logging and monitoring for ML systems has proven ineffective within existing DevOps and data sampling approaches. Alessya will discuss the WhyLabs solution to this problem: using statistical fingerprinting and data profiling to scale to TB-sized data with an open-source data logging library, whylogs. She will present the WhyLabs Observability platform that runs on top of whylogs, providing out-of-the-box monitoring and anomaly detection to proactively address data-related failures across the entire ML lifecycle.

Speaker Alessya Labzhinova,WhyLabs

09:35 AM - 09:50 AM

F3: Next-generation Feature Framework at Facebook

WATCH NOW

We will discuss the next generation feature framework in development at Facebook. This new framework enables efficient experimentation in building machine learning features to semantically model behaviors and intent of users, and leverages compiler technology to unify batch and streaming processing of these features in an expressive language. It also automatically optimizes underlying data pipelines and applies privacy enforcement at scale.

Speaker David Chung,Facebook

Speaker Qiao Yang,Facebook

09:50 AM - 10:40 AM

Live Panel / Audience Q&A Featuring Monday's Speakers

10:40 AM - 11:00 AM

Live Panel / Audience Q&A: Women in Engineering

Speaker Anubhuti Manohar,Microsoft

Speaker Fang Wang,Google

Speaker ,

09:00 AM - 09:20 AM

Netflix's Human-Centric Approach to ML Infrastructure

WATCH NOW

Netflix's unique culture affords it's data scientists extraordinary freedom of choice in ML tools and libraries. At the same time, they are responsible for building, deploying, and operating complex ML workflows autonomously without the need to be significantly experienced with systems or data engineering.
Metaflow, our ML framework (now open-source at metaflow.org), provides them with delightful abstractions to manage their project's lifecycle end-to-end, leveraging the strengths of the cloud: elastic compute and high-throughput storage.

In this talk, we present our human-centric design principles that enable the autonomy our users enjoy.

09:20 AM - 09:35 AM

Large Scale Machine Learning Using SQL in BigQuery

Google BigQuery is a petabyte-scale serverless cloud data warehouse that enables scalable machine learning using SQL. In this talk, we take a look at how enabling data analysts and other SQL users to perform machine learning tasks can accelerate business decision-making and intelligence. We also present challenges in democratizing ML in large scale data warehouses such as BigQuery. We describe how a combination of general purpose SQL query engine and dedicated machine learning infrastructure can create a robust infrastructure for performing machine learning tasks.

Speaker Amir Hormati,Google

09:35 AM - 09:50 AM

Mastercook: Large scale concurrent model development in ads ranking

WATCH NOW

We will discuss a novel model development process and tools we introduced to ads ranking machine learning teams, where a single model can be concurrently developed by dozens of engineers, whose changes to the model are centralized collected, combined, tested and launched.

Speaker Alexander Petrov,Facebook

Speaker Yifei Zhang,Facebook

09:35 AM - 09:50 AM

Large Scale Machine Learning Using SQL in BigQuery

WATCH NOW

Speaker Hossein Ahmadi,Google

09:50 AM - 10:05 AM

Flyte: Making MLOps and DataOps a reality

WATCH NOW

Flyte is the backbone for large-scale Machine Learning and Data Processing (ETL) pipelines at Lyft. It is used across business critical applications ranging from ETA, Pricing, Mapping, Autonomous etc. At its core it is a Kubernetes native workflow engine that executes 1M+ pipelines and 40M+ containers per month.

Flyte abstracts complex infrastructure management from its users and provides a declarative fabric to connect disparate compute technologies. This increases productivity and thus product velocity by enabling them to focus on business logic. Flyte has made it possible to build higher-level platforms at Lyft, further reducing the barriers to entry for non-infrastructure engineers.

The talk will focus on:

Motivation and tenets for building Flyte, and parts of the Data Stack tackled by it.
Architecture of Flyte and its specification language to orchestrate compute and manage data flow across disparate systems like Spark, Flink, Tensorflow, Hive etc.
Use-cases where Flyte can be leveraged
Extensibility of the Flyte and the burgeoning ecosystem.

Speaker Ketan Umare,Union.AI

10:05 AM - 04:32 PM

Live Panel / Audience Q&A Featuring Tuesday's Speakers

SPEAKERS AND MODERATORS

Anand Raman

Microsoft

Alessya Labzhinova

WhyLabs

David Chung

Facebook

Qiao Yang

Facebook

Anubhuti Manohar

Microsoft

Fang Wang

Google

Amir Hormati

Google

Alexander Petrov

Facebook

Yifei Zhang

Facebook

Hossein Ahmadi

Google

Ketan Umare is the TSC Chair for Flyte (incubating under LF AI & Data).... read more

Ketan Umare

Union.AI

UPCOMING EVENT November 20-21, 2024 | Video @Scale

Video @Scale 2024

Video @Scale 2024 is a technical conference designed for engineers that develop or manage large-scale video systems serving millions of people. The development of large-scale video systems includes complex, unprecedented engineering challenges. The @Scale community...

PAST EVENT March 20, 2024 @ 9am PT - 3pm PT | RTC @Scale

RTC @Scale 2024

RTC @Scale is for engineers who develop and manage large-scale real-time communication (RTC) systems serving millions of people. The operations of large-scale RTC systems have always involved complex engineering challenges which continue to attract attention...

Past EVENT May 22, 2024 | Data @Scale

Data @Scale 2024

Data @Scale is a technical conference for engineers who are interested in building, operating, and using data systems at scale. Companies across the industry use data and underlying infrastructure to build products with user empathy,...

Past EVENT June 12, 2024 | Systems @Scale

Systems @Scale 2024

Systems @Scale 2024 is a technical conference intended for engineers that build and manage large-scale distributed systems serving millions or billions of users. The development and operation of such systems often introduces complex, unprecedented engineering...

Past EVENT JULY 31, 2024 @ 2:30 PM PDT - 7:00 PM PDT - IN PERSON EVENT | AUGUST 7, 2024 @ 2:30 PM PDT - 5:30 PM PDT - VIRTUAL PROGRAM | AI Infra @Scale

AI Infra @Scale 2024

Meta’s Engineering and Infrastructure teams are excited to return for the second year in a row to host AI Infra @Scale on July 31. This year’s event is open to a limited number of in-person...

Past EVENT August 14, 2024 | Product @Scale

Product @Scale 2024

Product @Scale conferences are designed for technologists who work on solving complex product problems at scale. The @Scale community focuses on bringing forward people's experiences in creating innovative solutions to large-scale products serving millions or...

Past EVENT September 11, 2024 | Santa Clara Convention Center | Networking @Scale

Networking @Scale 2024

Meta’s Networking team invites you to Networking@scale on September 11th. This year’s event is an in-person event hosted at the Santa Clara Convention center and will also be live streamed for virtual attendees. Registration is...

Past EVENT October 9, 2024 | Reliability @Scale

Reliability @Scale 2024

In the digital age, where systems operate at unprecedented scales, the importance of robust configuration management cannot be overstated. This year’s Reliability @Scale will focus on a central theme of "Move Safely", emphasizing the critical...

Past EVENT October 23, 2024 | Mobile @Scale

Mobile @Scale 2024

Mobile @Scale is a technical conference designed for the engineers, product managers, and engineering leaders building mobile experiences at significant scale (millions to billions of daily users). Mobile @Scale provides a rare opportunity to gather...

FIND @SCALE TOPICS

Data, Systems and Networking Dev Tools and Ops, Privacy, Sustainability and Performance Fighting Abuse and Security Mobile, Video and Web

AI @Scale 2020

ABOUT EVENT

EVENT AGENDA

November 23

November 24

SPEAKERS AND MODERATORS

Anand Raman

Alessya Labzhinova

David Chung

Qiao Yang

Anubhuti Manohar

Fang Wang

Amir Hormati

Alexander Petrov

Yifei Zhang

Hossein Ahmadi

Ketan Umare

Video @Scale 2024

RTC @Scale 2024

Data @Scale 2024

Systems @Scale 2024

AI Infra @Scale 2024

Product @Scale 2024

Networking @Scale 2024

Reliability @Scale 2024

Mobile @Scale 2024

FIND @SCALE TOPICS

EXPLORE OTHER SERIES

MACHINE LEARNING @scale