Video @Scale 2024

November 20-21, 2024

Video @Scale 2024 is a technical conference designed for engineers that develop or manage large-scale video systems serving millions of people. The development of large-scale video systems includes complex, unprecedented engineering challenges. The @Scale community focuses on bringing forward people's experiences in the creation of innovative solutions in the video engineering domain.

Video @Scale 2024 will be hosted virtually on November 20 & 21. Joining us are speakers from AWS, Boston University, Captions, Meta, Momento, and Netflix.

RSVPS CLOSED

AGENDA SPEAKERS

EVENT AGENDA

Event times below are displayed in PT.

Day 1 - Nov 20

Day 2 - Nov 21

09:00 AM - 09:05 AM

Opening Remarks

Speaker Diana Chen,Meta

09:05 AM - 09:35 AM

Fireside Chat

WATCH NOW

Revolutionizing Video Creation and Editing: Insights from Engineering Leaders on the Present and Future of Generative Video Models

Hear video gen model experts together in the same panel for the first time. Hear them talk about the future of Gen AI Video Models

Moderator ABHINAV KAPOOR,Meta

Speaker Peter Vajda,Meta

Speaker Amit Jain,Luma AI

09:35 AM - 09:55 AM

Movie Gen Video: State-of-the-art Video Generation

WATCH NOW

Image, video, and audio generation are a fundamental building block for Generative AI research and applications in the real world. In this talk, I'll present Movie Gen, a set of foundational models for video generation, editing, personalization and audio generation. Movie Gen models are one the world's most advanced media generation models, with state-of-the-art results compared to industry solutions. I'll focus my talk on text-to-video generation and share key insights that enabled this step change in quality. Movie Gen produces HD quality videos of up to 16 seconds in length and has been used by movie producers in Hollywood.

Speaker Ishan Misra,Meta

Featured Article

Movie Gen: A Cast of Media-Generation Foundation Models read more

09:55 AM - 10:15 AM

How Captions Enables Anyone To Tell Their Story With Video

WATCH NOW

At Captions, we believe that anyone can become a video creator, regardless of their experience. This mission presents both product and technical challenges as we bridge the gap between cutting-edge video generation capabilities and a user-friendly experience. In this presentation, we'll deep dive on Captions and our underlying technical systems. Lastly, we'll demonstrate how we're evolving these systems to empower users to create videos that resonate globally.

Speaker Corbyn Salisbury,Captions

10:15 AM - 10:20 AM

Break

10:20 AM - 10:40 AM

Meta's AI Translations Inference

WATCH NOW

In this talk we will show how we implemented a media processing pipeline to perform (autodub / lipsync) media inference at Meta scale.

We will focus on the challenges we faced from a media processing / scaling point of view, such as: inference latency and scheduling, voice isolation, media timing/alignment, alternate tracks delivery, instrumentation, model evaluation, etc.

Speaker Sravan Rekula,Meta

Speaker Jordi Cenzano,Meta

Speaker Amisha Jaiswal,Meta

10:40 AM - 11:00 AM

Building Netflix's New Video Pipeline

WATCH NOW

An efficient and flexible video processing pipeline is critical for enabling innovation and supporting both our streaming service and studio partners, which is essential for Netflix's continued success. Over the past few years, we have been rebuilding this pipeline on our next-generation microservice-based platform. In this talk, we will share our journey and learnings with the community

Speaker Liwei Guo,Netflix

11:00 AM - 11:20 AM

On-device Video Playback Upsampling

WATCH NOW

Various video upsampling technologies are adaptively applied to video playback on mobile clients. The video playback quality are constraint when streamed to the end users mostly due to the device network bandwidth constraints or the low quality in the original content. With these different advanced upsampling technologies on device playback, the streamed video quality are improved and at the meantime, it helped the user to save their cell/wifi data by playing a video with lower bitrate.

Speaker Wen Li,Meta

Featured Article

On-Device Video Playback Upsampling read more

11:20 AM - 11:45 AM

Live Q&A Session

WATCH NOW

Moderator Diana Chen,Meta

Speaker Ishan Misra,Meta

Speaker Corbyn Salisbury,Captions

Speaker Sravan Rekula,Meta

Speaker Jordi Cenzano,Meta

Speaker Amisha Jaiswal,Meta

Speaker Wen Li,Meta

Speaker Liwei Guo,Netflix

09:00 AM - 09:05 AM

Opening Remarks

Speaker ABHINAV KAPOOR,Meta

09:05 AM - 09:25 AM

SAM 2: Segment Anything in Images & Videos

WATCH NOW

We present Segment Anything Model 2 (SAM 2), a foundation model towards solving promptable visual segmentation in images and videos. We build a data engine, which improves model and data via user interaction, to collect the largest video segmentation dataset to date. Our model is a simple transformer architecture with streaming memory for real-time video processing. SAM 2 trained on our data provides strong performance across a wide range of tasks. In video segmentation, we observe better accuracy, using 3x fewer interactions than prior approaches. In image segmentation, our model is more accurate and 6x faster than the Segment Anything Model (SAM). We believe that our data, model, and insights will serve as a significant milestone for video segmentation and related perception tasks.

Speaker Chay Ryali,Meta

09:25 AM - 09:45 AM

Challenges in generated video evaluation

WATCH NOW

In the past couple of years, the landscape of image and video generation has transformed dramatically. Despite such phenomenal progress, rigorous and holistic evaluation of the generative models continues to suffer. This is primarily due to the multi-faceted and highly subjective nature of the task: the generated image / video should be evaluated not just on overall visual quality and aesthetics, but also on its alignment to the input prompt , originality, lack of propagating stereotypical biases, and several more factors.

In this talk, I’ll give an overview of current metrics, their shortcomings, and the rapid progress in the research community to improve the rigor in evaluation.

Speaker Deepti Ghadiyaram,Boston University

09:45 AM - 10:05 AM

Efficient Segment Anything

WATCH NOW

EfficientSAM: small but mighty.

Speaker Bilge Soran,Meta

Speaker Yunyang Xiong,Meta

Featured Article

Pioneering AI Efficiency: Efficient Segmentation Models read more

10:05 AM - 10:10 AM

Break

10:10 AM - 10:30 AM

Metrics Driven Obsession with Viewer Experience

WATCH NOW

Viewer experience is a complex outcome of many competing dimensions, from encoding quality to network performance of the video pipeline. Metrics, like Zero Buffer Rates (ZBR), illuminate the impact of various components in the end-to-end pipeline on the viewer experience. . This talk will explore the essential elements of operational excellence in video infrastructure, working backwards from viewer’s perspective and into the video pipeline.

Speaker Khawaja Shams,Momento

10:30 AM - 10:50 AM

Using Performance signals to improve Video Recommendations

WATCH NOW

Presentation information coming soon!

Speaker Zhi Long Tan,Meta

Speaker Wen Zhang,Meta

10:50 AM - 11:10 AM

Rethinking the Video Transcoding Pipeline: How Pre-Filtering Holds the Key to Efficiency and Quality

WATCH NOW

While video input pre-filtering is not a new concept, it has evolved significantly in recent years. Originally designed to reduce noise and artifacts in low-quality video, pre-filtering techniques have now been adapted for use even on pristine video content. This talk will explore the importance of video input pre-filtering and the specific advantages it can offer in modern video production and delivery workflows.

Key benefits of advanced input pre-filtering include reduced file sizes without sacrificing perceived video quality, as well as mitigation of common video quality issues like softness and deblocking artifacts. However, implementing a production-ready, broadcast-grade pre-filtering solution requires careful consideration of several critical factors.

This talk will dive into the core pillars of the newly released video input pre-filter in AWS Elemental's media processing solutions. It will explain how this advanced pre-filtering technology can help video providers deliver high-quality, efficiently encoded video for a wide range of applications, from over-the-top streaming to broadcast television. Attendees will come away with a deep understanding of the evolving role of pre-filtering in the video technology landscape and practical insights they can apply to their own workflows.

Speaker Ramzi Khsib,AWS

11:10 AM - 11:35 AM

Live Q&A Session

WATCH NOW

Moderator ABHINAV KAPOOR,Meta

Speaker Chay Ryali,Meta

Speaker Deepti Ghadiyaram,Boston University

Speaker Bilge Soran,Meta

Speaker Yunyang Xiong,Meta

Speaker Khawaja Shams,Momento

Speaker Zhi Long Tan,Meta

Speaker Wen Zhang,Meta

Speaker Ramzi Khsib,AWS

SPEAKERS AND MODERATORS

Engineering Manager at Meta, supporting teams in Video Infrastructure that provides client-side media editing,... read more

Diana Chen

Meta

Abhinav is part of the Video Infra leadership team at Meta, focusing on scaling... read more

ABHINAV KAPOOR

Meta

Peter Vajda joined Meta in 2014 as a Research Scientist. He currently directs the... read more

Peter Vajda

Meta

Amit Jain is the CEO and Founder of Luma AI, which he founded in... read more

Amit Jain

Luma AI

Ishan Misra is a Research Scientist in the GenAI group at Meta where he... read more

Ishan Misra

Meta

As the first Backend Engineer Manager at Captions, I’m proud to have led the... read more

Corbyn Salisbury

Captions

Sravan is Engineering Manager in video infra and supports large scale video ingestion and... read more

Sravan Rekula

Meta

Jordi Cenzano is an engineer specializing in broadcast and online media. He is currently... read more

Jordi Cenzano

Meta

Amisha is a Software Engineer at Meta, where she currently contributes to the video... read more

Amisha Jaiswal

Meta

Liwei Guo is a Staff Software Engineer in the Encoding Technologies team at Netflix.... read more

Liwei Guo

Netflix

Wen Li is a software engineer in the iOS Video Playback team at Meta.... read more

Wen Li

Meta

Chay Ryali is a Research Engineer at AI@Meta (FAIR), developing multimodal foundation models with... read more

Chay Ryali

Meta

Deepti is an Assistant Professor in the Dept. of Computer Science in Boston University.... read more

Deepti Ghadiyaram

Boston University

Bilge Soran holds a PhD from the University of Washington, where she focused on... read more

Bilge Soran

Meta

I am a research scientist at Meta. I have been working on foundation model... read more

Yunyang Xiong

Meta

Khawaja is the CEO @ Momento, won the NASA Early Career Medal for his... read more

Khawaja Shams

Momento

Zhi is a Software Engineering Manager at Meta. read more

Zhi Long Tan

Meta

Wen began his career as a data scientist in the finance domain, spending 3... read more

Wen Zhang

Meta

Ramzi Khsib is a Principal Software Development Engineer with AWS Elemental's Research & Development... read more

Ramzi Khsib

AWS

LATEST NOTES

Video @Scale

11/20/2024

Movie Gen: A Cast of Media-Generation Foundation Models

Humans communicate using a rich variety of digital media—text, images, videos, audio. Movie Gen is a cast of media-generation foundation...

Video @Scale

11/20/2024

On-Device Video Playback Upsampling

In today’s digital age, videos are a major source of entertainment and information. Like many of us, I enjoy swiping...

Video @Scale

11/21/2024

Pioneering AI Efficiency: Efficient Segmentation Models

In the rapidly evolving landscape of artificial intelligence, the quest for models that are both powerful and efficient is paramount....

past EVENT November 20-21, 2024 | Mobile, Video and Web

Video @Scale 2024

PAST EVENT March 20, 2024 @ 9am PT - 3pm PT | Mobile, Video and Web

RTC @Scale 2024

RTC @Scale is for engineers who develop and manage large-scale real-time communication (RTC) systems serving millions of people. The operations of large-scale RTC systems have always involved complex engineering challenges which continue to attract attention...

Past EVENT May 22, 2024 | Data, Machine Learning and AI

Data @Scale 2024

Data @Scale is a technical conference for engineers who are interested in building, operating, and using data systems at scale. Companies across the industry use data and underlying infrastructure to build products with user empathy,...

Past EVENT June 12, 2024 | Systems and Networking

Systems @Scale 2024

Systems @Scale 2024 is a technical conference intended for engineers that build and manage large-scale distributed systems serving millions or billions of users. The development and operation of such systems often introduces complex, unprecedented engineering...

Past EVENT JULY 31, 2024 @ 2:30 PM PDT - 7:00 PM PDT - IN PERSON EVENT | AUGUST 7, 2024 @ 2:30 PM PDT - 5:30 PM PDT - VIRTUAL PROGRAM | Data, Machine Learning and AI

AI Infra @Scale 2024

Meta’s Engineering and Infrastructure teams are excited to return for the second year in a row to host AI Infra @Scale on July 31. This year’s event is open to a limited number of in-person...

Past EVENT August 14, 2024 | Mobile, Video and Web

Product @Scale 2024

Product @Scale conferences are designed for technologists who work on solving complex product problems at scale. The @Scale community focuses on bringing forward people's experiences in creating innovative solutions to large-scale products serving millions or...

Past EVENT September 11, 2024 | Santa Clara Convention Center | Systems and Networking

Networking @Scale 2024

Meta’s Networking team invites you to Networking@scale on September 11th. This year’s event is an in-person event hosted at the Santa Clara Convention center and will also be live streamed for virtual attendees. Registration is...

Past EVENT October 9, 2024 | Systems and Networking

Reliability @Scale 2024

In the digital age, where systems operate at unprecedented scales, the importance of robust configuration management cannot be overstated. This year’s Reliability @Scale will focus on a central theme of "Move Safely", emphasizing the critical...

Past EVENT October 23, 2024 | Mobile, Video and Web

Mobile @Scale 2024

Mobile @Scale is a technical conference designed for the engineers, product managers, and engineering leaders building mobile experiences at significant scale (millions to billions of daily users). Mobile @Scale provides a rare opportunity to gather...