EVENT AGENDA
Event times below are displayed in PT.
Video @Scale 2024 is a technical conference designed for engineers that develop or manage large-scale video systems serving millions of people. The development of large-scale video systems includes complex, unprecedented engineering challenges. The @Scale community focuses on bringing forward people's experiences in the creation of innovative solutions in the video engineering domain.
Video @Scale 2024 will be hosted virtually on November 20 & 21. Joining us are speakers from AWS, Boston University, Captions, Meta, Momento, and Netflix.
Register today and check back for upcoming speaker and agenda announcements!
Event times below are displayed in PT.
Revolutionizing Video Creation and Editing: Insights from Engineering Leaders on the Present and Future of Generative Video Models
Hear video gen model experts together in the same panel for the first time. Hear them talk about the future of Gen AI Video Models
Image, video, and audio generation are a fundamental building block for Generative AI research and applications in the real world. In this talk, I'll present Movie Gen, a set of foundational models for video generation, editing, personalization and audio generation. Movie Gen models are one the world's most advanced media generation models, with state-of-the-art results compared to industry solutions. I'll focus my talk on text-to-video generation and share key insights that enabled this step change in quality. Movie Gen produces HD quality videos of up to 16 seconds in length and has been used by movie producers in Hollywood.
At Captions, we believe that anyone can become a video creator, regardless of their experience. This mission presents both product and technical challenges as we bridge the gap between cutting-edge video generation capabilities and a user-friendly experience. In this presentation, we'll deep dive on Captions and our underlying technical systems. Lastly, we'll demonstrate how we're evolving these systems to empower users to create videos that resonate globally.
In this talk we will show how we implemented a media processing pipeline to perform (autodub / lipsync) media inference at Meta scale.
We will focus on the challenges we faced from a media processing / scaling point of view, such as: inference latency and scheduling, voice isolation, media timing/alignment, alternate tracks delivery, instrumentation, model evaluation, etc.
An efficient and flexible video processing pipeline is critical for enabling innovation and supporting both our streaming service and studio partners, which is essential for Netflix's continued success. Over the past few years, we have been rebuilding this pipeline on our next-generation microservice-based platform. In this talk, we will share our journey and learnings with the community
Various video upsampling technologies are adaptively applied to video playback on mobile clients. The video playback quality are constraint when streamed to the end users mostly due to the device network bandwidth constraints or the low quality in the original content. With these different advanced upsampling technologies on device playback, the streamed video quality are improved and at the meantime, it helped the user to save their cell/wifi data by playing a video with lower bitrate.
We present Segment Anything Model 2 (SAM 2), a foundation model towards solving promptable visual segmentation in images and videos. We build a data engine, which improves model and data via user interaction, to collect the largest video segmentation dataset to date. Our model is a simple transformer architecture with streaming memory for real-time video processing. SAM 2 trained on our data provides strong performance across a wide range of tasks. In video segmentation, we observe better accuracy, using 3x fewer interactions than prior approaches. In image segmentation, our model is more accurate and 6x faster than the Segment Anything Model (SAM). We believe that our data, model, and insights will serve as a significant milestone for video segmentation and related perception tasks.
In the past couple of years, the landscape of image and video generation has transformed dramatically. Despite such phenomenal progress, rigorous and holistic evaluation of the generative models continues to suffer. This is primarily due to the multi-faceted and highly subjective nature of the task: the generated image / video should be evaluated not just on overall visual quality and aesthetics, but also on its alignment to the input prompt , originality, lack of propagating stereotypical biases, and several more factors.
In this talk, I’ll give an overview of current metrics, their shortcomings, and the rapid progress in the research community to improve the rigor in evaluation.
Viewer experience is a complex outcome of many competing dimensions, from encoding quality to network performance of the video pipeline. Metrics, like Zero Buffer Rates (ZBR), illuminate the impact of various components in the end-to-end pipeline on the viewer experience. . This talk will explore the essential elements of operational excellence in video infrastructure, working backwards from viewer’s perspective and into the video pipeline.
While video input pre-filtering is not a new concept, it has evolved significantly in recent years. Originally designed to reduce noise and artifacts in low-quality video, pre-filtering techniques have now been adapted for use even on pristine video content. This talk will explore the importance of video input pre-filtering and the specific advantages it can offer in modern video production and delivery workflows.
Key benefits of advanced input pre-filtering include reduced file sizes without sacrificing perceived video quality, as well as mitigation of common video quality issues like softness and deblocking artifacts. However, implementing a production-ready, broadcast-grade pre-filtering solution requires careful consideration of several critical factors.
This talk will dive into the core pillars of the newly released video input pre-filter in AWS Elemental's media processing solutions. It will explain how this advanced pre-filtering technology can help video providers deliver high-quality, efficiently encoded video for a wide range of applications, from over-the-top streaming to broadcast television. Attendees will come away with a deep understanding of the evolving role of pre-filtering in the video technology landscape and practical insights they can apply to their own workflows.
Engineering Manager at Meta, supporting teams in Video Infrastructure that provides client-side media editing,... read more
Abhinav is part of the Video Infra leadership team at Meta, focusing on scaling... read more
Peter Vajda joined Meta in 2014 as a Research Scientist. He currently directs the... read more
Amit Jain is the CEO and Founder of Luma AI, which he founded in... read more
Ishan Misra is a Research Scientist in the GenAI group at Meta where he... read more
As the first Backend Engineer Manager at Captions, I’m proud to have led the... read more
Sravan is Engineering Manager in video infra and supports large scale video ingestion and... read more
Jordi Cenzano is an engineer specializing in broadcast and online media. He is currently... read more
Amisha is a Software Engineer at Meta, where she currently contributes to the video... read more
Liwei Guo is a Staff Software Engineer in the Encoding Technologies team at Netflix.... read more
Wen Li is a software engineer in the iOS Video Playback team at Meta.... read more
Chay Ryali is a Research Engineer at AI@Meta (FAIR), developing multimodal foundation models with... read more
Deepti is an Assistant Professor in the Dept. of Computer Science in Boston University.... read more
Bilge Soran holds a PhD from the University of Washington, where she focused on... read more
I am a research scientist at Meta. I have been working on foundation model... read more
Khawaja is the CEO @ Momento, won the NASA Early Career Medal for his... read more
Zhi is a Software Engineering Manager at Meta. read more
Wen began his career as a data scientist in the finance domain, spending 3... read more
Ramzi Khsib is a Principal Software Development Engineer with AWS Elemental's Research & Development... read more