EVENT AGENDA
Event times below are displayed in PT.
Video @Scale is an invitation-only technical conference for engineers that develop or manage large-scale video systems serving millions of people. The development of large-scale video systems includes complex, unprecedented engineering challenges. The @Scale community focuses on bringing people together to discuss these challenges and collaborate on the development of new solutions.
Event times below are displayed in PT.
Hear from Paresh Rajwat, the Vice President and Head of Product for Facebook Audio, Video, and Music.
Serving Live Videos with high reliability is challenging, not only from the perspective of deploying improvements on top of a distributed system but also from the perspective of defining correct measurements to capture reliability gaps that matter to users. Facebook’s Live platform spans from the ingest endpoints where creators upload their streams to services that are in charge of transcoding and generating multiple video renditions, as well as services executing the delivery on egress stack and CDN endpoints that serve broadcasts to the millions of viewers. All of these pipelines have to do the work in orchestration with realtime guarantees and even a single failure can be severe for our users. In this talk, Petar will show how we evolved our thinking about key reliability metrics over the time, and how we derive actionable insights to make Facebook Live as reliable as possible from the user’s perspective.
Reducing end-to-end streaming latency is critical for HTTP-based live video streaming. There are currently two new technologies in this domain: Low-Latency HTTP Live Streaming (LL-HLS) and Low-Latency Dynamic Adaptive Streaming over HTTP (LL-DASH). Several streaming players (Apple's AVPlayer, Google's Shaka Player, HLS.js, DASH.js, etc.), streaming encoding and packaging tools (Apple's HTTP Live Streaming Tools, FFmpeg, GPAC, etc.), have recently added support for these formats. There are also several live LL-HLS and LL-DASH reference/demo streams available, showcasing the capabilities of these technologies and helping the implementation community to adopt them. However, how well such systems perform in real-world deployment scenarios is not well known. Specifically, not much is known about the performance of such low-latency systems in delivery to mobile devices, connected over rapidly changing and often highly loaded wireless cellular networks. In this talk, Yuriy will review some results obtained in this direction by the Brightcove Research team. Our study consists of a series of live LL-HLS and LL-DASH streaming experiments. For each experiment, we use the same video content, encoder, encoding profiles, and identical network conditions emulated by running traces of the existing cellular networks (Verizon 4G LTE, T-Mobile 4G LTE, etc.). Several key performance metrics are captured and reported in each experiment. We measure the average streaming bitrate, the overall amounts of media data downloaded, the intensity of data requests, streaming latency, playback speed variations, buffering, stream switching statistics, etc. These results are subsequently analyzed and used to characterize some typical limits of LL-HLS and LL-DASH-based systems and differences between them.
Two years ago, iStreamPlanet set out to build a cloud-native software transcoder with the reliability and feature set to support some of the highest profile live channels and events in the world. Some of our goals included: 4+ 9’s of uptime, as little C/C++ code as possible; the ability to run 1000’s of live channels without human oversight; support for advanced features like SCTE-35 Signaling & Segmentation, Hitless merging of redundant video sources, and Dolby Digital Plus Surround Sound encoding; and the ability to update the software on running channels without customer impact. We’ll explain how we leveraged the power of Go, Kubernetes, and a mix of commercial and OSS components to make that vision a reality. The result powers the OTT distribution over 1000 live linear TV channels running 24/7, and has been used to deliver the streams for March Madness 2021, the Tokyo Olympics games, the UEFA Champions League, and more.
Understanding video content has been a focus for video-sharing platforms. It is one of the most important driving forces for the growth in distribution, discovery, user experience and monetization. Instream video understanding is the technology area where we analyze and utilize finer granularity video signals in the spatial and the temporal domains. The fine-grained spatial and temporal signals can be used for consumer facing products or used as signals for downstream models and pipelines. For example, in the spatial domain, we identify the salient regions inside each frame, which enables a system to automatically reframe a horizontal (landscape) video into a vertical (portrait) one. In the temporal domain, we identify the highlight score of each frame, which enables us to identify the highlight moments inside a video and create a video trailer.
Malicious synthetic media – both deepfakes and cheapfakes – are rising in prevalence and importance. End users are rapidly losing trust in media, and their ability to tell authentic media from inauthentic has greatly diminished. This talk will cover the dangers of synthetic media in the video ecosystem, approaches to combating malicious synthetic media, and an overview of how Media Provenance techniques, as developed by the Coalition for Content Authenticity and provenance, can be utilized to provide media trust signals to end users.
A/B testing on video isn’t just about tweaking recommendations or picking the perfect thumbnail. Every aspect of video benefits from rapid experimentation including the infrastructure — streaming algorithms, codecs, bitrates, caching strategies, network congestion control algorithms. Join us for a behind the scenes tour from practitioners on how experimentation helps learn what users want. We’ll share experiment results that confounded us, share insight about infrastructure to enable experimentation for everything and bust common myths that cause people to think experimentation isn’t for them. If you care about making video better, there’ll be something for you!
Facebook and user-generated content (UGC) platforms encode videos at “billion-scale” and deliver them worldwide to a variety of devices (Mobile/Laptop/TV) across different networks. The popularity of UGC videos can vary widely ranging from millions of views for a very-popular viral video to relatively few views for a privately shared video. Improving the compression of video encoding is critical to both reducing the cost of data-usage to both end-users and infrastructure providers, as well as providing good Quality-Of-Experience in poor bandwidth networks. At the same time, due to “billion-scale” we need to take into account the computational complexity of the encoder as well as device support for the codec. In this presentation, we show how ASIC-leveraged hybrid RDX plays a central role maximizing the compression efficiency for video delivery while meeting the constraints for codec-support and available compute. We also present an optimization framework that allows us quantify the compute impact due to changes in device support for advanced codecs and video watch-time distribution, as well as point out ASIC improvements that can further overall compute efficiency.
Like the rest of the video world, Facebook video has significantly grown year to year. While we celebrate the growth rate, we are also concerned about the resource consumption to support the growth, which became worse during COVID. Seeing the gap between increased storage demand and supply, video infra has worked with Facebook's capacity team to invent new methods to bend the curve. We establish source + MVE (minimum variable encoding) storage policy for every FB video. Then through a video lifecycle manage system, we ensure the storage consumption of a video is proportional to its popularity. For example, purge unused encodings, and reduce source from two permanent copies to one copy if the video becomes “cold”. With such technology, 70 percent of Facebook video's projected storage growth for 2021 has been successfully suppressed. We are exploring ways to further improve the performance and accuracy of the storage management system with a cost-benefit model.
AVQT, short for Advanced Video Quality Tool, is a macOS based command line tool which estimates perceptual video quality of compressed videos that might contain video coding and scaling artifacts. Utilizing the AVFoundation framework, AVQT supports a wide range of video formats, codecs, resolutions and frame-rates in both the SDR and HDR domains, which results in easy and efficient workflows — for example, no requirement to decode to a raw pixel format. AVQT uses Metal to achieve high processing speeds by offloading heavy pixel-level computation to the GPU, typically analyzing videos in excess of real-time video frame rates. In this talk, we'll cover key attributes of AVQT which make it useful across applications. We'll also demo how to use the tool and interpret the obtained scores correctly.
Video quality of User Generated Content (UGC) is extremely difficult to wrangle with due to their high diversity of contents and quality. They bring new challenges to how we traditionally measured and assessed video quality. Most videos uploaded to YouTube, and other video sharing platforms, are UGC. To facilitate and encourage research in UGC compression and quality assessment, in 2019 we released a large scale UGC dataset (YT-UGC) that contained representative UGC raw videos along with their ground truth Mean Opinion Score (MOS), Differential MOS (DMOS), and content labels. Parallely, we have been investigating a number of efforts analyzing and optimizing UGC video quality. Recently, we built a novel deep learning based framework to understand the importance of content, technical quality, and compression level on perceptual quality. In this talk, we will walk through our video analysis framework, our latest DNN-based video quality metric called YouVQ, and present some results.
In this talk, we will discuss the current state in terms of bitrate/quality and complexity of Two Orioles’ Eve video encoder for the VP9 & AV1 video codecs. VP9 provides meaningful quality improvements over H.264 with a mature ecosystem and good target platform support. AV1, on the other hand, provides far superior quality, but with an ecosystem that is still being actively constructed. For both codecs, Eve provides a wide range of speed presets to choose from in terms of bitrate/quality vs. computational complexity, the fastest of which at this moment approach x264 preset=medium / x265 preset=superfast in terms of complexity, but with a better bitrate vs. quality trade-off. Such speed presets allow encoding of high-volume UGC without compression being a cost barrier, thus enabling broad deployment of modern video codecs at scale.
This presentation will highlight the latest improvements of the VOD-targeted high-latency Constant Rate Factor (CRF) and Variable Bit Rate (VBR) modes of the SVT-AV1 encoder. It will first present the latest SVT-AV1 cycles-quality tradeoffs when applied in CRF mode to shot-based convex-hull-optimized encoding for a wide range of VOD applications. It will then demonstrate the performance of VBR encoding of a bitrate-resolution ladder that had been developed as a result of an offline statistical analysis of the parameter values generated via convex-hull CRF-mode encoding of a large dataset. The presentation concludes by presenting a highly efficient solution for reducing the cycles-cost of the Dynamic Optimizer (DO) encoding framework by using the fastest-preset SVT-AV1 encoder to generate the convex-hull encoding parameter values, which are then used to produce the final bitstreams using the desired encoder.