Video @Scale 2022

Virtual 9:00am - 3:20pm


Video @Scale is a technical conference for engineers that build large scale video systems, where engineers come together to discuss unprecedented engineering challenges, learn about new technology, and collaborate on the development of new solutions.

This year’s Video @Scale will be hosted virtually. Joining us are speakers from Akamai, Arti, BrightCove, Caffeine, Meta, Runway, and Twitch. The event will take place on November 3rd, 2022, with talks themed around Interactive, Immersive, and Intelligent video at scale.

Read More Read Less
9:00am - 9:05am

Welcome Remarks

9:05am - 9:25am

Building Real Time AR Experiences at Scale on Constrained Devices

Augmented Reality is going to change the way humans interact. Various companies have started to build the foundational infrastructure and tools to create the AR ecosystem and AR experiences on mobile devices. But to provide similar or even more computationally intensive immersive AR experiences for thin clients like AR glasses, one needs to take a step back and understand the strict power and thermal limitations and think of an architecture which allows you to offload compute to a beefier server in a privacy/context aware, latency sensitive and scalable fashion. There are lot of challenging areas when it comes to shipping camera frames to server for computation ranging from operating real time transport at scale, leveraging GPU's at scale for ML and render operations for both calling (like Augmented Calling) and non-calling scenarios. Camera frames (RGB and possibly depth) happen to be the prime driving force for Augmented Reality and to be able to process this video data at scale is a necessity for scaling AR experiences in the future. This talk will focus on some of the work which Meta has done in this domain and how industry as a whole needs to come together to solve some of these challenges in order to build the future of high fidelity, low latency immersive AR experiences.
9:25am - 9:45am

Bringing Interactivity to Videos

Traditionally, viewers consumed video in a passive, ‘lean back’ environment. Video consumption on social media, however, is an interactive, ‘lean forward’ experience with rich engagement between creators and their audience. Creators are looking for more ways to connect, engage and interact with their audience through new video experiences. Unfortunately, existing video specifications don’t provide a standardized mechanism to support a diversity of interactive experiences. In this talk, we’ll present a generic end to end framework for interactive video experiences. Our solution enables creators and broadcasters to simply add interactive components (e.g. Ads, sticker, poll, image, chapter marker, etc.) into the video timeline and define how the audience can interact with the components. During playback, viewers can interact with the video at the predefined timeline. We will also cover how AR and AI technologies can be applied towards interactive components, and discuss the different use cases the framework could power up.
9:45am - 10:05am

Democratizing AR: Building an AR platform for everyone

Arti.AR is building a cloud-based AR platform for live video. This talk outlines the benefits of adopting AR at scale and the challenges we faced during the platform development.
10:05am - 10:25am


Featuring: Pranav Saxena, Yurong Jiang, & Ben Hazan
10:25am - 10:45am

Live Media Over QUIC

Twitch has been working on Warp, a new live streaming protocol utilizing QUIC. This talk outlines the benefits of QUIC and why it will replace TCP. I'll cover some of the emerging approaches for transferring media over QUIC such as Warp, Meta's RUSH, and RTP over QUIC.
10:45am - 11:05am

Lessons Learned: Low Latency Ingest

Over the past six months, Caffeine has reimplemented its ingest gateway, both to address long-standing historical behaviors, and to provide a platform for future service enhancements. This presentation touches on a number of high-level challenges encountered during this development, and dives deep on one of the more baffling roadblocks we uncovered.
11:05am - 11:25am

Delivering Reliable Live Streaming Over Unreliable Backbone Networks

Sometimes we need to deliver a highly reliable live streaming experience over a network that is not designed for that. In our case we implemented a flexible multi-path strategy that allowed us to fix (almost) all problems (buffering and disconnections) caused by unavoidable network events.
11:25am - 11:45am


Featuring: Luke Curley, Adam Roach, & Jordi Cenzano
11:45am - 12:35pm


12:35pm - 12:55pm

Gaze-Driven Video Delivery: Science Fiction or Viable Link to the Metaverse?

It is well known that due to the uneven distribution of cones in the human retina, we have sharp vision only in the central (fovea) region. The angular span of this region is tiny, just about 1 degree^2. In comparison, the angular span of a TV set watched from 4x screen heights is over 250 degrees^2. This observation implies that using eye-tracking for video compression offers enormous potential. If the encoder can instantaneously know which spot (1degree^2 patch) is visible, only information in that spot will need to be encoded and transmitted. Up to 2 orders of magnitude savings in bandwidth may be attainable! This idea has been known, at least, since Bernd Girod's paper "Eye movements and coding of video sequences," published in 1988. Many additional works have followed, proposing various variants of implementations of gaze-based video coding systems. Even special classes of compression techniques called foveated video coding or region-of-interest (ROI)-based video coding have appeared, motivated by this application. However, most early attempts to build complete systems based on this idea were unsuccessful. The key reasons were the long network delays observed in the 1990s and 2000s – years when this idea was studied most extensively. But things have changed since. In this talk, I first briefly survey some basic principles (retinal eccentricity, eye movement types, and related statistics) and some key previous studies/results. I will then derive an equation explaining the relationship between network delay and bandwidth savings that may be achievable by gaze tracking. Then, I will switch the attention to modern-era mobile wireless networks – 5G and Wi-Fi 6 / 802.11ax - and discuss delays currently achievable in direct links to user devices and in cases of device-to-device communication in the same cell (or over the same WiFI access network), as well as in cases of data transmissions involving 5G core networks.
12:55pm - 1:15pm

Cloud Streaming in Metaverse

Cloud streaming is an important tool to make Metaverse better. Using cloud streaming, we can increase the reach of Metaverse to 2D surfaces quickly and to a variety of devices. Cloud streaming can also help 3D environments by enabling massively social, immersive, and rich experiences on lightweight devices that’s limited on compute, thermal, and power.
1:15pm - 1:35pm

Building a Professional Video Editor on the Cloud, Powered by Machine Learning

In this talk, I'll discuss some of the challenges we faced building Runway, a professional video editor on the browser, focusing on Green Screen, our interactive video segmentation tool, and the general server-side architecture we've developed for low-latency ML inference on video with computer vision models
1:35pm - 1:55pm


Featuring: Yuriy Reznik, Naizhi Li, & Anastasis Germanidis
1:55pm - 2:15pm

Approach to HDR and Tonemap on Android

At Meta, we deeply invest to ingest and playback with the best media quality for our users. This becomes especially challenging with the advancement in camera capture capabilities of new devices and products such as Reels that allow users to add special effects on top of these videos. As the HDR color space evolves on Android with different OEMs supporting different HDR formats, at Meta we need to correctly read these formats and apply the appropriate tonemap(conversion to SDR) so that such videos are not busted on upload and playback. Video Client Infra has solved the challenging problem to correctly tonemap different format HDR videos on Android devices, at a frame level. This helps to preserve the media quality, minimum latency impact and keeps these videos still compatible with all the awesome effects loved by our creators. We also plan for HDR transcode and ingestion, as the HDR format is standardized for all OEMs.
2:15pm - 2:35pm

HDR at Instagram: The iOS Story

We have been working on gracefully supporting HDR within the Instagram iOS app since it was made popular by Apple in October 2020. Follow our journey and the challenges we faced from ingestion through playback as we adopt this format within our non-traditional media stack.
2:35pm - 2:55pm

Content Steering with MPEG DASH

Content distributors routinely use multiple concurrent CDNs to distribute their live and VOD content. For performance, contractual and failover reasons, there are requirements to switch dynamically between these distribution channels at run time. A new specification being developed by the DASH Industry Forum provides a standardized means for a third-party steering service to switch a player between alternate content sources, both at start-up and dynamically while the stream is underway. This talk investigates the mechanics of this steering workflow, including manifest enhancements, player behavior, local steering for 3GPP EMBMS compatibility, steering the manifest itself, and steering ads separately from primary content. We’ll demo a working steering server and discuss compatibility and interop with HLS Content Steering.
2:55pm - 3:15pm


Featuring: Bhumi Sabarwal, Chris Ellsworth, & Will Law
3:15pm - 3:20am

Closing Remarks

Join the @Scale Mailing List and Get the Latest News & Event Info

Code of Conduct

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy