EVENT AGENDA
Event times below are displayed in PT.
Video @Scale is a technical conference for engineers that build large scale video systems, where engineers come together to discuss unprecedented engineering challenges, learn about new technology, and collaborate on the development of new solutions.
This year’s Video @Scale will be hosted virtually. Joining us are speakers from Meta, Twitch, Akamai, Caffeine, BrightCove and more. The event will take place on November 3rd, 2022, with talks themed around Interactive, Immersive, and Intelligent video at scale.
Event times below are displayed in PT.
Presented by: Abhinav Kapoor & Venus Montes
 
                                                 
                                                Augmented Reality is going to change the way humans interact. Various companies have started to build the foundational infrastructure and tools to create the AR ecosystem and AR experiences on mobile devices. But to provide similar or even more computationally intensive immersive AR experiences for thin clients like AR glasses, one needs to take a step back and understand the strict power and thermal limitations and think of an architecture which allows you to offload compute to a beefier server in a privacy/context aware, latency sensitive and scalable fashion. There are lot of challenging areas when it comes to shipping camera frames to server for computation ranging from operating real time transport at scale, leveraging GPU's at scale for ML and render operations for both calling (like Augmented Calling) and non-calling scenarios. Camera frames (RGB and possibly depth) happen to be the prime driving force for Augmented Reality and to be able to process this video data at scale is a necessity for scaling AR experiences in the future. This talk will focus on some of the work which Meta has done in this domain and how industry as a whole needs to come together to solve some of these challenges in order to build the future of high fidelity, low latency immersive AR experiences.
 
                                                Traditionally, viewers consumed video in a passive, ‘lean back’ environment. Video consumption on social media, however, is an interactive, ‘lean forward’ experience with rich engagement between creators and their audience. Creators are looking for more ways to connect, engage and interact with their audience through new video experiences. Unfortunately, existing video specifications don’t provide a standardized mechanism to support a diversity of interactive experiences.
In this talk, we’ll present a generic end to end framework for interactive video experiences. Our solution enables creators and broadcasters to simply add interactive components (e.g. Ads, sticker, poll, image, chapter marker, etc.) into the video timeline and define how the audience can interact with the components. During playback, viewers can interact with the video at the predefined timeline. We will also cover how AR and AI technologies can be applied towards interactive components, and discuss the different use cases the framework could power up.
 
                                                Arti.AR is building a cloud-based AR platform for live video. This talk outlines the benefits of adopting AR at scale and the challenges we faced during the platform development.
 
                                                Featuring Pranav Saxena, Yurong Jiang, & Ben Hazan
Moderated by Venus Montes
 
                                                 
                                                 
                                                 
                                                Twitch has been working on Warp, a new live streaming protocol utilizing QUIC. This talk outlines the benefits of QUIC and why it will replace TCP. I'll cover some of the emerging approaches for transferring media over QUIC such as Warp, Meta's RUSH, and RTP over QUIC.
 
                                                Over the past six months, Caffeine has reimplemented its ingest gateway, both to address long-standing historical behaviors, and to provide a platform for future service enhancements. This presentation touches on a number of high-level challenges encountered during this development, and dives deep on one of the more baffling roadblocks we uncovered.
 
                                                Sometimes we need to deliver a highly reliable live streaming experience over a network that is not designed for that. In our case we implemented a flexible multi-path strategy that allowed us to fix (almost) all problems (buffering and disconnections) caused by unavoidable network events.
 
                                                 
                                                Featuring Luke Curley, Adam Roach, Jordi Cenzano, & Thomas Higdon
Moderated by Abhinav Kapoor
 
                                                 
                                                 
                                                 
                                                 
                                                It is well known that due to the uneven distribution of cones in the human retina, we have sharp vision only in the central (fovea) region. The angular span of this region is tiny, just about 1 degree^2. In comparison, the angular span of a TV set watched from 4x screen heights is over 250 degrees^2. This observation implies that using eye-tracking for video compression offers enormous potential. If the encoder can instantaneously know which spot (1degree^2 patch) is visible, only information in that spot will need to be encoded and transmitted. Up to 2 orders of magnitude savings in bandwidth may be attainable!
This idea has been known, at least, since Bernd Girod's paper "Eye movements and coding of video sequences," published in 1988. Many additional works have followed, proposing various variants of implementations of gaze-based video coding systems. Even special classes of compression techniques called foveated video coding or region-of-interest (ROI)-based video coding have appeared, motivated by this application. However, most early attempts to build complete systems based on this idea were unsuccessful. The key reasons were the long network delays observed in the 1990s and 2000s – years when this idea was studied most extensively. But things have changed since.
In this talk, I first briefly survey some basic principles (retinal eccentricity, eye movement types, and related statistics) and some key previous studies/results. I will then derive an equation explaining the relationship between network delay and bandwidth savings that may be achievable by gaze tracking. Then, I will switch the attention to modern-era mobile wireless networks – 5G and Wi-Fi 6 / 802.11ax - and discuss delays currently achievable in direct links to user devices and in cases of device-to-device communication in the same cell (or over the same WiFI access network), as well as in cases of data transmissions involving 5G core networks.
 
                                                Cloud streaming is an important tool to make Metaverse better. Using cloud streaming, we can increase the reach of Metaverse to 2D surfaces quickly and to a variety of devices. Cloud streaming can also help 3D environments by enabling massively social, immersive, and rich experiences on lightweight devices that’s limited on compute, thermal, and power.
 
                                                In this talk, I'll discuss some of the challenges we faced building Runway, a professional video editor on the browser, focusing on Green Screen, our interactive video segmentation tool, and the general server-side architecture we've developed for low-latency ML inference on video with computer vision models
 
                                                Featuring Yuriy Reznik, Naizhi Li, Anastasis Germanidis
Moderated by Venus Montes
 
                                                 
                                                 
                                                 
                                                At Meta, we deeply invest to ingest and playback with the best media quality for our users.
This becomes especially challenging with the advancement in camera capture capabilities of new devices and products such as Reels that allow users to add special effects on top of these videos.
As the HDR color space evolves on Android with different OEMs supporting different HDR formats, at Meta we need to correctly read these formats and apply the appropriate tonemap(conversion to SDR) so that such videos are not busted on upload and playback.
Video Client Infra has solved the challenging problem to correctly tonemap different format HDR videos on Android devices, at a frame level. This helps to preserve the media quality, minimum latency impact and keeps these videos still compatible with all the awesome effects loved by our creators.
We also plan for HDR transcode and ingestion, as the HDR format is standardized for all OEMs.
 
                                                We have been working on gracefully supporting HDR within the Instagram iOS app since it was made popular by Apple in October 2020. Follow our journey and the challenges we faced from ingestion through playback as we adopt this format within our non-traditional media stack.
 
                                                AV1 was the first generation royalty-free coding standard developed by Alliance for Open Media, of which Meta is one of the founding members. Since its release in 2018, we have worked closely with the open source community to implement and optimize AV1 software decoder and encoder. Early in 2022, we believed AV1 was ready for delivery at scale for key VOD applications such as Facebook(FB) Reels and Instagram (IG) Reels. Since then, we have started delivering AV1 encoded FB/IG Reels videos to selected iPhone and Android devices. After roll out, we have observed great engagement win, playback quality improvement, and bitrate reduction with AV1.
In this talk, we will share our journey on how we enabled AV1 end-to-end from Meta servers to users' mobile screens around the world. First, we will talk about AV1 production, including encoding configuration and ABR algorithms. Further, since the main delivery challenge is on the decoder and client side, we will also talk about the learnings on integrating AV1 software decoder on both iOS and Android devices and the current state. Finally, some ongoing and future work will also be presented.
 
                                                Content distributors routinely use multiple concurrent CDNs to distribute their live and VOD content. For performance, contractual and failover reasons, there are requirements to switch dynamically between these distribution channels at run time. A new specification being developed by the DASH Industry Forum provides a standardized means for a third-party steering service to switch a player between alternate content sources, both at start-up and dynamically while the stream is underway. This talk investigates the mechanics of this steering workflow, including manifest enhancements, player behavior, local steering for 3GPP EMBMS compatibility, steering the manifest itself, and steering ads separately from primary content. We’ll demo a working steering server and discuss compatibility and interop with HLS Content Steering.
 
                                                Featuring Bhumi Sabarwal, Chris Ellsworth, Ryan Lei, & Will Law
Moderated by Abhinav Kapoor
 
                                                 
                                                 
                                                 
                                                 
                                                Presented by: Abhinav Kapoor & Venus Montes
 
                                                 
                                                Abhinav is part of the Video Infra leadership team at Meta, focusing on scaling... read more
 
                Venus Montes is a Software Engineering Manager at Meta working on Video Infra. Her... read more
 
                Pranav Saxena is a Staff software engineer/Technical Lead at Meta Reality Labs driving key... read more
 
                Yurong is a software engineer in video infra from Meta. He’s primarily working on... read more
 
                Ben Hazan is the VP of R&D at Arti.AR, building a cloud-based platform to... read more
 
                Luke is a software engineer at Twitch primarily focused on video distribution. Twitch runs... read more
 
                
 
                Jordi Cenzano is an engineer specializing in broadcast and online media. He is currently... read more
 
                Thomas Higdon is a software engineer at Meta Platforms, Inc. in Cambridge, MA, USA.... read more
 
                
 
                Naizhi is a software engineer working on real time communication field, currently focusing cloud... read more
 
                Anastasis Germanidis is the co-founder/CTO at Runway, which is building next-generation content creation software... read more
 
                I am an Android Software Engineer@ Meta working in Video Client Infra. We work... read more
 
                Chris is an iOS engineer on the Media Platform team at Instagram. His focus... read more
 
                Dr. Ryan Lei is currently working as a video codec specialist and technical lead... read more
 
                Will Law is Chief Architect within the Edge Technology Group at Akamai and a... read more
