On-Device Video Playback Upsampling

Designed for engineers that develop or manage large-scale video systems serving millions of people. The development of large-scale video systems includes complex, unprecedented engineering challenges.

Wen Li

TOPIC: Mobile, Video and Web

@SCALE SERIES: Video @Scale

TYPE: ARTICLE

YEAR: 2024

TAGS:

In today’s digital age, videos are a major source of entertainment and information. Like many of us, I enjoy swiping through Instagram Reels and watching funny animal clips. It’s frustrating, however, when a low-resolution video displays on a high-resolution 4K screen—it looks grainy and unclear.

Similarly, many users experience data limitations, especially as data caps are common in developing countries, such as India. For instance, a 2GB monthly plan allows only around 40 hours of 1080p streaming, which doesn’t account for other data usage. This makes low-quality video streaming even more prevalent, as people try to save data. Today, we’ll discuss how Meta is tackling these challenges using modern-video-upscaling technologies on mobile devices.

In this blog post, I’ll cover:

Current video upsampling technologies: A look into the methods used to enhance video quality.
Challenges of mobile deployment: Key obstacles in implementing these solutions on mobile, and how Meta addresses them.
Adaptive bitrate (ABR) integration: How upsampling can work with ABR to help reduce data usage.
Future directions: Potential improvements for upscaling technologies on edge devices, including mobile and AR/VR.

Video upsampling technologies

Two primary types of video upsampling technologies can enhance video quality:

Mathematical algorithms: Algorithms such as bicubic and Lanczos are cost effective and fast, offering moderate quality improvements when upscaling videos. They are limited, however, in the levels of enhancement they can achieve.
Deep-learning models: Deep-learning approaches, such as video super-resolution models, can significantly improve video quality, although they come with higher computational costs and latency. One such model, basicVSR++, won the Video Super Resolution Challenge at CVPR 2021 and serves as a benchmark for deep-learning-based upscaling quality.

In evaluating these methods, we upscale a 540p video to 1080p and measure the quality using metrics like PSNR, SSIM, and VMAF, which compare the upscaled video with a native 1080p reference. While bicubic and Lanczos perform moderately well, deep-learning models like basicVSR++ achieve significantly higher VMAF scores, with more than 13% improvement compared to Lanczos. (See figure 1 below.)

*Figure 1. Quality scores with different upscaling methods*

Integration into Meta’s applications

At Meta, enhancing video quality on platforms such as Facebook and Instagram is a priority. Video processing involves several stages (see Figure 2 below): pre-processing and encoding on the server, followed by decoding and rendering on the client. We can apply upsampling either on the server side (pre-processing) or on the client side (post-processing) during playback.

*Figure 2. Video processing and delivery flow*

In typical video playback, the ABR algorithm selects the video resolution, which is then scaled to fit the viewport. Although bicubic is the default method in most OS graphics-rendering stacks, we aim to boost quality by integrating advanced upscaling techniques during client-side post-processing. (See Figure 3 below.)

*Figure 3. ABR works with different scaling methods*

Challenges of Mobile Device Limits

Deploying deep-learning models for video upsampling on mobile devices is challenging due to:

Limited device computation and memory: Mobile devices generally have fewer GPU flops and memory resources than workstations or data-center servers, and some devices may lack dedicated GPUs entirely. Resources for model inference often compete with UI tasks, which take priority.
Latency sensitivity: Video playback requires low latency, especially at 30 fps (which allows only 33 ms between frames). Since decoding alone can take one to 20 ms per frame, adding upscaling within this narrow window is challenging.
Fragmented mobile hardware and software: Device ecosystems vary widely, encompassing frameworks such as Apple’s CoreML and Android’s NNAPI as well as hardware from different vendors. This fragmentation introduces engineering complexity and can lead to inconsistent model performance.

Meta’s Approach to Solving These Challenges

Model optimization: We optimized the video-upscaling model by reducing complexity, pruning, quantizing from 32-bit to 8-bit, and leveraging hardware acceleration (e.g., Qualcomm’s DSP). Specifically, our model uses “luma-only” upsampling, keeping chroma planes upscaled with traditional algorithms like bicubic or Lanczos, significantly reducing latency.
Adaptive upscaling strategy: Our adaptive framework selects different scalers based on device capabilities and content. More advanced upscaling techniques are applied on high-performance devices, while low-end devices may default to simpler algorithms to prevent issues like frame drops, battery drain, and overheating. This framework also assesses the target resolution—if minimal, we opt for lower-cost scaling. (See Figure 4 below.)

*Figure 4. Using different upscaling methods adaptively*

Here are the benchmark results (see Figure 5 below) that we got from running on Samsung and iPhone. We implemented Lanczos in an OpenGL shader, and the basicVSR++ and our own Meta deep-learning model in ExecuTorch. MetalFX is the Apple system built-in video, super-resolution solution. Both our Meta’s model and MetalFX from Apple can finish processing in under 10 ms per frame. This leaves us room to do it in real time on video playback.

*Figure 5. Frame processing latency with different upscaling methodologies*

The images below show a visual comparison of the original low-resolution content and the post-upscaled output from our super-resolution video model. The upscaled version provides more details in the dog’s hair and the table textures. The text is also clearer.

*Image 1. Comparison of the original low-resolution content and the post-upscaled output*

Working with ABR (adaptive-bitrate streaming)

Mobile users often encounter data constraints, and costs can be higher on cellular networks than on Wi-Fi. Additionally, different types of content (e.g., gaming versus short-form videos) may have varying quality requirements.

By integrating video upscaling with ABR, we aim to reduce data usage without sacrificing playback quality. In practice, post-upscaling can enhance a lower-bitrate video to match or even exceed the quality of the next resolution level.

In Figure 6 below, the diagram shows the video qualities on different bitrates. The blue line represents the original video, while the red line indicates the post-upscaled video using our super-resolution video model. The arrows point to two dots on each line with similar quality scores, but the red dot revals a lower bitrate than the blue one.

*Figure 6. Varying video qualities on different bitrates, before and after upscaling*

Our framework dynamically adjusts ABR quality scores based on device capabilities and health (see Figure 7 below). After a scaler is selected, it modifies the quality score of each encoding, allowing ABR to choose lower-bitrate encodings while maintaining high playback quality.

*Figure 7.* Adjusting ABR quality scores based on device capabilities and health

Future directions

We have made significant progress in upscaling technologies on edge devices, but challenges remain. For instance, resampling can introduce visual artifacts that may not be perceptually better, even if quality metrics improve. Future work includes optimizing the model further to be more efficient for edge devices, improving perceptual quality detection, and real-time artifact management on mobile.

Thank you for your time, and we look forward to enhancing mobile-video quality together!