Videos & Articles

#videoatscale2021 x
android x
artificial intelligence x
awslambda x
Boston x
data x
dev tools x
dev tools & ops x
hot topics x
london x
machine learning x
menlo park x
mobile x
Netflix x
networking x
overload x
overloadtesting x
performance x
reliability x
Resilience x
seattle x
security x
slos x
spamfighting x
systemsatscale x
systemsatscalefall2021 x
systemsatscalefallr2021 x
systemsatscalespring2021 x
systemsatscalesummer2021 x
video x
web x

Select from the Topics below to filter:

Type the Speaker's name below and select to filter:

Select from the Locations below to filter:

Select from the Years below to filter:

Video @Scale 2021

The Importance of Audio Today

Live Panel – “The Importance of Audio Today”. Recently, Audio has been front and center with the emergence of new audio-only products and experiences and many new audio-focused investments to enhance video viewing. ...
Video @Scale 2021

Highly-Efficient SVT-AV1-based Solutions for VOD Applications

This presentation will highlight the latest improvements of the VOD-targeted high-latency Constant Rate Factor (CRF) and Variable Bit Rate (VBR) modes of the SVT-AV1 encoder. It will first present the latest SVT-AV1 cycles-quality ...
Video @Scale 2021

Scaling Your Encoding Backend Using Eve

In this talk, we will discuss the current state in terms of bitrate/quality and complexity of Two Orioles’ Eve video encoder for the VP9 & AV1 video codecs. VP9 provides meaningful quality improvements over H.264 with a mature ...
Video @Scale 2021

Video Quality Assessment of User Generated Contents

Video quality of User Generated Content (UGC) is extremely difficult to wrangle with due to their high diversity of contents and quality. They bring new challenges to how we traditionally measured and assessed video quality. Most ...
Video @Scale 2021

Measuring Video Quality Using AVQT

AVQT, short for Advanced Video Quality Tool, is a macOS based command line tool which estimates perceptual video quality of compressed videos that might contain video coding and scaling artifacts. Utilizing the AVFoundation framework, ...
Video @Scale 2021

Optimizing Storage Efficiency for FB Video Processing

Like the rest of the video world, Facebook Video has significantly grown year to year. While we celebrate the growth rate, we are also concerned about the resources consumption to support the growth, which became worse during COVID. ...
Video @Scale 2021

ASIC-RDX and Compute-Compression Efficiency Optimization for UGC Video Processing

Facebook and user-generated content (UGC) platforms encode videos at “billion-scale” and deliver them worldwide to a variety of devices (Mobile/Laptop/TV) across different networks. The popularity of UGC videos can vary widely ranging ...
Video @Scale 2021

Behind the Curtains: A/B Tests in Video Land

A/B testing on video isn’t just about tweaking recommendations or picking the perfect thumbnail. Every aspect of video benefits from rapid experimentation including the infrastructure – streaming algorithms, codecs, bitrates, caching ...
Video @Scale 2021

Media Provenance as a Prevention for Malicious Synthetic Media

Malicious synthetic media – both deepfakes and cheapfakes – are rising in prevalence and importance. End users are rapidly losing trust in media, and their ability to tell authentic media from inauthentic has greatly diminished. This ...
Video @Scale 2021

Smart Crop and Smart Preview via Video Understanding

Understanding video content has been a focus for video-sharing platforms. It is one of the most important driving forces for the growth in distribution, discovery, user experience and monetization. Instream video understanding is the ...
Video @Scale 2021

Highly Available Live Encoding Using Go and Kubernetes

Two years ago, iStreamPlanet set out to build a cloud-native software transcoder with the reliability and feature set to support some of the highest profile live channels and events in the world. Some of our goals included: 4+ 9’s of ...
Video @Scale 2021

Performance of Low-Latency DASH/CMAF and Low-Latency HLS Systems

Reducing end-to-end streaming latency is critical for HTTP-based live video streaming. There are currently two new technologies in this domain: Low-Latency HTTP Live Streaming (LL-HLS) and Low-Latency Dynamic Adaptive Streaming over ...
Video @Scale 2021

Measuring the Reliability of Live Video Infrastructure

Serving Live Videos with high reliability is challenging, not only from the perspective of deploying improvements on top of a distributed system but also from the perspective of defining correct measurements to capture reliability gaps ...
Video @Scale 2021


Hear from Paresh Rajwat, the Vice President and Head of Product for Facebook Audio, Video, and Music.
Systems @Scale Fall 2021

Week 3 Live Q&A – Systems @Scale Fall 2021 Edition

Week 3 Live Q&A – Systems @Scale Fall 2021 Edition with speakers discussing Challenges@Scale
Systems @Scale Fall 2021

Scaling Back-End Deployment @ Facebook

We present how Facebook’s unified Continuous Deployment (CD) system, Conveyor, powers safe and flexible service deployment across all services at Facebook. Conveyor enables services owners to build highly customized deployment ...
Systems @Scale Fall 2021

Lessons from Building a Large-scale, Multi-cloud Data Platform at Databricks

The cloud is becoming one of the most attractive ways for enterprises to store, analyze, and get value from their data, but building and operating a data platform in the cloud has a number of new challenges compared to traditional ...
Systems @Scale Fall 2021

Making Distributed Priority Queue Disaster Ready

Facebook Ordered Queue Service (FOQS) is a distributed priority queue service that powers hundreds of services and products across the Facebook stack. Facebook users have come to rely on its services to remain connected to their ...
Systems @Scale Fall 2021

Scaling Apache Kafka in the Cloud

Confluent Inc provides cloud based data stream platforms based on Apache Kafka. Running an open source product like Kafka on the public cloud offerings of Amazon, Google, and Microsoft offers an interesting array of challenges. This ...
Systems @Scale Fall 2021

Week 2 Live Q&A – Systems @Scale Fall 2021 Edition

Week 2 Live Q&A – Systems @Scale Fall 2021 Edition with speakers discussing Reliability & Testing
Systems @Scale Fall 2021

Power Loss Siren: Making Facebook Resilient to Power Outages

Power outages cause the majority of unplanned server downtime in Facebook data centers. During a power outage, thousands of servers can go offline simultaneously for several hours, which can lead to service degradations. At Facebook, ...
Systems @Scale Fall 2021

Scaling Testing at a Startup: Integration Testing GraphQL Services with Jest

BigSpring is a mobile first platform for lifelong skilling with measurable ROI. We use GraphQL to power our services. We would love to talk about how we use Jest to integration test our resolvers and other business logic built in our ...
Systems @Scale Fall 2021

Automatic Testing for Services at Scale

Developing at speed and scale across Facebook’s many services requires testing frameworks that help developers iterate on features quickly and with minimal friction, while helping to catch bugs early. Learn why we’ve built our own ...
Systems @Scale Fall 2021

Blaming in a Blameless World

Attribution of reliability in a microservice architecture can be solved, and has been solved, in very different ways due to how services are cataloged across the industry. Our hypothesis at Lyft was that service catalogs can become ...
Systems @Scale Fall 2021

Amazon Redshift Reinvented

In 2013, eight years ago, Amazon Web Services revolutionized the data warehousing industry by launching Amazon Redshift, the first fully managed, petabyte-scale cloud data warehouse solution. Amazon Redshift made it simple and ...
Systems @Scale Fall 2021

Week 1 Live Q&A – Systems @Scale Fall 2021 Edition

Week 1 Live Q&A – Systems @Scale Fall 2021 Edition with speakers discussing Capacity & Efficiency
Systems @Scale Fall 2021

End-2-End Resource Accounting Leveraging Distributed Tracing

Transitive Resource Accounting (TRA) is a system that builds on top of Facebook’s distributed traces platform, Canopy, with the goal of capturing end-2-end request cost metrics and attributing them back to the originating caller. This ...
Systems @Scale Fall 2021

Log Events @ Twitter: Challenges of Handling Billions of Events per Minute

At Twitter, hundreds of thousands of microservices emit important events triggered by user interactions on the platform. The Data Platform team has the requirement to aggregate these events by service type and generate consolidated ...
Systems @Scale Fall 2021

RAS: A Resource Allowance System for Perpetual Region-wide Resource Allocation

Facebook is undergoing a massive design shift in capacity management and service placement to scale the efficiency of our datacenter resources. At the core of this shift is the Resource Allowance System (RAS) that continuously ...
Systems @Scale Summer 2021

Resource Guarantees in a Multi-Tenant Infrastructure

The Facebook cloud supports a variety of workloads including those which are CPU intensive, memory bound, I/O bound, latency sensitive, or a combination of these, on hardware that ranges from smaller single socket servers to load ...
Systems @Scale Summer 2021

Platform Agnostic Observability System for AI Accelerators

Application specific hardware platforms play a crucial role in meeting the growing latency and compute demands of workloads like deep learning, content understanding and video encoding. However, it is challenging to operate these ...
Systems @Scale Summer 2021

Performance Wins with BPF: Getting Started

BPF (eBPF) tracing is the superpower that can analyze everything, helping you find performance wins, troubleshoot software, and more. But with many different front-ends and languages, and years of evolution, finding the right starting ...
Systems @Scale Summer 2021

Week 3 Live Q&A – Systems @Scale Summer 2021 Edition

Systems @Scale, Summer – Week 3 Live Q&A with speakers discussing Scalability
Systems @Scale Summer 2021

What is Behind Alibaba’s Double 11 Shopping Festival – The Architecture Evolution of Flink Stream-Batch Unification UNIFICATION

In this talk, we share some of the most exciting achievements of Flink at Alibaba in recent years, including two main topics: one is the architecture evolution of stream-batch unification; the other is the recent efforts to improve ...
Systems @Scale Summer 2021

Consolidating Storage Backend at Scale with Tectonic Filesystem

Tectonic is Facebook’s exabyte-scale, datacenter-wide distributed filesystem. Prior to Tectonic, Facebook’s storage infrastructure consisted of a constellation of smaller, specialized storage systems. Blob storage was spread across ...
Systems @Scale Summer 2021

ARKDB: The Key-Value Engine for Alibaba Cloud Storage Services

Alibaba Cloud offers a comprehensive set of storage services, including Object Storage Service (OSS), File Storage Service (NAS) and NoSQL Tablestore with high durability, high availability, high scalability and strong consistency. All ...
Systems @Scale Summer 2021

Week 2 Live Q&A – Systems @Scale Summer 2021 Edition

Systems @Scale, Summer – Week 2 Live Q&A with speakers discussing Cluster Management, Capacity, and Provisioning @Scale
Systems @Scale Summer 2021

Dynamic Leasing of Spare Capacity to Improve Fleet Utilization with Optimus

Optimus, our spare capacity leasing system, coordinates capacity allocations on millions of machines to improve global capacity utilization and meet fast growing business needs. Within Facebook’s infrastructure, spares are ...
Systems @Scale Summer 2021

Morcor – Co-location of Mixed Workloads at Uber

Uber infrastructure broadly supports 3 kinds of workloads: stateless microservices, big data (batch) and stateful, each running on its own hardware silo. Morcor aims to reduce the cost of infrastructure through co-location of stateless ...
Systems @Scale Summer 2021

Managing a Million Kubernetes Clusters

Azure Kubernetes Service (AKS) manages Kubernetes clusters on behalf of customers. AKS stays agnostic to the customer workload and manages the accessibility, performance, and reliability of these clusters without requiring full ...
Systems @Scale Summer 2021

Defining Service Level Objectives for Exposure Notifications Servers

This session will share the real-world lessons from reliability engineering work on the Exposure Notifications Server – A project from Google and Apple in an effort to slow the spread of COVID-19. The work from Google SRE ...
Systems @Scale Summer 2021

Better Resilience Through Self Tuning

William previously worked at Netflix, and this presentation will highlight some of the strategies he used while working there. He has the permission of Netflix to discuss them at this conference. As companies grow and the number of ...
Systems @Scale Summer 2021

Avoiding Overload in Distributed Systems

At AWS, we build systems using a variety of complementary strategies for maintaining predictable, consistent performance in the face of overload. In this talk, we describe techniques such as implementing layers of protection, ...
Systems @Scale Summer 2021

Live Traffic Load-Testing-Measuring and Validating Capacity at Facebook

Facebook is made up of hundreds of heterogeneous services in geographically distributed data center regions. To reliably run, providing a sufficient amount of capacity for all sub-systems and services is crucial. However, understanding ...
Systems @Scale Summer 2021

Week 1 Live Q&A – Systems @Scale Summer 2021 Edition

Systems @Scale, Summer – Week 1 Live Q&A with speakers discussing Reliability.
Systems @Scale Spring 2021

Virtualizing Consensus in Delos for Rapid Upgrades and Happy Engineers

We will be hosting a talk about our work on Virtualizing Consensus In Delos For Rapid Upgrades And Happy Engineers during our virtual Systems @Scale event at 11am PT on Wednesday, March 17th, followed by a live Q&A session. Please ...
Systems @Scale Spring 2021

FlightTracker: Social graph consistency at scale

We will be hosting a talk about our work on FlightTracker: Social graph consistency at scale during our virtual Systems @Scale event at 11am PT on Wednesday, March 17th, followed by a live Q&A session. Please submit any questions ...
Systems @Scale Spring 2021

Systems @Scale Spring 2021: Week 3

Welcome to the third week of Systems@Scale – Spring 2021, Virtual Edition – featuring recorded sessions & Live Q&As with Maxim Fateev, Girish Joshi, and Dan Shiovitz.
Systems @Scale Spring 2021

Workflows@Facebook: Powering developer productivity and automation at Facebook scale

We will be hosting a talk about our work on Workflows@Facebook: Powering Developer Productivity And Automation At Facebook Scale during our virtual Systems @Scale event at 11am PT on Wednesday, March 10th, followed by a live Q&A ...
Systems @Scale Spring 2021

Week 2 Live Q&A – Systems @Scale Spring 2021 Edition

Welcome to the second week of Systems@Scale – Spring 2021, Virtual Edition – featuring recorded sessions & Live Q&As with Chidambaram Muthu, Dan Danaila, and Sazzala Reddy.
Systems @Scale Spring 2021

Optimizing video storage via Semantic Replication

We will host a talk about our work on Optimizing video storage via Semantic Replication during our virtual Systems @Scale event at 11am PT on Wednesday, March 3rd, followed by a live Q&A session. Please submit any questions you may ...
Systems @Scale Spring 2021

Week 1 Live Q&A – Systems @Scale Spring 2021 Edition

Welcome to the first week of Systems@Scale – Spring 2021, Virtual Edition – featuring recorded sessions and Live Q&A sessions with Akshay Nanavati, Niharika Devanathan, and Gerhard Lazu.
Performance @Scale NY

Performance @Scale 2020: Optimizing the Instagram Web Tier

Instagram is one of the largest Python deployments which supports billions of people using the service. As the system and features keep growing, so has our compute footprint. This was even more evident this year when global lockdowns ...
Performance @Scale NY

Performance @Scale 2020: Improving UX with resource prioritization

With today’s complex web content, ordering the requests for resources can have a dramatic impact on the resulting user experience. Patrick will explore what the impact of the ordering can have, discuss practical solutions to ...
Performance @Scale NY

Performance @Scale 2020: Scaling machine learning on graphs

Networks (graphs) of people’s social and content interactions are a rich source of data for machine learning algorithms. Traditional machine learning algorithms do not naturally take graph-structured data as input, so unsupervised ...
Performance @Scale NY

Performance @Scale 2020: Understanding emotion for happy users

In this talk we will look at how to collect, analyze, and act on a few metrics that tell us more about how our user feels when using the site. We can trend how these metrics change over a user’s browsing experience, and we can improve ...
Performance @Scale NY

Performance @Scale 2020: Facebook’s developer infrastructure

In our first talk of the conference you’ll hear how Facebook is scaling its performance efforts across many apps that are growing and evolving rapidly. What are the newest directions Facebook is exploring to make apps fast and keep ...
Keeping the Lights On @Scale

Keeping the Lights on @Scale 2020: Panel discussion

This panel discussion will explore how our global engineering team immediately came together as one at the start of the pandemic. Through it all, Facebook wasn’t just “keeping the lights on.” Our team was ushering in the future of ...
Keeping the Lights On @Scale

Keeping the Lights on @Scale 2020: Remote Work @Scale

On March 6, Facebook closed its global offices and employees began working from home. It was the start of the largest remote work experiment ever created. This presentation will provide an inside look at how we enabled our engineers to ...
AI @Scale

AI @Scale: Flyte: Making MLOps and DataOps a reality

Flyte is the backbone for large-scale Machine Learning and Data Processing (ETL) pipelines at Lyft. It is used across business critical applications ranging from ETA, Pricing, Mapping, Autonomous, etc. At its core it is a Kubernetes ...
AI @Scale

AI @Scale 2020: Azure Cognitive Services @Scale

Azure Cognitive Services sits at the core of many essential products and services at Microsoft for internal and external workloads. Anand’s talk describes the hardware and software infrastructure that supports Ai services at global ...
AI @Scale

AI @Scale 2020: Mastercook: Large scale concurrent model development in ads ranking

We will discuss a novel model development process and tools we introduced to ads ranking machine learning teams, where a single model can be concurrently developed by dozens of engineers, whose changes to the model are centralized ...
AI @Scale

AI @Scale 2020: Large Scale Machine Learning Using SQL in BigQuery

Google BigQuery is a petabyte-scale serverless cloud data warehouse that enables scalable machine learning using SQL. In this talk, we take a look at how enabling data analysts and other SQL users to perform machine learning tasks can ...
AI @Scale

AI @Scale 2020: Netflix’s Human-Centric Approach to ML Infrastructure

Netflix’s unique culture affords it’s data scientists extraordinary freedom of choice in ML tools and libraries. At the same time, they are responsible for building, deploying, and operating complex ML workflows ...
AI @Scale

AI @Scale 2020: F3: Next-generation Feature Framework at Facebook

We will discuss the next generation feature framework in development at Facebook. This new framework enables efficient experimentation in building machine learning features to semantically model behaviors and intent of users, and ...
AI @Scale

AI @Scale 2020: High Performance Observability Across the ML Lifecycle

The scale and breadth of ML applications have increased dramatically thanks to scalable model-training and serving technologies. Builders of enterprise ML systems often have to contend with both real-time inference and massive amounts ...
Systems @Scale Remote Edition — Summer 2020

Systems @Scale Summer 2020 Q&A

As part of the Systems @Scale event, engineers participated in a series of live Q&As about the engineering work presented in the technical talks. We’ve collected those questions and the engineers’ responses below. Asynchronous ...

Join the @Scale Mailing List and Get the Latest News & Event Info

Code of Conduct

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy