DECEMBER 13, 2023

AUTOMATED GPU SHARING AT SCALE

Xiao Zhang

Google

TOPIC: Data, Systems and Networking

@SCALE SERIES: Systems @Scale

TYPE: video

YEAR: 2023

TAGS:

General-purpose GPUs, with their powerful numerical computing capacity, are popular platforms for accelerating machine-learning workloads. However, GPU workloads often fail to keep the GPU pipeline fully occupied, resulting in low overall resource utilization. To address this inefficiency, we have designed and implemented GPU sharing to improve overall throughput and utilization at cluster level.

SUBSCRIBE TO @SCALE

TOPICS

Data, Systems and Networking Dev Tools and Ops, Privacy, Sustainability and Performance Fighting Abuse and Security Machine Learning and AI Mobile, Video and Web