Scalable Data Transportation & Ingestion with MemQ

Ambud Sharma

TOPIC: Data, Systems and Networking

@SCALE SERIES: Data @Scale

TYPE: video

YEAR: 2022

TAGS:

Machine learning is at the heart of Pinterest and is powered by large scale ML training log collection. To solve the cost efficient data ingestion & transportation problem at Pinterest we developed MemQ, a PubSub system that leverages pluggable cloud native storage like S3 using a decoupled packet based storage design. MemQ is able to scale to GB/s traffic with 90% higher cost efficiency than Apache Kafka, enabling Pinterest to ingest all of our ML training data powering offline training, near real-time model quality validation and ad-hoc analysis.