September 02, 2016

No shard left behind: APIs for massive parallel efficiency

Topic: Systems and Networking

Jay Ayres

TripAdvisor

TYPE: Videos

YEAR: 2016

Apache Beam (incubating) is a unified batch and streaming data processing programming model that is efficient and portable. Beam evolved from a decade of system-building at Google, and Beam pipelines run today on both open source (Apache Flink, Apache Spark) and proprietary (Google Cloud Dataflow) runners. This talk will focus on I/O and connectors in Apache Beam, specifically its APIs for efficient, parallel, adaptive I/O. Google will discuss how these APIs enable a Beam data processing pipeline runner to dynamically rebalance work at runtime, to work around stragglers, and to automatically scale up and down cluster size as a job’s workload changes. Together these APIs and techniques enable Apache Beam runners to efficiently use computing resources without compromising on performance or correctness. Practical examples and a demonstration of Beam will be included.

SUBSCRIBE TO @SCALE

← Back

No shard left behind: APIs for massive parallel efficiency

Jay Ayres

TYPE: Videos

YEAR: 2016

SUBSCRIBE TO @SCALE

Thank you for your response. ✨

RECENT POSTS

RELATED POSTS