Apache Beam (incubating) is a unified batch and streaming data processing programming model that is efficient and portable. Beam evolved from a decade of system-building at Google, and Beam pipelines run today on both open source (Apache Flink, Apache Spark) and proprietary (Google Cloud Dataflow) runners. This talk will focus on I/O and connectors in Apache Beam, specifically its APIs for efficient, parallel, adaptive I/O. Google will discuss how these APIs enable a Beam data processing pipeline runner to dynamically rebalance work at runtime, to work around stragglers, and to automatically scale up and down cluster size as a job’s workload changes. Together these APIs and techniques enable Apache Beam runners to efficiently use computing resources without compromising on performance or correctness. Practical examples and a demonstration of Beam will be included.
- WATCH NOW
- 2024 EVENTS
- PAST EVENTS
- 2023
- 2022
- February
- RTC @Scale 2022
- March
- Systems @Scale Spring 2022
- April
- Product @Scale Spring 2022
- May
- Data @Scale Spring 2022
- June
- Systems @Scale Summer 2022
- Networking @Scale Summer 2022
- August
- Reliability @Scale Summer 2022
- September
- AI @Scale 2022
- November
- Networking @Scale Fall 2022
- Video @Scale Fall 2022
- December
- Systems @Scale Winter 2022
- 2021
- 2020
- 2019
- 2018
- 2017
- 2016
- 2015
- Blog & Video Archive
- Speaker Submissions