AUGUST 31, 2016

Unifying big data workloads in Apache Spark

Jay Ayres

TripAdvisor

TOPIC: Data, Systems and Networking

@SCALE SERIES:

TYPE: video

YEAR: 2016

TAGS: data

In contrast to previous big data systems, Apache Spark was designed to offer a unified engine across diverse workloads, such as SQL, streaming, and batch analytics. While this approach may seem counterintuitive, it has some key benefits — most important, applications can combine workloads in ways that are not possible with specialized engines, and users benefit from a uniform management environment. The talk will cover how having a unified engine enabled new types of applications based on Spark (such as interactive queries over streams), and how Databricks designed Spark’s APIs to enable efficient composition. It will also sketch the newest unified API in Spark, Structured Streaming, which lets the engine run batch SQL or DataFrame computations incrementally over a stream of data.

SUBSCRIBE TO @SCALE

TOPICS

Data, Systems and Networking Dev Tools and Ops, Privacy, Sustainability and Performance Fighting Abuse and Security Machine Learning and AI Mobile, Video and Web