Unifying big data workloads in Apache Spark
In contrast to previous big data systems, Apache Spark was designed to offer a unified engine across diverse workloads, such as SQL, streaming, and batch analytics. While this approach may seem counterintuitive, it has some key benefits — most important, applications can combine workloads in ways that are not possible with specialized engines, and users benefit from a uniform management environment. The talk will cover how having a unified engine enabled new types of applications based on Spark (such as interactive queries over streams), and how Databricks designed Spark’s APIs to enable efficient composition. It will also sketch the newest unified API in Spark, Structured Streaming, which lets the engine run batch SQL or DataFrame computations incrementally over a stream of data.