@Scale 2018: Resource management at scale for SQL analytics
Apache Impala is a highly popular open source SQL interface built for large-scale data warehouses. Impala has been deployed in production at more than 800 enterprise customers as part of Cloudera Enterprise, managing warehouses up to 40 PB in size. Hadoop Distributed File System (HDFS), cloud object stores, and scalable columnar storage engines make it cheap and easy to store large volumes of data in one place rather than spread across many silos. This data attracts queries, and, soon enough, contention for resources arises between different queries, workloads, and organizations. Without resource management policies and enforcement, critical queries can’t run and users can’t interactively query the data. In this talk, Tim Armstrong, Software Engineer at Cloudera, discusses the challenges in making resource management work at scale for SQL analytics and how his team is tackling them in Apache Impala.