Apache Impala is a highly popular open source SQL interface built for large-scale data warehouses. Impala has been deployed in production at more than 800 enterprise customers as part of Cloudera Enterprise, managing warehouses up to 40 PB in size. Hadoop Distributed File System (HDFS), cloud object stores, and scalable columnar storage engines make it cheap and easy to store large volumes of data in one place rather than spread across many silos. This data attracts queries, and, soon enough, contention for resources arises between different queries, workloads, and organizations. Without resource management policies and enforcement, critical queries can’t run and users can’t interactively query the data. In this talk, Tim Armstrong, Software Engineer at Cloudera, discusses the challenges in making resource management work at scale for SQL analytics and how his team is tackling them in Apache Impala.
- WATCH NOW
- VIEW 2023 EVENTS
- DIVIDER
- EXPLORE TOPICS
- MACHINE LEARNING AND AI
- Data, Systems, and Networking
- ANDROID, VIDEO, AND WEB
- DEV TOOLS AND OPS, PRIVACY, SUSTAINABILITY, AND PERFORMANCE
- Fighting Abuse and Security
- DIVIDER
- Annual @Scale Conference
- Blog
- Community Forum
- About @Scale