Apache Hive Performance Tuning
Also available as:
PDF

Optimizing an Apache Hive data warehouse

You can tune your data warehouse infrastructure, components, and client connection parameters to improve the performance and relevance of business intelligence and other applications. Tuning Hive and background components that support Hive Query Language (HiveQL) processing is particularly important as your workload and database volume increases.

Increasingly, enterprises want to run SQL workloads that return faster results than batch processing can provide. These enterprises often want data analytics applications to support interactive queries. Hive low-latency analytical processing (LLAP) can improve the performance of interactive queries. A Hive interactive query that runs on the Hortonworks Data Platform (HDP) meets low-latency, variably guaged benchmarks to which Hive LLAP responds in 15 seconds or fewer. LLAP enables application development and IT infrastructure to run queries that return real-time or near-real-time results.

You can further enhance LLAP performance with real-time data by integrating the enterprise data warehouse (EDW) with the Druid business intelligence engine.

When you query large-scale EDW data sets, you have to meet service-level agreement (SLA) benchmarks or other performance expectations. Because how you tune your query processing environment depends on factors such as system resources, depth of data analysis, and query latency requirements, you must become familiar with Hive warehouse processing, prepare for tuning, and configure LLAP using parameters that meet your performance needs.