Apache Hive Performance Tuning
Also available as:
PDF

Chapter 2. Interactive SQL Query with Apache Hive LLAP (Technical Preview)

[Note]Note

This feature is in technical preview and considered under development. Do not use this feature in your production systems. If you have questions regarding this feature, contact Support by logging a case on the Hortonworks Support Portal at https://support.hortonworks.com.

Interactive and Sub-Second SQL Queries

Many SQL workloads require fast response times because a person is waiting in real-time for query output: for example, a Business Intelligence tool or a web dashboard.

Apache Hive enables interactive and subsecond SQL through Low Latency Analytical Processing (LLAP), a new component introduced in Hive 2.0 that makes Hive faster by using persistent query infrastructure and optimized data caching. LLAP is 100% compatible with Hive SQL queries and data formats. Using LLAP gives you the benefit of interactive and sub-second SQL while keeping all your data in Apache Hadoop.

To use LLAP in Hortonworks Data Platform, you must perform the following actions:

  1. Enable LLAP.

  2. Size LLAP appropriately.

  3. Connect your clients to a dedicated HiveServer2 endpoint that is created when you enable LLAP.

[Note]Note

LLAP supports only SQL standard authorization. SQL GRANT and REVOKE statements are used for authorization. Storage-based authorization is not supported.

[Note]Note

Wire encyption is not supported with LLAP.

Enabling LLAP for Interactive Queries in Hive

After you install Apache Ambari 2.4, you must configure Apache Hive to run interactive queries:

  1. Select the Hive service in the Ambari dashboard.

  2. Click the Configs tab.

  3. In the Settings tab, locate the Interactive Query section:

    Figure 2.1. Settings tab


  4. Under Enable Interactive Query, move the slider.

    The Select HiverServer2 Interactive host dialog box opens :

    Figure 2.2. HiveServer2 Interactive dialog box


  5. Assign HiveServer2 Interactive to the host that you want to run it on.

    In most cases, you can keep the default setting.

  6. Click Select.

    The Settings tab opens again and displays additional configurations.

    [Note]Note

    Most configurations in the Settings panel will apply to your LLAP cluster except for the following: Run as end user instead of Hive user in the Security section, and Tez under the "Optimization' section'.

  7. Modify settings as needed:

    • For simple clusters that contain only a default queue, the system tries to manage the queues, and creates an LLAP queue in YARN. In this case, the following two settings are configurable:

      • % of cluster capacity - Percentage of the cluster to be used for the interactive query system

      • Maximum Total Concurrent queries - Total concurrent queries that can execute for this interactive setup, which determines the number of Application Masters to be launched.

    • For more complex clusters, you must specify the queue that LLAP will use from the Interactive Query Queue menu.

      The system tries to occupy the entire queue, and sets up parameters for LLAP accordingly. You can still configure the Maximum Total Concurrent queries.

      Table 2.1. Interactive Query Properties

      PropertyDescription
      Number of LLAP DaemonsThe total number of LLAP daemons
      YARN Memory per DaemonThe YARN container size for each individual daemon
      In-Memory Cache per DaemonA subset of the container size, the size of the cache, in megabytes
      Maximum CPUs per DaemonThe number of executors per daemon: for example, thenumber of fragments that can execute in parallel on a daemon


  8. If you make any changes to the configuration settings, click Save.

Connecting Your Clients to a Dedicated HiveServer2 Endpoint

Hortonworks provides Hive JDBC drivers that enable you to connect to HiveServer2 so that you can query, analyze, and visualize data stored in the Hortonworks Data Platform.

From the Apache Ambari UI, in the Hive>Services dashboard, you can copy the second URL, the HiveServer2 Interactive JDBC URL, to paste into any JDBC client (such as a Business Intelligence tool or Beeline):

Figure 2.3. HiveServer2 Interactive JDBC URL


[Note]Note

Hive CLI is not supported for LLAP.

After you enable interactive queries, links to HiveServer2 Interactive appear in the Hive Summary tab, as shown in the following figure:

Figure 2.4. HiveServer2 Interactive


From the Quick Links menu, shown in the following figure, you can open the HiveServer2 Interactive UI, which enables you to view executing queries, see a recent history of queries, and access the LLAP daemons:

Figure 2.5. Quick Links


Monitoring Interactive Query Performance

Select a link in the HiveServer2 Interactive UI to open a monitoring UI that shows you heap, system, and cache metrics for the selected node:

Figure 2.6. HiveServer2 Interactive UI


Viewing Metrics in Grafana

The Ambari Metrics System includes Grafana, with prebuilt dashboards for advanced visualization of cluster metrics. You can monitor the performance of the system through the Hive LLAP dashboards. The following dashboards are available:

Hive LLAP Heatmap: Shows all the nodes that are running LLAP daemons, with percentage summaries for available executors and cache. This dashboard enables you to identify the hotspots in the cluster in terms of executors and cache.

Hive LLAP Overview: Shows the aggregated information across all of the clusters: for example, the total cache memory from all the nodes. This dashboard enables you to see that your cluster is configured and running correctly. For example, you might have configured 10 nodes but see executors and cache accounted for only 8 nodes running.

If you find an issue in this dashboard, you can open the LLAP Daemon dashboard to see which node is having the problem.

Hive LLAP Daemon: Metrics that show operating status for a specific Hive LLAP Daemon

Restarting HiveServer2 Interactive

After HiveServer2 starts, it shows up in the Hive Summary tab:

Figure 2.7. Hive Summary Tab


To restart the entire component, click the HiveServer2 Interactive link and select the Started>Restart link.

Figure 2.8. HiveServer2 Restart


Restarting LLAP

To restart LLAP (the Slider Application) without having to restart the Hive server, select Service Actions>Restart LLAP in the Apache Ambari interface:

Figure 2.9. LLAP Restart


This step is useful if you have added new UDFs, for example, and do not want to shut down the HiveServer.

LLAP on Your Cluster

After setup, LLAP is transparent to Apache Hive users and Business Intelligence tools. LLAP runs on YARN as a Slider Application. It can be monitored through the Resource Manager UI or by using Apache Slider and YARN command line tools. Running through Slider enables you to easily open your cluster, share resources with other applications, remove your cluster, and flexibly utilize your resources. For example, you could run a large LLAP cluster during the day for BI tools, and then reduce usage during nonbusiness hours to use the cluster resources for ETL processing.

Figure 2.10. LLAP on Your Cluster


On your cluster, an extra HiveServer2 instance is installed that is dedicated to interactive queries and LLAP. You can see this HiveServer2 instance listed in the Hive Summary page:

Figure 2.11. Hive Summary


In the Resource Manager UI, you can see the LLAP daemons themselves through the Apache Slider YARN application that is running apps on the queue:

Figure 2.12. Resource Manager UI


The Apache Tez Application Masters are the same as the selected concurrency. If you selected a total concurrency of 5, you see 5 Tez Application Masters. The following example shows selecting a concurrency of 2:

Figure 2.13. Concurrency Setting


The Cluster Capacity slider is also very important in understanding how LLAP behaves on your cluster. Note that selecting, for example, 60% of cluster capacity on a 10-node cluster does not mean that LLAP runs on 6 nodes. HDP runs LLAP on n full nodes, after accounting for resources required by the Slider Application and the Tez Application Masters.