Command Line Installation
Also available as:
PDF
loading table of contents...

Configure Hive and HiveServer2 for Tez

The hive-site.xml file in the HDP companion files includes the settings for Hive and HiveServer2 for Tez.

If you have already configured the hive-site.xmlconnection properties for your Hive metastore database, the only remaining task would be to adjust hive.tez.container.size and hive.tez.java.opts values as described in the following section. You can also use the HDP utility script described earlier in this guide to calculate these Tez memory configuration settings.

Hive-on-Tez Configuration Parameters

Apart from the configurations generally recommended for Hive and HiveServer2 and included in the hive-site.xml file in the HDP companion files, for a multi-tenant use case, only the following configurations are required in the hive-site.xml configuration file to configure Hive for use with Tez.

Table 10.1. Hive Configuration Parameters

Configuration Parameter

Description

Default Value

hive.execution.engine

This setting determines whether Hive queries are executed using Tez or MapReduce.

If this value is set to "mr," Hive queries are executed using MapReduce. If this value is set to "tez," Hive queries are executed using Tez. All queries executed through HiveServer2 use the specified hive.execution.engine setting.

hive.tez.container.size

The memory (in MB) to be used for Tez tasks.

-1 (not specified) If this is not specified, the memory settings from the MapReduce configurations (mapreduce.map.memory.mb) are used by default for map tasks.

hive.tez.java.opts

Java command line options for Tez.

If this is not specified, the MapReduce java opts settings (mapreduce.map.java.opts) are used by default.

hive.server2.tez.default.queues

A comma-separated list of queues configured for the cluster.

The default value is an empty string, which prevents execution of all queries. To enable query execution with Tez for HiveServer2, this parameter must be configured.

hive.server2.tez.sessions. per.default.queue

The number of sessions for each queue named in the hive.server2.tez.default.queues.

1; Larger clusters might improve performance of HiveServer2 by increasing this number.

hive.server2.tez.initialize.default. sessions

Enables a user to use HiveServer2 without enabling Tez for HiveServer2. Users might potentially want to run queries with Tez without a pool of sessions.

false

hive.server2.enable.doAs

Required when the queue-related configurations above are used.

false


Examples of Hive-Related Configuration Properties:

<property>
     <name>hive.execution.engine</name>
     <value>tez</value>
</property>

<property>
     <name>hive.tez.container.size</name>
     <value>-1</value>
     <description>Memory in mb to be used for Tez tasks. If this is not specified (-1)
     then the memory settings for map tasks are used from mapreduce configuration</description>
</property>

<property>
     <name>hive.tez.java.opts</name>
     <value></value>
     <description>Java opts to be specified for Tez tasks. If this is not specified
     then java opts for map tasks are used from mapreduce configuration</description>
</property>

<property>
     <name>hive.server2.tez.default.queues</name>
     <value>default</value>
</property>

<property>
     <name>hive.server2.tez.sessions.per.default.queue</name>
     <value>1</value>
</property>

<property>
     <name>hive.server2.tez.initialize.default.sessions</name>
     <value>false</value>
</property>

<property>
     <name>hive.server2.enable.doAs</name>
     <value>false</value>
</property> 
[Note]Note

Users running HiveServer2 in data analytic tools such as Tableau must reconnect to HiveServer2 after switching between the Tez and MapReduce execution engines.

You can retrieve a list of queues by executing the following command: hadoop queue -list.

Using Hive-on-Tez with Capacity Scheduler

You can use the tez.queue.name property to specify which queue is used for Hive-on-Tez jobs. You can also set this property in the Hive shell, or in a Hive script.