Apache Hive Performance Tuning
Also available as:
PDF
loading table of contents...

Chapter 6. Optimizing the Hive Execution Engine

To maximize the data analytics capabilities of applications that query Hive, you might need to tune the Apache Tez execution engine. Tez is an advancement over earlier application frameworks for Hadoop data processing, such as MapReduce. The Tez framework is required for high-performance batch workloads and for all interactive applications.

Explain Plans

When you use Hive for interactive queries, you can generate explain plans. An explain plan shows you the execution plan of a query by revealing the series of operations that occur when a particular query is run. By understanding the plan, you can determine if you want to adjust your application development.

For example, an explain plan might help you see why the query optimizer runs a query with a shuffle operation instead of a hash JOIN. With this knowledge, you might want to rewrite queries in the application so that they better align with user goals and the environment.

Hive in HDP can generate two types of explain plans. A textual plan, such as information printed in a CLI query editor, displays the execution plan in descriptive lines. A graphical plan shows the execution plan as a flow diagram.

Tuning the Execution Engine Manually

If you encounter subpar performance of your Hive queries after debugging them with Tez View, then you might need to adjust Tez Service configuration properties.

Tune Tez Service Configuration Properties

About this Task

[Important]Important

Check and adjust the following property settings only if you think these execution engine properties degrade the performance of Hive LLAP queries.

Advanced users: If you want to add or configure a property that is not listed in the table below, open the Custom tez-site section of the Configs tab to enter or edit the custom property.

Steps

  1. In Ambari, open Services > Tez > Configs tab.

  2. Use the following table as a reference checklist.

    [Tip]Tip

    Ambari automatically customizes the value for the tez.am.resource.memory.mb property to suit your cluster profile. Generally, you should not change the default value of this property at this stage if you are not changing resources on the cluster.

  3. You can view the properties by either of these methods:

    Type each property name in the Filter field in the top right corner.
    Open the General, Advanced tez-env, etc., sections and scan the lists of each category.
  4. Click Save.

  5. If prompted to restart, restart the Tez Service.

Table 6.1. Settings for Execution Engine Properties

PropertySetting Guideline If Manual Configuration Is NeededDefault Value in Ambari

tez.am.resource.memory.mb

4 GB maximum for most sites

Depends on your environment

tez.session.am.dag.submit.

timeout.secs

300 minimum300

tez.am.container.idle.

release-timeout-min.millis

20000 minimum

10000

tez.am.container.idle.

release-timeout-max.millis

40000 minimum

20000

tez.shuffle-vertex-manager.desired-task-input-size

Increase for large ETL jobs that run too long

No default value set

tez.min.partition.factor

Increase for more reducers

Decrease for fewer reducers

0.25

tez.max.partition.factor

Increase for more reducers

Decrease for fewer reducers

2.0

tez.shuffle-vertex-manager.min-task-parallelism

Set a value if reducer counts are too low, even if the tez.shuffle-vertex-manager.min-src-fraction property is already adjusted

No default value set

tez.shuffle-vertex-manager.min-src-fraction

Increase to start reducers later

Decrease to start reducers sooner

0.2
tez.shuffle-vertex-manager.max-src-fraction

Increase to start reducers later

Decrease to start reducers sooner

0.4

hive.vectorized.

execution.enabled

true0.4

hive.mapjoin.hybridgrace.

hashtable

true for slower but safer processing

false for faster processing

false