Performance Tuning Guide
Also available as:
PDF

Tuning for a Mix of Interactive and Batch Hive Queries

In general, adjustments for interactive queries will not adversely affect batch queries, so both types of queries can usually run well together on the same cluster. You can use Capacity Scheduler queues to divide cluster resources between batch and interactive queries. For example, you might set up a configuration that allocates 50% of the cluster capacity to a default queue for batch jobs, and two queues for interactive Hive queries, with each assigned 25% of cluster resources:

yarn.scheduler.capacity.root.queues=default,hive1,hive2
yarn.scheduler.capacity.root.default.capacity=50
yarn.scheduler.capacity.root.hive1.capacity=25
yarn.scheduler.capacity.root.hive2.capacity=25

The following settings enable the capacity of the batch queue to expand to 100% when the cluster is not being used (at night, for example). The maximum-capacity of the default batch queue is set to 100%, and the user-limit-factor is increased to 2 to enable the queue users to occupy twice the configured capacity of the queue (50%).

yarn.scheduler.capacity.root.default.maximum-capacity=100
yarn.scheduler.capacity.root.default.user-limit-factor=2