1.1. Create and Configure YARN Capacity Scheduler Queues

Capacity Scheduler queues can be used to allocate cluster resources among users and groups. These settings can be accessed from Ambari > YARN > Configs > Scheduler or in capacity-scheduler.xml. YARN must be restarted in order for queues to take effect.

To demonstrate how to set up Capacity Scheduler queues, let’s use the following simple configuration that might be used to separate short and long-running queries into two separate queues.

  • hive1 -- this queue will be used for short-duration queries, and will be assigned 50% of cluster resources.

  • hive2 -- this queue will be used for longer-duration queries, and will be assigned 50% of cluster resources.

The following capacity-scheduler.xml settings in would be used to implement this configuration:

yarn.scheduler.capacity.root.queues=hive1,hive2
yarn.scheduler.capacity.root.hive1.capacity=50
yarn.scheduler.capacity.root.hive2.capacity=50

Let’s also set limits on usage for these queues and their users:

yarn.scheduler.capacity.root.hive1.maximum-capacity=50
yarn.scheduler.capacity.root.hive2.maximum-capacity=50
yarn.scheduler.capacity.root.hive1.user-limit=1
yarn.scheduler.capacity.root.hive2.user-limit=1

The value of “50” for maximum-capacity means that queue users are restricted to 50% of the queue capacity (hard limit). If the maximum-capacity were more than 50%, the queue could use more than its capacity when there are other idle resources in the cluster. However, any one user can still only use up to the configured queue capacity. The default value of "1" for user-limit means that any single user in the queue can at maximum only occupy 1x the queue’s configured capacity. These settings prevent users in any one queue from monopolizing resources across all queues in a cluster.

The preceding example represents a very basic introduction to queues. For more detailed information on allocating cluster resources using Capacity Scheduler queues, see Capacity Scheduler.