Capacity Scheduler queues can be used to allocate cluster resources among users and
groups. These settings can be accessed from Ambari > YARN
> Configs > Scheduler or in
capacity-scheduler.xml
. YARN must be restarted in order for
queues to take effect.
To demonstrate how to set up Capacity Scheduler queues, let’s use the following simple configuration that might be used to separate short and long-running queries into two separate queues.
hive1 -- this queue will be used for short-duration queries, and will be assigned 50% of cluster resources.
hive2 -- this queue will be used for longer-duration queries, and will be assigned 50% of cluster resources.
The following capacity-scheduler.xml
settings in would be used to
implement this configuration:
yarn.scheduler.capacity.root.queues=hive1,hive2 yarn.scheduler.capacity.root.hive1.capacity=50 yarn.scheduler.capacity.root.hive2.capacity=50
Let’s also set limits on usage for these queues and their users:
yarn.scheduler.capacity.root.hive1.maximum-capacity=50 yarn.scheduler.capacity.root.hive2.maximum-capacity=50 yarn.scheduler.capacity.root.hive1.user-limit=1 yarn.scheduler.capacity.root.hive2.user-limit=1
The value of “50” for maximum-capacity
means that queue
users are restricted to 50% of the queue capacity (hard limit). If the
maximum-capacity
were more than 50%, the queue could use more than
its capacity when there are other idle resources in the cluster. However, any one
user can still only use up to the configured queue capacity. The default value of
"1" for user-limit
means that any single user in the queue can at
maximum only occupy 1x the queue’s configured capacity. These settings prevent users
in any one queue from monopolizing resources across all queues in a cluster.
The preceding example represents a very basic introduction to queues. For more detailed information on allocating cluster resources using Capacity Scheduler queues, see Capacity Scheduler.