Managing Data Operating System
Also available as:
PDF
loading table of contents...

Configure CPU Scheduling and Isolation

You can configure CPU scheduling on your Ambari or non-Ambari cluster to allocate the best possible nodes having the required CPU resources for application containers.

Enable CPU scheduling and isolation on an Ambari cluster

To enable CPU scheduling on an Ambari cluster, select YARN > CONFIGS on the Ambari dashboard, then click CPU Scheduling and Isolation under CPU. Click Save, then restart all cluster components that require a restart.

Enable CPU scheduling on a non-Ambari cluster

  1. On the ResourceManager and NodeManager hosts, enable CPU scheduling in capacity-scheduler.xml by replacing the DefaultResourceCalculator portion of the <value> string with DominantResourceCalculator:

    Property: yarn.scheduler.capacity.resource-calculator

    Value: org.apache.hadoop.yarn.util.resource.DominantResourceCalculator

    Example:

    <property>
     <name>yarn.scheduler.capacity.resource-calculator</name>
     <!-- <value>org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator</value> -->
     <value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value>
    </property>
  2. Set vcores in yarn-site.xml

    On the ResourceManager and NodeManager hosts, set the number of vcores to match the number of physical CPU cores on the NodeManager host by providing the number of physical cores as the <value>.

    You should set the number of vcores to match the number of physical CPU cores on the NodeManager hosts. Set the following property in the /etc/hadoop/conf/yarn-site.xml file on the ResourceManager and NodeManager hosts:

    Property: yarn.nodemanager.resource.cpu-vcores

    Value: <number_of_physical_cores>

    Example:

    <property>
     <name>yarn.nodemanager.resource.cpu-vcores</name>
    <value>16</value>
    </property>

Enable cgroups along with CPU scheduling. Cgroups is used as the isolation mechanism for CPU processes. With cgroups strict enforcement activated, each CPU process receives only the resources it requests. Without cgroups activated, the DRF scheduler attempts to balance the load, but unpredictable behavior may occur. For more information, see Enabling cgroups.