YARN Resource Management
Also available as:
PDF
loading table of contents...

Using CPU Scheduling

MapReduce Jobs Only

If you primarily run MapReduce jobs on your cluster, you probably will not see much of a change in performance if you enable CPU scheduling. The dominant resource for MapReduce is memory, so the DRF scheduler continues to balance MapReduce jobs in a manner similar to the default resource calculator. In the case of a single resource, the DRF reduces to max-min fairness for that resource.

Mixed Workloads

One example of a mixed workload is a cluster that runs both MapReduce and Storm on YARN. MapReduce is not CPU-constrained (MapReduce containers do require much CPU). Storm on YARN is CPU-constrained: its containers require more CPU than memory. As you start adding Storm jobs along with MapReduce jobs, the DRF scheduler tries to balance memory and CPU resources, but you may start to see some degradation in performance. If you then add more CPU-intensive Storm jobs, individual jobs start to take longer to run as the cluster CPU resources are consumed.

CGroups can be used along with CPU scheduling to help manage mixed workloads. CGroups provides isolation for CPU-intensive processes such as Storm on YARN, thereby enabling you to predictably plan and constrain the CPU-intensive Storm containers.

You can also use node labels in conjunction with CPU scheduling and CGroups to restrict Storm on YARN jobs to a subset of cluster nodes.