Spark Guide
Also available as:
PDF
loading table of contents...

Configuring Cluster Dynamic Resource Allocation Manually

To configure a cluster to run Spark applications with dynamic resource allocation:

  1. Add the following properties to the spark-defaults.conf file associated with your Spark installation. (For general Spark applications, this file typically resides at $SPARK_HOME/conf/spark-defaults.conf.)

    • Set spark.dynamicAllocation.enabled to true

    • Set spark.shuffle.service.enabled to true

    (Optional) The following properties specify a starting point and range for the number of executors. Note that initialExecutors must be greater than or equal to minExecutors, and less than or equal to maxExecutors.

    • spark.dynamicAllocation.initialExecutors

    • spark.dynamicAllocation.minExecutors

    • spark.dynamicAllocation.maxExecutors

    For a description of each property, see Dynamic Resource Allocation Properties.

  2. Start the shuffle service on each worker node in the cluster. (The shuffle service runs as an auxiliary service of the NodeManager.)

    1. In the yarn-site.xml file on each node, add spark_shuffle to yarn.nodemanager.aux-services, then set yarn.nodemanager.aux-services.spark_shuffle.class to org.apache.spark.network.yarn.YarnShuffleService.

    2. Review and, if necessary, edit spark.shuffle.service.* configuration settings. For more information, see the Apache Spark Shuffle Behavior documentation.

    3. Restart all NodeManagers in your cluster.