Managing Data Operating System
Also available as:
PDF
loading table of contents...

YARN Client Mode Configuration

In the YARN client mode configuration, the Spark driver runs within the submission client’s JVM on the gateway machines as a Java process. The ApplicationMaster runs on the cluster separate from the driver.

The application is submitted as part of the initialization of SparkContext by the Spark driver. The ApplicationMaster is a proxy for managing requests such as resource allocation and container status on behalf of the Spark driver. The Spark executors run within Docker containers.

The following image provides an overview of the client mode configuration.

Because the Spark driver runs as a Java process and not within a YARN container, specifying any driver-specific yarn configuration to use docker or docker images will not take effect.

During application submission, you must specify –deploy-mode=client. In addition, you must specify the executor's container configuration using environment variables as follows:
spark.executorEnv.YARN_CONTAINER_RUNTIME_TYPE=docker

spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=<spark executor’s docker-image>

spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS=<any volume mounts needed by the
spark application>