Deployment of Containerized Spark
The Hadoop distributed cache mechanism ensures that the base Spark and Hadoop libraries along with the related configuration, which are installed on the gateway hosts, are distributed automatically to all the Spark hosts in the cluster. YARN automatically mounts the base libraries to the Docker containers where the Spark executors also run.
In addition, any binaries (–files, –jars and other such files) that the user explicitly includes at the time of application submission, are also made available through distributed cache.
The following diagram outlines how containerized Spark is deployed on YARN: