Run Docker Containers on YARN

You can configure YARN to run Docker containers.

Docker containerization makes it easier to package and distribute applications, thereby allowing you to focus on running and fine-tuning applications, as well as significantly reducing "time to deployment" and “time to insight." Docker containerization also provides isolation, and enables you to run multiple versions of the same applications side-by-side. You can have a stable production version of an application, while also evaluating test versions.

Background: the YARN ContainerExecutor

Since its inception, YARN has supported the notion of the ContainerExecutorabstraction. The ContainerExecutor is responsible for:

Localizing (downloading and setting up) the resources required for running the container on any given node.
Setting up the environment for the container to run (such as creating the directories for the container).
Managing the life cycle of the YARN container (launching, monitoring, and cleaning up the container).

In the past, Apache Hadoop shipped with three ContainerExecutors – DefaultContainerExecutor, LinuxContainerExecutor, and WindowsSecureContainerExecutor. Each of these was created to address a specific need. DefaultContainerExecutor is meant for non-secure clusters where all YARN containers are launched as the same user as the NodeManager (providing no security). LinuxContainerExecutor is meant for secure clusters where tasks are launched and run as the user who submitted them. WindowsSecureContainerExecutorprovides similar functionality but on Windows.

The Experimental DockerContainerExecutor

As Docker grew in popularity, DockerContainerExecutor was added to the list of ContainerExecutors. DockerContainerExecutor was the first attempt to add support for Docker in YARN. It would allow users to run tasks as Docker containers. It added support in YARN for Docker commands to allow the NodeManager to launch, monitor and clean up Docker containers as it would for any other YARN container.

There were a couple of limitations of the DockerContainerExecutor – some related to implementation and some architectural. The limits related to implementation were things such a not allowing users to specify the image they wished to run (it required all users to use the same image).

However, the bigger architectural issue is that in YARN, you can use one ContainerExecutor per NodeManager. All tasks will use the ContainerExecutor specified in the node’s configuration. As a result, once the cluster was configured to use DockerContainerExecutor, users would be unable to launch regular MapReduce, Tez, or Spark jobs. Additionally, implementing a new ContainerExecutor means that all of the benefits of the existing LinuxContainerExecutor (such as cgroups and traffic shaping) now need to be reimplemented in the new ContainerExecutor. As a result of these challenges, DockerContainerExecutor has been deprecated in favor of a newer abstraction – container runtimes – and DockerContainerExecutor will be removed in a future Apache Hadoop release.

Introducing Container Runtimes

To address these deficiencies, YARN added support for container runtimes in LinuxContainerExecutor. Container runtimes split up the ContainerExecutor into two distinct pieces – the underlying framework required to carry out the functionalities, and a runtime piece that can change depending on the type of container you wish to launch. With these changes, we solve the architectural problem of being able to run regular YARN process containers alongside Docker containers. The life cycle of the Docker container is managed by YARN just as any other container. The change also allows YARN to add support for other containerization technologies in the future.

Currently, two runtimes exist; the process tree based runtime (DefaultLinuxContainerRuntime) and the new Docker runtime (DockerLinuxContainerRuntime). The process-tree based runtime launches containers the same way YARN has always done, whereas, the Docker runtime launches Docker containers. Interfaces exist that can be extended to add new container runtimes. Support for container runtimes, and specifically the DockerLinuxContainerRuntime, is being add through YARN-3611.