Managing Data Operating System
Also available as:
PDF
loading table of contents...

Example of Running SparkR with a Docker Image

You can use a SparkR program with a Docker image that includes the R binary and the required R packages.

Consider an example of using SparkR with a Docker container in the YARN client mode configuration. You can specify the required Docker configuration and the Dockerfile as specified.

Docker Configuration

/usr/hdp/current/spark2-client/bin/sparkR --master yarn

--conf spark.executorEnv.YARN_CONTAINER_RUNTIME_TYPE=docker

--conf spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=spark-r-demo

--conf spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS=/etc/passwd:/etc/passwd:ro

Dockerfile

FROM centos

RUN yum install -y epel-release
RUN yum -y install java-1.8.0-openjdk java-1.8.0-openjdk-devel
RUN yum -y install R R-devel openssl-devel

#setup R configs
RUN echo "r <- getOption('repos'); r['CRAN'] <- 'http://cran.us.r-project.org'; options(repos = r);" > ~/.Rprofile

#Install necessary R packages
RUN Rscript -e "install.packages('yhatr')"
RUN Rscript -e "install.packages('ggplot2')"
RUN Rscript -e "install.packages('plyr')"
RUN Rscript -e "install.packages('reshape2')"
RUN Rscript -e "install.packages('forecast')"
RUN Rscript -e "install.packages('stringr')"
RUN Rscript -e "install.packages('lubridate')"
RUN Rscript -e "install.packages('randomForest')"
RUN Rscript -e "install.packages('rpart')"
RUN Rscript -e "install.packages('e1071')"
RUN Rscript -e "install.packages('kknn')"

Example Program