Command Line Installation
Also available as:
PDF
loading table of contents...

Installing Spark 2

When you install Spark 2, the following directories are created:

  • /usr/hdp/current/spark2-client for submitting Spark 2 jobs

  • /usr/hdp/current/spark2-history for launching Spark 2 master processes, such as the Spark 2 History Server

  • /usr/hdp/current/spark2-thriftserver for the Spark 2 Thrift Server

To install Spark 2:

  1. Search for Spark 2 in the HDP repo:

    • For RHEL or CentOS:

      yum search spark2

    • For SLES:

      zypper install spark2

    • For Ubuntu and Debian:

      apt-cache spark2

    This shows all the versions of Spark 2 available. For example:

    spark_<version>_<build>-master.noarch : Server for Spark 2 master
    spark_<version>_<build>-python.noarch : Python client for Spark 2
    spark_<version>_<build>-worker.noarch : Server for Spark 2 worker
    spark_<version>_<build>.noarch : Lightning-Fast Cluster Computing
  2. Install the version corresponding to the HDP version you currently have installed.

    • For RHEL or CentOS:

      yum install spark2_<version>-master spark_<version>-python

    • For SLES:

      zypper install spark2_<version>-master spark_<version>-python

    • For Ubuntu and Debian:

      apt-get install spark2_<version>-master spark_<version>-python

  3. Before you launch the Spark 2 Shell or Thrift Server, make sure that you set $JAVA_HOME:

    export JAVA_HOME=<path to JDK 1.8>

  4. Change owner of /var/log/spark2 to spark2:hadoop.

    chown spark2:hadoop /var/log/spark2