Non-Ambari Cluster Installation Guide
Also available as:
PDF
loading table of contents...

Installing Spark

When you install Spark, the following directories will be created:

  • /usr/hdp/current/spark-client for submitting Spark jobs

  • /usr/hdp/current/spark-history for launching Spark master processes, such as the Spark History Server

  • /usr/hdp/current/spark-thriftserver for the Spark Thrift Server

To install Spark:

  1. Search for Spark in the HDP repo:

    • For RHEL or CentOS:

      yum search spark

    • For SLES:

      zypper install spark

    • For Ubuntu and Debian:

      apt-cache spark

    This will show all the versions of Spark available. For example:

    spark_2_3_4_0_3485-master.noarch : Server for Spark master
    spark_2_3_4_0_3485-python.noarch : Python client for Spark
    spark_2_3_4_0_3485-worker.noarch : Server for Spark worker
    spark_2_3_4_0_3485.noarch : Lightning-Fast Cluster Computing
  2. Install the version corresponding to the HDP version you currently have installed.

    • For RHEL or CentOS:

      yum install spark_<version>-master spark_<version>-python

    • For SLES:

      zypper install spark_<version>-master spark_<version>-python

    • For Ubuntu and Debian:

      apt-get install spark_<version>-master apt-get install spark_<version>-python

  3. Before you launch the Spark Shell or Thrift Server, make sure that you set $JAVA_HOME:

    export JAVA_HOME=<path to JDK 1.8>