Apache Spark Component Guide
Also available as:
PDF
loading table of contents...

Chapter 2. Installing Spark

Before installing Spark, ensure that your cluster meets the following prerequisites:

  • HDP cluster stack version 2.6.0 or later

  • (Optional) Ambari version 2.5.0 or later

  • HDFS and YARN deployed on the cluster

You can choose to install Spark version 1, Spark version 2, or both. (To specify which version of Spark runs a job, see Specifying Which Version of Spark to Run.)

Additionally, note the following requirements and recommendations for optional Spark services and features:

  • Spark Thrift server requires Hive deployed on the cluster.

  • SparkR requires R binaries installed on all nodes.

  • SparkR is not currently supported on SLES.

  • Spark access through Livy requires the Livy server installed on the cluster.

    • For clusters managed by Ambari, see Installing Spark Using Ambari.

    • For clusters not managed by Ambari, see "Installing and Configuring Livy" in the Spark or Spark 2 chapter of the Command Line Installation Guide, depending on the version of Spark installed on your cluster.

  • PySpark and associated libraries require Python version 2.7 or later, or Python version 3.4 or later, installed on all nodes.

  • For optimal performance with MLlib, consider installing the netlib-java library.