Spark Guide
Also available as:
PDF
loading table of contents...

Chapter 2. Prerequisites

Before installing Spark, make sure your cluster meets the following prerequisites.

Table 2.1. Prerequisites for Running Spark

PrerequisiteDescription
HDP Cluster Stack Version
  • 2.4.2 or later

(Optional) Ambari Version
  • 2.2.2 or later

Software dependencies
  • Spark requires HDFS and YARN

  • PySpark requires Python to be installed on all nodes

  • (Optional) The Spark Thrift Server requires Hive to be deployed on your cluster

  • (Optional) For optimal performance with MLlib, consider installing the netlib-java library.

  • SparkR (tech preview) requires R binaries to be installed on all nodes


[Note]Note

When you upgrade your cluster to HDP 2.4.2, Spark is automatically upgraded to version 1.6.1.