Spark Guide
Also available as:
PDF
loading table of contents...

Chapter 4. Developing Spark Applications

Apache Spark is designed for fast application development and fast processing. Spark Core is the underlying execution engine; other services such as Spark SQL, MLlib, and Spark Streaming are built on top of the Spark Core.

To run Spark applications, use the spark-submit script in the Spark bin directory to launch applications on a cluster. Alternately, to use the API interactively you can launch an interactive shell for Scala (spark-shell), Python (pyspark), or SparkR. Note: Each interactive shell automatically creates SparkContext in a variable called sc.

For more information about getting started with Spark, see the Apache Spark Quick Start. For more extensive information about application development, see the Apache Spark Programming Guide and Submitting Applications.

The remainder of this chapter contains basic coding examples. Subsequent chapters describe how to access a range of data sources and analytic capabilities.