1. Run the Spark Pi example

The Pi program tests compute-intensive tasks by calculating pi using an approximation method. The program “throws darts” at a circle -- it generates points in the unit square ((0,0) to (1,1)) and sees how many fall within the unit circle. The result approximates pi.

To run Spark Pi:

  1. Log on as a user with HDFS access--for example, your spark user (if you defined one) or hdfs. Navigate to a node with a Spark client and access the spark-client directory:

    su hdfs

    cd /usr/hdp/current/spark-client

  2. Submit the Spark Pi job:

    ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1 lib/spark-examples*.jar 10

    The job should complete without errors. It should produce output similar to the following:

    15/04/10 17:29:35 INFO Client:
            client token: N/A
            diagnostics: N/A
            ApplicationMaster host: N/A
            ApplicationMaster RPC port: 0
            queue: default
            start time: 1428686924325
            final status: SUCCEEDED
            tracking URL: http://blue1:8088/proxy/application_1428670545834_0009/
            user: hdfs

    To view job status in a browser, copy the URL tracking from the job output and go to the associated URL.

  3. Job output should list the estimated value of pi. In the following example, output was directed to stdout:

    Log Type: stdout
    Log Upload Time: 22-Mar-2015 17:13:33
    Log Length: 23
    Pi is roughly 3.142532