Tuning Apache Spark
Also available as:
PDF

Check Job Status

If a job takes longer than expected or does not finish successfully, check the following to understand more about where the job stalled or failed:

  • To list running applications by ID from the command line, use yarn application –list.

  • To see a description of a resilient distributed dataset (RDD) and its recursive dependencies (useful for understanding how jobs are executed) use toDebugString() on the RDD.

  • To check the query plan when using the DataFrame API, use DataFrame#explain().