6.2. Using Job Charts

If you are tracking a Hive or Pig query that has been broken down into multiple interdependent jobs, you can use the DAG/Charts screen to see a more complete picture. The DAG tab displays a Directed Acyclic Graph (DAG) for the set of interdependent jobs and the Charts tab displays Timeline and Tasks information related to maps + reduces for each job in the set.

For example let’s use a Pig script and the “wordcount” example.

From the job row overview description, we can see that the Pig script executed in three (3) interdependent jobs and required a total execution time of 99.67 seconds. This is the execution time for each job plus time for submitting and launching each job.

Now click on the job. The DAG/Charts screen pops up.

The DAG relationship of three interdependent jobs is displayed. You can see the sequence in which each interdependent job was executed as well as other information, including the duration of each execution.

[Note]Note

Pig scripts that include an “exec” call will break the script into multiple scripts (and subsequently, the interdependent jobs for those scripts). This causes the DAG to only show the jobs for the first script of the multiple scripts.

Click on the Charts tab to view the Job Timeline and Job Tasks View graphs. These graphs show timing information for each task executed as part of a job. The Y-axis of the Timeline graph shows the number of tasks executed while the Y-axis on the Tasks graph shows the task runtime. Both graphs show the job timeline on the X-axis and you can hover over the X-axis to see the absolute date + time in GMT.

These graphs represent the “wordcount” example from earlier. The job was submitted at 18:07:09 GMT and finished at 18:07:36 GMT and both graph’s X-axis run from :09 seconds to :36 seconds. On the Timeline graph, a map task starts at :19 seconds and runs for 3 seconds, then a shuffle task runs for about 8 seconds and finally a reduce task for 1 second. On the Tasks graph, you can see the map + reduce tasks (shown as circles) with run-time shown on the Y-axis (about 3 seconds and 9 seconds respectively). You can hover on each task circle to see details, such as Wait-time and I/O. The size of the circle shown is based on the amount of I/O for the task.


loading table of contents...