3.2.9. MRv2 Troubleshooting Actions Checklist

  1. Understand the issue. Check to see if the issue exists with all MRv2 jobs. Determine when a particular job or script last worked successfully. Understand the actual and expected behavior and formulate a problem statement.

  2. Verify that all components related to MRv2 are running.  You can use the ps and jps commands to check to see if the processes for dependent components are running. Ensure that all ports are listening, are bound to a process, and accept connection (i.e., firewall issues).

  3. Look at the job details in the Resource Manager  UI.

    1. Use the UI to navigate to the job attempt.

    2. Look at the log file for the failed attempt.

  4. Look at the Resource Manager and the Node Manager log files in the Resource Manager UI, or on the specific nodes.

  5. Use the yarn logs command to collect all of the logs of the Containers.

  6. Check the Job Configuration in the Resource Manager UI to make sure that all of the desired parameters were actually passed on to the job.

  7. Run the MRv2 pi job provided with the HDP examples to see if that job succeeds:

    • If it succeeds, check to see if there is a problem with the client or the data.

    • If it fails there is probably some basic problem with the process or the configuration.

  8. If the job is run through streaming or pipes, run a similar job to troubleshoot.

  9. If the job is started by one of the other HDP components, look at the component-specific guide.

  10. Look for the operating system information and verify that it is supported.

  11. Search the Hortonworks Knowledge Base for a possible solution.

  12. If the issue is still not resolved, log a case in the Hortonworks Support Portal:

    1. Provide all of the information gathered in the preceding steps, along with the information in the “Checklist of Items to Collect” list in the following section.

    2. Tar the configuration files and the log files and attach them to the case.

    3. Inform Hortonworks if it is a Production, Development, or POC environment.

Checklist of Items to Collect

  1. Collect the most recent log files for all of the MRv2 daemons.

  2. Get copies of the following configuration files:

  3. Provide the number of Data Nodes in the cluster, as well as the total number of nodes.

  4. Use the yarn logs command to collect the log files of the Containers for all of the tasks.

  5. How was HDP installed -- with Ambari, or manually with RPM?

  6. Provide hardware specifications: CPU, memory, disk drives, number of network interfaces.