6. NameNode Manual Failover and Status Check

The hdfs haadmin command can be used to manually failover a NameNode, and to check NameNode status.

Running the hdfs haadmin command without any arguments returns a list of the available subcommands:

Usage: DFSHAAdmin [-ns <nameserviceId>]
    [-transitionToActive <serviceId>]
    [-transitionToStandby <serviceId>]
    [-failover [--forcefence] [--forceactive] <serviceId> <serviceId>]
    [-getServiceState <serviceId>]
    [-checkHealth <serviceId>]
    [-help <command>

This section provides a description of each of these subcommands.

  • failover

    Initiates a failover between two NameNodes.

    Usage:

    hdfs haadmin -failover <target standby (service id of namenode)> <target active (service id of namenode)>

    This subcommand causes a failover from the first provided NameNode to the second. It can be used for testing or when hardware maintenance needs to be performed on the active machine.

    • If the first NameNode is in the Active state, an attempt will be made to gracefully transition it to the Standby state. If this fails, the fencing methods (as configured by dfs.ha.fencing.methods) will be attempted in order until one succeeds. Only after this process will the second NameNode be transitioned to the Active state. If the fencing methods fail, the second NameNode is not transitioned to Active state and an error is returned. For more information about configuring fencing methods, see Configure NameNode HA Cluster.

    • If the first NameNode is in the Standby state, the command will execute and say the failover is successful, but no failover occurs and the two NameNodes will remain in their original states.

    You can use the /etc/hadoop/conf/hdfs-site.xml file to determine the serviceId of each NameNode. In a High Availability (HA) cluster, the NameNode ID can be found in the dfs.namenode.rpc-address.[$nameservice ID].[$name node ID] property. The following is an example of NameNodes with corresponding serviceId values nn1 and nn2:

    <property> 
    <name>dfs.namenode.rpc-address.myHAcluster.nn1</name> 
    <value>63hdp20ambari252.example.com:8020</value> 
    </property> 
    
    <property> 
    <name>dfs.namenode.rpc-address.myHAcluster.nn2</name> 
    <value>63hdp20ambari251.example.com:8020</value> 
    </property>

    Example:

    hdfs haadmin -failover nn1 nn2

    For the preceding example, assume that nn1 was Active and nn2 was Standby prior to running the failover command. After the command is executed, nn2 will be Active and nn1 will be Standby.

    If nn1 is Standby and nn2 is Active prior to running the failover command, the command will execute and say the failover is successful, but no failover occurs and nn1 and nn2 will remain in their original states.

  • getServiceState

    Determines whether a given NameNode is in an Active or Standby state.

    This subcommand connects to the referenced NameNode, determines its current state, and prints either "standby" or "active" to STDOUT. This subcommand might be used by cron jobs or monitoring scripts.

    Usage:

    hdfs haadmin -getServiceState <serviceId of NameNode>

    Example

    hdfs haadmin -getServiceState nn1
  • checkHealth

    Checks the health of a given NameNode.

    This subcommand connects to the referenced NameNode and checks its health. The NameNode is capable of performing diagnostics that include checking to see if internal services are running as expected. This command will return 0 if the NameNode is healthy; otherwise it will return a non-zero code.

    [Note]Note

    This subcommand is in implementation phase and currently always returns success (0) unless the given NameNode is down.

    Usage:

    hdfs haadmin -checkHealth <serviceId of NameNode>

    Example

    hdfs haadmin -checkHealth nn1
  • transitionToActive and transitionToStandby

    Transitions the state of the given NameNode to Active or Standby.

    Usage:

    hdfs haadmin -transitionToActive <serviceId of NameNode>
    hdfs haadmin --transitionToStandby <serviceId of NameNode>
    [Note]Note

    These commands do not attempt to perform any fencing, and therefore are not recommended. Instead, Hortonworks recommends using the -failover subcommand described above.


loading table of contents...