Configuring Fault Tolerance
Also available as:
PDF
loading table of contents...

Administrative Commands

The subcommands of hdfs haadmin are extensively used for administering an HA cluster.

Running the hdfs haadmin command without any additional arguments will display the following usage information:

Usage: HAAdmin [-ns <nameserviceId>]
 [-transitionToActive <serviceId>]
 [-transitionToStandby <serviceId>]
 [-failover [--forcefence] [--forceactive] <serviceId> <serviceId>]
 [-getServiceState <serviceId>]
 [-checkHealth <serviceId>]
 [-help <command>

This section provides high-level uses of each of these subcommands.

  • transitionToActive and transitionToStandby: Transition the state of the given NameNode to Active or Standby.

    These subcommands cause a given NameNode to transition to the Active or Standby state, respectively. These commands do not attempt to perform any fencing, and thus should be used rarely. Instead, Hortonworks recommends using the following subcommand:

    hdfs haadmin -failover 
  • failover: Initiate a failover between two NameNodes.

    This subcommand causes a failover from the first provided NameNode to the second.

    • If the first NameNode is in the Standby state, this command transitions the second to the Active state without error.

    • If the first NameNode is in the Active state, an attempt will be made to gracefully transition it to the Standby state. If this fails, the fencing methods (as configured by dfs.ha.fencing.methods) will be attempted in order until one succeeds. Only after this process will the second NameNode be transitioned to the Active state. If the fencing methods fail, the second NameNode is not transitioned to Active state and an error is returned.

  • getServiceState: Determine whether the given NameNode is Active or Standby.

    This subcommand connects to the provided NameNode, determines its current state, and prints either "standby" or "active" to STDOUT appropriately. This subcommand might be used by cron jobs or monitoring scripts.

  • checkHealth: Check the health of the given NameNode.

    This subcommand connects to the NameNode to check its health. The NameNode is capable of performing some diagnostics that include checking if internal services are running as expected. This command will return 0 if the NameNode is healthy else it will return a non-zero code.

    Note
    Note

    This subcommand is in implementation phase and currently always returns success unless the given NameNode is down.