Configuring the NameNode HA Cluster
First, add High Availability configurations to your HDFS configuration files. Start by taking the HDFS configuration files from the original NameNode in your HDP cluster, and use that as the base, adding various properties to those files.
Add the following configuration options to your
Choose an arbitrary but logical name (for example, mycluster) as the value for
dfs.nameservicesoption. This name will be used for both configuration and authority component of absolute HDFS paths in the cluster.
<property> <name>dfs.nameservices</name> <value>mycluster</value> <description>Logical name for this new nameservice</description> </property>
If you are also using HDFS Federation, this configuration setting should also include the list of other nameservices, HA or otherwise, as a comma-separated list.
- dfs.ha.namenodes.[$nameservice ID]
Provide a list of comma-separated NameNode IDs. DataNodes use this this property to determine all the NameNodes in the cluster.
For example, for the nameservice ID
myclusterand individual NameNode IDs
nn3, the value of this property is:
<property> <name>dfs.ha.namenodes.mycluster</name> <value>nn1,nn2,nn3</value> <description>Unique identifiers for each NameNode in the nameservice</description> </property>Note
The minimum number of NameNodes for HA is two, but you can configure more. You should not exceed five NameNodes due to communication overhead. Three NameNodes are recommended.
- dfs.namenode.rpc-address.[$nameservice ID].[$name node ID]
Use this property to specify the fully-qualified RPC address for each NameNode to listen on.
Continuing with the previous example, set the full address and IPC port of the NameNode process for the NameNode IDs above --
Note that there will be three separate configuration options.
<property> <name>dfs.namenode.rpc-address.mycluster.nn1</name> <value>machine1.example.com:8020</value> </property> <property> <name>dfs.namenode.rpc-address.mycluster.nn2</name> <value>machine2.example.com:8020</value> </property> <property> <name>dfs.namenode.rpc-address.mycluster.nn3</name> <value>machine3.example.com:9820</value> </property>
- dfs.namenode.http-address.[$nameservice ID].[$name node ID]
Use this property to specify the fully-qualified HTTP address for each NameNode to listen on.
Set the addresses for the NameNodes HTTP servers to listen on. For example:
<property> <name>dfs.namenode.http-address.mycluster.nn1</name> <value>machine1.example.com:9870</value> </property> <property> <name>dfs.namenode.http-address.mycluster.nn2</name> <value>machine2.example.com:9870</value> </property> <property> <name>dfs.namenode.http-address.mycluster.nn3</name> <value>machine3.example.com:9870</value> </property>Note
If you have Hadoop security features enabled, set the https-address for each NameNode.
Use this property to specify the URI that identifies a group of JournalNodes (JNs) where the NameNode will write/read edits.
Configure the addresses of the JNs that provide the shared edits storage. The Active nameNode writes to this shared storage and the Standby NameNode reads from this location to stay up-to-date with all the file system changes.
Although you must specify several JournalNode addresses, you must configure only one of these URIs for your cluster.
The URI should be of the form:
The Journal ID is a unique identifier for this nameservice, which allows a single set of JournalNodes to provide storage for multiple federated namesystems. You can reuse the nameservice ID for the journal identifier.
For example, if the JournalNodes for a cluster were running on
node3.example.com, and the nameservice ID were
mycluster, you would use the following value for this setting:
<property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://node1.example.com:8485;node2.example.com: 8485;node3.example.com:8485/mycluster</value> </property>Note
Note that the default port for the JournalNode is
- dfs.client.failover.proxy.provider.[$nameservice ID]
This property specifies the Java class that HDFS clients use to contact the Active NameNode. DFS Client uses this Java class to determine which NameNode is the current Active and therefore which NameNode is currently serving client requests.
ConfiguredFailoverProxyProviderimplementation if you are not using a custom implementation.
<property> <name>dfs.client.failover.proxy.provider.mycluster</name> <value>org.apache.hadoop.hdfs.server.namenode.ha. ConfiguredFailoverProxyProvider</value> </property>
This property specifies a list of scripts or Java classes that will be used to fence the Active NameNode during a failover.
For maintaining system correctness, it is important to have only one NameNode in the Active state at any given time. When using the Quorum Journal Manager, only one NameNode will ever be allowed to write to the JournalNodes, so there is no potential for corrupting the file system metadata from a split-brain scenario. However, when a failover occurs, it is still possible that the previous Active NameNode could serve read requests to clients, which may be out of date until that NameNode shuts down when trying to write to the JournalNodes.
For this reason, it is still recommended to configure some fencing methods even when using the Quorum Journal Manager. To improve the availability of the system in the event the fencing mechanisms fail, it is advisable to configure a fencing method which is guaranteed to return success as the last fencing method in the list. Note that if you choose to use no actual fencing methods, you must set some value for this setting, for example shell(
The fencing methods used during a failover are configured as a carriage-return-separated list, which will be attempted in order until one indicates that fencing has succeeded. The following two methods are packaged with Hadoop:sshfence: SSH to the Active NameNode and kill the process.
sshfence. For information on implementing custom fencing method, see the
The sshfence option SSHes to the target node and uses fuser to kill the process listening on the service's TCP port. In order for this fencing option to work, it must be able to SSH to the target node without providing a passphrase. Ensure that you configure the
dfs.ha.fencing.ssh.private-key-filesoption, which is a comma-separated list of SSH private key files.
<property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/home/exampleuser/.ssh/id_rsa</value> </property>
Optionally, you can also configure a non-standard username or port to perform the SSH. You can also configure a timeout, in milliseconds, for the SSH, after which this fencing method will be considered to have failed. To configure non-standard username or port and timeout, see the properties given below:
<property> <name>dfs.ha.fencing.methods</name> <value>sshfence([[username][:port]])</value> </property> <property> <name>dfs.ha.fencing.ssh.connect-timeout</name> <value>30000</value> </property>shell: Run an arbitrary shell command to fence the Active NameNode.
The shell fencing method runs an arbitrary shell command:
<property> <name>dfs.ha.fencing.methods</name> <value>shell(/path/to/my/script.sh arg1 arg2 ...)</value> </property>
The string between '(' and ')' is passed directly to a bash shell and may not include any closing parentheses.
The shell command will be run with an environment set up to contain all of the current Hadoop configuration variables, with the '_' character replacing any '.' characters in the configuration keys. The configuration used has already had any namenode-specific configurations promoted to their generic forms -- for example
dfs_namenode_rpc-addresswill contain the RPC address of the target node, even though the configuration may specify that variable as
Additionally, the following variables (referring to the target node to be fenced) are also available:
$target_host: Hostname of the node to be fenced.
$target_port: IPC port of the node to be fenced
$target_address: The combination of
$target_hostand $target_port as host:port
$target_nameserviceid: The nameservice ID of the NN to be fenced
$target_namenodeid: The namenode ID of the NN to be fenced
These environment variables may also be used as substitutions in the shell command. For example:
<property> <name>dfs.ha.fencing.methods</name> <value>shell(/path/to/my/script.sh --nameservice=$target_nameserviceid $target_host:$target_port)</value> </property>
If the shell command returns an exit code of 0, the fencing is successful.Note
This fencing method does not implement any timeout. If timeouts are necessary, they should be implemented in the shell script itself (for example, by forking a subshell to kill its parent in some number of seconds).
fs.defaultFS The default path prefix used by the
Hadoop FS client. Optionally, you may now configure the default path for Hadoop
clients to use the new HA-enabled logical URI. For example, for mycluster
nameservice ID, this will be the value of the authority portion of all of your
HDFS paths. Configure this property in the
<property> <name>fs.defaultFS</name> <value>hdfs://mycluster</value> </property>
dfs.journalnode.edits.dir This is the absolute path
on the JournalNode machines where the edits and other local state (used by the
JNs) will be stored. You may only use a single path for this configuration.
Redundancy for this data is provided by either running multiple separate
JournalNodes or by configuring this directory on a locally-attached RAID array.
<property> <name>dfs.journalnode.edits.dir</name> <value>/path/to/journal/node/local/data</value> </property>Note
NameNode and NameNode HA failure may occur if the
hadoop.security.authorizationproperty in the
core-site.xmlfile is set to
truewithout Kerberos enabled on a NameNode HA cluster. Therefore you should only set this property to
truewhen configuring HDP for Kerberos.