2. Instructions

To configure and use the HDFS NFS gateway, complete the following steps:

  1. Configure settings for HDFS NFS gateway:

    NFS gateway uses the same configurations as used by the NameNode and DataNode. Configure the following properties based on your application's requirements:

    1. Edit the hdfs-default.xml file on your NFS gateway machine and modify the following property:

      <property>
        <name>dfs.namenode.accesstime.precision</name>
        <value>3600000</value>
        <description>The access time for HDFS file is precise up to this value. 
                     The default value is 1 hour. Setting a value of 0 disables
                     access times for HDFS.
        </description>
      </property>

      [Note]Note

      If the export is mounted with access time update allowed, make sure this property is not disabled in the configuration file. Only NameNode needs to restart after this property is changed. If you have disabled access time update by mounting with "noatime" you do NOT have to change this property nor restart your NameNode.

    2. Add the following property to hdfs-site.xml:

      <property>    
          <name>dfs.nfs3.dump.dir</name>    
          <value>/tmp/.hdfs-nfs</value> 
      </property>

      [Note]Note

       NFS client often reorders writes. Sequential writes can arrive at the NFS gateway at random order. This directory is used to temporarily save out-of-order writes before writing to HDFS. One needs to make sure the directory has enough space. For example, if the application uploads 10 files with each having 100MB, it is recommended for this directory to have 1GB space in case if a worst-case write reorder happens to every file.

    3. Update the following property to hdfs-site.xml:

      <property>    
          <name>dfs.nfs.exports.allowed.hosts</name>    
          <value>* rw</value> 
      </property>

      [Note]Note

      By default, the export can be mounted by any client. You must update this property to control access. The value string contains the machine name and access privilege, separated by whitespace characters. The machine name can be in single host, wildcard, or IPv4 network format. The access privilege uses rw or ro to specify readwrite or readonly access to exports. If you do not specifiy an access privilege, the default machine access to exports is readonly. Separate machine dentries by ;. For example, 192.168.0.0/22 rw ; host*.example.com ; host1.test.org ro;.

      Restart the NFS gateway after this property is updated.

    4. Optional - Customize log settings.

      Edit the log4j.property file to add the following:

      To change trace level, add the following:

      log4j.logger.org.apache.hadoop.hdfs.nfs=DEBUG

      To get more details of ONCRPC requests, add the following:

      log4j.logger.org.apache.hadoop.oncrpc=DEBUG

  2. Start the NFS gateway service.

    Three daemons are required to provide NFS service: rpcbind (or portmap), mountd and nfsd. The NFS gateway process has both nfsd and mountd. It shares the HDFS root "/" as the only export. We recommend using the portmap included in NFS gateway package as shown below:

    1. Stop nfs/rpcbind/portmap services provided by the platform:

      service nfs stop
      service rpcbind stop

    2. Start the included portmap package (needs root privileges):

      hadoop portmap

      OR

      hadoop-daemon.sh start portmap

    3. Start mountd and nfsd.

      No root privileges are required for this command. However, verify that the user starting the Hadoop cluster and the user starting the NFS gateway are same.

      hadoop nfs3
      

      OR

      hadoop-daemon.sh start nfs3

      [Note]Note

      If the hadoop-daemon.sh script starts the NFS gateway, its log can be found in the hadoop log folder.

    4. Stop NFS gateway services.

      hadoop-daemon.sh stop nfs3
      hadoop-daemon.sh stop portmap

  3. Verify validity of NFS-related services.

    1. Execute the following command to verify that all the services are up and running:

      rpcinfo -p $nfs_server_ip

      You should see output similar to the following:

         program vers proto   port
      
          100005    1   tcp   4242  mountd
      
          100005    2   udp   4242  mountd
      
          100005    2   tcp   4242  mountd
      
          100000    2   tcp    111  portmapper
      
          100000    2   udp    111  portmapper
      
          100005    3   udp   4242  mountd
      
          100005    1   udp   4242  mountd
      
          100003    3   tcp   2049  nfs
      
          100005    3   tcp   4242  mountd
    2. Verify that the HDFS namespace is exported and can be mounted.

      showmount -e $nfs_server_ip                         

      You should see output similar to the following:

      Exports list on $nfs_server_ip :
      / (everyone)

  4. Mount the export “/”.

    Currently NFS v3 is supported and uses TCP as the transportation protocol is TCP. The users can mount the HDFS namespace as shown below:

    mount -t nfs -o vers=3,proto=tcp,nolock $server:/  $mount_point

    Then the users can access HDFS as part of the local file system except that hard/symbolic link and random write are not supported in this release.  

    [Note]Note

    Because NLM is not supported, the mount option nolock is needed.

    User authentication and mapping:

    NFS gateway in this release uses AUTH_UNIX style authentication which means that the the login user on the client is the same user that NFS passes to the HDFS. For example, if the NFS client has current user as admin, when the user accesses the mounted directory, NFS gateway will access HDFS as user admin. To access HDFS as hdfs user, you must first switch the current user to hdfs on the client system before accessing the mounted directory.

  5. Set up client machine users to interact with HDFS through NFS.

    NFS gateway converts the UID to user name and HDFS uses username for checking permissions.

    The system administrator must ensure that the user on NFS client machine has the same name and UID as that on the NFS gateway machine. This is usually not a problem if you use the same user management system such as LDAP/NIS to create and deploy users to HDP nodes and to client node.

    If the user is created manually, you might need to modify the UID on either the client or NFS gateway host in order to make them the same:

    usermod -u 123 $myusername

    The following illustrates how the UID and name are communicated between the NFS client, NFS gateway, and NameNode.