1. Add HDFS Repositories

The HDFS repository contains access policies for the Hadoop cluster HDFS. The Security Agent integrates with the NameNode service on the NameNode host. The agent enforces the policy's configured in the HDP Security Administration Web UI and sends HDFS audit information to the portal where it can be viewed and reported on from a central location.

[Warning]Warning

In Ambari managed environments additional configuration is required. Ensure that you carefully follow the steps outlined in the Configure Hadoop Agent to run in Ambari Environments.

 1.1. Add a HDFS Repository

Add HDFS repositories after the Hadoop environment is fully operational. During the initial set up of the repository, Hortonworks recommends testing the connection from the HDP Security Administration Web UI to the NameNode to ensure that the agent will be able to connect to the server after installation is complete.

 1.1.1. Create a HDFS Repository

Before installing the agent on the NameNode, create a HDFS Repository as follows:

  1. Sign in to the HDP Security Administration Web UI and click Policy Manager.

  2. Next to HDFS, click the + (plus symbol).

    The Create Repository page displays.

  3. Complete the Repository Details:

     

    Table 4.1. Policy Manager Repository Details

    LabelValueDescription
    Repository Name $nameSpecify a unique name for the repository, you will need to specify the same repository name in the agent installation properties. For example, clustername_hdfs.
    Description$description-of-repoEnter a description up to 150 characters.
    Active StatusEnabled or DisabledEnable or disable policy enforcement for the repository.
    Repository typeHDFS, Hive, or HBaseSelect the type of repository, HDFS.
    User name $userSpecify a user name on the remote system with permission to establish the connection, for example hdfs.
    Password$passwordSpecify the password of the user account for connection.


  4. Complete the security settings for the Hadoop cluster, the settings must match the values specified in the core-site.xml file as follows:

     

    Table 4.2. Repository HDFS Required

    LabelValueDescription
    fs.default.name$hdfs-urlHDFS URL, should match the setting in the Hadoop core-site.xml file. For example, hdfs://sandbox.hortonworks.com:8020
    hadoop.security.authorizationtrue or falseSpecify the same setting found in the core-site.xml.
    hadoop.security.authenticationsimple or kerberosSpecify the type indicated in the core-site.xml.
    hadoop.security.auth_to_local$usermappingMust match the setting in the core-site.xml file. For example: RULE:[2:$1@$0]([rn]m@.*)s/.*/yarn/ RULE:[2:$1@$0](jhs@.*)s/.*/mapred/ RULE:[2:$1@$0]([nd]n@.*)s/.*/hdfs/ RULE:[2:$1@$0](hm@.*)s/.*/hbase/ RULE:[2:$1@$0](rs@.*)s/.*/hbase/ DEFAULT
    dfs.datanode.kerberos.principal$dn-principalSpecify the Kerberos DataNode principal name.
    dfs.namenode.kerberos.principal$nn-principalSpecify the Kerberos NameNode principal name.
    dfs.secondary.namenode.kerberos.principal$secondary-nn-principalSpecify the Kerberos Secondary NN principal name.
    Common Name For Certificate$cert-nameSpecify the name of the certificate.


  5. Click Test Connection.

    If the server can connect to HDFS, the connection successful message displays. If the connection fails, go to the troubleshooting appendix.

  6. After making a successful connection, click Save.

 1.1.2. Install the HDFS Agent on NameNode

Install the agent on the NameNode Host as root (or sudo privileges). In HA Hadoop clusters, you must also install an agent on the Secondary NN.

 1.1.2.1. Installation Set Up

Perform the following steps on the Hadoop NameNode host.

  1. Log on to the host as root.

  2. Create a temporary directory, such as /tmp/xasecure:

    mkdir /tmp/xasecure
  3. Move the package into the temporary directory along with the MySQL Connector Jar.

  4. Extract the contents:

    tar xvf $xasecureinstallation.tar
  5. Go to the directory where you extracted the installation files:

    cd /tmp/xasecure/xasecure-$name-$build-version
  6. Open the install.properties file for editing.

  7. Change the following parameters for your environment:

     

    Table 4.3. HDFS Agent Install Parameters

    ParameterValueDescription
    POLICY_MGR_URL$urlSpecify the full URL to access the Policy Manager Web UI. For example, http://pm-host:6080.
    MYSQL_CONNECTOR_JAR$path-to-mysql-connectorAbsolute path on the local host to the JDBC driver for mysql including filename.[a] For example, /tmp/xasecure/
    REPOSITORY_NAME$Policy-Manager-Repo-NameName of the HDFS Repository in the Policy Manager that this agent connects to after installation.
    XAAUDIT.DB.HOSTNAME$XAsecure-db-hostSpecify the host name of the MySQL database.
    XAAUDIT.DB.DATABASE_NAME$auditdbSpecify the audit database name that matches the audit_db_name specified during the web application server installation.
    XAAUDIT.DB.USER_NAME$auditdbuserSpecify the audit database name that matches the audit_db_user specified during the web application server installation
    XAAUDIT.DB.PASSWORD$auditdbupwSpecify the audit database name that matches the audit_db_password specified during the web application server installation.

    [a] Download the JAR from here.


  8. Save the install.properties file.

[Note]Note

If your environment is configured to use SSL, modify the properties following the instructions in Set Up SSL for HDFS Security Agent.

 1.1.2.1.1. Example HDFS Agent Installation Properties

The following is an example of the Hadoop Agent install.properties file with the MySQL database co-located on the XASecure host:

#
# Location of Policy Manager URL  
#
#
# Example:
# POLICY_MGR_URL=http://policymanager.xasecure.net:6080
#


POLICY_MGR_URL=http://xasecure-host:6080

#
# Location of mysql client library (please check the location of the jar file)
#
MYSQL_CONNECTOR_JAR=/usr/share/java/mysql-connector-java.jar

#
# This is the repository name created within policy manager
#
# Example:
# REPOSITORY_NAME=hadoopdev
#

REPOSITORY_NAME=sandbox


#
# AUDIT DB Configuration
# 
#  This information should match with the one you specified during the PolicyManager Installation
# 
# Example:
# XAAUDIT.DB.HOSTNAME=localhost
# XAAUDIT.DB.DATABASE_NAME=xasecure
# XAAUDIT.DB.USER_NAME=xalogger
# XAAUDIT.DB.PASSWORD=
#
#

XAAUDIT.DB.HOSTNAME=xasecure-host
XAAUDIT.DB.DATABASE_NAME=xaaudit
XAAUDIT.DB.USER_NAME=xaaudit
XAAUDIT.DB.PASSWORD=password

#
# SSL Client Certificate Information
#
# Example:
# SSL_KEYSTORE_FILE_PATH=/etc/xasecure/conf/xasecure-hadoop-client.jks
# SSL_KEYSTORE_PASSWORD=clientdb01
# SSL_TRUSTSTORE_FILE_PATH=/etc/xasecure/conf/xasecure-truststore.jks
# SSL_TRUSTSTORE_PASSWORD=changeit
#
#
# IF YOU DO NOT DEFINE SSL parameters, the installation script will automatically generate necessary key(s) and assign appropriate values 
# ONLY If you want to assign manually, please uncomment the following variables and assign appropriate values. 

# SSL_KEYSTORE_FILE_PATH=
# SSL_KEYSTORE_PASSWORD=
# SSL_TRUSTSTORE_FILE_PATH=
# SSL_TRUSTSTORE_PASSWORD=
 1.1.2.2. Run the Agent Installation Script

After configuring the install.properties file, install the agent as root:

  1. Log on to the Linux system as root and go to the directory where you extracted the installation files:

    cd /tmp/xasecure/xasecure-$name-$build-version
  2. Run the agent installation script:

    # ./install.sh
 1.1.2.3. Verify that Agent is Connected

Connected Agents display in the HDP Security Administration Web UI.

[Note]Note

Agents may not appear in the list until after the first event occurs in the repository.

To verify that the agent is connected to the server:

  1. Log in to the interface using the admin account.

  2. Click Audit > Agent.

 1.1.2.4. Configure HDFS Agent to run in Ambari Environments

On Hadoop clusters managed by Ambari, change the default HDFS settings to allow the agent to enforce policies and report auditing events. Additionally, Ambari uses its own startup scripts to start and stop the NameNode server. Therefore, modify the Hadoop configuration script to include the Security Agent with a NameNode restart.

To configure HDFS properties and NameNode startup scripts:

  1. Update HDFS properties from the Ambari Web Interface as follows:

    1. On the Dashboard, click HDFS.

      The HDFS Service page displays.

    2. Go to the Configs tab.

    3. In Filter, type dfs.permissions.enabled and press enter.

      The results display. This property is located under Advanced.

    4. Expand Advanced, then change dfs.permissions.enabled to true.

    5. In Filter, type hadoop.security.authorization and press enter.

      Under the already expanded Advanced option, the parameter displays.

    6. Change hadoop.security.authorization to true.

    7. Scroll to the bottom of the page and click Save.

      At the top of the page, a message displays indicating the services that need to be restarted.

      [Warning]Warning

      Do not restart the services until after you perform the next step.

  2. Change the Hadoop configuration script to start the Security Agent with the NameNode service:

    1. In the Ambari Administrator Portal, click HDFS and then NameNode.

      The NameNode Hosts page displays.

    2. Click Host Actions and choose Turn on Maintenance Mode.

      Wait for the cluster to enter maintenance mode.

    3. SSH to the NameNode as the root user.

    4. Open the hadoop-config.sh script for editing and go to the end of the file. For example:

      vi /usr/lib/hadoop/libexec/hadoop-config.sh
    5. At the end of the file paste the following statement:

      if [ -f  ${HADOOP_CONF_DIR}/xasecure-hadoop-env.sh ]
      then
              .  ${HADOOP_CONF_DIR}/xasecure-hadoop-env.sh
      fi

      This adds the Security Agent for Hadoop to the start script for Hadoop.

    6. Save the changes.

  3. In the Ambari Administrative Portal, click Services > Service Actions > Restart All.

    Wait for the services to completely restart.

  4. Click Services > Service Actions > Turn off Maintenance Mode.

    It may take several minutes for the process to complete. After confirming all the services restart as expected, perform a few simple HDFS comments such as browsing the file system from Hue.

 1.1.2.5. Restart NameNode In non-Ambari Environments

The HDFS Agent is integrated with the NameNode Service. Before your changes can take effect you must restart the NameNode service.

  1. On the NameNode host machine, execute the following command:

    su -l hdfs -c "/usr/lib/hadoop/sbin/hadoop-daemon.sh stop namenode"

    Ensure that the NameNode Service stops completely.

  2. On the NameNode host machine, execute the following command:

    su -l hdfs -c "/usr/lib/hadoop/sbin/hadoop-daemon.sh start namenode"

    Ensure that the NameNode Service starts correctly.

 1.1.3. Test HDFS Configuration

After completing the setup of the HDFS Repository and agent, perform a few simple tests to ensure that the agent is auditing and reporting events to the HDP Security Administration Web UI. By default, the repository allows all access and has auditing enabled.

  1. Log into the Hadoop cluster.

  2. Type the following command to display a list of items at the root folder of HDFS:

    hadoop fs -ls /
    Found 6 items
    drwxrwxrwx   - yarn   hadoop          0 2014-04-21 07:21 /app-logs
    drwxr-xr-x   - hdfs   hdfs            0 2014-04-21 07:23 /apps
    drwxr-xr-x   - mapred hdfs            0 2014-04-21 07:16 /mapred
    drwxr-xr-x   - hdfs   hdfs            0 2014-04-21 07:16 /mr-history
    drwxrwxrwx   - hdfs   hdfs            0 2014-06-17 15:05 /tmp
    drwxr-xr-x   - hdfs   hdfs            0 2014-04-22 07:21 /user
  3. Sign in to the Web UI and click Audit.

    The Big Data page displays a list of events for the configured Repositories.

  4. Click Search > Repository Type > HDFS.

    The list filters as you make selections.


loading table of contents...