3. Enable SSL for HTTP Connections

Encryption over the HTTP can be implemented with support for SSL across your Hadoop cluster.

 3.1. Set up WebHDFS/YARN with SSL (HTTPS)

This section explains how to set up HTTPS encryption for the Web interfaces.

 3.1.1. Install an SSL Certificate

You can use either a certificate from a Certificate Authority or a Self-Signed Certificate. Using a self-signed certificate requires some additional configuration on each host. Follow the instructions in the appropriate section to install a certificate.

 3.1.1.1. Use a Self-Signed Certificate

To set up SSL for Hadoop HDFS operations:

  1. Create HTTPS certificates and keystore/truststore files.

    1. For each host in the cluster, create a directory for storing the keystore and truststore. For example, SERVER_KEY_LOCATION. Also create a directory to store public certificate, for example, CLIENT_KEY_LOCATION.

      mkdir -p $SERVER_KEY_LOCATION ; mkdir -p $CLIENT_KEY_LOCATION

      For example:

      ssh host1.hwx.com “mkdir -p /etc/security/serverKeys ; mkdir -p /etc/security/clientKeys ; ” 

    2. For each host, create a keystore file.

      cd $SERVER_KEY_LOCATION ; keytool -genkey -alias $hostname -keyalg RSA -keysize 1024 -dname \"CN=$hostname,OU=hw,O=hw,L=paloalto,ST=ca,C=us\" -keypass $SERVER_KEYPASS_PASSWORD -keystore $KEYSTORE_FILE -storepass $SERVER_STOREPASS_PASSWORD\”
    3. For each host, export the certificate public key to a certificate file.

      cd $SERVER_KEY_LOCATION ; keytool -export -alias $hostname -keystore $KEYSTORE_FILE -rfc -file $CERTIFICATE_NAME -storepass $SERVER_STOREPASS_PASSWORD\”
    4. For each host, import the certificate into truststore file.

      cd $SERVER_KEY_LOCATION ; keytool -import -noprompt -alias $hostname -file $CERTIFICATE_NAME -keystore $TRUSTSTORE_FILE -storepass $SERVER_TRUSTSTORE_PASSWORD
    5. Create a single truststore file containing the public key from all certificates. Login to host1 and import the truststore file for host1.

      keytool -import -noprompt -alias $host -file $CERTIFICATE_NAME -keystore $ALL_JKS -storepass $CLIENT_TRUSTSTORE_PASSWORD
    6. Copy $ALL_JKS from host1 to other hosts, and repeat the above command. For example, for a 2-node cluster with host1 and host2:

      1. Create $ALL_JKS on host1.

        keytool -import -noprompt -alias $host -file $CERTIFICATE_NAME -keystore $ALL_JKS -storepass $CLIENT_TRUSTSTORE_PASSWORD
      2. Copy over $ALL_JKS from host1 to host2. $ALL_JKS already has the certificate entry of host1.

      3. Import certificate entry of host2 to $ALL_JKS using same command as before:

        keytool -import -noprompt -alias $host -file $CERTIFICATE_NAME -keystore $ALL_JKS -storepass $CLIENT_TRUSTSTORE_PASSWORD
      4. Copy over the updated $ALL_JKS from host2 to host1.

        [Note]Note

        Repeat these steps each time for each node in the cluster. When you are finished, the $ALL_JKS file on host1 will have the certificates of all nodes.

      5. Copy over the $ALL_JKS file from host1 to all the nodes.

    7. Validate the common truststore file on all hosts.

      keytool -list -v -keystore $ALL_JKS -storepass $CLIENT_TRUSTSTORE_PASSWORD
    8. Set permissions and ownership on the keys:

      chgrp -R $YARN_USER:hadoop $SERVER_KEY_LOCATION
      chgrp -R $YARN_USER:hadoop $CLIENT_KEY_LOCATION
      chown 755 $SERVER_KEY_LOCATION
      chown 755 $CLIENT_KEY_LOCATION
      chown 440 $KEYSTORE_FILE
      chown 440 $TRUSTSTORE_FILE 
      chown 440 $CERTIFICATE_NAME
      chown 444 $ALL_JKS
      [Note]Note

      The complete path of the $SEVER_KEY_LOCATION and the CLIENT_KEY_LOCATION from the root directory /etc must be owned by the $YARN_USER user and the hadoop group.

 3.1.1.2. Use a CA Signed Certificate
  1. Run the following command to create a self-signing rootCA and import the rootCA into client truststore:

    openssl genrsa -out $clusterCA.key 2048
    openssl req -x509 -new -key $clusterCA.key -days 300 -out $clusterCA.pem
    keytool -importcert -alias $clusterCA -file $clusterCA.pem -keystore $clustertruststore -storepass $clustertruststorekey
    [Note]Note

    Ensure that the ssl-client.xml on every host configure to use this ‘$clustertrust’ store.

  2. On each host, run the following command to create a certifcate and a keystore for each server:

    keytool -genkeypair -alias `hostname -s` -keyalg RSA -keysize 1024 -dname "CN=`hostname -f`,OU=foo,O=corp” -keypass $hostkey -keystore $hostkeystore -storepass $hoststorekey -validity 300
  3. On each host, run the following command to export a certreq file from the host’s keystore:

    keytool -keystore keystore -alias `hostname -s` -certreq -file $host.cert -storepass $hoststorekey -keypass $hostkey
  4. On each host, sign certreq file with the rootCA:

    openssl x509 -req -CA $clusterCA.pem -CAkey $clusterCA.key -in $host.cert -out $host.signed -days 300 -CAcreateserial
  5. On each host, import both rootCA and the signed cert back in:

    keytool -keystore $hostkeystore -storepass $hoststorekey -alias $clusterCA -import -file cluseter1CA.pem
    keytool -keystore $hostkeystore -storepass $hoststorekey -alias `hostname -s` -import -file $host.signed -keypass $hostkey

 3.1.2. Set Hadoop Properties to Enable HTTPS

To enable WebHDFS to listen for HTTP over SSL, configure SSL on the NameNode and all DataNodes by setting dfs.https.enable=true in the hdfs-site.xml file.

You can set up SSL in the following modes:

  • One-way SSL: Authenticates the server only. This mode requires the keystore on the NameNode and each DataNode only as specified in the table below. The parameters are set in the ssl-server.xml file on the NameNode and each of the DataNodes.

  • Mutual authentication (2WAY SSL): Requires authentication of both the server and the client. To use mutual SSL, you must also set dfs.client.https.need-auth=true in the hdfs-site.xml file on the NameNode and each DataNode. 2WAY SSL can cause performance delays and is difficult to set up and maintain.

The truststore configuration is only needed when using a self-signed certificate or a certificate that is not in the JVM's truststore.

The following configuration properties need to be specified in ssl-server.xml and ssl-client.xml.

Table 5.1. Configuration Properties in ssl-server.xml
Property Default Value Description
ssl.server.keystore.type JKS The type of the keystore, JKS = Java Keystore, the de-facto standard in Java
ssl.server.keystore.location None The location of the keystore file
ssl.server.keystore.password None The password to open the keystore file
ssl.server truststore.type JKS The type of the trust store
ssl.server.truststore.location None The location of the truststore file
ssl server.truststore.password None The password to open the trustsore

The following diagram shows an HTTP or REST client's interaction with the NameNode and the DataNodes over HTTPS.

Enable HTTPS by setting the following properties.

  1. Set the following properties in core-site.xml.

    hadoop.ssl.require.client.cert=false
    hadoop.ssl.hostname.verifier=DEFAULT
    hadoop.ssl.keystores.factory.class=org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory
    hadoop.ssl.server.conf=ssl-server.xml
    hadoop.ssl.client.conf=ssl-client.xml
  2. Set the following properties in ssl-server.xml.

    ssl.server.truststore.location=/etc/security/serverKeys/truststore.jks
    ssl.server.truststore.password=serverTrustStorePassword
    ssl.server.truststore.type=jks
    ssl.server.keystore.location=/etc/security/serverKeys/keystore.jks
    ssl.server.keystore.password=serverStorePassPassword
    ssl.server.keystore.type=jks
    ssl.server.keystore.keypassword=serverKeyPassPassword
  3. Set the following properties in ssl-client.xml.

    ssl.client.truststore.location=/etc/security/clientKeys/all.jks
    ssl.client.truststore.password=clientTrustStorePassword
    ssl.client.truststore.type=jks
  4. Set the following properties in hdfs-site.xml.

    dfs.http.policy=$Policy
    dfs.datanode.https.address=$DataNode-host:50475
    dfs.namenode.https-address=$NAMENODE-host:50470

    where $Policy is either:

    • HTTP_ONLY: Service is provided only on HTTP

    • HTTPS_ONLY: Service is provided only on HTTPS

    • HTTP_AND_HTTPS: Service is provided both on HTTP and HTTPS

  5. Set the following properties in mapred-site.xml:

    mapreduce.jobhistory.http.policy=HTTPS_ONLY
    mapreduce.jobhistory.webapp.https.address=<JHS>:<JHS_HTTPS_PORT> 
  6. Set the following properties in yarn-site.xml:

    yarn.http.policy=HTTPS_ONLY
    yarn.log.server.url=https://<JHS>:<JHS_HTTPS_PORT>/jobhistory/logs
    yarn.resourcemanager.webapp.https.address=<RM>:<RM_HTTPS_PORT> 
    yarn.nodemanager.webapp.https.address=0.0.0.0:<NM_HTTPS_PORT>

 3.2. Configure Encryption during a Shuffle

Data securely loaded into HDFS is processed by Mappers and Reducers to derive meaningful business intelligence. Hadoop code moves data between Mappers and Reducers over the HTTP protocol in a step called the shuffle. In SSL parlance, the Reducer is the SSL client that initiates the connection to the Mapper to ask for data. Enabling HTTPS for encrypting shuffle traffic involves the following steps.

  • Enable Encrypted Shuffle by setting the follwing properties in mapred-site.xml:

    	
    <property>
        <name>hadoop.ssl.enabled</name>
        <value>true</value>
    </property>
    <property>
        <name>hadoop.ssl.require.client.cert</name>
        <value>false</value>
        <final>true</final>
    </property>
    <property>
        <name>hadoop.ssl.hostname.verifier</name>
        <value>DEFAULT</value>
        <final>true</final>
    </property>
    <property>
        <name>hadoop.ssl.keystores.factory.class</name>
        <value>org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory</value>
        <final>true</final>
    </property>
    <property>
        <name>hadoop.ssl.server.conf</name>
        <value>ssl-server.xml</value>
        <final>true</final>
    </property>
    <property>
        <name>hadoop.ssl.client.conf</name>
        <value>ssl-client.xml</value>
        <final>true</final>
    </property>

    (The default buffer size is 65536. )

 3.3. Set up Ozzie with SSL (HTTPS)

The default SSL configuration makes all Oozie URLs use HTTPS except for the JobTracker callback URLs. This simplifies the configuration because no changes are required outside of Oozie. Oozie inherently does not trust the callbacks, they are used as hints.

[Note]Note

The related environment variables are explained at Environment Setup .

 3.3.1. Install the SSL Certificate

You can use either a certificate from a Certificate Authority or a Self-Signed Certificate. Using a self-signed certificate requires some additional configuration on each Oozie client machine.

Follow the instructions in the appropriate section to install a certificate.

 3.3.1.1. Create a Self-Signed Certificate

There are many ways to create a Self-Signed Certificate, this is just one way. We will be using the keytool program, which is included with your JRE. If its not on your path, you should be able to find it in $JAVA_HOME/bin.

  1. As the Oozie user, run the following command:

    keytool -genkey -alias tomcat -keyalg RSA
    
  2. An interactive prompt displays, answer the questions as follows:

    • For What is your first and last name? (i.e. "CN") enter the hostname of the machine where the Oozie Server is running.

    • For the keystore password and key password, enter the same password; by default, the Oozie keystore password is set to password.

    A keystore file is created named .keystore and is located in the Oozie user's home directory.

  3. Change the OOZIE_HTTPS_KEYSTORE_PASS environment variable to match the keystore password of the self-signed certificate.

  4. As the Oozie user, run the following command to export a certificate file from the keystore file:

    keytool -exportcert -alias tomcat -file path/to/where/I/want/my/certificate.cert
 3.3.1.2. Use a Certificate from a Certificate Authority

Consult a Certificate Authority to obtain a certicate. Use the certificate from the CA to set up Oozie with HTTPS.

As the Oozie user, run the following command to create a keystore file from your certificate:

keytool -import -alias tomcat -file path/to/certificate.cert

A keystore file is created named .keystore and is located in the Oozie user's home directory.

 3.3.2. Set up Oozie Server Secure Mode

This describes how to configure Oozie to use HTTPS instead of HTTP.

  1. If Oozie server is running, stop Oozie.

  2. As the Oozie user, run the following command to configure Oozie to use HTTPS:

    oozie-setup.sh prepare-war -secure
  3. Start the Oozie server.

[Note]Note

To revert back to HTTP, as the Oozie user run the following command:

oozie-setup.sh prepare-war

 3.3.3. Connect to Oozie Server using SSL (HTTPS)

On every Oozie client system, follow the instructions for the type of certificate used in your environment.

 3.3.3.1.  Use a Self-signed Certificate from Oozie Clients

When using a self-signed certificate, you must first install the certificate before the Oozie client can connect to the server.

  1. Install the certificate in the keychain:

    1. Copy or download the .cert file onto the client machine.

    2. Run the following command (as root) to import the certificate into the JRE's keystore:

      sudo keytool -import -alias tomcat -file path/to/certificate.cert -keystore $JRE_cacerts

      Where $JRE_cacerts is the path to the JRE's certs file. It's location may differ depending on the Operating System, but its typically called cacerts and located at $JAVA_HOME/lib/security/cacerts. It can be under a different directory in $JAVA_HOME. The default password is changeit.

      Java programs, including the Oozie client, can now connect to the Oozie Server using the self-signed certificate.

  2. In the connection strings change HTTP to HTTPS, for example, replace http://oozie.server.hostname:11000/oozie with https://oozie.server.hostname:11443/oozie.

    Java does not automatically redirect HTTP addresses to HTTPS.

 3.3.3.2. Use a CA Certificate from Oozie Clients

In the connection strings change HTTP to HTTPS, for example, replace http://oozie.server.hostname:11000/oozie with https://oozie.server.hostname:11443/oozie.

Java does not automatically redirect HTTP addresses to HTTPS.

 3.3.4. Connect to Oozie Server using a Browser

Use https://oozie.server.hostname:11443/oozie though most browsers should automatically redirect you if you use http://oozie.server.hostname:11000/oozie.

When using a Self-Signed Certificate, your browser warns you that it can't verify the certificate. Add the certificate as an exception.

 3.3.5. Configure Oozie HCatalogJob Properties

Integrate Oozie Hcatalog by adding following property to oozie-hcatalog job.properties. For example if you are using Ambari, set the properties as:

hadoop.rpc.protection=privacy 
[Note]Note

This property is in addition to any properties you must set for secure clusters.

 3.4. Set up HBase REST API with SSL

Perform the following task to enable SSL with the HBase REST API.

  1. Execute the following statement from the command line of the HBase Master server to create a keystore for HBase:

    keytool -genkey -alias hbase -keyalg RSA -keysize 1024 -keystore hbase.jks

  2. Add the following properties to the hbase-site.xml configuration file on each node in your HBase cluster:

    <property>
        <name>hbase.rest.ssl.enabled</name>
        <value>true</value>
    </property>
    
    <property>
        <name>hbase.rest.ssl.keystore.store</name>
        <value>/path/to/keystore</value>
    </property>
    
    <property>
        <name>hbase.rest.ssl.keystore.password</name>
        <value>keystore password</value>
    </property>
    
    <property>
        <name>hbase.rest.ssl.keystore.keypassword</name>
        <value>key password</value>
    </property>

  3. Restart all HBase nodes in the cluster.

[Note]Note

When using a self-signed certificate, administrators must manually add the certificate to the JVM truststore on all HBase clients.

 3.5. Set up WebHBase with SSL

  1. On the HBase Master, create keystore for Hbase:

    keytool -genkey -alias hbase -keyalg RSA -keysize 1024 -keystore hbase.jks
  2. Add the following properties to hbase-site.xml:

    <property>
       <name>hadoop.ssl.enabled</name>
       <value>true</value>
    </property>
    
    <property>
       <name>hbase.rest.ssl.enabled</name>
       <value>true</value>
    </property>
    
    <property>
       <name>hbase.rest.ssl.keystore.store</name>
       <value>/path/to/hbase.jks</value>
    </property>
    
    <property>
       <name>hbase.rest.ssl.keystore.password</name>
       <value> keystore password</value>
    </property>
    
    <property>
       <name>hbase.rest.ssl.keystore.keypassword</name>
       <value> key password</value>
    </property>                    
  3. Restart HBase.

    When using a self-signed certificate, you must add the certificate to JMV truststore on the HBase clients.

 3.6. Set up HiveServer2 with SSL

When using HiveServer2 without Kerberos authentication, you can set up HTTP and JDBC to use an SSL certificate to secure communications.

Perform the followin steps on the HiveServer2:

  1. Run the following command to create a keystore for hiveserver2::

    keytool -genkey -alias hbase -keyalg RSA -keysize 1024 -keystore hbase.jks
  2. Edit the hive-site.xml, set the following properties to enable SSL:

    <property>
      <name>hive.server2.enable.SSL</name>
      <value>true</value>
      <description></description>
    </property>
     
    <property>
      <name>hive.server2.keystore.path</name>
      <value>$keystore-file-path</value>
      <description></description>
    </property>
    
    <property>
      <name>hive.server2.keystore.password</name>
      <value>$keystore-file-password</value>
      <description></description>
    </property>
  3. On the client-side, specify SSL settings for Beeline or JDBC client as follows:

    jdbc:hive2://$host:$port/$database;ssl=true;sslTrustStore=$path-to-truststore;sslTrustStorePassword=$password 


loading table of contents...