Command Line Installation
Also available as:
PDF
loading table of contents...

Installing LLAP on a Secured Cluster

Prerequisites

  • Cluster is available and is already secured with Kerberos.

  • Slider and ZooKeeper are installed on the cluster.

  • The Hadoop directory is in the same location on each node so the native binary can be accessed, which supports secure I/O.

[Important]Important
  • You should have a method to execute commands and upload files on all nodes in the cluster from one "setup" or "jump" node. This can be set up with passwordless ssh (pssh) or by using a FOR loop with a list of nodes.

  • On the "setup" node, add Java tools to the system path.

  • All passwords, user names, and paths in this section of the document are provided as examples unless otherwise noted. Change them for your deployment as appropriate.

Installing LLAP on a Secured Cluster

Review the prerequisites for installing LLAP on a secured cluster before you begin.

  1. Ensure that user hive exists on each node, and configure the following:

    1. Create local directories that are similar to those set up for the yarn.nodemanager.local-dirs property:

      mkdir -p /grid/0/hadoop/llap/local 
      chown -R hive /grid/0/hadoop/llap
    2. On the "setup" node, ensure that user hive has access to its HDFS home directory:

      hadoop fs -mkdir -p /user/hive
      hadoop fs -chown -R hive /user/hive r
  2. Set up keytabs.

    You can perform this step on the "setup" machine and distribute it to the cluster, or you can perform this step on each node. The following example shows how to perform this step on each node. Use kadmin.local if under root; otherwise, use kadmin.

  3. On each node (specified by their fully qualified domain names), create the host and headless principals, and a keytab with each:

    kadmin.local -q 'addprinc -randkey hive@EXAMPLE.COM'
    	kadmin.local -q "addprinc -randkey hive/<fqdn>@EXAMPLE.COM"
    	kadmin.local -q 'cpw -pw hive hive'
    	kadmin.local -q "xst -norandkey -k /etc/security/keytabs/hive.keytab hive/<fqdn>@EXAMPLE.COM"
    kadmin.local -q "xst -norandkey -k /etc/security/keytabs/hive.keytab hive@EXAMPLE.COM"
    chown hive /etc/security/keytabs/hive.keytab
    
  4. On the "setup" node, create and install, as user hive, the headless keytab for Slider:

    kadmin.local -q "xst -norandkey -k hive.headless.keytab hive@EXAMPLE.COM"
    chown hive hive.headless.keytab
    kinit -kt /etc/security/keytabs/hive.keytab hive@EXAMPLE.COM
    slider install-keytab --keytab hive.headless.keytab --folder hive --overwrite
  5. If you want to use web UI SSL, set up the keystore for SSL.

    Note that Keystore is often set up for other web UIs: for example HiveServer2. If the keystore is not already set up, perform the following steps:

    1. Create the Certificate Authority (CA).

      1. On the setup node, create the CA parameters:

        cat > /tmp/cainput << EOF
        US
        California
        Palo Alto
        Example Certificate Authority
        Certificate Authority
        example.com
        .
        EOF
        
      2. Create the CA:

        [Note]Note

        The JAVA_HOME must be set. The default Java truststore password must be changed.

        mkdir -p /etc/security/certs/
        openssl genrsa -out /etc/security/certs/ca.key 4096
        cat /tmp/cainput | openssl req -new -x509 -days 36525 -key /etc/security/certs/ca.key \
          -out /etc/security/certs/ca.crt
        echo 01 > /etc/security/certs/ca.srl
        echo 01 > /etc/security/certs/ca.ser
        keytool -importcert -noprompt -alias example-ca –keystore \
          $JAVA_HOME/jre/lib/security/cacerts -storepass changeit -file \
          /etc/security/certs/ca.crt
        rm /tmp/cainput
        
    2. Create the certificate.

      On the "setup" node, create the keystore parameters. In the following example, llap00 is the password specified for the new keystore:

      hostname -f > /tmp/keyinput
      hostname -d >> /tmp/keyinput
      cat >> /tmp/keyinput << EOF
      Example Corp
      Palo Alto
      CA
      US
      yes
      llap00
      llap00
      EOF
      
    3. Generate a keystore, a certificate request, and a certificate, and then import the certificate into the keystore:

      cat /tmp/keyinput | keytool -genkey -alias hive -keyalg RSA -keystore \
        /etc/security/certs/keystore.jks -keysize 4096 -validity 36525 -storepass llap00
      keytool -certreq -alias hive -keystore /etc/security/certs/keystore.jks \
        -storepass llap00 -file /etc/security/certs/server.csr
      openssl x509 -req -days 36525 -in /etc/security/certs/server.csr \
       -CA /etc/security/certs/ca.crt -CAkey /etc/security/certs/ca.key \
       -CAserial /etc/security/certs/ca.ser -out /etc/security/certs/server.crt
      keytool -import -alias hive -keystore /etc/security/certs/keystore.jks \
        -storepass llap00 -trustcacerts -file /etc/security/certs/server.crt
      chown hive:hadoop /etc/security/certs/keystore.jks /etc/security/certs/server.crt
      chmod 640 /etc/security/certs/keystore.jks /etc/security/certs/server.crt
      rm /tmp/keyinput
      
    4. Distribute the keystore and certificate to each node:

      1. On each node, create the directory:

        mkdir -p /etc/security/certs
        
      2. Upload the files from the "setup" node:

        scp … /etc/security/certs/* …@node:/etc/security/certs/
        
      3. Import the CA:

        chown hive:hadoop /etc/security/certs/*
        chmod 640 /etc/security/certs/keystore.jks /etc/security/certs/server.crt
        keytool -importcert -noprompt -alias example-ca -keystore \
          $JAVA_HOME/jre/lib/security/cacerts -storepass changeit -file \
          /etc/security/certs/ca.crt
        
    5. Configure LLAP and generate the package.

      Specify the following properties in the /etc/hive/conf/hive-site.xml file:

      Table 9.6. Properties to Set in hive-site.xml for Secured Clusters

      PropertyValues
      hive.llap.daemon.work.dirshive.llap.daemon.work.dirs
      hive.llap.daemon.keytab.filehive.llap.daemon.keytab.file
      hive.llap.daemon.service.principalhive.llap.daemon.service.principal
      hive.llap.daemon.service.sslTrue
      hive.llap.zk.sm.principalhive@EXAMPLE.COM
      hive.llap.zk.sm.keytab.file/etc/security/keytabs/hive.keytab
      hive.llap.zk.sm.connectionStringZooKeeper connection string: for example, <machine:port,machine:port, ...>
      hadoop.security.authenticationkerberos
      hadoop.security.authorizationtrue

      Following is an example of these properties set in the llap-daemon-site.xml file:

      <property>
           <name>hive.llap.daemon.work.dirs</name>
           <value>/grid/0/hadoop/llap/local</value>
      </property>
      
      <property>
           <name>hive.llap.daemon.keytab.file</name>
           <value>/etc/security/keytabs/hive.keytab</value>
      </property>
      
      <property>
           <name>hive.llap.daemon.service.principal</name>
           <value>hive/_HOST@EXAMPLE.COM</value>
      </property>
      
      <property>
           <name>hive.llap.daemon.service.ssl</name>
           <value>true</value>
      </property>
      
      <property>
           <name>hive.llap.zk.sm.principal</name>
           <value>hive@EXAMPLE.COM</value>
      </property>
      
      <property>
           <name>hive.llap.zk.sm.keytab.file</name>
           <value>>/etc/security/keytabs/hive.keytab</value>
      </property>
      
      <property>
           <name>hive.llap.zk.sm.connectionString</name>
           <value>127.0.0.1:2181,128.0.0.1:2181,129.0.0.1:2181</value>
      </property>
      
      <property>
           <name>hadoop.security.authentication</name>
           <value>kerberos</value>
      </property>
      
      <property>
           <name>hadoop.security.authorization</name>
           <value>true</value>
      </property>
      

      Optionally, you can also use hive.llap.daemon.acl and hive.llap.management.acl to restrict access to the LLAP daemon protocols.

      The Hive user must have access to both.

    1. Specify the following properties in the ssl-server.xml file.

      Ensure that you perform this step before you create the LLAP package.

      Table 9.7. Properties to Set in ssl-server.xml for LLAP on Secured Clusters

      PropertyValues
      ssl.server.truststore.locationPath to Java truststore: for example, /jre/lib/security/cacerts
      ssl.server.keystore.location/etc/security/certs/keystore.jks
      ssl.server.truststore.password changeit Note: This is the default password.
      ssl.server.keystore.passwordllap00
      ssl.server.keystore.keypassword llap00

      Following is an example of these properties set in the ssl-server.xml file:

      <property>
           <name>ssl.server.truststore.location</name>
           <value>/jre/lib/security/cacerts</value>
      </property>
      
      <property>
           <name>ssl.server.keystore.location</name>
           <value>/etc/security/certs/keystore.jks</value>
      </property>
      
      <property>
           <name>ssl.server.truststore.password</name>
           <value>strong_password</value>
           </property>
      
      <property>
           <name>ssl.server.keystore.password</name>
           <value>llap00</value>
      </property>
      
      <property>
           <name>ssl.server.keystore.keypassword</name>
           <value>llap00</value>
      </property>
      

      Generate the LLAP package.

      [Important]Important

      Ensure that JAVA_HOME and HADOOP_HOME are set before generating the package. JAVA_HOME and the site.global.library_path property in the appConfig.json configuration file are set using JAVA_HOME and HADOOP_HOME. If you see problems such as a missing native library, check the appConfig.json configuration file.

      Make sure that LLAP package generation is done under user hive because some HDFS paths in the configuration are user-specific. You can modify the paths after package generation.

    2. To generate the LLAP package, run the following command, setting parameters as described in the LLAP Package Parameters table:

      hive --service llap --name <llap_svc_name> --instances <number_of_cluster_nodes> 
      --cache <cache_size>m --xmx <heap_size>m --size ((<cache_size>+<heap_size>)*1.05)m 
      --executors <number_of_cores> --loglevel <WARN|INFO> 
      --args " -XX:+UseG1GC -XX:+ResizeTLAB -XX:+UseNUMA  -XX:-ResizePLAB"
      

      Table 9.8. LLAP Package Parameters

      ParameterRecomended Value (based on daemons using all available node resources)
      --instances <cache_size>Set to number of cluster nodes that you to use for LLAP.
      --cache <cache_size>

      <YARN_maximum_container_size> - (<hive.tez.container.size> * <number_of_cores>)

      <hive.tez.container.size> is the setting for this property found in the hive-site.xml file. Depending on the size of the node, specify a minimum of 1-4 GB.

      --xmx <heap_size>

      For medium-sized nodes:

      <hive.tez.container.size> * <number_of_cores> * (0.8 to 0.95)

      Where <hive.tez.container.size> is the setting for this property found in the hive-site.xml file.

      Ensure that the setting for --xmx is 1GB less than (<hive.tez.container.size> * <number_of_cores>).

      For smaller nodes:

      Use the same formula as for medium-sized nodes, but multiply by 0.8

      --executors <number_of_cores>Set to the number of CPU cores available on nodes running NodeManager. Set this value even if CPU scheduling is enabled in YARN.

      Set the --loglevel parameter to INFO when you are troubleshooting or testing. The INFO option provides verbose output. In a production environment, set the --loglevel parameter to WARN, which only outputs a message to the logs if there is a warning or error. This makes the logs easier to read and reduces load on the node.

      [Note]Note

      The recommended values listed in the LLAP Package Parameters table represent a sample configuration. LLAP also can be configured to use a fraction of node resources.

    3. Specify the keytab settings in the slider-appmaster section of the appConfig.json configuration file if they have not already been specified:

      "components": {
          "slider-appmaster": {
      … existing settings …
      "slider.hdfs.keytab.dir": ".slider/keytabs/llap",
      "slider.am.login.keytab.name": "hive.headless.keytab",
      "slider.keytab.principal.name": "hive@EXAMPLE.COM"

Validating the Installation on a Secured Cluster

  1. Make sure that you are logged in as user hive.

  2. Verify that the following properties are set as follows in the hive-site.xml file that is being used by HiveServer2:

    • hive.execution.mode = llap

    • hive.llap.execution.mode = all

    • hive.llap.daemon.service.hosts = @<llap_service_name>

  3. From the hive user home directory, start the LLAP service:

    cd ~ ./llap-slider-<date>/run.sh

    <date> is the date that you generated the LLAP package. To verify that you have the correct <date>, on the node where you generated the LLAP package, make sure you are in the hive user home directory and view the subdirectories:

    cd ~ ls
  4. There is a subdirectory named llap-slider-<date>. This subdirectory contains the run.sh script you use to start the LLAP service.

  5. As user hive, run the Hive CLI and HiveServer2 to run test queries.

    If you are using the Hive CLI, you must kinit.

  6. After running test queries, check the following:

    • Check the logs on YARN for the Slider application that is running LLAP.

      Look for changes that indicate that LLAP is processing the test queries.

    • Using the ResourceManager UI, monitor the Tez AM (session) to make sure that it does not launch new containers to run the query.