2.3. Adding Security Information to Configuration Files

To enable security on HDP 2, you must add optional information to various configuration files.

Before you begin, set JSVC_Home in hadoop-env.sh.

  • For RHEL/CentOS/Oracle Linux:

    export JSVC_HOME=/usr/libexec/bigtop-utils
  • For SLES and Ubuntu:

     export JSVC_HOME=/usr/lib/bigtop-utils

 2.3.1. core-site.xml

To the core-site.xml file on every host in your cluster, you must add the following information:

 

Table 18.3. core-site.xml

Property NameProperty ValueDescription

hadoop.security.authentication

kerberos

Set the authentication type for the cluster. Valid values are: simple or kerberos.

hadoop.rpc.protectionauthentication; integrity; privacy

This is an [OPTIONAL] setting. If not set, defaults to authentication.

authentication= authentication only; the client and server mutually authenticate during connection setup.

integrity = authentication and integrity; guarantees the integrity of data exchanged between client and server as well as authentication.

privacy = authentication, integrity, and confidentiality; guarantees that data exchanged between client and server is encrypted and is not readable by a “man in the middle”.

hadoop.security.authorization

true

Enable authorization for different protocols.

hadoop.security.auth_to_local

The mapping rules. For exampleRULE:[2:$1@$0]([jt]t@.*EXAMPLE.COM)s/.*/mapred/ RULE:[2:$1@$0]([nd]n@.*EXAMPLE.COM)s/.*/hdfs/ RULE:[2:$1@$0](hm@.*EXAMPLE.COM)s/.*/hbase/ RULE:[2:$1@$0](rs@.*EXAMPLE.COM)s/.*/hbase/ DEFAULT

The mapping from Kerberos principal names to local OS user names. See Creating Mappings Between Principals and UNIX Usernames for more information.


The XML for these entries:

  <property>   
        <name>hadoop.security.authentication</name>   
        <value>kerberos</value>   
        <description>    Set the
        authentication for the cluster. Valid values are: simple or   
        kerberos.   
        </description>  
</property> 

<property>  
        <name>hadoop.security.authorization</name>  
        <value>true</value>  
        <description>      Enable
        authorization for different protocols.  
        </description> 
</property>    

<property>
    
        <name>hadoop.security.auth_to_local</name>    
        <value>        
        RULE:[2:$1@$0]([jt]t@.*EXAMPLE.COM)s/.*/mapred/        
        RULE:[2:$1@$0]([nd]n@.*EXAMPLE.COM)s/.*/hdfs/        
        RULE:[2:$1@$0](hm@.*EXAMPLE.COM)s/.*/hbase/        
        RULE:[2:$1@$0](rs@.*EXAMPLE.COM)s/.*/hbase/        
        DEFAULT</value> <description>The mapping from kerberos principal names
        to local OS user names.
</property>                  

 2.3.2. hdfs-site.xml

To the hdfs-site.xml file on every host in your cluster, you must add the following information:

 

Table 18.4. hdfs-site.xml

Property NameProperty ValueDescription
dfs.permissions.enabledtrueIf true, permission checking in HDFS is enabled. If false, permission checking is turned off, but all other behavior is unchanged. Switching from one parameter value to the other does not change the mode, owner or group of files or directories.
dfs.permissions.supergrouphdfsThe name of the group of super-users.
dfs.block.access.token.enabletrueIf true, access tokens are used as capabilities for accessing datanodes. If false, no access tokens are checked on accessing datanodes.
dfs.namenode.kerberos.principalnn/_HOST@EXAMPLE.COM Kerberos principal name for the NameNode.
dfs.secondary.namenode.kerberos.principalnn/_HOST@EXAMPLE.COMKerberos principal name for the secondary NameNode.
dfs.web.authentication.kerberos.principal HTTP/_HOST@EXAMPLE.COM

The HTTP Kerberos principal used by Hadoop-Auth in the HTTP endpoint.

The HTTP Kerberos principal MUST start with 'HTTP/' per Kerberos HTTP SPNEGO specification.

dfs.web.authentication.kerberos.keytab /etc/security/keytabs/spnego.service.keytab The Kerberos keytab file with the credentials for the HTTP Kerberos principal used by Hadoop-Auth in the HTTP endpoint.
dfs.datanode.kerberos.principal dn/_HOST@EXAMPLE.COM The Kerberos principal that the DataNode runs as. "_HOST" is replaced by the real host name .
dfs.namenode.keytab.file /etc/security/keytabs/nn.service.keytab Combined keytab file containing the NameNode service and host principals.
dfs.secondary.namenode.keytab.file /etc/security/keytabs/nn.service.keytab Combined keytab file containing the NameNode service and host principals. <question?>
dfs.datanode.keytab.file /etc/security/keytabs/dn.service.keytab The filename of the keytab file for the DataNode.
dfs.https.port 50470 The https port to which the NameNode binds
dfs.namenode.https-address

Example:

ip-10-111-59-170.ec2.internal:50470

The https address to which the NameNode binds
dfs.datanode.data.dir.perm 750 The permissions that must be set on the dfs.data.dir directories. The DataNode will not come up if all existing dfs.data.dir directories do not have this setting. If the directories do not exist, they will be created with this permission
dfs.cluster.administrators hdfs ACL for who all can view the default servlets in the HDFS
dfs.namenode.kerberos.internal.spnego.principal ${dfs.web.authentication.kerberos.principal}  
dfs.secondary.namenode.kerberos.internal.spnego.principal ${dfs.web.authentication.kerberos.principal}  

The XML for these entries:

<property> 
        <name>dfs.permissions</name> 
        <value>true</value> 
        <description> If "true", enable permission checking in
        HDFS. If "false", permission checking is turned
        off, but all other behavior is
        unchanged. Switching from one parameter value to the other does
        not change the mode, owner or group of files or
        directories. </description> 
</property>   

<property> 
        <name>dfs.permissions.supergroup</name> 
        <value>hdfs</value> 
        <description>The name of the group of
        super-users.</description> 
</property>   

<property> 
        <name>dfs.namenode.handler.count</name> 
        <value>100</value> 
        <description>Added to grow Queue size so that more
        client connections are allowed</description> 
</property>   

<property> 
        <name>ipc.server.max.response.size</name> 
        <value>5242880</value> 
</property>   

<property> 
        <name>dfs.block.access.token.enable</name> 
        <value>true</value> 
        <description> If "true", access tokens are used as capabilities
        for accessing datanodes. If "false", no access tokens are checked on
        accessing datanodes. </description> 
</property>   

<property> 
        <name>dfs.namenode.kerberos.principal</name> 
        <value>nn/_HOST@EXAMPLE.COM</value> 
        <description> Kerberos principal name for the
        NameNode </description> 
</property>   

<property> 
        <name>dfs.secondary.namenode.kerberos.principal</name> 
        <value>nn/_HOST@EXAMPLE.COM</value>    
        <description>Kerberos principal name for the secondary NameNode.    
        </description>          
</property>      

<property>     
        <!--cluster variant -->    
        <name>dfs.secondary.http.address</name>    
        <value>ip-10-72-235-178.ec2.internal:50090</value>    
        <description>Address of secondary namenode web server</description>  
</property>    

<property>    
        <name>dfs.secondary.https.port</name>    
        <value>50490</value>    
        <description>The https port where secondary-namenode
        binds</description>  
</property>    

<property>    
        <name>dfs.web.authentication.kerberos.principal</name>    
        <value>HTTP/_HOST@EXAMPLE.COM</value>    
        <description> The HTTP Kerberos principal used by Hadoop-Auth in the HTTP endpoint. 
        The HTTP Kerberos principal MUST start with 'HTTP/' per Kerberos HTTP
        SPNEGO specification.    
        </description>  
</property>    

<property>    
        <name>dfs.web.authentication.kerberos.keytab</name>    
        <value>/etc/security/keytabs/spnego.service.keytab</value>    
        <description>The Kerberos keytab file with the credentials for the HTTP
        Kerberos principal used by Hadoop-Auth in the HTTP endpoint.    
        </description>  
</property>    

<property>    
        <name>dfs.datanode.kerberos.principal</name>    
        <value>dn/_HOST@EXAMPLE.COM</value>  
        <description>        
        The Kerberos principal that the DataNode runs as. "_HOST" is replaced by the real
        host name.    
        </description>  
</property>    

<property>    
        <name>dfs.namenode.keytab.file</name>    
        <value>/etc/security/keytabs/nn.service.keytab</value>  
        <description>        
        Combined keytab file containing the namenode service and host
        principals.    
        </description>  
</property>    

<property>     
        <name>dfs.secondary.namenode.keytab.file</name>    
        <value>/etc/security/keytabs/nn.service.keytab</value>  
        <description>        
        Combined keytab file containing the namenode service and host
        principals.    
        </description>  
</property>    

<property>     
        <name>dfs.datanode.keytab.file</name>    
        <value>/etc/security/keytabs/dn.service.keytab</value>  
        <description>        
        The filename of the keytab file for the DataNode.    
        </description>  
</property>    

<property>    
        <name>dfs.https.port</name>    
        <value>50470</value>  
        <description>The https port where namenode
        binds</description>    
</property>    

<property>    
        <name>dfs.https.address</name>    
        <value>ip-10-111-59-170.ec2.internal:50470</value>  
        <description>The https address where namenode binds</description>    
</property>    

<property>    
        <name>dfs.datanode.data.dir.perm</name>    
        <value>750</value> 
        <description>The permissions that should be there on
        dfs.data.dir directories. The datanode will not come up if the
        permissions are different on existing dfs.data.dir directories. If
        the directories don't exist, they will be created with this
        permission.</description>  
</property>    

<property>  
        <name>dfs.access.time.precision</name>  
        <value>0</value>  
        <description>The access time for HDFS file is precise upto this
        value.The default value is 1 hour. Setting a value of 0
        disables access times for HDFS.  
        </description> 
</property>   

<property>  
        <name>dfs.cluster.administrators</name>  
        <value> hdfs</value>  
        <description>ACL for who all can view the default
        servlets in the HDFS</description> 
</property>   

<property>  
        <name>ipc.server.read.threadpool.size</name>  
        <value>5</value>  
        <description></description> 
</property>   

<property>  
        <name>dfs.namenode.kerberos.internal.spnego.principal</name>  
        <value>${dfs.web.authentication.kerberos.principal}</value> 
</property>   

<property>  
        <name>dfs.secondary.namenode.kerberos.internal.spnego.principal</name>  
        <value>${dfs.web.authentication.kerberos.principal}</value> 
</property> 

In addition, you must set the user on all secure DataNodes:

export HADOOP_SECURE_DN_USER=hdfs
export HADOOP_SECURE_DN_PID_DIR=/grid/0/var/run/hadoop/$HADOOP_SECURE_DN_USER

 2.3.3. mapred-site.xml

To the mapred-site.xml file on every host in your cluster, you must add the following information:

 

Table 18.5. mapred-site.xml

Property NameProperty ValueDescriptionFinal
mapreduce.jobtracker.kerberos.principal jt/_HOST@EXAMPLE.COM Kerberos principal name for the JobTracker 
mapreduce.tasktracker.kerberos.principal tt/_HOST@EXAMPLE.COM Kerberos principal name for the TaskTracker. _HOST" is replaced by the host name of the task tracker. 
hadoop.job.history.user.location none true
mapreduce.jobtracker.keytab.file /etc/security/keytabs/jt.service.keytab The keytab for the JobTracker principal  
mapreduce.tasktracker.keytab.file /etc/security/keytabs/tt.service.keytab The keytab for the Tasktracker principal 
mapreduce.jobtracker.staging.root.dir /user The path prefix for the location of the the staging directories. The next level is always the user's name. It is a path in the default file system  
mapreduce.tasktracker.group hadoop The group that the task controller uses for accessing the task controller. The mapred user must be a member and users should not be members. <question?> 
mapreduce.jobtracker.split.metainfo.maxsize 50000000 If the size of the split metainfo file is larger than this value, the JobTracker will fail the job during initialization. true
mapreduce.history.server.embedded false Should the Job History server be embedded within the JobTracker process true

mapreduce.history.server.http.address

Note: cluster variant

Example:

ip-10-111-59-170.ec2.internal:51111

  

mapreduce.jobhistory.kerberos.principal

Note: cluster variant

jt/_HOST@EXAMPLE.COM Kerberos principal name for JobHistory. This must map to the same user as the JT user. true

mapreduce.jobhistory.keytab.file

Note: cluster variant

/etc/security/keytabs/jt.service.keytab The keytab for the JobHistory principal 
mapred.jobtracker.blacklist.fault-timeout-window

Example:

180

3-hour sliding window - the value is specified in minutes.  
mapred.jobtracker.blacklist.fault-bucket-width

Example:

15

15-minute bucket size - the value is specified in minutes.  
mapred.queue.names default Comma separated list of queues configured for this jobtracker.  

The XML for these entries:

<property>  
        <name>mapreduce.jobtracker.kerberos.principal</name>  
        <value>jt/_HOST@EXAMPLE.COM</value>  
        <description> JT
        user name key.  </description> 
</property>   

<property>  
        <name>mapreduce.tasktracker.kerberos.principal</name>   
        <value>tt/_HOST@EXAMPLE.COM</value>  
        <description>tt
        user name key. "_HOST" is replaced by the host name of the task tracker.   
        </description> 
</property>      

<property>    
        <name>hadoop.job.history.user.location</name>    
        <value>none</value>    
        <final>true</final>  
</property>      

<property>   
        <name>mapreduce.jobtracker.keytab.file</name>   
        <value>/etc/security/keytabs/jt.service.keytab</value>   
        <description>       
        The keytab for the jobtracker principal.   
        </description>   
</property>    

<property>   
        <name>mapreduce.tasktracker.keytab.file</name>   
        <value>/etc/security/keytabs/tt.service.keytab</value>    
        <description>The filename of the keytab for the task
        tracker</description>  
</property>    

<property>   
        <name>mapreduce.jobtracker.staging.root.dir</name>   
        <value>/user</value>  
        <description>The Path prefix for where the staging
        directories should be placed. The next level is always the user's name. It
        is a path in the default file system.</description>  
</property>    

<property>      
        <name>mapreduce.tasktracker.group</name>      
        <value>hadoop</value>      
        <description>The group that the task controller uses for accessing the task controller.
        The mapred user must be a member and users should *not* be
        members.</description>    
</property>    

<property>    
        <name>mapreduce.jobtracker.split.metainfo.maxsize</name>    
        <value>50000000</value>     
        <final>true</final>     
        <description>If the size of the split metainfo file is larger than this, the JobTracker
        will fail the job during    
        initialize.   
        </description>  
</property>  

<property>    
        <name>mapreduce.history.server.embedded</name>     
        <value>false</value>    
        <description>Should job history server be embedded within Job tracker process</description>    
        <final>true</final>  
</property>    

<property>    
        <name>mapreduce.history.server.http.address</name>     
        <!--cluster variant -->     
        <value>ip-10-111-59-170.ec2.internal:51111</value>    
        <description>Http address of the history server</description>    
        <final>true</final>  
</property>    

<property>    
        <name>mapreduce.jobhistory.kerberos.principal</name>     
        <!--cluster variant -->  
        <value>jt/_HOST@EXAMPLE.COM</value>    
        <description>Job history user name key. (must map to same user as JT user)</description>  
</property>    

<property>   
        <name>mapreduce.jobhistory.keytab.file</name>     
        <!--cluster variant -->   
        <value>/etc/security/keytabs/jt.service.keytab</value>   
        <description>The keytab for the job history server
        principal.</description>  
</property>   

<property>  
        <name>mapred.jobtracker.blacklist.fault-timeout-window</name>  
        <value>180</value>  
        <description>     3-hour
        sliding window (value is in minutes)  
        </description> 
</property>   

<property>  
        <name>mapred.jobtracker.blacklist.fault-bucket-width</name>  
        <value>15</value>  
        <description>    
        15-minute bucket size (value is in minutes)  
        </description> 
</property>   

<property>  
        <name>mapred.queue.names</name>  
        <value>default</value>   <description>
        Comma separated list of queues configured for this jobtracker.</description> 
</property>    
    

 2.3.4. hbase-site.xml

For Hbase to run on a secured cluster, Hbase must be able to authenticate itself to HDFS. To the hbase-site.xml file on your HBase server, you must add the following information. There are no default values; the following are all only examples:

 

Table 18.6. hbase-site.xml

Property NameProperty ValueDescription
hbase.master.keytab.file /etc/security/keytabs/hm.service.keytab The keytab for the HMaster service principal
hbase.master.kerberos.principal hm/_HOST@EXAMPLE.COM The Kerberos principal name that should be used to run the HMaster process. If _HOST is used as the hostname portion, it will be replaced with the actual hostname of the running instance.
hbase.regionserver.keytab.file /etc/security/keytabs/rs.service.keytab The keytab for the HRegionServer service principal
hbase.regionserver.kerberos.principal rs/_HOST@EXAMPLE.COM The Kerberos principal name that should be used to run the HRegionServer process. If _HOST is used as the hostname portion, it will be replaced with the actual hostname of the running instance.
hbase.superuser hbase Comma-separated List of users or groups that are allowed full privileges, regardless of stored ACLs, across the cluster.  Only used when HBase security is enabled.
hbase.coprocessor.region.classes  Comma-separated list of Coprocessors that are loaded by  default on all tables. For any override coprocessor method, these classes will be called in order. After implementing your own Coprocessor, just put it in HBase's classpath and add the fully qualified class name here. A coprocessor can also be loaded on demand by setting HTableDescriptor.
hbase.coprocessor.master.classes  Comma-separated list of org.apache.hadoop.hbase.coprocessor.MasterObserver coprocessors that are loaded by default on the active HMaster process. For any implemented   coprocessor methods, the listed classes will be called in order. After implementing your own MasterObserver, just put it in HBase's classpath and add the fully qualified class name here.

The XML for these entries:

  <property>    
        <name>hbase.master.keytab.file</name>    
        <value>/etc/security/keytabs/hm.service.keytab</value>    
        <description>Full path to the kerberos keytab file to use for logging
        in the configured HMaster server principal.    
        </description>  
</property>  

<property>    
        <name>hbase.master.kerberos.principal</name>    
        <value>hm/_HOST@EXAMPLE.COM</value>    
        <description>Ex. "hbase/_HOST@EXAMPLE.COM". 
        The kerberos principal name that
        should be used to run the HMaster process.  The
        principal name should be in
        the form: user/hostname@DOMAIN.  If "_HOST" is used
        as the hostname portion, it will be replaced with the actual hostname of the running    
        instance.    
        </description>  
</property>  

<property>    
        <name>hbase.regionserver.keytab.file</name>    
        <value>/etc/security/keytabs/rs.service.keytab</value>    
        <description>Full path to the kerberos keytab file to use for logging
        in the configured HRegionServer server principal.    
        </description>  
</property>  

<property>    
        <name>hbase.regionserver.kerberos.principal</name>    
        <value>rs/_HOST@EXAMPLE.COM</value>    
        <description>Ex. "hbase/_HOST@EXAMPLE.COM". 
        The kerberos principal name that
        should be used to run the HRegionServer process. The
        principal name should be in the form: 
        user/hostname@DOMAIN.  If _HOST
        is used as the hostname portion, it will be replaced 
        with the actual hostname of the running
        instance.  An entry for this principal must exist
        in the file specified in hbase.regionserver.keytab.file    
        </description>  
</property>     

<!--Additional configuration specific to HBase security -->
  
<property>    
        <name>hbase.superuser</name>    
        <value>hbase</value>    
        <description>List of users or groups (comma-separated), who are
        allowed full privileges, regardless of stored ACLs, across the cluster. Only
        used when HBase security is enabled.    
        </description>  
</property>    

<property>    
        <name>hbase.coprocessor.region.classes</name>    
        <value></value>    
        <description>A comma-separated list of Coprocessors that are loaded
        by default on all tables. For any override coprocessor method, these classes     
        will be called in order. After implementing your own Coprocessor, 
        just put it in HBase's classpath and add the fully qualified class name here. A
        coprocessor can also be loaded on demand by setting HTableDescriptor.    
        </description>  
</property>    

<property>    
        <name>hbase.coprocessor.master.classes</name>    
        <value></value>    
        <description>A comma-separated list of    
        org.apache.hadoop.hbase.coprocessor.MasterObserver coprocessors that
        are loaded by default on the active HMaster process. For any implemented    
        coprocessor methods, the listed classes will be called in order.
        After implementing your own MasterObserver, just put it in HBase's
        classpath and add the fully qualified class name here.    
        </description>  
</property> 

 2.3.5. hive-site.xml

Hive Metastore supports Kerberos authentication for Thrift clients only. HiveServer does not support Kerberos authentication for any clients:

 

Table 18.7. hive-site.xml

Property NameProperty ValueDescription
hive.metastore.sasl.enabled true If true, the Metastore Thrift interface will be secured with SASL and clients must authenticate with Kerberos
hive.metastore.kerberos.keytab.file /etc/security/keytabs/hive.service.keytab The keytab for the Metastore Thrift service principal
hive.metastore.kerberos.principal hive/_HOST@EXAMPLE.COM The service principal for the Metastore Thrift server. If _HOST is used as the hostname portion, it will be replaced with the actual hostname of the running instance.
hive.metastore.cache.pinobjtypes Table,Database,Type,FieldSchema,Order Comma-separated Metastore object types that should be pinned in the cache

The XML for these entries:

<property>    
        <name>hive.metastore.sasl.enabled</name>    
        <value>true</value>    
        <description>If true, the metastore thrift interface will be secured with
        SASL.     
        Clients must authenticate with Kerberos.</description>  
</property>    

<property>    
        <name>hive.metastore.kerberos.keytab.file</name>    
        <value>/etc/security/keytabs/hive.service.keytab</value>    
        <description>The path to the Kerberos Keytab file containing the
        metastore thrift server's service principal.</description>  
</property>    

<property>    
        <name>hive.metastore.kerberos.principal</name>    
        <value>hive/_HOST@EXAMPLE.COM</value>    
        <description>The service principal for the metastore thrift server. The
        special string _HOST will be replaced automatically with the correct 
        hostname.</description>  
</property>    

<property>    
        <name>hive.metastore.cache.pinobjtypes</name>    
        <value>Table,Database,Type,FieldSchema,Order</value>    
        <description>List of comma separated metastore object types that should be pinned in
        the cache</description>  
</property>
 2.3.5.1. oozie-site.xml

To the oozie-site.xml file, you must add the following information:

 

Table 18.8. oozie-site.xml

Property NameProperty ValueDescription
oozie.service.AuthorizationService.security.enabled true Specifies whether security (user name/admin role) is enabled or not. If it is disabled any user can manage the Oozie system and manage any job.
oozie.service.HadoopAccessorService.kerberos.enabled trueIndicates if Oozie is configured to use Kerberos
local.realm EXAMPLE.COM Kerberos Realm used by Oozie and Hadoop. Using 'local.realm' to be aligned with Hadoop configuration.
oozie.service.HadoopAccessorService.keytab.file /etc/security/keytabs/oozie.service.keytab The keytab for the Oozie service principal.
oozie.service.HadoopAccessorService.kerberos.principal oozie/_HOSTl@EXAMPLE.COM Kerberos principal for Oozie service
oozie.authentication.type kerberos  
oozie.authentication.kerberos.principal HTTP/_HOST@EXAMPLE.COM Whitelisted job tracker for Oozie service
oozie.authentication.kerberos.keytab /etc/security/keytabs/spnego.service.keytab Location of the Oozie user keytab file.
oozie.service.HadoopAccessorService.nameNode.whitelist   
oozie.authentication.kerberos.name.rules
RULE:[2:$1@$0]([jt]t@.*EXAMPLE.COM)s/.*/mapred/
RULE:[2:$1@$0]([nd]n@.*EXAMPLE.COM)s/.*/hdfs/
RULE:[2:$1@$0](hm@.*EXAMPLE.COM)s/.*/hbase/
RULE:[2:$1@$0](rs@.*EXAMPLE.COM)s/.*/hbase/
DEFAULT
The mapping from Kerberos principal names to local OS user names. See Creating Mappings Between Principals and UNIX Usernames for more information.

 2.3.5.2. webhcat-site.xml

To the webhcat-site.xml file, you must add the following information:

 

Table 18.9. webhcat-site.xml

Property NameProperty ValueDescription
templeton.kerberos.principalHTTP/_HOST@EXAMPLE.COM 
templeton.kerberos.keytab/etc/security/keytabs/spnego.service.keytab 
templeton.kerberos.secretsecret 


loading table of contents...