6. Migrate the HDP Configurations

Configurations and configuration file names have changed between HDP 1.3.2 (Hadoop 1.2.x) and HDP 2.1 (Hadoop 2.4). To upgrade to HDP 2.x, back up your current configuration files, download the new HDP 2.1 files, and compare. The following tables provide mapping information to make the comparison between releases easier.

To migrate the HDP Configurations

  1. Back up the following HDP 1.x configurations on all nodes in your clusters.

    • /etc/hadoop/conf

    • /etc/hbase/conf

    • /etc/hcatalog/conf (Note: With HDP 2.1, /etc/hcatalog/conf is divided into /etc/hive- hcatalog/conf and /etc/hive-webhcat.You cannot use /etc/ hcatalog/conf in HDP 2.1.)

    • /etc/hive/conf

    • /etc/pig/conf

    • /etc/sqoop/conf

    • /etc/flume/conf

    • /etc/mahout/conf

    • /etc/oozie/conf

  2. Edit /etc/hadoop/conf/core-site.xml and set hadoop.rpc.protection from none to authentication.

    [Note]Note

    Hadoop lets cluster administrators control the quality of protection in the configuration parameter “hadoop.rpc.protection” in core-site.xml. It is an optional parameter in HDP 2.2. If not present, the default QOP setting of “auth” is used, which implies “authentication only”.

    Valid values for this parameter are: “authentication” : Corresponds to “auth” “integrity” : Corresponds to “auth-int” “privacy” : Corresponds to “auth-conf”

    The default setting is authentication-only because integrity checks and encryption are a performance cost.

  3. Copy your /etc/hcatalog/conf configurations to /etc/hive-hcatalog/conf and /etc/hive-webhcat as appropriate.

  4. Copy log4j.properties from the hadoop config directory of the companion files to /etc/hadoop/conf. The file should have owners and permissions similar to other files in /etc/hadoop/conf.

  5. Download the your HDP 2.x companion files (see "Download the Companion Files" in Chapter 1 of the Manual Install Guide) and migrate your HDP 1.x configuration.

  6. Copy these configurations to all nodes in your clusters.

    • /etc/hadoop/conf

    • /etc/hbase/conf

    • /etc/hcatalog/conf

    • /etc/hive/conf

    • /etc/pig/conf

    • /etc/sqoop/conf

    • /etc/flume/conf

    • /etc/mahout/conf

    • /etc/oozie/conf

    • /etc/zookeeper/conf

    [Note]Note

    Upgrading the repo using yum or zypper resets all configurations. Prepare to replace these configuration directories each time you perform a yum or zypper rmgrade.

  7. Review the following HDP 1.3.2 Hadoop Core configurations and the new configurations or locations in HDP 2.x.

     

    Table 3.3. HDP 1.3.2 Hadoop Core Site (core-site.xml)

    HDP 1.3.2 configHDP 1.3.2 config fileHDP 2.2 configHDP 2.2 config file

    fs.default.name

    core-site.xml

    fs.defaultFS

    core-site.xml

    fs.checkpoint.dir

    core-site.xml

    dfs.namenode. checkpoint.dir

    hdfs-site.xml

    fs.checkpoint.edits. dir

    core-site.xml

    dfs.namenode. checkpoint.edits.dir

    hdfs-site.xml

    fs.checkpoint.period

    core-site.xml

    dfs.namenode. checkpoint.period

    hdfs-site.xml

    io.bytes.per. checksum

    core-site.xml

    dfs.bytes-per-checksum

    hdfs-site.xml

    dfs.df.interval

    hdfs-site

    fs.df.interval

    core-site.xml

    hadoop.native.lib

    core-site.xml

    io.native.lib. available

    core-site.xml

    hadoop.configured. node.mapping

    --

    net.topology. configured.node. mapping

    core-site.xml

    topology.node. switch.mapping.impl

    core-site-xml

    net.topology.node. switch.mapping.impl

    core-site.xml

    topology-script. file.name

    core-site.xml

    net.topology.script. file.name

    core-site.xml

    topology.script. number.args

    core-site.xml

    net.topology.script. number.args

    core-site.xml


    [Note]Note

    The hadoop.rpc.protection configuration property in core- site.xml needs to specify authentication, integrity and/or privacy. No value defaults to authentication, but an invalid value such as "none" causes an error.

  8. Review the following 1.3.2 HDFS site configurations and their new configurations and files in HDP 2.x.

     

    Table 3.4. HDP 1.3.2 Hadoop Core Site (hdfs-site.xml)

    HDP 1.3.2 configHDP 1.3.2 config fileHDP 2.2 configHDP 2.2 config file

    dfs.block.size

    hdfs-site.xml

    dfs.blocksize

    hdfs-site.xml

    dfs.write.packet.size

    hdfs-site.xml

    dfs.client-write-packet-size

    hdfs-site.xml

    dfs.https.client. keystore.resource

    hdfs-site.xml

    dfs.client.https. keystore.resource

    hdfs-site.xml

    dfs.https.need. client.auth

    hdfs-site.xml

    dfs.client.https. need-auth

    hdfs-site.xml

    dfs.read.prefetch. size

    hdfs-site.xml

    dfs.bytes-per-checksum

    hdfs-site.xml

    dfs.socket.timeout

    hdfs-site.xml

    dfs.client.socket-timeout

    hdfs-site.xml

    dfs.balance. bandwidthPerSec

    hdfs-site.xml

    dfs.datanode.balance. bandwidthPerSec

    hdfs-site.xml

    dfs.data.dir

    hdfs-site.xml

    dfs.datanode.data.dir

    hdfs-site.xml

    dfs.datanode.max. xcievers

    hdfs-site.xml

    dfs.datanode.max. transfer.threads

    hdfs-site.xml

    session.id

    hdfs-site.xml

    dfs.metrics.session-id

    hdfs-site.xml

    dfs.access.time. precision

    hdfs-site.xml

    dfs.namenode. accesstime.precision

    hdfs-site.xml

    dfs.backup.address

    hdfs-site.xml

    dfs.namenode.backup. address

    hdfs-site.xml

    dfs.backup.http. address

    hdfs-site.xml

    dfs.namenode.backup. http-address

    hdfs-site.xml

    fs.checkpoint.dir

    hdfs-site.xml

    dfs.namenode. checkpoint.dir

    hdfs-site.xml

    fs.checkpoint. edits.dir

    hdfs-site.xml

    dfs.namenode. checkpoint.edits.dir

    hdfs-site.xml

    fs.checkpoint.period

    hdfs-site.xml

    dfs.namenode. checkpoint.period

    hdfs-site.xml

    dfs.name.edits.dir

    hdfs-site.xml

    dfs.namenode. edits.dir

    hdfs-site.xml

    heartbeat.recheck. interval

    hdfs-site.xml

    dfs.namenode. heartbeat.recheck-interval

    hdfs-site.xml

    dfs.http.address

    hdfs-site.xml

    dfs.namenode.http-address

    hdfs-site.xml

    dfs.https.address

    hdfs-site.xml

    dfs.namenode.https-address

    hdfs-site.xml

    dfs.max.objects

    hdfs-site.xml

    dfs.namenode.max. objects

    hdfs-site.xml

    dfs.name.dir

    hdfs-site.xml

    dfs.namenode. name.dir

    hdfs-site.xml

    dfs.name.dir. restore

    hdfs-site.xml

    dfs.namenode.name. dir.restore

    hdfs-site.xml

    dfs.replication. considerLoad

    hdfs-site.xml

    dfs.namenode. replication. considerLoad

    hdfs-site.xml

    dfs.replication. interval

    hdfs-site.xml

    dfs.namenode. replication.interval

    hdfs-site.xml

    dfs.max-repl-streams

    hdfs-site.xml

    dfs.namenode. replication. max-streams

    hdfs-site.xml

    dfs.replication.min

    hdfs-site.xml

    dfs.namenode. replication. min

    hdfs-site.xml

    dfs.replication. pending.timeout.sec

    hdfs-site.xml

    dfs.namenode. replication. pending.timeout-sec

    hdfs-site.xml

    dfs.safemode. extension

    hdfs-site.xml

    dfs.namenode. safemode. extension

    hdfs-site.xml

    dfs.safemode. threshold.pcthdfs-site.xmldfs.namenode. secondary. threshold-pct 

    dfs.secondary. http.address

    hdfs-site.xml

    dfs.namenode. secondary.http-address

    hdfs-site.xml

    dfs.permissions

    hdfs-site.xml

    dfs.permissions. enabled

    hdfs-site.xml

    dfs.permissions. supergroup

    hdfs-site.xml

    dfs.permissions. superusergroup

    hdfs-site.xml

    dfs.df.interval

    hdfs-site.xml

    fs.df.interval

    core-site.xml

    dfs.umaskmode

    hdfs-site.xml

    fs.permissions. umask-mode

    hdfs-site.xml


  9. Review the following HDP 1.3.2 MapReduce Configs and their new HDP 2.x mappings.

     

    Table 3.5. HDP 1.3.2 Configs now in Capacity Scheduler for HDP 2.x (mapred-site.xml)

    HDP 1.3.2 configHDP 1.3.2 config fileHDP 2.2 configHDP 2.2 config file

    mapred.map.child. java.opts

    mapred-site.xml

    mapreduce.map. java.opts

    mapred-site.xml

    mapred.job.map. memory.mb

    mapred-site.xml

    mapred.job.map. memory.mb

    mapred-site.xml

    mapred.reduce.child. java.opts

    mapred-site.xml

    mapreduce.reduce. java.opts

    mapred-site.xml

    mapreduce.job.reduce. memory.mb

    mapred-site.xml

    mapreduce.reduce. memory.mb

    mapred-site.xml

    security.task. umbilical. protocol.acl

    mapred-site.xml

    security.job.task. protocol.acl

    mapred-site.xml


  10. Review the following HDP 1.3.2 Configs and their new HDP 2.x Capacity Scheduler mappings.

     

    Table 3.6. HDP 1.3.2 Configs now in capacity scheduler for HDP 2.x (capacity-scheduler.xml)

    HDP 1.3.2 configHDP 1.3.2 config fileHDP 2.2 configHDP 2.2 config file

    mapred.queue.names

    mapred-site.xml

    yarn.scheduler. capacity.root.queues

    capacity-scheduler.xml

    mapred.queue.default. acl-submit.job

    mapred-queue-acls.xml

    yarn.scheduler. capacity.root. default.acl_ submit_jobs

    capacity-scheduler.xml

    mapred.queue.default. acl.administer-jobs

    mapred-queue-acls.xml

    yarn.scheduler. capacity.root.default. acl_administer_jobs

    capacity-scheduler.xml

    mapred.capacity-scheduler. queue.default. capacity

    capacity-scheduler.xml

    yarn-scheduler.capacity. root.default. capacity

    capacity-scheduler.xml

    mapred.capacity-scheduler. queue.default.user-limit-factor

    capacity-scheduler.xml

    yarn.scheduler. capacity.root.default. user-limit-factor

    capacity-scheduler.xml

    mapred.capacity-scheduler.queue. default.maximum-capacity

    capacity-scheduler.xml

    yarn.scheduler. capacity.root.default. maximum-capacity

    capacity-scheduler.xml

    mapred.queue. default.state

    capacity-scheduler.xml

    yarn.scheduler. capacity.root. default.state

    capacity-scheduler.xml


  11. Compare the following HDP 1.3.2 configs in hadoop-env.sh with the new configs in HDP 2.x.

    Paths have changed in HDP 2.2 to /usr/hdp/current. You must remove lines such as:

    export JAVA_LIBRARY_PATH=/usr/lib/hadoop/lib/native/Linux-amd64-64

     

    Table 3.7. HDP 1.3.2 Configs and HDP 2.x for hadoop-env.sh

    HDP 1.3.2 configHDP 2.2 configDescription

    JAVA_HOME

    JAVA_HOME

    Java implementation to use

    HADOOP_HOME_WARN_SUPPRESS

    HADOOP_HOME_WARN_SUPPRESS

    --

    HADOOP_CONF_DIR

    HADOOP_CONF_DIR

    Hadoop configuration directory

    not in hadoop-env.sh.

    HADOOP_HOME

    --

    not in hadoop-env.sh.

    HADOOP_LIBEXEC_DIR

    --

    HADOOP_NAMENODE_INIT_ HEAPSIZE

    HADOOP_NAMENODE_INIT_ HEAPSIZE

    --

    HADOOP_OPTS

    HADOOP_OPTS

    Extra Java runtime options; empty by default

    HADOOP_NAMENODE_OPTS

    HADOOP_NAMENODE_OPTS

    Command-specific options appended to HADOOP-OPTS

    HADOOP_JOBTRACKER_OPTS

    not in hadoop-env.sh.

    Command-specific options appended to HADOOP-OPTS

    HADOOP_TASKTRACKER_OPTS

    not in hadoop-env.sh.

    Command-specific options appended to HADOOP-OPTS

    HADOOP_DATANODE_OPTS

    HADOOP_DATANODE_OPTS

    Command-specific options appended to HADOOP-OPTS

    HADOOP_BALANCER_OPTS

    HADOOP_BALANCER_OPTS

    Command-specific options appended to HADOOP-OPTS

    HADOOP_SECONDARYNAMENODE_ OPTS

    HADOOP_SECONDARYNAMENODE_ OPTS

    Command-specific options appended to HADOOP-OPTS

    HADOOP_CLIENT_OPTS

    HADOOP_CLIENT_OPTS

    Applies to multiple commands (fs, dfs, fsck, distcp, etc.)

    HADOOP_SECURE_DN_USER

    not in hadoop-env.sh.

    Secure datanodes, user to run the datanode as

    HADOOP_SSH_OPTS

    HADOOP_SSH_OPTS

    Extra ssh options.

    HADOOP_LOG_DIR

    HADOOP_LOG_DIR

    Directory where log files are stored in the secure data environment.

    HADOOP_SECURE_DN_LOG_DIR

    HADOOP_SECURE_DN_LOG_DIR

    Directory where pid files are stored; /tmp by default.

    HADOOP_PID_DIR

    HADOOP_PID_DIR

    Directory where pid files are stored, /tmp by default.

    HADOOP_SECURE_DN_PID_DIR

    HADOOP_SECURE_DN_PID_DIR

    Directory where pid files are stored, /tmp by default.

    HADOOP_IDENT_STRING

    HADOOP_IDENT_STRING

    String representing this instance of hadoop. $USER by default

    not in hadoop-env.sh.

    HADOOP_MAPRED_LOG_DIR

    --

    not in hadoop-env.sh.

    HADOOP_MAPRED_PID_DIR

    --

    not in hadoop-env.sh.

    JAVA_LIBRARY_PATH

    --

    not in hadoop-env.sh.

    JSVC_HOME

    For starting the datanode on a secure cluster


    [Note]Note

    Some of the configuration settings refer to the variable HADOOP_HOME. The value of HADOOP_HOME is automatically inferred from the location of the startup scripts. HADOOP_HOME is the parent directory of the bin directory that holds the Hadoop scripts. In many instances this is $HADOOP_INSTALL/hadoop.

  12. Add the following properties to the yarn-site.xml file:

    <property>
     <name>yarn.resourcemanager.scheduler.class</name>
     <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity. CapacityScheduler</value>
    </property>
    
    <property>
     <name>yarn.resourcemanager.resource-tracker.address</name>
     <value>$resourcemanager.full.hostname:8025</value>
     <description>Enter your ResourceManager hostname.</description>
    </property>
    
    <property>
     <name>yarn.resourcemanager.scheduler.address</name>
     <value>$resourcemanager.full.hostname:8030</value>
     <description>Enter your ResourceManager hostname.</description>
    </property>
    
    <property>
     <name>yarn.resourcemanager.address</name>
     <value>$resourcemanager.full.hostname:8050</value>
     <description>Enter your ResourceManager hostname.</description>
    </property>
    
    <property>
     <name>yarn.resourcemanager.admin.address</name>
     <value>$resourcemanager.full.hostname:8141</value>
     <description>Enter your ResourceManager hostname.</description>
    </property>
    
    <property>
     <name>yarn.nodemanager.local-dirs</name>
     <value>/grid/hadoop/yarn/local,/grid1/hadoop/yarn/local</value>
     <description>Comma-separated list of paths. Use the list of directories from $YARN_LOCAL_DIR.For example, /grid/hadoop/yarn/local,/grid1/hadoop/yarn/local.</description>
    </property>
    
    <property>
     <name>yarn.nodemanager.log-dirs</name>
     <value>/grid/hadoop/yarn/log</value>
     <description>Use the list of directories from $YARN_LOCAL_LOG_DIR.For example, /grid/hadoop/yarn/log,/grid1/hadoop/yarn/log,/grid2/hadoop/yarn/log</description>
    </property>
    
    <property>
     <name>yarn.log.server.url</name>
     <value>http://$jobhistoryserver.full.hostname:19888/jobhistory/logs/</ value>
     <description>URL for job history server</description>
    </property>
    
    <property>
     <name>yarn.resourcemanager.webapp.address</name>
     <value>$resourcemanager.full.hostname:8088</value>
     <description>URL for job history server</description>
    </property>
    
    <property>
     <name>yarn.nodemanager.admin-env</name>
     <value>MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX</value>
     <description>Restrict the number of memory arenas to prevent 
        excessive VMEM use by the glib arena allocator. 
        For example, MALLOC_ARENA_MAX=4</description>
    </property>
  13. Add the following properties to the yarn-site.xml file:

    <property>
     <name>yarn.resourcemanager.scheduler.class</name>
     <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity. CapacityScheduler</value>
    </property>
    
    <property>
     <name>yarn.resourcemanager.resource-tracker.address</name>
     <value>$resourcemanager.full.hostname:8025</value>
     <description>Enter your ResourceManager hostname.</description>
    </property>
    
    <property>
     <name>yarn.resourcemanager.scheduler.address</name>
     <value>$resourcemanager.full.hostname:8030</value>
     <description>Enter your ResourceManager hostname.</description>
    </property>
    
    <property>
     <name>yarn.resourcemanager.address</name>
     <value>$resourcemanager.full.hostname:8050
     </value><description>Enter your ResourceManager hostname.
     </description></property>
    
    <property>
     <name>yarn.resourcemanager.admin.address</name>
     <value>$resourcemanager.full.hostname:8141</value>
     <description>Enter your ResourceManager hostname.</description>
    </property>
    
    <property>
     <name>yarn.nodemanager.local-dirs</name>
     <value>/grid/hadoop/yarn/local,/grid1/hadoop/yarn/local</value>
     <description>Comma separated list of paths. Use the list of directories 
      from $YARN_LOCAL_DIR. For example, 
      /grid/hadoop/yarn/local,/grid1/hadoop/yarn/local.
    </description>
    </property>
    
    <property>
     <name>yarn.nodemanager.log-dirs</name>
     <value>/grid/hadoop/yarn/log</value>
     <description>Use the list of directories from $YARN_LOCAL_LOG_DIR.
      For example, /grid/hadoop/yarn/log,/grid1/hadoop/yarn/log,/ 
      grid2/hadoop/yarn/log
    </description>
    </property>
    
    <property>
     <name>yarn.log.server.url</name>
     <value>http://$jobhistoryserver.full.hostname:19888/jobhistory/logs/</ value>
     <description>URL for job history server</description>
    </property>
    
    <property>
     <name>yarn.resourcemanager.webapp.address</name>
     <value>$resourcemanager.full.hostname:8088</value>
     <description>URL for job history server</description>
    </property>
    
    <property>
     <name>yarn.nodemanager.admin-env</name>
     <value>MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX</value>
     <description>Restrict the number of memory arenas to prevent excessive VMEM use by
     the glib arena allocator. For example, MALLOC_ARENA_MAX=4</description>
    </property> 
  14. Add the following properties to the yarn-site.xml file:

    <property>
     <name>yarn.resourcemanager.scheduler.class</name>
     <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity. CapacityScheduler</value>
    </property>
    
    <property>
     <name>yarn.resourcemanager.resource-tracker.address</name>
     <value>$resourcemanager.full.hostname:8025</value>
     <description>Enter your ResourceManager hostname.</description>
    </property>
    
    <property>
     <name>yarn.resourcemanager.scheduler.address</name>
     <value>$resourcemanager.full.hostname:8030</value>
     <description>Enter your ResourceManager hostname.</description>
    </property>
    
    <property>
     <name>yarn.resourcemanager.address</name>
     <value>$resourcemanager.full.hostname:8050</value>
     <description>Enter your ResourceManager hostname.</description>
    </property>
    
    <property>
     <name>yarn.resourcemanager.admin.address</name>
     <value>$resourcemanager.full.hostname:8141</value>
     <description>Enter your ResourceManager hostname.</description>
    </property>
    
    <property>
     <name>yarn.nodemanager.local-dirs</name>
     <value>/grid/hadoop/yarn/local,/grid1/hadoop/yarn/local</value>
     <description>Comma separated list of paths. Use the list of directories 
      from $YARN_LOCAL_DIR. For example, 
      /grid/hadoop/yarn/local,/grid1/hadoop/yarn/local.
     </description>
    </property>
    
    <property>
     <name>yarn.nodemanager.log-dirs</name>
     <value>/grid/hadoop/yarn/log</value>
     <description>Use the list of directories from $YARN_LOCAL_LOG_DIR. 
      For example, /grid/hadoop/yarn/log,
      /grid1/hadoop/yarn/log,/grid2/hadoop/yarn/log
     </description>
    </property>
    
    <property>
     <name>yarn.log.server.url</name>
     <value>http://$jobhistoryserver.full.hostname:19888/jobhistory/logs/
     </value>
     <description>URL for job history server</description>
    </property>
    
    <property>
     <name>yarn.resourcemanager.webapp.address</name>
     <value>$resourcemanager.full.hostname:8088</value>
     <description>URL for job history server</description>
    </property>
    
    <property>
     <name>yarn.nodemanager.admin-env</name>
     <value>MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX</value>
     <description>Restrict the number of memory arenas to prevent excessive 
     VMEM use by the glib arena allocator. For example, 
     MALLOC_ARENA_MAX=4</description>
    </property>
  15. Adding the following properties to the mapred-site.xml file:

    <property>
     <name>mapreduce.jobhistory.address</name>
     <value>$jobhistoryserver.full.hostname:10020</value>
     <description>Enter your JobHistoryServer hostname.</description>
    </property>
    
    <property>
     <name>mapreduce.jobhistory.webapp.address</name>
     <value>$jobhistoryserver.full.hostname:19888</value>
     <description>Enter your JobHistoryServer hostname.</description>
    </property>
    
    <property>
     <name>mapreduce.shuffle.port</name>
     <value>13562</value>
    </property>
    
    <property>
     <name>mapreduce.framework.name</name>
     <value>yarn</value>
    </property>
  16. For a secure cluster, add the following properties to mapred-site.xml:

    <property>
     <name>mapreduce.jobhistory.principal</name>
     <value>jhs/_PRINCIPAL@$REALM.ACME.COM</value>
     <description>Kerberos principal name for the MapReduce JobHistory Server.
     </description>
    </property>
    
    </property>
     <name>mapreduce.jobhistory.keytab</name>
     <value>/etc/security/keytabs/jhs.service.keytab</value>
     <description>Kerberos keytab file for the MapReduce JobHistory Server.</description>
    </property> 
  17. For a secure cluster, you must also update hadoop.security.auth_to_local in core- site.xml to include a rule regarding the mapreduce.jobhistory.principal value you set in the previous step:

    RULE:[2:$1@$0](PRINCIPAL@$REALM.ACME.COM )s/.*/mapred/

    where PRINCIPAL and REALM are the kerberos principal and realm you specified in mapreduce.jobhistory.principal.

  18. Delete any remaining HDP1 properties in the mapred-site.xml file.

  19. Replace the default memory configuration settings in yarn-site.xml and mapred-site.xml with the YARN and MapReduce memory configuration settings you calculated previously.


loading table of contents...