1. Meet Minimum System Requirements

To run the Hortonworks Data Platform, your system must meet minimum requirements.

 1.1. Hardware Recommendations

Although there is no single hardware requirement for installing HDP, there are some basic guide­lines. You can see sample setups here: Suggested Hardware for a Typical Hadoop Cluster.

 1.2. Operating Systems Requirements

The following operating systems are supported:

  • 64-bit Red Hat Enterprise Linux (RHEL) 5 or 6

  • 64-bit CentOS 5 or 6

  • 64-bit Oracle Linux 5 or 6

  • 64-bit SUSE Linux Enterprise Server (SLES) 11, SP1

 1.3. Software Requirements

On each of your hosts:

  • yum [for RHEL or CentOS]

  • zypper [for SLES]

  • php_curl [for SLES]

  • rpm

  • scp

  • curl

  • wget

  • unzip

  • tar

 1.4. Metastore Database Requirements

If you are installing Hive and HCatalog or installing Oozie, you must install a database to store metadata information in the metastore. You can either use an existing database instance or install a new instance manually. HDP supports the following databases for the metastore:

  • Postgres 8.x, 9.x

  • MySQL 5.x

  • Oracle 11g r2

  • SQL Server 2012, 2014

The database administrator must create the following databases users for Hive and/or Oozie:

  • For Hive, ensure that your database administrator creates hive_dbname, hive_dbuser, and hive_dbpasswd.

  • For Oozie, ensure that your database administrator creates oozie_dbname, oozie_dbuser, and oozie_dbpasswd.

[Note]Note

By default, Hive uses the Derby database for the metastore. However, Derby is not supported for production systems.

 1.4.1. Installing and Configuring PostgresSQL

The following instructions explain how to install PostgresSQL as the metastore database. See your third-party documentation for instructions on how to install other supported databases.

To install a new instance of PostgresSQL:

  1. Connect to the host machine where you plan to deploy PostgreSQL instance and from a terminal window, type:

    • For RHEL and CentOS:

      yum install postgresql-server
    • For SLES:

      zypper install postgresql-server
    • For Ubuntu:

      apt-get install postgresql-server
  2. Start the instance. For RHEL and CentOS:

    /etc/init.d/postgresql start
    [Note]Note

    For some newer versions of PostgreSQL, you might need to execute the following command:

    /etc/init.d/postgresql initdb
  3. Reconfigure PostgreSQL server:

    1. Edit the /var/lib/pgsql/data/postgresql.conf file and change the value of #listen_addresses = 'localhost' to the following:

      listen_addresses = '*'
    2. Edit the /var/lib/pgsql/data/postgresql.conf file and change the port setting #port = 5432 to the following:

      port = 5432
    3. Edit the /var/lib/pgsql/data/pg_hba.conf and add the following:

      host all all 0.0.0.0/0 trust
    4. Optional - If you are using PostgreSQL v9.1 or later, add the following to the /var/lib/pgsql/data/postgresql.conf file:

      standard_conforming_strings = off

  4. Create users for PostgreSQL server:

    echo "CREATE DATABASE $dbname;" | psql -U postgres
    echo "CREATE USER $user WITH PASSWORD '$passwd';" | psql -U postgres
    echo "GRANT ALL PRIVILEGES ON DATABASE $dbname TO $user;" | psql -U postgres 
    [Note]Note

    For access to Hive metastore, create hive_dbuser and for access to Oozie metastore, create oozie_dbuser.

  5. On the Hive Metastore host, install the connector.

    1. Install the connector.

      RHEL/CentOS/Oracle Linux

      yum install postgresql-jdbc*

      SLES

      zypper install -y postgresql-jdbc
    2. Copy the connector .jar file to the Java share directory.

      cp /usr/share/pgsql/postgresql-*.jdbc3.jar /usr/share/java/postgresql-jdbc.jar
    3. Confirm that .jar is in the Java share directory.

      ls /usr/share/java/postgresql-jdbc.jar
    4. Change the access mode of the .jar file to 644.

      chmod 644 /usr/share/java/postgresql-jdbc.jar
  6. Load the Hive schema:

    psql -U $HIVEUSER -d $HIVEDATABASE
    \connect $HIVEDATABASE;
    \i hive-schema-0.13.0.postgres.sql;

    [Note]Note

    The Hive schema is located at /usr/lib/hive/scripts/metastore/upgrade/postgres.

 1.4.2. Installing and Configure MySQL

The following instructions explain how to install MySQL as the metastore database. See your third-party documentation for instructions on how to install other supported databases.

To install a new instance of MySQL:

  1. Connect to the host machine you plan to use for Hive and HCatalog.

  2. Install MySQL server. From a terminal window, type:

    For RHEL/CentOS/Oracle Linux:

    yum install mysql-server

    For SLES:

    zypper install mysql-server

    For Ubuntu:

    apt-get install mysql-server
  3. Start the instance.

    For RHEL/CentOS/Oracle Linux:

    /etc/init.d/mysqld start 

    For SLES:

    /etc/init.d/mysqld start

    For Ubuntu:

    /etc/init.d/mysql start

  4. Set the root user password using the following command format:

    mysqladmin -u root password $mysqlpassword

    For example, to set the password to "root":

    mysqladmin -u root password root
  5. Remove unnecessary information from log and STDOUT.

    mysqladmin -u root 2>&1 >/dev/null
  6. Now that the root password has been set, you can use the following command to log in to MySQL as root:

    mysql -u root -proot

    As root, create the “dbuser” and grant it adequate privileges. This user provides access to the Hive metastore. Use the following series of commands (shown here with the returned responses) to create "dbuser" with password "dbuser".

    [root@c6402 /]# mysql -u root -proot
    Welcome to the MySQL monitor.  Commands end with ; or \g.
    Your MySQL connection id is 11
    Server version: 5.1.73 Source distribution
    
    Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.
    
    Oracle is a registered trademark of Oracle Corporation and/or its
    affiliates. Other names may be trademarks of their respective
    owners.
    
    Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
    
    mysql> CREATE USER 'dbuser'@'localhost' IDENTIFIED BY 'dbuser';
    Query OK, 0 rows affected (0.00 sec)
    
    mysql> GRANT ALL PRIVILEGES ON *.* TO 'dbuser'@'localhost';
    Query OK, 0 rows affected (0.00 sec)
    
    mysql> CREATE USER 'dbuser'@'%' IDENTIFIED BY 'dbuser';
    Query OK, 0 rows affected (0.00 sec)
    
    mysql> GRANT ALL PRIVILEGES ON *.* TO 'dbuser'@'%';
    Query OK, 0 rows affected (0.00 sec)
    
    mysql> FLUSH PRIVILEGES;
    Query OK, 0 rows affected (0.00 sec)
    
    mysql> GRANT ALL PRIVILEGES ON *.* TO 'dbuser'@'localhost' WITH GRANT OPTION;
    Query OK, 0 rows affected (0.00 sec)
    
    mysql> GRANT ALL PRIVILEGES ON *.* TO 'dbuser'@'%' WITH GRANT OPTION;
    Query OK, 0 rows affected (0.00 sec)
    
    mysql>
  7. Use the exit command to exit MySQL.

  8. You should now be able to reconnect to the database as "dbuser" using the following command:

    mysql -u dbuser -pdbuser

    After testing the "dbuser" login, use the exit command to exit MySQL.

  9. Install the MySQL connector JAR file.

    • For RHEL/CentOS/Oracle Linux:

      yum install mysql-connector-java*
    • For SLES:

      zypper install mysql-connector-java*
    • For Ubuntu:

      apt-get install mysql-connector-java* 
  10. Load the Hive database schema.

    mysql $HIVEUSER/$HIVEPASSWORD < hive-schema-0.13.0.mysql.sql

 1.4.3. Installing and Configuring Oracle

To set up Oracle for use with Hive:

  1. On the Hive Metastore host, install the appropriate JDBC .jar file.

    1. Download the Oracle JDBC (OJDBC) driver from http://www.oracle.com/technetwork/database/features/jdbc/index-091264.html.

    2. Select Oracle Database 11g Release 2 - ojdbc6.jar.

    3. Copy the .jar file to the Java share directory.

      cp ojdbc6.jar /usr/share/java
    4. Make sure the .jar file has the appropriate permissions - 644.

  2. Create a user for Hive and grant it permissions.

    • Using the Oracle database admin utility:

      # sqlplus sys/root as sysdba
      CREATE USER $HIVEUSER IDENTIFIED BY $HIVEPASSWORD;
      GRANT SELECT_CATALOG_ROLE TO $HIVEUSER;
      GRANT CONNECT, RESOURCE TO $HIVEUSER;                            
      QUIT;
    • Where $HIVEUSER is the Hive user name and $HIVEPASSWORD is the Hive user password.

  3. Load the Hive database schema.

    sqlplus $HIVEUSER/$HIVEPASSWORD < hive-schema-0.13.0.oracle.sql

[Note]Note

The hive schema is located at /usr/lib/hive/scripts/metastore/upgrade/oracle/.

 1.5. JDK Requirements

Your system must have the correct JDK installed on all the nodes of the cluster. HDP supports the following JDKs.

  • Oracle JDK 1.7 64-bit update 51 or higher

  • Oracle JDK 1.6 update 31 64-bit

    [Note]Note

    Deprecated as of HDP 2.1

  • OpenJDK 7 64-bit

 1.5.1. Oracle JDK 7 update 51

Use the following instructions to manually install JDK 7:

  1. Check the version. From a terminal window, type:

    java -version
  2. (Optional) Uninstall the Java package if the JDK version is less than 7.

    rpm -qa | grep java
    yum remove {java-1.*}
  3. (Optional) Verify that the default Java package is uninstalled.

    which java
  4. Navigate to the usr/java folder. If this folder does not already exist, create the folder:

    mkdir usr/java
    cd usr/java
  5. Download the Oracle 64-bit JDK (jdk-7u51-linux-x64.tar.gz) from the Oracle download site. Open a web browser and navigate to http://www.oracle.com/technetwork/java/javase/downloads/java-archive-downloads-javase7-521261.html.

    Accept the license agreement and download the file labeled "jdk-7u51-linux-x64.tar.gz".

    [Note]Note

    The label on the download page is "jdk-7u51-linux-x64.tar.gz", but the actual name of the file is "jdk-7u51-linux-x64.gz".

  6. Copy the downloaded jdk-7u51-linux-x64.gz file to the /usr/java folder.

  7. Navigate to the /usr/java folder and extract the jdk-7u51-linux-x64.gz file.

    cd /usr/java
    tar zxvf jdk-7u51-linux-x64.gz

    The JDK files will be extracted into a usr/java/jdk1.7.0_51 directory.

  8. Create a symbolic link (symlink) to the JDK.

    ln -s /usr/java/jdk1.7.0_51 /usr/java/default
  9. Set the JAVA_HOME and PATH environment variables.

    export JAVA_HOME=/usr/java/default
    export PATH=$JAVA_HOME/bin:$PATH
  10. Verify that Java is installed in your environment by running the following command:

    java -version

    You should see the following output:

    java version "1.7.0_51"
    Java(TM) SE Runtime Environment (build 1.7.0_51-b18)
    Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)

 1.5.2. Oracle JDK 1.6 update 31 (Deprecated)

Use the following instructions to manually install JDK 1.6 update 31:

  1. Check the version. From a terminal window, type:

    java -version
  2. Optional - Uninstall the Java package if the JDK version is less than v1.6 update 31.

    rpm -qa | grep java
    yum remove {java-1.*}
  3. Optional - Verify that the default Java package is uninstalled.

    which java
  4. Download the Oracle 64-bit JDK (jdk-6u31-linux-x64.bin) from the Oracle download site. Open a web browser and navigate to http://www.oracle.com/technetwork/java/javase/downloads/java-archive-downloads-javase6-419409.html.

    Accept the license agreement and download jdk-6u31-linux-x64.bin to a temporary directory ($JDK_download_directory).

  5. Change directory to the location where you downloaded the JDK and run the install.

    mkdir /usr/jdk1.6.0_31
    cd /usr/jdk1.6.0_31
    chmod u+x $JDK_download_directory/jdk-6u31-linux-x64.bin
    ./$JDK_download_directory/jdk-6u31-linux-x64.bin 
    
  6. Create symbolic links (symlinks) to the JDK.

    mkdir /usr/java
    ln -s /usr/jdk1.6.0_31/jdk1.6.0_31 /usr/java/default
    ln -s /usr/java/default/bin/java /usr/bin/java
    
  7. Set up your environment to define JAVA_HOME to put the Java Virtual Machine and the Java compiler on your path.

    export JAVA_HOME=/usr/java/default
    export PATH=$JAVA_HOME/bin:$PATH
    
  8. Verify if Java is installed in your environment. Execute the following from the command line console:

    java -version

    You should see the following output:

    java version "1.6.0_31"
    Java(TM) SE Runtime Environment (build 1.6.0_31-b04)
    Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode)

 1.5.3. OpenJDK 7

[Note]Note

OpenJDK7 on HDP 2.1 does not work if you are using SLES as your OS.

Use the following instructions to manually install OpenJDK 7:

  1. Check the version. From a terminal window, type:

    java -version
  2. (Optional) Uninstall the Java package if the JDK version is less than 7.

    rpm -qa | grep java
    yum remove {java-1.*}
  3. (Optional) Verify that the default Java package is uninstalled.

    which java
  4. Download OpenJDK 7 RPMs. From the command-line, run:

    yum install java-1.7.0-openjdk java-1.7.0-openjdk-devel
  5. Create symbolic links (symlinks) to the JDK.

    mkdir /usr/java
    ln -s /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.51.x86_64 /usr/java/default
    ln -s /usr/java/default/bin/java /usr/bin/java
    
  6. Set up your environment to define JAVA_HOME to put the Java Virtual Machine and the Java compiler on your path.

    export JAVA_HOME=/usr/java/default
    export PATH=$JAVA_HOME/bin:$PATH
  7. Verify if Java is installed in your environment. Execute the following from the command-line console:

    java -version

    You should see output similar to the following:

    openjdk version "1.7.0"
    OpenJDK Runtime Environment (build 1.7.0)
    OpenJDK Client VM (build 20.6-b01, mixed mode)

 1.6. Virtualization and Cloud Platforms

HDP is certified and supported when running on virtual or cloud platforms (for example, VMware vSphere or Amazon Web Services EC2) as long as the respective guest operating system (OS) is supported by HDP and any issues detected on these platforms are reproducible on the same supported OS installed on bare metal.

See Operating Systems Requirements for the list of supported operating systems for HDP.


loading table of contents...