Chapter 15. Installing Apache Sqoop

This section describes installing and testing Apache Sqoop, a component that provides a mechanism for moving data between HDFS and external structured datastores.

Use the following instructions to deploy Apache Sqoop:

 1. Install the Sqoop RPMs

Prerequisites

  1. You must have at least core Hadoop on your system. See Configure the Remote Repositories for more information.

  2. Verify the HDP repositories are available:

    yum list sqoop

    The output should list at least one Sqoop package similar to the following:

    sqoop.noarch <version>

    If yum responds with "Error: No matching package to list" as shown below, yum cannot locate a matching RPM. This can happen if the repository hosting the HDP RPMs is unavailable, or has been disabled. Follow the instructions at Configure the Remote Repositories to configure either a public or private repository before proceeding.

    Error: No matching package to list.

Installation

On all nodes where you plan to use the Sqoop client, install the following RPMs:

  • For RHEL/CentOS/Oracle Linux:

    yum install sqoop

  • For SLES:

    zypper install sqoop

  • For Ubuntu:

    apt-get install sqoop

 2. Set Up the Sqoop Configuration

This section describes how to set up and edit the deployment configuration files for Sqoop. Use the following instructions to set up Sqoop configuration files:

  1. Hortonworks recommends that you edit and source the bash script files included in the companion files (see "Download Companion Files"). Alternatively, you can copy the contents to your ~/.bash_profile file, to set up these environment variables in your environment.

  2. Extract the Sqoop configuration files to a temporary directory. The files are located in the configuration_files/sqoop directory where you decompressed the companion files.

  3. Modify the configuration files.

    In the temporary directory, locate the following files and modify the properties based on your environment.

    To find the properties to replace, search for TODO in the files.

    Also in sqoop-env.sh, make the following changes:

    • export HADOOP_HOME=${HADOOP_HOME:-/usr/hdp/current/hadoop-client}

    • export HBASE_HOME=${HBASE_HOME:-/usr/hdp/current/hbase-client}

    • export HIVE_HOME=${HIVE_HOME:-/usr/hdp/current/hive-server}

    • export ZOOCFGDIR=${ZOOCFGDIR:-/etc/zookeeper/conf}

    • From the HDP companion flles, extract the files in configuration_files/sqoop to a temporary directory.

    • Copy all the configuration files to the Sqoop configuration directory, such as /etc/sqoop/conf.

 3. Validate the Installation

Run the following command. You should see the Sqoop version information displayed.

sqoop version | grep 'Sqoop [0-9].*'


loading table of contents...