Create a Sqoop import command

You create a single Sqoop import command that imports data from diverse data sources, such as a relational database, into Hive using Apache Sqoop.

You enter the Sqoop import command on the command line of your Hive cluster to import data from a data source into HDFS and Hive. The import can includes the following information, for example:

Database connection information: database URI, database name, and connection protocol, such as jdbc:mysql:
The data to import
Parallel processing directives for performant data transfer
Destination for imported data

Create an import command that specifies the Sqoop connection to the RDBMS.

To enter a password for the data source on the command line, use the -P option in the connection string.
To specify a file where the password is stored, use the --password-file option.

Password on command line:

sqoop import --connect jdbc:mysql://db.foo.com:3306/bar \
<data to import> \
--username <username> \
-P

Specify password file:

sqoop import --connect jdbc:mysql://db.foo.com:3306/bar \
--table EMPLOYEES \
--username <username> \
--password-file ${user.home}/.password

	Note
	Sqoop is tested to work with Connector/J 5.1. If you have upgraded to Connector/J 8.0, and want to use the `zeroDateTimeBehavior` property to handle values of '0000-00-00\' in DATE columns, explicitly specify `zeroDateTimeBehavior=CONVERT_TO_NULL` in the connection string. For example, `jdbc:mysql://<MySQL host>/<DB>?zeroDateTimeBehavior=CONVERT_TO_NULL`

Specify the data to import in the command.

Import an entire table.
Import a subset of the columns.
Import data using a free-form query.

Entire table:

sqoop import \
--connect jdbc:mysql://db.foo.com:3306/bar \
--table EMPLOYEES

Subset of columns:

sqoop import 
--connect jdbc:mysql://db.foo.com:3306/bar \
--table EMPLOYEES \
--columns "employee_id,first_name,last_name,job_title"

Free-form query to import the latest data:

sqoop import \
--connect jdbc:mysql://db.foo.com:3306/bar \
--table EMPLOYEES \
--where "start_date > '2018-01-01'"

Optionally, specify write parallelism in the import statement to execute a number of map tasks in parallel:
- Set mappers: If the source table has a primary key, explicitly set the number of mappers using --num-mappers.
- Split by: If primary keys are not evenly distributed, provide a split key using --split-by
- Sequential: If you do not have a primary key or split key, import data sequentially using --num-mappers 1 or --autoreset-to-one-mapper in query.
- Set mappers:
```
sqoop import --connect jdbc:mysql://db.foo.com:3306/bar \
 --table EMPLOYEES \
 --num-mappers 8 \
```
- Split by:
```
sqoop import --connect jdbc:mysql://db.foo.com:3306/bar \
 --table EMPLOYEES \
 --split-by dept_id
```
- Setting mappers evenly splits the primary key range of the source table.
- Split by evenly splits the data using the split key instead of a primary key.
Specify importing the data into Hive using Hive default delimiters by specifying the import option --hive-import.
Specify the Hive destination of the data.
- If you think the table does not already exist in Hive, name the table using --hive-table <db>.<table_name> and use the --create-hive-table option.
- If you want to insert the imported data into an existing Hive external table, name the table using --hive-table <db>.<table_name>. Do not use the --create-hive-table option.
This command imports the MySQL EMPLOYEES table to a new Hive table named in the default HDFS location /user/hive/warehouse.
```
sqoop import --connect jdbc:mysql://db.foo.com:3306/corp \
--table EMPLOYEES \
--hive-import \
--create-hive-table \
--hive-table mydb.newtable
```
This command imports the MySQL EMPLOYEES table to an external table in HDFS.
```
sqoop import --connect jdbc:mysql://db.foo.com:3306/corp \
--table EMPLOYEES \
--hive-import \
--hive-table mydb.myexternaltable
```