Chapter 6. Using Apache Sqoop

Sqoop is a tool designed to transfer data between Hadoop and relational databases. You use Sqoop to import data from a relational database management system (RDBMS) such as MySQL or Oracle into the Hadoop Distributed File System (HDFS) and then to export the data back once it has been transformed by MapReduce processing. Sqoop automates most of this process, relying on the database to describe the schema for the data to be imported. Sqoop uses MapReduce to import and export the data, which provides parallel operation as well as fault tolerance.

This document includes information on:

For additional information see the Sqoop User Guide.