Chapter 1. HBase Import Tools

HBase includes several methods of loading data into tables. Various methods exist for loading data from relational format into non-relational format.

The most straightforward method is to either use the TableOutputFormat class from a MapReduce job, or use the normal client APIs; however, these are not always the most efficient methods because these APIs cannot handle bulk loading.

Bulk Importing bypasses the HBase API and writes contents, which are properly formatted as HBase data files – HFiles, directly to the file system. Analyzing HBase data with MapReduce requires custom coding.

Using bulk load will use less CPU and network resources than simply using the HBase API. ImportTsv is a custom MapReduce application that will load data in Tab Separated Value TSV format into HBase.

The following discusses typical use cases for bulk loading data into HBase:

  • HBase can act as ETL data sink

  • HBase can be used as data source

Bulk load workflows generate HFiles offline and have two distinct stages:

  • Use either ImportTsv or import utilities or write a custom application to generate HFiles from Hive/Pig.

  • Use completebulkload to load the HFiles to HDFS

[Note]Note

By default, the bulk loader class ImportTsv in HBase imports a tab separated files.