Using Apache HBase to store and access data
Also available as:
loading table of contents...

HBase Hive integration example

A change to Hive in HDP 3.0 is that all StorageHandlers must be marked as “external”. There is no such thing as an non-external table created by a StorageHandler. If the corresponding HBase table exists when the Hive table is created, it will mimic the HDP 2.x semantics of an “external” table. If the corresponding HBase table does not exist when the Hive table is created, it will mimic the HDP 2.x semantics of a non-external table (e.g. the HBase table is dropped when the Hive table is dropped).

  1. From the Hive shell, create a HBase table:
    CREATE EXTERNAL TABLE hbase_hive_table (key int, value string) 
    STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
    WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
    TBLPROPERTIES ("" = "hbase_hive_table", "hbase.mapred.output.outputtable" = "hbase_hive_table");

    The hbase.columns.mapping property is mandatory. The property is optional.The hbase.mapred.output.outputtable property is optional; It is needed, if you plan to insert data to the table

  2. From the HBase shell, access the hbase_hive_table:
    $ hbase shell
    HBase Shell; enter 'help<RETURN>' for list of supported commands.
    Version: 0.20.3, r902334, Mon Jan 25 13:13:08 PST 2010
    hbase(main):001:0> list hbase_hive_table                                                                                                          
    1 row(s) in 0.0530 seconds
    hbase(main):002:0> describe hbase_hive_table
    Table hbase_hive_table is ENABLED
    hbase_hive_table COLUMN FAMILIES DESCRIPTION{NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'} 1 row(s) in 0.2860 seconds
    hbase(main):003:0> scan "hbase_hive_table "
    ROW                          COLUMN+CELL                                                                      
    0 row(s) in 0.0060 seconds
  3. Insert the data into the HBase table through Hive:
  4. From the HBase shell, verify that the data got loaded:
    hbase(main):009:0> scan "hbase_hive_table"
    ROW                        COLUMN+CELL                                                                      
     98                          column=cf1:val, timestamp=1267737987733, value=val_98                            
    1 row(s) in 0.0110 seconds
  5. From Hive, query the HBase data to view the data that is inserted in the hbase_hive_table:
    hive> select * from HBASE_HIVE_TABLE;
    Total MapReduce jobs = 1
    Launching Job 1 out of 1
    98	val_98
    Time taken: 4.582 seconds