Using Druid and Apache Hive
Also available as:
PDF

Transform Apache Hive Data to Druid

As a DBA, you can execute a Hive query to transform data in Hive to a datasource in Druid.

A DBA runs a Hive SQL command invoking the Druid storage handler, specifies the Druid segment granularity, and maps selected Hive columns to Druid column types.

  1. Put all the Hive data to undergo ETL in an external table.
  2. Run a CREATE TABLE AS SELECT statement to create a new Druid datasource.
    The following example pushes Hive data to Druid.
    CREATE TABLE ssb_druid_hive
    STORED BY 'org.apache.hadoop.hive.
    druid.DruidStorageHandler'
    TBLPROPERTIES (
    "druid.segment.granularity" = "MONTH",
    "druid.query.granularity" = "DAY")
    AS
    SELECT
    cast(d_year || '-' || d_monthnuminyear || '-' || d_daynuminmonth as timestamp) as `__time`,
    cast(c_city as string) c_city,
    cast(c_nation as string) c_nation,
    cast(c_region as string) c_region,
    cast(d_weeknuminyear as string) d_weeknuminyear,
    cast(d_year as string) d_year,
    cast(d_yearmonth as string) d_yearmonth,
    cast(d_yearmonthnum as string) d_yearmonthnum,
    cast(lo_discount as string) lo_discount,
    cast(lo_quantity as string) lo_quantity,
    cast(p_brand1 as string) p_brand1,
    cast(p_category as string) p_category,
    cast(p_mfgr as string) p_mfgr,
    cast(s_city as string) s_city,
    cast(s_nation as string) s_nation,
    cast(s_region as string) s_region,
    lo_revenue,
    lo_extendedprice * lo_discount discounted_price,
    lo_revenue - lo_supplycost net_revenue
    FROM
    ssb_10_flat_orc.customer, ssb_10_flat_orc.dates, ssb_10_flat_orc.lineorder,
    ssb_10_flat_orc.part, ssb_10_flat_orc.supplier
    where
    lo_orderdate = d_datekey and lo_partkey = p_partkey
    and lo_suppkey = s_suppkey and lo_custkey = c_custkey;