Integrating Apache Hive with Kafka, Spark, and BI
Also available as:
PDF

Query a SQL data source using the JdbcStorageHandler

Using the JdbcStorageHandler, you can connect Hive to a MySQL, PostgreSQL, Oracle, MS SQL, or Derby data source, create an external table to represent the data, and then query the table.

In this task you create an external table that uses the JdbcStorageHandler to connect to and read a local JDBC data source.

  1. Load data into a supported SQL database, such as MySQL, on a node in your cluster or familiarize yourself with existing data in the your database.
  2. Obtain credentials for accessing the database.
    • If you are accessing data on your network and have no security worries, you can use the user name and password that authorizes your to access Hive.
    • If you are accessing data on a remote network, create a JCEKS credential keystore on HDFS, and use credentials you specify in the process.
  3. Create an external table using the JdbcStorageHandler and table properties that specify the minimum information: database type, driver, database connection string, user name and password for querying hive, table name, and number of active connections to Hive.
    CREATE EXTERNAL TABLE mytable_jdbc(
      col1 string,
      col2 int,
      col3 double
    )
    STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler'
    TBLPROPERTIES (
      "hive.sql.database.type" = "MYSQL",
      "hive.sql.jdbc.driver" = "com.mysql.jdbc.Driver",
      "hive.sql.jdbc.url" = "jdbc:mysql://localhost/sample",
      "hive.sql.dbcp.username" = "hive",
      "hive.sql.dbcp.password" = "hive",
      "hive.sql.table" = "MYTABLE",
      "hive.sql.dbcp.maxActive" = "1"
    );
  4. Query the external table.
    SELECT * FROM mytable_jdbc WHERE col2 = 19;