Integrating Hive and Kafka
Also available as:
PDF

Apache Hive-Kafka integration

As a Hive user, you can connect to, analyze, and transform data in Kafka from Hive. You can offload data from Kafka to the Hive warehouse.

You connect to Kafka data from Hive by creating an external table that maps to a Kafka topic. The table definition includes a reference to a Kafka storage handler that makes the connection to Kafka. On the external table, Hive-Kafka integration supports ad hoc queries, such as questions about data changes in the stream within an interval of time just passed. You can transform Kafka data in the following ways:
  • Perform data masking.
  • Join dimension tables or any stream.
  • Aggregate data.
  • Change the Serde encoding of the original stream.
  • Create a persistent stream in a Kafka topic.
You can achieve exactly once offloading of data by controlling its position in the stream. The Hive-Kafka connector supports the following serialization and deserialization formats:
  • JsonSerDe (default)
  • OpenCSVSerde
  • AvroSerDe