Developing Apache Spark Applications
Also available as:
PDF

HBase Data on Spark with Connectors

This section provides information on streaming HBase data into Spark using connectors.

Software connectors are architectural elements in the cluster that facilitate interaction between different Hadoop components. For real-time and near-real-time data analytics, there are connectors that bridge the gap between the HBase key-value store and complex relational SQL queries that Spark supports. Developers can enrich applications and interactive tools with connectors because connectors allow operations such as complex SQL queries on top of an HBase table inside Spark and table JOINs against data frames.

Important
Important

The HDP bundle includes two different connectors that extract datasets out of HBase and streams them into Spark:

  • Hortonworks Spark-HBase Connector

  • RDD-Based Spark-HBase Connector: a connector from Apache HBase that uses resilient distributed datasets (RDDs)