Using Apache Storm to Move Data
Also available as:
PDF

Ingesting Data from Kafka

KafkaSpout reads from Kafka topics. To do so, it needs to connect to the Kafka broker, locate the topic from which it will read, and store consumer offset information (using the ZooKeeper root and consumer group ID). If a failure occurs, KafkaSpout can use the offset to continue reading messages from the point where the operation failed.

The storm-kafka components include a core Storm spout and a fully transactional Trident spout. Storm-Kafka spouts provide the following key features:

  • 'Exactly once' tuple processing with the Trident API

  • Dynamic discovery of Kafka brokers and partitions

You should use the Trident API unless your application requires sub-second latency.