Chapter 1. Using Apache Storm

The exponential increase in data from real-time sources such as machine sensors creates a need for data processing systems that can ingest this data, process it, and respond in real time. A typical use case involves an automated system that responds to sensor data by sending email to support staff or placing an advertisement on a consumer's smart phone. Apache Storm enables such data-driven and automated activity by providing a realtime, scalable, and distributed solution for streaming data.

Apache Storm can be used with any programming language, and guarantees that data streams are processed without data loss.

Storm is datatype-agnostic; it processes data streams of any data type.

A complete introduction to the Storm API is beyond the scope of this documentation. However, the next section, Basic Storm Concepts, provides a brief overview of the most essential concepts and a link to the javadoc API. For a more thorough discussion of Apache Storm concepts, see the Apache Storm documentation for your version of Storm.

Experienced Storm developers may want to skip to later sections for information about streaming data to Hive; ingesting data with the Apache Kafka spout; writing data to HDFS, HBase, and Kafka; and deploying Storm topologies.

The last section, RollingTopWords Topology, lists the source code for a sample application included with the storm-starter.jar.

​Chapter 1. Using Apache Storm

Chapter 1. Using Apache Storm