Chapter 2. Using Apache Storm

The exponential increase in realtime data from sources such as machine sensors creates a need for data processing systems that can ingest this data, process it, and respond in real time. A typical use case involves an automated system that might respond to machine sensor data by sending email to support staff or placing an advertisement on a consumer's smart phone. Apache Storm enables such data-driven and automated activity by providing a realtime, scalable, and distributed solution for streaming data. Apache Storm can be used with any programming language and guarantees that data streams are processed without data loss. Storm is datatype-agnostic; it processes data streams of any data type.

A complete introduction to the Storm API is beyond the scope of this documentation. However, the next section, Basic Storm Concepts, provides a brief overview of the most essential concepts and a link to the javadoc API. Experienced Storm developers may want to skip to the following sections, Ingesting Data with the KafkaSpout Storm Connector and Writing Data to HDFS and HBase with Storm Connectors, to learn about the group of connectors provided by Hortonworks that facilitate ingesting and writing streaming data directly to HDFS and HBase. Managing Storm Topologies introduces readers to using the Storm GUI to manage topologies for a cluster. Finally, Running the RollingTopWords Topology shows the source code for a sample application included with the storm-starter.jar.

[Tip]Tip

See the Storm documentation at the Storm incubator site for a more thorough discussion of Apache Storm concepts.