The exponential increase in realtime data from sources such as machine sensors creates a need for data processing systems that can ingest this data, process it, and respond in real time. A typical use case involves an automated system that might respond to machine sensor data by sending email to support staff or placing an advertisement on a consumer's smart phone. Apache Storm enables such data-driven and automated activity by providing a realtime, scalable, and distributed solution for streaming data. Apache Storm can be used with any programming language and guarantees that data streams are processed without data loss. Storm is datatype-agnostic; it processes data streams of any data type.
A complete introduction to the Storm API is beyond the scope of this documentation.
However, the next section, Basic Storm
Concepts, provides a brief overview of the most essential concepts and a link to
the javadoc API. Experienced Storm developers may want to skip to the following sections,
Ingesting Data with the KafkaSpout Storm
Connector and Writing Data to
HDFS and HBase with Storm Connectors, to learn about the group of connectors
provided by Hortonworks that facilitate ingesting and writing streaming data directly to
HDFS and HBase. Managing Storm
Topologies introduces readers to using the Storm GUI to manage topologies for a
cluster. Finally, Running the RollingTopWords
Topology shows the source code for a sample application included with the
storm-starter.jar
.
Tip | |
---|---|
See the Storm documentation at the Storm incubator site for a more thorough discussion of Apache Storm concepts. |