Skip to content

Storm programming model

The Storm programming model is a distributed real-time computation system centered on the concept of a Topology. Unlike traditional flow systems that may terminate after processing, a Storm Topology runs indefinitely until explicitly killed, processing data streams as they arrive^[600-developer__big-data__Storm__storm-01.md].

Core Data Structures

The model relies on two primary data abstractions to handle information flow^[600-developer__big-data__Storm__storm-01.md]:

  • Tuple: The main data structure used to represent a single unit of data.
  • Stream: An unordered sequence of tuples flowing through the system.

Topology Components

A Topology is a graph of computation where nodes are arranged in a network to process streams. It is composed of two types of nodes connected together^[600-developer__big-data__Storm__storm-01.md]:

  • Spouts: These act as the source of streams. They are responsible for reading data from external sources (such as Kafka, Flume, or databases) and emitting it into the topology as tuples^[600-developer__big-data__Storm__storm-01.md].
  • Bolts: These serve as the logical processing units. Bolts consume input tuples from Spouts or other Bolts, process the data (e.g., filtering, aggregation, or transformation), and may optionally emit new tuples to downstream components^[600-developer__big-data__Storm__storm-01.md].

Data Flow and Grouping

The movement of data between components is managed by Stream Grouping. This defines how the stream of data flows from Spouts to Bolts, or from one Bolt to another^[600-developer__big-data__Storm__storm-01.md].

Physical Execution

While the Topology defines the logical structure, the execution of a Storm Topology is distributed across a cluster^[600-developer__big-data__Storm__storm-01.md].

  • Workers: The topology runs across multiple worker nodes (JVMs) to enable distributed processing^[600-developer__big-data__Storm__storm-01.md].
  • Tasks: A task is the most basic unit of execution, performing the actual data processing within a Spout or Bolt^[600-developer__big-data__Storm__storm-01.md].

Sources

^[600-developer__big-data__Storm__storm-01.md]