Skip to content

Storm parallelism and execution model

Storm parallelism defines how a distributed computation is broken down across a cluster. It determines the number of tasks and executor threads allocated to the components of a [[Topology]] (spouts and bolts) to process streams of data^[600-developer__big-data__Storm__storm-01.md].

Execution hierarchy

The Apache Storm execution model consists of four hierarchical levels that organize how data is processed.^[600-developer__big-data__Storm__storm-01.md]

Workers (Worker Process / JVMs)

A Worker process is a JVM responsible for executing a subset of the topology.^[600-developer__big-data__Storm__storm-01.md] A topology runs in a distributed manner across one or more worker nodes.^[600-developer__big-data__Storm__storm-01.md]

Executors

An Executor is a single thread spawned by a worker process.^[600-developer__big-data__Storm__storm-01.md]

Tasks

A Task is the unit that performs the actual data processing.^[600-developer__big-data__Storm__storm-01.md] There is a distinction between the logical components (Spouts and Bolts) and their physical execution: while the code defines the logic, the execution is handled by tasks running within executors.^[600-developer__big-data__Storm__storm-01.md]

Configuration

The parallelism of the system, specifically the number of worker processes, can be configured via the storm.yaml configuration file or programmatically when deploying the topology.^[600-developer__big-data__Storm__storm-01.md]

  • [[Stream]]
  • [[Tuple]]
  • [[Spouts and Bolts]]

Sources

^[600-developer__big-data__Storm__storm-01.md]