Storm parallelism and execution model¶

Storm parallelism defines how a distributed computation is broken down across a cluster. It determines the number of tasks and executor threads allocated to the components of a [[Topology]] (spouts and bolts) to process streams of data^[600-developer__big-data__Storm__storm-01.md].

Execution hierarchy¶

The Apache Storm execution model consists of four hierarchical levels that organize how data is processed.^[600-developer__big-data__Storm__storm-01.md]

Workers (Worker Process / JVMs)¶

A Worker process is a JVM responsible for executing a subset of the topology.^[600-developer__big-data__Storm__storm-01.md] A topology runs in a distributed manner across one or more worker nodes.^[600-developer__big-data__Storm__storm-01.md]

Executors¶

An Executor is a single thread spawned by a worker process.^[600-developer__big-data__Storm__storm-01.md]

Tasks¶

A Task is the unit that performs the actual data processing.^[600-developer__big-data__Storm__storm-01.md] There is a distinction between the logical components (Spouts and Bolts) and their physical execution: while the code defines the logic, the execution is handled by tasks running within executors.^[600-developer__big-data__Storm__storm-01.md]

Configuration¶

The parallelism of the system, specifically the number of worker processes, can be configured via the storm.yaml configuration file or programmatically when deploying the topology.^[600-developer__big-data__Storm__storm-01.md]

[[Stream]]
[[Tuple]]
[[Spouts and Bolts]]

Sources¶

^[600-developer__big-data__Storm__storm-01.md]