Storm parallelism and execution model¶
Storm parallelism defines how a distributed computation is broken down across a cluster. It determines the number of tasks and executor threads allocated to the components of a [[Topology]] (spouts and bolts) to process streams of data^[600-developer__big-data__Storm__storm-01.md].
Execution hierarchy¶
The Apache Storm execution model consists of four hierarchical levels that organize how data is processed.^[600-developer__big-data__Storm__storm-01.md]
Workers (Worker Process / JVMs)¶
A Worker process is a JVM responsible for executing a subset of the topology.^[600-developer__big-data__Storm__storm-01.md] A topology runs in a distributed manner across one or more worker nodes.^[600-developer__big-data__Storm__storm-01.md]
Executors¶
An Executor is a single thread spawned by a worker process.^[600-developer__big-data__Storm__storm-01.md]
Tasks¶
A Task is the unit that performs the actual data processing.^[600-developer__big-data__Storm__storm-01.md] There is a distinction between the logical components (Spouts and Bolts) and their physical execution: while the code defines the logic, the execution is handled by tasks running within executors.^[600-developer__big-data__Storm__storm-01.md]
Configuration¶
The parallelism of the system, specifically the number of worker processes, can be configured via the storm.yaml configuration file or programmatically when deploying the topology.^[600-developer__big-data__Storm__storm-01.md]
Related concepts¶
- [[Stream]]
- [[Tuple]]
- [[Spouts and Bolts]]
Sources¶
^[600-developer__big-data__Storm__storm-01.md]