Storm Topology Deployment¶
Storm Topology Deployment refers to the process of submitting and running a defined graph of computation (a topology) onto a Storm cluster. The deployment mechanism determines whether the topology runs in a single local JVM for testing or across a distributed cluster for production processing.^[600-developer__big-data__Storm__storm-01.md]
Deployment Modes¶
Apache Storm supports two primary modes for deploying topologies, distinguished by the execution environment and the command-line arguments used.
Local Mode (LocalCluster)¶
Local mode is designed for development and testing purposes. In this mode, the topology is executed within a single local Java Virtual Machine (JVM).^[600-developer__big-data__Storm__storm-01.md] This simulates the cluster environment without requiring a distributed infrastructure, allowing for rapid iteration and debugging.
To submit a topology in local mode, the submission command typically includes a specific flag, such as -local.^[600-developer__big-data__Storm__storm-01.md]
Example:
storm jar target/storm-starter-*.jar org.apache.storm.starter.ExclamationTopology -local
Remote/Cluster Mode¶
Cluster mode is intended for production environments. In this mode, the topology is submitted to a running Storm cluster (managed by Nimbus nodes) and executes across multiple worker processes distributed across the cluster.^[600-developer__big-data__Storm__storm-01.md]
To submit a topology to a remote cluster, the command generally requires the topology class followed by a user-defined name for the topology instance.^[600-developer__big-data__Storm__storm-01.md]
Example:
storm jar target/storm-starter-*.jar org.apache.storm.starter.RollingTopWords production-topology
Topology Components¶
A deployed topology consists of interconnected processing units that form the data flow graph.^[600-developer__big-data__Storm__storm-01.md]
- Spouts: These act as the source of the data streams.^[600-developer__big-data__Storm__storm-01.md]
- Bolts: These serve as the logical processing units, consuming input tuples and potentially emitting new ones.^[600-developer__big-data__Storm__storm-01.md]
- Stream Grouping: This defines how data is routed between spouts and bolts (or between bolts).^[600-developer__big-data__Storm__storm-01.md]
Cluster Architecture¶
When deployed in remote mode, the topology relies on the Storm cluster architecture to manage execution:
- Nimbus: The master node responsible for distributing code and assigning tasks.^[600-developer__big-data__Storm__storm-01.md]
- Supervisor: Nodes that receive instructions from the Nimbus and manage the worker processes.^[600-developer__big-data__Storm__storm-01.md]
- Worker Process: These are the actual JVMs that execute the specific topology tasks.^[600-developer__big-data__Storm__storm-01.md]
- Executor: A thread within a worker process spawned to execute tasks.^[600-developer__big-data__Storm__storm-01.md]
Configuration¶
Deployment configurations often involve tuning the parallelism of the topology, which determines the number of worker processes and executors allocated to the components.^[600-developer__big-data__Storm__storm-01.md] Cluster-wide settings, such as the Zookeeper servers and local directory locations, are defined in configuration files like storm.yaml.^[600-developer__big-data__Storm__storm-01.md]
Related Concepts¶
- Stream processing
- [[Distributed computing]]
- [[Big data]]