Storm Topology Deployment¶

Storm Topology Deployment refers to the process of submitting and running a defined graph of computation (a topology) onto a Storm cluster. The deployment mechanism determines whether the topology runs in a single local JVM for testing or across a distributed cluster for production processing.^[600-developer__big-data__Storm__storm-01.md]

Deployment Modes¶

Apache Storm supports two primary modes for deploying topologies, distinguished by the execution environment and the command-line arguments used.

Local Mode (LocalCluster)¶

Local mode is designed for development and testing purposes. In this mode, the topology is executed within a single local Java Virtual Machine (JVM).^[600-developer__big-data__Storm__storm-01.md] This simulates the cluster environment without requiring a distributed infrastructure, allowing for rapid iteration and debugging.

To submit a topology in local mode, the submission command typically includes a specific flag, such as -local.^[600-developer__big-data__Storm__storm-01.md]

Example:

storm jar target/storm-starter-*.jar org.apache.storm.starter.ExclamationTopology -local

Remote/Cluster Mode¶

Cluster mode is intended for production environments. In this mode, the topology is submitted to a running Storm cluster (managed by Nimbus nodes) and executes across multiple worker processes distributed across the cluster.^[600-developer__big-data__Storm__storm-01.md]

To submit a topology to a remote cluster, the command generally requires the topology class followed by a user-defined name for the topology instance.^[600-developer__big-data__Storm__storm-01.md]

Example:

storm jar target/storm-starter-*.jar org.apache.storm.starter.RollingTopWords production-topology

Topology Components¶

A deployed topology consists of interconnected processing units that form the data flow graph.^[600-developer__big-data__Storm__storm-01.md]

Spouts: These act as the source of the data streams.^[600-developer__big-data__Storm__storm-01.md]
Bolts: These serve as the logical processing units, consuming input tuples and potentially emitting new ones.^[600-developer__big-data__Storm__storm-01.md]
Stream Grouping: This defines how data is routed between spouts and bolts (or between bolts).^[600-developer__big-data__Storm__storm-01.md]

Cluster Architecture¶

When deployed in remote mode, the topology relies on the Storm cluster architecture to manage execution:

Nimbus: The master node responsible for distributing code and assigning tasks.^[600-developer__big-data__Storm__storm-01.md]
Supervisor: Nodes that receive instructions from the Nimbus and manage the worker processes.^[600-developer__big-data__Storm__storm-01.md]
Worker Process: These are the actual JVMs that execute the specific topology tasks.^[600-developer__big-data__Storm__storm-01.md]
Executor: A thread within a worker process spawned to execute tasks.^[600-developer__big-data__Storm__storm-01.md]

Configuration¶

Deployment configurations often involve tuning the parallelism of the topology, which determines the number of worker processes and executors allocated to the components.^[600-developer__big-data__Storm__storm-01.md] Cluster-wide settings, such as the Zookeeper servers and local directory locations, are defined in configuration files like storm.yaml.^[600-developer__big-data__Storm__storm-01.md]

Stream processing
[[Distributed computing]]
[[Big data]]

Sources¶

600-developer__big-data__Storm__storm-01.md