Skip to content

Apache Spark components¶

Apache Spark is a unified analytics engine for large-scale data processing, centered around a core computing engine with various specialized components.^[600-developer__big-data__big-data.md]

Core Components¶

Spark Core: The foundation of the platform, responsible for data computing operations^[600-developer__big-data__big-data.md].
Spark SQL: A module for structured data processing, allowing users to query data using SQL or the DataFrame API^[600-developer__big-data__big-data.md].
Spark Streaming: A component that enables scalable and high-throughput Stream processing (flow computation) for real-time analytics^[600-developer__big-data__big-data.md].

Implementation and Context¶

The Spark framework is primarily implemented in the Scala programming language^[600-developer__big-data__big-data.md]. It is one of the two major frameworks in the Big Data ecosystem, often contrasted with the Hadoop framework (which includes HDFS and MapReduce)^[600-developer__big-data__big-data.md].

[[Big Data]]
[[Hadoop]]

Sources¶

^[600-developer__big-data__big-data.md]