Hadoop ecosystem¶

The Hadoop ecosystem refers to the collection of open-source software utilities and frameworks that facilitate the storage, processing, and management of large datasets.^[600-developer-big-data-big-data.md] It is built around two primary concerns in big data: data storage (數據的存儲) and data computation (數據的計算).^[600-developer-big-data-big-data.md]

Core Components¶

The fundamental components of the Hadoop framework include:

HDFS: The distributed file system for storage.
MapReduce: The programming model for processing large data sets.
HBase: A NoSQL database running on top of HDFS.^[600-developer-big-data-big-data.md]

Data Analysis and Ingestion¶

The ecosystem provides specific engines for analyzing and collecting data:

Analysis Engines: [[Hive]] and [[Pig]] are used for data analysis.^[600-developer-big-data-big-data.md]
Data Collection Engines: [[Sqoop]] and [[Flume]] are utilized for data ingestion.^[600-developer-big-data-big-data.md]

Management and Workflow¶

To manage the infrastructure and data processing tasks, the ecosystem includes:

Web Management: [[Hue]] provides a web-based interface.
Workflow: [[Oozie]] is used for workflow scheduling and management.^[600-developer-big-data-big-data.md]

Sources¶

^[600-developer-big-data-big-data.md]

[[Big Data]]
[[Spark]]
[[HDFS]]
[[MapReduce]]
[[Hive]]

Hadoop ecosystem¶

Core Components¶

Data Analysis and Ingestion¶

Management and Workflow¶

Sources¶

Related¶