Hadoop ecosystem¶
The Hadoop ecosystem refers to the collection of open-source software utilities and frameworks that facilitate the storage, processing, and management of large datasets.^[600-developer-big-data-big-data.md] It is built around two primary concerns in big data: data storage (數據的存儲) and data computation (數據的計算).^[600-developer-big-data-big-data.md]
Core Components¶
The fundamental components of the Hadoop framework include:
- HDFS: The distributed file system for storage.
- MapReduce: The programming model for processing large data sets.
- HBase: A NoSQL database running on top of HDFS.^[600-developer-big-data-big-data.md]
Data Analysis and Ingestion¶
The ecosystem provides specific engines for analyzing and collecting data:
- Analysis Engines: [[Hive]] and [[Pig]] are used for data analysis.^[600-developer-big-data-big-data.md]
- Data Collection Engines: [[Sqoop]] and [[Flume]] are utilized for data ingestion.^[600-developer-big-data-big-data.md]
Management and Workflow¶
To manage the infrastructure and data processing tasks, the ecosystem includes:
- Web Management: [[Hue]] provides a web-based interface.
- Workflow: [[Oozie]] is used for workflow scheduling and management.^[600-developer-big-data-big-data.md]
Sources¶
^[600-developer-big-data-big-data.md]
Related¶
- [[Big Data]]
- [[Spark]]
- [[HDFS]]
- [[MapReduce]]
- [[Hive]]