[Templates]]¶
[[Obsidian]]# HDFS (Hadoop Distributed File System)
HDFS (Hadoop Distributed File System) is a distributed, scalable, and fault-tolerant file system designed to store very large data sets across clusters of commodity hardware.
It is a core component of the [[Apache Hadoop]] framework, serving as the primary storage layer for vast amounts of data, while other components within the ecosystem handle data processing and analysis^[600-developer-big-data-big-data.md].
Key Characteristics¶
- Data Storage: HDFS is designed specifically for the storage of massive volumes of data^[600-developer-big-data-big-data.md].
- Commodity Hardware: It runs on standard, low-cost hardware rather than requiring specialized expensive equipment^[600-developer-big-data-big-data.md].
Related Ecosystem Components¶
As the storage backbone of the Hadoop ecosystem, HDFS interacts with various tools for data processing, management, and analysis^[600-developer-big-data-big-data.md]:
- [[MapReduce]]: A programming model and processing engine for large-scale data computation^[600-developer-big-data-big-data.md].
- [[HBase]]: A NoSQL database that runs on top of HDFS^[600-developer-big-data-big-data.md].
- Data Analysis Engines: Tools like [[Hive]] and [[Pig]] utilize HDFS for data analysis^[600-developer-big-data-big-data.md].
- Data Ingestion: [[Sqoop]] and [[Flume]] are engines used to collect and import data into the system^[600-developer-big-data-big-data.md].