Skip to content

[Templates]]

[[Obsidian]]# HDFS (Hadoop Distributed File System)

HDFS (Hadoop Distributed File System) is a distributed, scalable, and fault-tolerant file system designed to store very large data sets across clusters of commodity hardware.

It is a core component of the [[Apache Hadoop]] framework, serving as the primary storage layer for vast amounts of data, while other components within the ecosystem handle data processing and analysis^[600-developer-big-data-big-data.md].

Key Characteristics

  • Data Storage: HDFS is designed specifically for the storage of massive volumes of data^[600-developer-big-data-big-data.md].
  • Commodity Hardware: It runs on standard, low-cost hardware rather than requiring specialized expensive equipment^[600-developer-big-data-big-data.md].

As the storage backbone of the Hadoop ecosystem, HDFS interacts with various tools for data processing, management, and analysis^[600-developer-big-data-big-data.md]:

  • [[MapReduce]]: A programming model and processing engine for large-scale data computation^[600-developer-big-data-big-data.md].
  • [[HBase]]: A NoSQL database that runs on top of HDFS^[600-developer-big-data-big-data.md].
  • Data Analysis Engines: Tools like [[Hive]] and [[Pig]] utilize HDFS for data analysis^[600-developer-big-data-big-data.md].
  • Data Ingestion: [[Sqoop]] and [[Flume]] are engines used to collect and import data into the system^[600-developer-big-data-big-data.md].

Sources