Apache HDFS is a Java-based file system for storing large volumes of data. Designed to span large clusters of commodity servers, HDFS provides scalable and reliable data storage.
Apache YARN is the processing layer for managing distributed applications that run on multiple machines in a network. YARN allows you to use various data processing engines for batch, interactive, and real-time stream processing of data stored in HDFS.
HDFS and YARN form the data management layer of Apache Hadoop. YARN provides the resource management while HDFS provides the storage.
Managing Data Operating System
Provides information about using Apache YARN for application management, cluster management, and resource allocation.
Scaling Namespaces and Optimizing Data Storage
Provides information about scaling namespaces, optimizing data storage, and optimizing performance of Apache HDFS.
Describes cluster maintenance procedures and provides port configuration details for a Hadoop cluster.