1. Understand the Basics

The Hortonworks Data Platform consists of three layers.

Core Hadoop: The basic components of Apache Hadoop.
- Hadoop Distributed File System (HDFS): A special purpose file system that is designed to work with the MapReduce engine. It provides high-throughput access to data in a highly distributed environment.
- MapReduce: A framework for performing high volume distributed data processing using the MapReduce programming paradigm.
Essential Hadoop: A set of Apache components designed to ease working with Core Hadoop.
- Apache Pig: A platform for creating higher level data flow programs that can be compiled into sequences of MapReduce programs, using Pig Latin, the platform’s native language.
- Apache Hive: A tool for creating higher level SQL-like queries using HiveQL, the tool’s native language, that can be compiled into sequences of MapReduce programs.
- Apache HCatalog: A metadata abstraction layer that insulates users and scripts from how and where data is physically stored.
- Templeton: A component that provides a set of REST-like APIs for HCatalog and related Hadoop components.
Supporting Components: A set of components that allow you to monitor your Hadoop installation and to connect Hadoop with your larger compute environment.
- Apache Oozie: A server based workflow engine which is optimized for running workflows that execute Hadoop jobs.
- Apache Sqoop: A component that provides a mechanism for moving data between HDFS and external structured datastores. Sqoop can be integrated with Oozie workflows.

For more information on the structure of the HDP, see Understanding Hadoop Ecosystem.

Legal notices