Chapter 1. Using WebHDFS REST API

Apache Hadoop provides native libraries for accessing HDFS. However, users prefer to use HDFS remotely over the heavy client side native libraries. For example, some applications need to load data in and out of the cluster, or to externally interact with the HDFS data. WebHDFS addresses these issues by providing a fully functional HTTP REST API to access HDFS.

WebHDFS provides the following features:

  • Provides read and write access. Supports all HDFS operations (like granting permissions, configuring replication factor, accessing block location, etc.).

  • Supports all HDFS parameters with defaults.

  • Permits clients to access Hadoop from multiple languages without actually installing Hadoop. You can also use common tools like curl/wget to access HDFS.

  • Uses the full bandwidth of the Hadoop cluster for streaming data: The file read and file write calls are redirected to the corresponding datanodes.

  • Uses Kerberos (SPNEGO) and Hadoop delegation tokens for authentication.

  • WebHDFS is completely Apache open source. Hortonworks contributed the code to Apache Hadoop as a first class built-in Hadoop component.

  • Requires no additional servers. However, a proxy WebHDFS (for example: Httpfs is useful in certain cases and is complementary to WebHDFS).

For more information, see: WebHDFS – HTTP REST Access to HDFS.

In this section:

For information on WbHDFS Administration guide, see: WebHDFS Administrator Guide.