Accessing Data on Azure

Hortonworks Data Platform (HDP) supports reading and writing both block blobs and page blobs from/to Windows Azure Storage Blob (WASB) object store, as well as reading and writing files stored in an Azure Data Lake Storage (ADLS) account. This allows you to:

Accessing Data in ADLS

Azure Data Lake Store (ADLS) is an enterprise-wide hyper-scale repository for big data analytic workloads.

Prerequisites

If you want to use ADLS to store your data, you must enable Azure subscription for Data Lake Store, and then create an Azure Data Lake Store storage account.

Configuring Access to ADLS

ADLS is not supported as a default file system, but access to data in ADLS via the adl connector. To configure access to ADLS from a cluster managed via Cloudbreak use the steps described in How to Configure Authentication with ADLS.

Testing Access to ADLS

To tests access to ADLS, SSH to a cluster node and run a few hadoop fs shell commands against your existing ADLS account.

ADLS access path syntax is:

adl://account_name.azuredatalakestore.net/dir/file

For example, the following Hadoop FileSystem shell commands demonstrate access to a storage account named "myaccount":

hadoop fs -mkdir adl://myaccount.azuredatalakestore.net/testdir
hadoop fs -put testfile adl://myaccount.azuredatalakestore.net/testdir/testfile

To use DistCp against ADLS, use the following syntax:

hadoop distcp
    [-D hadoop.security.credential.provider.path=localjceks://file/home/user/adls.jceks]
    hdfs://namenode_hostname:9001/user/foo/007020615
    adl://myaccount.azuredatalakestore.net/testDir/

Working with ADLS

For more information about configuring the ADLS connector and working with data stored in ADLS, refer to Cloud Data Access documentation.

Related Links
Cloud Data Access (Hortonworks) How to Configure Authentication with ADLS (Hortonworks)
Azure Data Lake Store (External)
Create a Storage Account (External)
Get started with Azure Data Lake Store (External)

Accessing Data in WASB

Windows Azure Storage Blob (WASB) is an object store service available on Azure.

Prerequisites

If you want to use Windows Azure Storage Blob to store your data, you must enable Azure subscription for Blob Storage, and then create a storage account.

Configuring Access to WASB

In order to access data stored in your Azure blob storage account, you must configure your storage account access key in core-site.xml. The configuration property that you must use is fs.azure.account.key.<account name>.blob.core.windows.net and the value is the access key.

For example the following property should be used for a storage account called "testaccount":

<property>
  <name>fs.azure.account.key.testaccount.blob.core.windows.net</name>
  <value>TESTACCOUNT-ACCESS-KEY</value>
</property>

You can obtain your access key from the Access keys in your storage account settings.

Testing Access to WASB

To tests access to WASB, SSH to a cluster node and run a few hadoop fs shell commands against your existing WASB account.

WASB access path syntax is:

wasb://container_name@storage_account_name.blob.core.windows.net/dir/file

For example, to access a file called "testfile" located in a directory called "testdir", stored in the container called "testcontainer" on the account called "hortonworks", the URL is:

wasb://testcontainer@hortonworks.blob.core.windows.net/testdir/testfile

You can also use "wasbs" prefix to utilize SSL-encrypted HTTPS access:

wasbs://@.blob.core.windows.net/dir/file

The following Hadoop FileSystem shell commands demonstrate access to a storage account named "myaccount" and a container named "mycontainer":

hadoop fs -ls wasb://mycontainer@myaccount.blob.core.windows.net/

hadoop fs -mkdir wasb://mycontainer@myaccount.blob.core.windows.net/testDir

hadoop fs -put testFile wasb://mycontainer@myaccount.blob.core.windows.net/testDir/testFile

hadoop fs -cat wasb://mycontainer@myaccount.blob.core.windows.net/testDir/testFile
test file content

Working with WASB

For more information about configuring the WASB connector and working with data stored in WASB, refer to Cloud Data Access documentation.

Related Links
Cloud Data Access (Hortonworks)
Create a Storage Account (External)