Accessing Cloud Data
Also available as:
PDF
loading table of contents...

Configuring Page Blob Support

The Azure Blob Storage interface for Hadoop supports two kinds of blobs, block blobs and page blobs.

Block blobs, which are used by default, are suitable for most big-data use cases such as input data for Hive, Pig, analytical map-reduce jobs, and so on.

Page blobs can be up to 1TB in size, larger than the maximum 200GB size for block blobs. Their primary use case is in the context of HBase write-ahead logs. This is because page blobs can be written any number of times, whereas block blobs can only be appended up to 50,000 times, at which point you run out of blocks and your writes fail. This wouldn't work for HBase logs, so page blob support was introduced to overcome this limitation.

  1. In order to have the files that you create be page blobs, you must set the configuration variable fs.azure.page.blob.dir in core-site.xml to a comma-separated list of folder names. For example:

    <property>
      <name>fs.azure.page.blob.dir</name>
      <value>/hbase/WALs,/hbase/oldWALs,/data/mypageblobfiles</value>
    </property>

    To make all files page blobs, you can simply set this to /.

  2. You can set two additional configuration properties related to page blobs. You can also set them in core-site.xml:

  • The configuration option fs.azure.page.blob.size defines the default initial size for a page blob. The parameter value is an integer specifying the number of bytes. It must be 128MB or greater, but no more than 1TB.

  • The configuration option fs.azure.page.blob.extension.size defines the page blob extension size. This determines the amount by which to extend a page blob when it becomes full. The parameter value is an integer specifying the number of bytes. It must be 128MB or greater, specified as an integer number of bytes.