HDFS Administration
Also available as:
PDF
loading table of contents...

Chapter 7. Configuring HDFS Compression

This section describes how to configure HDFS compression on Linux.

Linux supports GzipCodec, DefaultCodec, BZip2Codec, LzoCodec, and SnappyCodec. Typically, GzipCodec is used for HDFS compression. Use the following instructions to use GZipCodec.

  • Option I: To use GzipCodec with a one-time only job:

    hadoop jar hadoop-examples-1.1.0-SNAPSHOT.jar sort sbr"-Dmapred.compress.map.output=true" sbr"-Dmapred.map.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec"sbr "-Dmapred.output.compress=true" sbr"-Dmapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec"sbr -outKey org.apache.hadoop.io.Textsbr -outValue org.apache.hadoop.io.Text input output 
  • Option II: To enable GzipCodec as the default compression:

    • Edit the core-site.xml file on the NameNode host machine:

      <property>
        <name>io.compression.codecs</name>
        <value>org.apache.hadoop.io.compress.GzipCodec,
          org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,
          org.apache.hadoop.io.compress.SnappyCodec</value>
      <description>A list of the compression codec classes that can be used
          for compression/decompression.</description>
      </property>
    • Edit the mapred-site.xml file on the JobTracker host machine:

      <property>
        <name>mapreduce.map.output.compress</name>
        <value>true</value>
      </property>
      
      <property>
        <name>mapreduce.map.output.compress.codec</name>
        <value>org.apache.hadoop.io.compress.GzipCodec</value>
      </property>
      
      <property>
        <name>mapreduce.output.fileoutputformat.compress.type</name>
        <value>BLOCK</value>
      </property>
    • (Optional) - Enable the following two configuration parameters to enable job output compression. Edit the mapred-site.xml file on the Resource Manager host machine:

      <property>
        <name>mapreduce.output.fileoutputformat.compress</name>
        <value>true</value>
      </property>
      
      <property>
        <name>mapreduce.output.fileoutputformat.compress.codec</name>
        <value>org.apache.hadoop.io.compress.GzipCodec</value>
      </property>
    • Restart the cluster using the applicable commands in Controlling HDP Services Manually.