5.2. Configuration Properties

Configuration properties for centralized caching are specified in the hdfs-site.xml file.

Required Properties

Currently, only one property is required:

  • dfs.datanode.max.locked.memory

    This property determines the maximum amount of memory a DataNode will use for caching. The "locked-in-memory size" ulimit (ulimit -l) of the DataNode user also needs to be increased to match or exceed this parameter (for more details, see the following section on OS Limits). When setting this value, remember that you will need space in memory for other things as well, such as the DataNode and application JVM heaps, and the operating system page cache.

    Example:

    <property>
        <name>dfs.datanode.max.locked.memory</name>
        <value>268435456</value>
      </property>

Optional Properties

The following properties are not required, but can be specified for tuning.

  • dfs.namenode.path.based.cache.refresh.interval.ms

    The NameNode will use this value as the number of milliseconds between subsequent cache path re-scans. By default, this parameter is set to 300000, which is five minutes.

    Example:

    <property>
        <name>dfs.namenode.path.based.cache.refresh.interval.ms</name>
        <value>300000</value>
      </property>

  • dfs.time.between.resending.caching.directives.ms

    The NameNode will use this value as the number of milliseconds between resending caching directives.

    Example:

    <property>
        <name>dfs.time.between.resending.caching.directives.ms</name>
        <value>300000</value>
      </property>

  • dfs.datanode.fsdatasetcache.max.threads.per.volume

    The DataNode will use this value as the maximum number of threads per volume to use for caching new data. By default, this parameter is set to 4.

    Example:

    <property>
        <name>dfs.datanode.fsdatasetcache.max.threads.per.volume</name>
        <value>4</value>
      </property>

  • dfs.cachereport.intervalMsec

    The DataNode will use this value as the number of milliseconds between sending a full report of its cache state to the NameNode. By default, this parameter is set to 10000, which is 10 seconds.

    Example:

    <property>
        <name>dfs.cachereport.intervalMsec</name>
        <value>10000</value>
      </property>

  • dfs.namenode.path.based.cache.block.map.allocation.percent

    The percentage of the Java heap that will be allocated to the cached blocks map. The cached blocks map is a hash map that uses chained hashing. Smaller maps may be accessed more slowly if the number of cached blocks is large; larger maps will consume more memory. The default value is 0.25 percent.

    Example:

    <property>
        <name>dfs.namenode.path.based.cache.block.map.allocation.percent</name>
        <value>0.25</value>
      </property>


loading table of contents...