Using Apache Storm to Move Data
Also available as:
PDF

Configuring HDFS Spout

The following member functions are required for HdfsSpout:

.setReaderType()

Specifies which file reader to use:

  • To read sequence files, set this to 'seq'.

  • To read text files, set this to 'text'.

  • If you want to use a custom file reader class that implements interface org.apache.storm.hdfs.spout.FileReader, set this to the fully qualified class name.

.withOutputFields()

Specifies names of output fields for the spout. The number of fields depends upon the reader being used.

For convenience, built-in reader types expose a static member called defaultFields that can be used for setting this.

.setHdfsUri()

Specifies the HDFS URI for HDFS NameNode; for example: hdfs://namenodehost:8020.

.setSourceDir()

Specifies the HDFS directory from which to read files; for example, /data/inputdir.

.setArchiveDir()

Specifies the HDFS directory to move a file after the file is completely processed; for example, /data/done.

If this directory does not exist, it will be created automatically.

.setBadFilesDir()

Specifies a directory to move a file if there is an error parsing the contents of the file; for example, /data/badfiles.

If this directory does not exist it will be created automatically.

For additional configuration settings, see Apache HDFS spout Configuration Settings.