Example archive indexed data

A working example of archiving indexed data.

In archive mode, the program fetches data from the Solr collection and writes it out to HDFS or S3, then deletes the data.

The program will fetch records from Solr and creates a file once the write block size is reached, or if there are no more matching records found in Solr. The program keeps track of its progress by fetching the records ordered by the filter field, and the id field, and always saves their last values. Once the file is written, it’s is compressed using the configured compression type.

After the compressed file is created the program creates a command file containing instructions with next steps. In case of any interruptions or error during the next run for the same collection the program will start executing the saved command file, so all the data would be consistent. If the error is due to invalid configuration, and failures persist, the -g option can be used to ignore the saved command file. The program supports writing data to HDFS, S3, or Local Disk.

The command below will archive data from the solr collection hadoop_logs accessible at http://c6401.ambari.apache.org:8886/solr, based on the field logtime, and will extract everything older than 1 day, read 10 documents at once, write 100 documents into a file, and copy the zip files into the local directory /tmp.

infra-solr-data-manager -m archive -s 
http://c6401.ambari.apache.org:8886/solr -c hadoop_logs -f logtime -d 
1 -r 10 -w 100 -x /tmp -v