Cloud Data Access
Also available as:
loading table of contents...

Improving Performance for DistCp


You can tune and Refer to Maximizing HDInsight throughput to Azure Blob Storage blog post.

Amazon S3

If you are planning to copy large amounts of data between HDFS and S3, you can accelerate the process by passing -D while invoking DistCp. For example:

hadoop distcp -D  s3a://dominika-test/driver-data /tmp/test2

The option significantly accelerates data upload by writing the data in blocks, possibly in parallel.

For more tips on how to improve performance for DistCp with S3, refer to Configuring and Tuning S3A Fast Upload.