4. Encryption during Shuffle

Data securely loaded into HDFS is processed by Mappers and Reducers to derive meaningful business intelligence. Hadoop code moves data between Mappers and Reducers over the HTTP protocol in a step called the shuffle. In SSL parlance, the Reducer is the SSL client that initiates the connection to the Mapper to ask for data. Enabling HTTPS for encrypting shuffle traffic involves the following steps.

  • In the mapred-site.xml file, set mapreduce.shuffle.ssl.enabled=true.

  • Set keystore and optionally truststore (for 2-way SSL) properties as in the table above.