Pre-installation tasks for Data Plane Profiler Agent
Perform these tasks before you try to install the Data Profiler agent on the cluster.
- Ensure that the clusters are running the latest version of HDP.
Ensure that the following HDP components are installed and configured:
- Spark2 and Livy Server2
- If you plan to sync users from LDAP into Ranger, ensure a dpprofiler user is created in LDAP and synced into Ranger.
- Ensure that Ranger integration for HDFS and Hive is enabled.
- Make sure that HDFS Audit logging for Ranger is enabled.
Add the following proxy users details in the custom core-site.xml file as follows:
If the cluster is kerberos-enabled, go to the Kerberos configuration section in Ambari and look up the value of the global property called principal suffix. Go to the Spark2 service and access the Custom livy2-conf section and add this property.
Ensure that the following configuration is set up in Spark2 for cleaning up history files without filling up HDFS space over time.
- Log in to Ambari on the cluster.
- Select Spark2 > Configs > Custom spark2-defaults.
Add the following lines:
spark.history.fs.cleaner.maxAge=7dThis ensures that Spark history from jobs older than seven days will be cleaned up once per day. Modify the values as needed.
- Restart the services as required.
Make sure the minimum resource requirements for a default DSS configurations are as follows:
- RAM should be greater than or equal to 24 GB.
- CPU Cores should be greater than equal to 12.
Update the YARN parameters as follows:
yarn.scheduler.capacity.maximum-am-resource-percentparameter on YARN > Scheduler (let this be x) such that
(x * total_memory_in_yarn) >= 8G
For example, for 16 GB it is advised to set x to 0.5.
All these resources must be allocated exclusively for profiler agent and profilers. It is advisable to have a separate queue.