Pre-installation tasks for Data Plane Profiler Agent
Perform these tasks before you try to install the Data Profiler agent on the cluster.
- Ensure that the clusters are running the latest version of HDP.
-
Ensure that the following HDP components are installed and configured:
- Atlas
- Ranger
- Knox
- Spark2 and Livy Server2
- If you plan to sync users from LDAP into Ranger, ensure a dpprofiler user is created in LDAP and synced into Ranger.
- Ensure that Ranger integration for HDFS and Hive is enabled.
- Make sure that HDFS Audit logging for Ranger is enabled.
- Make sure you install Hive client and HDFS client on the machine where you plan to install DataPlane Profiler Agent.
-
Add the following proxy users details in the custom core-site.xml file as follows:
hadoop.proxyuser.livy.groups=*
hadoop.proxyuser.livy.hosts=*
hadoop.proxyuser.knox.groups=*
hadoop.proxyuser.knox.hosts=*
-
Ensure that the following configuration is set up in Spark2 for cleaning up history files without filling up HDFS space over time.
- Log in to Ambari on the cluster.
- Select Spark2 > Configs > Custom spark2-defaults.
-
Add the following lines:
spark.history.fs.cleaner.enabled=true
spark.history.fs.cleaner.interval=1d
spark.history.fs.cleaner.maxAge=7d
This ensures that Spark history from jobs older than seven days will be cleaned up once per day. Modify the values as needed.
- Restart the services as required.
-
Make sure the resource requirements for YARN queues for a default DSS
configurations are as follows:
- RAM should be greater than or equal to 24 GB.
- CPU Cores should be greater than equal to 12.
Update the YARN parameters as follows:
Set the
yarn.scheduler.capacity.maximum-am-resource-percent
parameter on YARN > Scheduler (let this be x) such that, when multiplied with the total memory in YARN, it should be greater than or equal to 8G.The equation appears as follows:
(x * total_memory_in_yarn) >= 8G
For example, for 16 GB it is advised to set x to 0.5.
All these resources must be allocated exclusively for profiler agent and profilers. It is advisable to have a separate queue.