Pre-installation tasks for DP Profiler Agent for HDP 3.x versions
Perform these tasks before you try to install the Data Profiler agent on the cluster.
- Ensure that the clusters are running the latest version of HDP.
Ensure that the following HDP components are installed and configured:
- Spark2 with Livy for Spark2 and Spark Thrift Server for Spark2
- Ensure that Hive Interactive is enabled.
- If you plan to sync users from LDAP into Ranger, ensure a dpprofiler user is created in LDAP and synced into Ranger.
- Ensure that Ranger integration for HDFS and Hive is enabled.
- Make sure that HDFS Audit logging for Ranger is enabled.
- Make sure you install Hive client and HDFS client on the machine where you plan to install DataPlane Profiler Agent.
- Restart the services as required.
Make sure the resource requirements for YARN queues for a default DSS
configurations are as follows:
- RAM should be greater than or equal to 24 GB.
- CPU Cores should be greater than equal to 12.
Update the YARN parameters as follows:
yarn.scheduler.capacity.maximum-am-resource-percentparameter on YARN > Scheduler (let this be x) such that, when multiplied with the total memory in YARN, it should be greater than or equal to 8G.
The equation appears as follows:
(x * total_memory_in_yarn) >= 8G
For example, for 16 GB it is advised to set x to 0.5.
All these resources must be allocated exclusively for profiler agent and profilers. It is advisable to have a separate queue.
Make sure a stable Hive LLAP instance is available with the following minimal requirements.
Considering the following parameters:
- a= Average number of executor for sensitive/tablestats profilers
- b= Average RAM per executor for sensitive/tablestats profilers
- c= Average RAM per application master for sensitive/tablestats profilers
- y= RAM available in yarn for dpprofilers queue
The following formula will determine the minimal requirements:
LLAP will have x more jobs accessing data in Hive through LLAP with each having a parallelism.