Pre-installation tasks for Data Plane Profiler Agent
Perform these tasks before you try to install the Data Profiler agent on the cluster.
- Ensure that the clusters are running the latest version of HDP.
-
Ensure that the following HDP components are installed and configured:
- Atlas
- Ranger
- Knox
- Spark2 and Livy Server2
- If you plan to sync users from LDAP into Ranger, ensure a dpprofiler user is created in LDAP and synced into Ranger.
- Make sure that HDFS Audit logging for Ranger is enabled.
-
Add the following proxy users details in the custom core-site.xml file as follows:
hadoop.proxyuser.livy.groups=*
hadoop.proxyuser.livy.hosts=*
hadoop.proxyuser.knox.groups=*
hadoop.proxyuser.knox.hosts=*
-
If the cluster is kerberos-enabled, go to the Kerberos configuration section in Ambari and look up the value of the global property called principal suffix. Go to the Spark2 service and access the Custom livy2-conf section and add this property.
livy.superusers=dpprofiler${principalsuffix}
-
Ensure that the following configuration is set up in Spark2 for cleaning up history files without filling up HDFS space over time.
- Log in to Ambari on the cluster.
- Select Spark2 > Configs > Custom spark2-defaults.
-
Add the following lines:
spark.history.fs.cleaner.enabled=true
spark.history.fs.cleaner.interval=1d
spark.history.fs.cleaner.maxAge=7d
This ensures that Spark history from jobs older than seven days will be cleaned up once per day. Modify the values as needed.
- Restart the services as required.