Data Steward Studio Installation
Also available as:
PDF

Pre-installation tasks for DP Profiler Agent for HDP 2.6.5 version

Perform these tasks before you try to install the Data Profiler agent on the cluster.

  1. Ensure that the clusters are running the latest version of HDP.
  2. Ensure that the following HDP components are installed and configured:
    • Atlas
    • Ranger
    • Knox
    • Spark2 and Livy Server2
  3. If you plan to sync users from LDAP into Ranger, ensure a dpprofiler user is created in LDAP and synced into Ranger.
  4. Ensure that Ranger integration for HDFS and Hive is enabled.
  5. Make sure that HDFS Audit logging for Ranger is enabled.
  6. Make sure you install Hive client and HDFS client on the machine where you plan to install DataPlane Profiler Agent.
  7. Add the following proxy users details in the custom core-site.xml file as follows:
    hadoop.proxyuser.livy.groups=* hadoop.proxyuser.livy.hosts=* hadoop.proxyuser.knox.groups=* hadoop.proxyuser.knox.hosts=*
  8. Restart the services as required.
  9. Make sure the resource requirements for YARN queues for a default DSS configurations are as follows:
    • RAM should be greater than or equal to 24 GB.
    • CPU Cores should be greater than equal to 12.

    Update the YARN parameters as follows:

    Set the yarn.scheduler.capacity.maximum-am-resource-percent parameter on YARN > Scheduler (let this be x) such that, when multiplied with the total memory in YARN, it should be greater than or equal to 8G.

    The equation appears as follows:

    (x * total_memory_in_yarn) >= 8G

    For example, for 16 GB it is advised to set x to 0.5.

    All these resources must be allocated exclusively for profiler agent and profilers. It is advisable to have a separate queue.