Data Steward Studio Administration
Also available as:
PDF

Understanding the DSS Profiler

DSS includes a profiler engine that can run various data profiling operations as a pipeline on data located in multiple data lakes. The profiler agent is installed in a data lake and can be set up on a specific schedule to generate various types of data profiles that create metadata annotations that summarizes the content and shape characteristics for data assets.

The technical preview version of DSS will include two built-in profilers: a Hive column univariate statistical profiler and a Ranger audit log summarizer. Both of these profiler agents are included as part of DSS and must be installed on the cluster.

When an Asset Collection is created, all data assets in that collection are added to a scheduler in the profiler backend. You cannot manually trigger the Profiler to run; the schedule is hardcoded to run once every hour.