DSS Getting Started
Also available as:
PDF

Understanding the DSS Profiler

Data Steward Studio (DSS) includes a profiler engine that can run data profiling operations as a pipeline on data located in multiple data lakes. You can install the profiler agent in a data lake and set up a specific schedule to generate various types of data profiles. Data profilers generate metadata annotations on the assets for various purposes.

For example, data profilers can create summarized information about contents of an asset and also provide annotations that indicate its shape (such as distribution of values in a box plot or histogram).

When you create an Asset Collection, all data assets in that collection are added to a scheduler in the profiler backend agent. You cannot manually trigger the profiler to run; you can set the global refresh rate in Ambari > Dataplane Profiler > Configs > Advanced > Refresh table cron.