Additional Configuration for Hive Column Profiler
In addition to the generic configuration, there are additional parameters for the Cluster Sensitivity Profiler that can optionally be edited.
- Click Profilers in the main navigation menu on the left.
- Click Configs to view all of configured profilers.
- Select the cluster for which you need to edit profiler configuration.
The list of profilers for the selected clusters is displayed.
- Click the Hive Column Profiler to edit.
The Profiler Configuration tab is displayed in the right panel.
- Select the queue and schedule details as specified in Edit Profiler Configuration.
Note: The schedule for Hive Column Profiler is set to run once every six hours. After installation, you will be able to see the output of Hive Column Profiler after six hours. If you want to view the output in advance, update the cron expression accordingly.
- Select the Sample Data Size.
- From the drop down, select the type of sample data size.
- Enter the value based on the previously selected type.
- Choose the Selection Criteria.
- All tables - The Hive Column Profiler will run on all tables in the asset collection during its next scheduled run.
- Only changed tables - The Hive Column Profiler will run only on tables which are in the asset collection and which have been changed.
While configuring Hive-site settings in Ambari, make sure the
hive.metastore.transactional.event.listenersis set to
org.apache.hive.hcatalog.listener.DbNotificationListenerparameter. If this parameter is not set as specified, you will not be able to select the option to choose only changed tables. An error will appear suggesting you to update the parameter.
- Click Save to apply the configuration changes to the selected profiler. The changes should appear in the profiler description.