Set up the cost-based optimizer and statistics
You can use the cost-based optimizer (CBO) and statistics to generate efficient query execution plans that can improve performance. You must generate column statistics to make CBO functional.
In this task, you enable and configure the cost-based optimizer (CBO) and configure Hive to gather column and table statistics for evaluating query performance. Column and table statistics are critical for estimating predicate selectivity and cost of the plan. Certain advanced rewrites require column statistics.
In this task, you check, and set the following properties in the hive-site.xml configuration file:
Controls collection of table-level statistics.
Controls collection of column-level statistics.
Instructs Hive to use statistics when generating query plans.
All of these properties are checked by default. You can manually generate the table-level statistics for newly created tables and table partitions using the ANALYZE TABLE statement.
- You installed Ambari.
- You added the Apache Hive service and started all components.
- You have administrative privileges to configure Hive in Ambari.
- In Ambari, select .
Enable cost-based optimization if you changed the default: In Filter, enter
hive.cbo.enable, and check the checkbox.
Configure automatic gathering of table-level statistics for newly created
tables and table partitions if you changed the default: In Filter, enter
hive.stats.autogather, and check the checkbox.
Configure Hive to use statistics when generating query plans: In Filter, enter
hive.compute.query.using.stats, and check the checkbox.