Apache Hive Performance Tuning
Also available as:
PDF

Generate and view Apache Hive statistics

You can use statistics to optimize queries for improved performance. The cost-based optimizer (CBO) also uses statistics to compare query plans and choose the best one. By viewing statistics instead of running a query, you can sometimes get answers to your data questions faster.

This task shows how to generate different types of statistics about a table.

  1. Launch a hive shell and log in.
  2. Gather statistics for the non-partitioned table mytable:
    ANALYZE TABLE mytable COMPUTE STATISTICS;
  3. Confirm that the hive.stats.autogather property is enabled.
    1. In Ambari, select Services > Hive > Configs.
    2. In Filter, enter hive.stats.autogather.
  4. View table statistics you generated:
    DESCRIBE EXTENDED mytable;
  5. Gather column statistics for the table:
    ANALYZE TABLE mytable COMPUTE STATISTICS FOR COLUMNS;
  6. View column statistics for the name column in my_table in the my_db database:
    DESCRIBE FORMATTED my_db.my_table name;