Using Apache HiveQL
Also available as:
PDF

Aggregate and group data

You use AVG, SUM, or MAX functions to aggregate data, and the GROUP BY clause to group data query results in one or more table columns..

The GROUP BY clause explicitly groups data. Hive supports implicit grouping, which occurs when aggregating the table in full.
  1. Construct a query that returns the average salary of all employees in the engineering department grouped by year.
    SELECT year, AVG(salary)
    FROM Employees
    WHERE Department = 'engineering' GROUP BY year;
  2. Construct an implicit grouping query to get the highest paid employee.
    SELECT MAX(salary) as highest_pay, 
    AVG(salary) as average_pay
    FROM Employees
    WHERE Department = 'engineering';