Using Apache HiveQL
Also available as:
PDF

Create and use a materialized view

You can create a materialized view of a query to calculate and store results of an expensive operation, such as join.

In this task, you create and populate example tables. You create a materialized view of a join of the tables. Subsequently, when you run a query to join the tables, the query plan takes advantage of the precomputed join to accelerate processing.

  1. In the Hive shell or other Hive UI, create two tables:
    CREATE TABLE emps (
                        empid INT,
                        deptno INT,
                        name VARCHAR(256),
                        salary FLOAT,
                        hire_date TIMESTAMP)
                        STORED AS ORC;
                        
                        CREATE TABLE depts (
                        deptno INT,
                        deptname VARCHAR(256),
                        locationid INT)
                        STORED AS ORC;
  2. Insert some data into the tables for example purposes:
    INSERT INTO TABLE emps VALUES (10001,101,'jane doe',250000,'2017-01-17');
    INSERT INTO TABLE emps VALUES (10002,102,'john smith',150000,'2018-01-10');
    INSERT INTO TABLE emps VALUES (10003,101,'naoko murai',110000,'2017-01-17');
    INSERT INTO TABLE emps VALUES (10004,102,'gunnar thoresohn',150000,'2018-01-10');
                        
    INSERT INTO TABLE depts VALUES (101,'IT',25);
    INSERT INTO TABLE depts VALUES (102,'Eng',11);
  3. Create a materialized view to join the tables:
    CREATE MATERIALIZED VIEW mv1
      AS SELECT empid, deptname, hire_date
      FROM emps JOIN depts
      ON (emps.deptno = depts.deptno)
      WHERE hire_date >= '2017-01-17';
  4. Execute a query that takes advantage of the precomputation performed by the materialized view:
    SELECT empid, deptname
     FROM emps
     JOIN depts
     ON (emps.deptno = depts.deptno)
     WHERE hire_date >= '2017-01-17'
     AND hire_date <= '2018-01-10';
  5. Verify that the query rewrite used the materialized view by running an extended EXPLAIN statement:
    +-------------------------------------------------------------------------------------+
    |                      Explain                                                        |
    +-------------------------------------------------------------------------------------+
    | STAGE DEPENDENCIES:                                                                 |
    |   Stage-0 is a root stage                                                           |
    |                                                                                     |
    | STAGE PLANS:                                                                        |
    |   Stage: Stage-0                                                                    |
    |     Fetch Operator                                                                  |
    |       limit: -1                                                                     |
    |       Processor Tree:                                                               |
    |         TableScan                                                                   |
    |           alias: default.mv1                                                        |
    |           filterExpr: (hire_date <= TIMESTAMP'2018-01-10 00:00:00') (type: boolean) |
    |           GatherStats: false                                                        |
    |           Filter Operator                                                           |
    |             isSamplingPred: false                                                   |
    |             predicate: (hire_date <= TIMESTAMP'2018-01-10 00:00:00') (type: boolean)|
    |             Select Operator                                                         |
    |               expressions: empid (type: int), deptname (type: varchar(256))         |
    |               outputColumnNames: _col0, _col1                                       |
    |               ListSink                                                              |
    +-------------------------------------------------------------------------------------+
    The output shows the alias default.mv1 for the materialized view in the TableScan section of the plan.