Create and use a partitioned materialized view

When creating a materialized view, you can partition selected columns to improve performance. Partitioning separates the view of a table into parts, which often improves query rewrites of partition-wise joins of materialized views with tables or other materialized views.

This task assumes you created a materialized view of emps and depts tables. The emps table contains the following data:


empid	deptno	name	salary	hire_date
10001	101	jane doe	250000	2018-01-10
10005	100	somporn klailee	210000	2017-12-25
10006	200	jeiranan thongnopneua	175000	2018-05-05

The depts table contains the following data:


deptno	deptname	locationid
100	HR	10
101	Eng	11
200	Sup	20

In this task, you create two materialized views: one partitions data on department; the another partitions data on hire date. You select data, filtered by department,from the original table, not from either one of the materialized views. The explain plan shows that Hive rewrites your query for efficiency to select data from the materialized view that partitions data by department. In this task, you also see the effects of rebuilding a materialized view.

Create a materialized view of the emps table that partitions data into departments.

CREATE MATERIALIZED VIEW partition_mv_1 PARTITIONED ON (deptno) 
AS SELECT hire_date, deptno FROM emps WHERE deptno > 100 AND deptno < 200;

Create a second materialized view that partitions the data on the hire date instead of the department number.

CREATE MATERIALIZED VIEW partition_mv_2 PARTITIONED ON (hire_date)
  AS SELECT deptno, hire_date FROM emps where deptno > 100 AND deptno < 200;

Generate an extended explain plan by selecting data for department 101 directly from the emps table without using the materialized view.

EXPLAIN EXTENDED SELECT deptno, hire_date FROM emps_a where deptno = 101;

The explain plan shows that Hive rewrites your query for efficiency, using the better of the two materialized views for the job: partition_mv_1.

+----------------------------------------------------+
|                      Explain                       |
+----------------------------------------------------+
| OPTIMIZED SQL: SELECT CAST(101 AS INTEGER) AS `deptno`, `hire_date` |
| FROM `default`.`partition_mv_1`                    |
| WHERE 101 = `deptno`                               |
| STAGE DEPENDENCIES:                                |
|   Stage-0 is a root stage  
...

Insert another row for Jane Doe that uses a different hire date 2018-02-12, rebuild one of the materialized views, but not the other, and compare contents of both materialized views.

INSERT INTO emps VALUES (10001,101,'jane doe',250000,'2018-02-12');
ALTER MATERIALIZED VIEW partition_mv_1 REBUILD;
SELECT * FROM partition_mv_1 where deptno = 101;
SELECT * FROM partition_mv_2 where deptno = 101;

The output of selecting the rebuilt partition_mv_1 includes the original row and newly inserted row because INSERT does not perform in-place updates (overwrites).

+---------------------------+------------------------+
| partition_mv_1.hire_date  | partition_mv_1.deptno  |
+---------------------------+------------------------+
| 2018-01-10 00:00:00.0     | 101                    |
| 2018-02-12 00:00:00.0     | 101                    |
+---------------------------+------------------------+

The output from the other partition is stale because you did not rebuild it:

+------------------------+---------------------------+
| partition_mv_2.deptno  | partition_mv_2.hire_date  |
+------------------------+---------------------------+
| 101                    | 2018-01-10 00:00:00.0     |
+------------------------+---------------------------+

Create a second employees table and a materialized view of the tables joined on the department number.

CREATE TABLE emps2 TBLPROPERTIES AS SELECT * FROM emps;

CREATE MATERIALIZED VIEW partition_mv_3 PARTITIONED ON (deptno) AS
  SELECT emps.hire_date, emps.deptno FROM emps, emps2
  WHERE emps.deptno = emps2.deptno
  AND emps.deptno > 100 AND emps.deptno < 200;

Generate an explain plan that joins tables emps and emps2 on department number using a query that omits the partitioned materialized view.
```
EXPLAIN EXTENDED SELECT emps.hire_date, emps.deptno FROM emps, emps2
  WHERE emps.deptno = emps2.deptno
  AND emps.deptno > 100 AND emps.deptno < 200;
```
The output shows that Hive rewrites the query to use the partitioned materialized view partition_mv_3 even though your query omitted the materialized view.
Verify that the partition_mv_3 sets up the partition for deptno=101 for partition_mv_3.
```
SHOW PARTITIONS partition_mv_3;
```
Output is:
```
+-------------+
|  partition  |
+-------------+
| deptno=101  |
+-------------+
```