3. New Feature: Subqueries in WHERE Clauses

Previous versions of Hive allowed subqueries only in FROMclauses of SQL statements. A subquery is a SQL expression that returns a set of rows. The subquery is evaluated and its query result set used to evaluate the parent query, the outer query that contains the subquery. Version 0.13 of Hive expands the use the use of subqueries to include WHEREclauses, as shown in the following example.

SELECT state, net_payments FROM transfer_payments WHERE transfer_payments.year IN (SELECT year FROM us_census);

No configuration is required to enable execution of subqueries in Hive; the feature is available by default. However, several restrictions exist for the use of subqueries in WHERE clauses. The next section,Understanding Subqueries, describes the concepts necessary to understand these restrictions, and the following section,Restrictions on Subqueries in WHERE Clauses explains the restrictions.


loading table of contents...