Hive Data Types

Column Types

These are all the supported column types in Hive:

Primitive types:

Complex types:

  • arrays: ARRAY<data_type>
  • maps: MAP<primitive_type, data_type>
  • structs: STRUCT<col_name : data_type [COMMENT col_comment], ...>
  • union: UNIONTYPE<data_type, data_type, ...>

Timestamps

Supports traditional UNIX timestamp with optional nanosecond precision.

Supported conversions:

  • Integer numeric types: Interpreted as UNIX timestamp in seconds
  • Floating point numeric types: Intepreted as UNIX timestamp in seconds with decimal precision
  • Strings: JDBC compliant java.sql.Timestamp format "YYYY-MM-DD HH:MM:SS.fffffffff" (9 decimal place precision)

Timestamps are interpreted to be timezoneless and stored as an offset from the UNIX epoch. Convenience UDFs for conversion to and from timezones are provided (to_utc_timestamp, from_utc_timestamp).

All existing datetime UDFs (month, day, year, hour, etc.) work with the TIMESTAMP data type.

Union Types

Union types can at any one point hold exactly one of their specified data types. You can create an instance of this type using the create_union UDF:

CREATE TABLE union_test(foo UNIONTYPE<int, double, array<string>, struct<a:int,b:string>>); SELECT foo FROM union_test; {0:1} {1:2.0} {2:["three","four"]} {3:{"a":5,"b":"five"}} {2:["six","seven"]} {3:{"a":8,"b":"eight"}} {0:9} {1:10.0}

The first part in the deserialized union is the tag which lets us know which part of the union is being used. In this example 0 means the first data_type from the definition which is an int and so on.

To create a union you have to provide this tag to the create_union UDF:

SELECT create_union(0, key), create_union(if(key<100, 0, 1), 2.0, value), create_union(1, "a", struct(2, "b")) FROM src LIMIT 2; {0:"238"} {1:"val_238"} {1:{"col1":2,"col2":"b"}} {0:"86"} {0:2.0} {1:{"col1":2,"col2":"b"}}

Literals

Integral Types

Integral literals are assumed to be INT by default, unless the number exceeds the range of INT in which case it is interpreted as a BIGINT, or if one of the following postfixes is present on the number.

Type Postfix Example
TINYINT Y 100Y
SMALLINT S 100S
BIGINT L 100L

String Types

String literals can be expressed with either single quotes (') or double quotes ("). Hive uses C-style escaping within the strings.

Floating Point Types

Floating point literals are assumed to be DOUBLE. Scientific notation is not yet supported.

BigDecimal Types

BigDecimal literals provide greater precision and range for floating point numbers than the DOUBLE data type. They are needed for use cases in which the precision of a DOUBLE is insufficient, such as financial applications, and for cases that deal with numbers outside the DOUBLE range (approximately -10308 to 10308) or very close to zero (-10-308 to 10-308). The precision of a BigDecimal is practically unlimited.

For a general discussion of the limits of the DOUBLE data type, see the Wikipedia article Double-precision floating-point format.

Using BigDecimal Types

You can create a table in Hive that uses the BigDecimal type with the following syntax:

create table decimal_1 (t decimal);

The table decimal_1 is a table having one field of type decimal which is basically a BigDecimal value.

You can read and write values in such a table using either the LazySimpleSerDe or the LazyBinarySerDe. For example:

alter table decimal_1 set serde 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe';

or:

alter table decimal_1 set serde 'org.apache.hadoop.hive.serde2.lazy.LazyBinarySerDe';

You can use a cast to convert a BigDecimal value to any other primitive type such as a BOOLEAN. For example:

select cast(t as boolean) from decimal_2;

BigDecimal also supports many mathematical UDFs with the same syntax as used in the case of DOUBLE.

Basic mathematical operations that can use decimal types include:

  • Plus
  • Negative
  • Addition
  • Subtraction
  • Multiplication
  • Division
  • Modulus

These rounding functions can also take decimal types:

  • Floor
  • Ceiling
  • Round

But these mathematical UDFs are not currently supported:

  • Exp
  • Log (Ln)
  • Log2
  • Log10
  • Sqrt

Power(decimal, n) only supports positive integer values for the exponent n.

Casting is supported between decimal values and integer, double, and so on.

Testing BigDecimal Types

Two new tests have been added as part of the TestCliDriver framework within Hive. They are decimal_1.q and decimal_2.q. Other tests such as udf7.q cover the gamut of UDFs mentioned above.

More tests need to be added that demonstrate failure or when certain types of casts are prevented (for example, casting to date). There is some ambiguity in the round function because the rounding of BigDecimal does not work exactly as the SQL standard, and therefore it has been omitted in the current work.

For general information about running Hive tests, see How to Contribute to Apache Hive and Hive Developer FAQ.

Handling of NULL Values

Missing values are represented by the special value NULL. To import data with NULL fields, check documentation of the SerDe used by the table. (The default Text Format uses LazySimpleSerDe which interprets the string \N as NULL when importing.)