Apache Hive Data Types

Hive is Data warehousing tool and used to process the data stored in hadoop and HDFS. Hive is similar to SQL because it analyze and process the data through querying language.

In this article we are discuss about basic data types for Hive query processing.

Recommended Reading – Basic Apache Hive Table Queries

Hive Data Types are classified into four types, given as follows

  • Column Types
  • Literal
  • Null Values
  • Complex Types

Column Types:

1. Integral


Integer type data can be used to Integral data types. Integral data types mentioned as INT. There are four types INT data types

TINYINT (1-byte signed integer, from -128 to 127)

SMALLINT (2-byte signed integer, from -32,768 to 32,767)

INT (4-byte signed integer, from -2,147,483,648 to 2,147,483,647)

BIGINT (8-byte signed integer, from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807)

2. String Types:


String Data Types mentioned by Single Quotes and Double Quotes. It contains two types – CHAR, VARCHAR

CHAR – CHAR is the fixed length and value shorter than are padded space. The maximum length of CHAR is 255.

VARCHAR – VARCHAR is variable length and we can specify the length of the field. (example : name VARCHAR(64)). Maximum length of VARCHAR is 65355 bytes.

3. Timestamp:


It supports traditional UNIX timestamp with nanosecond precision.

Timestamp Use Casts:

  • cast(date as date) – Same date value
  • cast(date as string) – Data formatted as “YYYY-MM-DD”
  • cast(date as timestamp) – Midnight of the year/month/day of the date value is returned as timestamp.
  • cast(string as date) – If the string is in the form ‘YYYY-MM-DD’, then a date value corresponding to that is returned. If the string value does not match this format, then NULL is returned.
  • cast(timestamp as date) – The year/month/day of the timestamp is returned as a date value.

Literals:


There are two types of literal types in Hive

Float – Floating point types are nothing but numbers with decimal points. Generally, this type of data is composed of DOUBLE data type.

Decimal – Decimal type data is nothing but floating point value with higher range than DOUBLE data type. The range of decimal type is approximately -10-308 to 10308.

Null Values:


Missing values are represented by the special value NULL.

Complex Data Types:


Complex Types can be built up from primitive types and other composite types. Data type of the fields in the collection are specified using an angled bracket notation.

The Hive complex data types are as follows:

1. Array – An Ordered sequences of similar type elements that are indexable using zero-based integers

Example – array (‘abc’, ‘abcd’ ); First element is accessed with array[0].

2. MAP – Collection of key-value pairs. Fields are accessed using array notation of keys (e.g., [‘key’]).

Example – ‘first’ -> ‘abc’ , ‘last -> ‘xyz’ is represented as map(‘first’, ‘abc’, ‘last’, ‘xyz’).

3. UNIONTYPE – It is similar to Unions in C. At any point of time, an Union Type can hold any one (exactly one) data type from its specified data types.

Example – CREATE TABLE test(col1 UNIONTYPE<INT, DOUBLE, ARRAY<VARCHAR>, STRUCT<a:INT,b:CHAR>>);