What is Hive?
Hive is Data warehousing tool and used to process the data in Hadoop and HDFS.Hive is similar to SQL because it analyzes and processes the data with querying language. Hive runs on MapReduce and top of the Hadoop. Hive also knew as HiveQL.Main Functions of the hive is data summarization, querying, and analysis.
Recommended Reading – Why and Benefits of Hadoop
Features of Hive:
- Hive processed data from Hadoop and HDFS
- Hive designed by OLAP
- It Supports SQL and HiveQl
Data Storage of Hive:
It stores all metadata database, tables, and datatypes.Metastore also tracks HDFS data.
There are two types of table.
1.First or Normal table
2.Second or External table
HDFS has used the external table for stores the data.The main difference between two tables are when deleting normal table all data also deleted in a normal table but in External table data are not deleted when deleting table of external.
Tables contains following columns
- user id
- IP address
Partition is a slice of the table and table contains many partitions.It mainly used to identify the rows and how is data stored.Partitions data are not stored but delivered the result on load. The syntax of partitions declaration is “WHERE” clause in a select statement.
Partitions data are divided into buckets based on a value of a hash function.Main Purpose of buckets is speeding up the join operation.
Hive is Not:
- Hive is not relational database
- Hive is not a design for OLTP
- Hive is not a programming language
It is a traditional database.HQL stores data in table format and all table contains columns and columns contains rows.All columns having own datatypes and HQL supports all datatypes like primitive and complex.
Primitive Datatype – Integer,Boolean,Bigint,smallint,TinyInt,Double,String,Binary
Complex Datatype – Map, Structs, and list
It supports Data Definition Statements like Create table, Deletes table, Drops table.It also supports Data Manipulation Statements like Load, select, insert
A load is used to take data from HDFS and move to Hive. Insert used to move data from hive to another one. Select is querying data.
Basic Query Statements:
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
[(col_name data_type [COMMENT col_comment], …)] [COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], …)]
[CLUSTERED BY (col_name, col_name, …) [SORTED BY (col_name [ASC|DESC], …)] INTO num_buckets BUCKETS]
[ [ROW FORMAT row_format] [STORED AS file_format]]
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 …) [IF NOT EXISTS]] select_statement1
[INSERT OVERWRITE TABLE tablename2 [PARTITION … [IF NOT EXISTS]] select_statement2]
[INSERT INTO TABLE tablename2 [PARTITION …] select_statement2] …;
SELECT [ALL | DISTINCT] select_expr, select_expr, …
[GROUP BY col_list]
[CLUSTER BY col_list]
[SORT BY col_list]]