What is HCatalog in Hadoop?

What is HCatalog?

HCatalog is a table storage management tool for Hadoop. HCatalog helps to users enables different data processing tools like Hive, Pig, and MapReduce. Which use HCatalog users don’t have worry about what type of data is stored because Hcatalog is a key component of the hive. HCatalog is a UI based access to hive meta store so you can use this UI for creating and managing tables.

Why HCatalog?

Enabling right tool for right job
Integrate Hadoop with everything
Capture processing states to enable sharing

HCatalog Architecture:

HCatalog Architecture

HCatalog supports the read and writing files in any format and size. It also supports CSV, JSON, RCFile, and ORC file formats. To use default format, the user must provide InputFormat, OutputFormat, and Serde.

HCatalog is running top of the Hive and used to access Hive DDL (Data Definition Language). Hcatalog uses a hive command line for metadata exploration. It also has more tools for access Hive DDL like Create table.

HCatalog is a relational database so stored data as a table format. Tables value partitioned with one or more set of key values.

HCatalog CLI Commands:

CLI – Command Line Interface

CLI commands used to access the data which stored in Hive meta store and supports DDL operations. Basic DDL operations of CLI is Create a table, Alter Table, View Table and show tables etc..

Basic CLI Queries:

Create Table:

Create Table syntax used to create a table in hive using Hcatalog.

Syntax:

CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.] table_name

[(col_name data_type [COMMENT col_comment], …)]

[COMMENT table_comment]

[ROW FORMAT row_format]

[STORED AS file_format]

Insert Table:

Insert table used to insert the user data values using Insert command. In Hcatalog using “Load Data” statement for Inserting data

Syntax:

LOAD DATA [LOCAL] INPATH ‘filepath’ [OVERWRITE] INTO TABLE tablename

[PARTITION (partcol1=val1, partcol2=val2 …)]

Alter Table:

Alter Table query used to modify the data values in stored data. Hcatalog using “Alter Table” statement for Alter operations.

Syntax:

ALTER TABLE name RENAME TO new_name

ALTER TABLE name ADD COLUMNS (col_spec[, col_spec …])

ALTER TABLE name DROP [COLUMN] column_name

ALTER TABLE name CHANGE column_name new_name new_type

ALTER TABLE name REPLACE COLUMNS (col_spec[, col_spec …])

Rename to command:

It used to rename the table name

Syntax:

./hcat –e “ALTER TABLE employee RENAME TO emp;”

Show Table:

Show table commands used to displays names of the tables and it lists out a table from the current database.

Syntax:

SHOW TABLES [IN database_name] [‘identifier_with_wildcards’];

Conclusion:

Hcatalog is the storage component of Hadoop and used to access the hive meta store data values using DDL operations. If use Hcatalog for storage you can read and write operations easily and also easy to integrate values to External tools like Pig, Hive, and MapReduce.