What is Hbase in Hadoop

  • What is Hbase in Hadoop

What is Hbase?

Hbase is a NoSql database and it runs on top of the HDFS.Main Purpose of Hbase is read and write large amount of datasets.Hbase stores the data at rows and columns.Hbase programming written in only Java.Hbase does not support structured query Language.

Hbase Data Model:

Hbase data model stores semi structured data and having different datatypes data.In Hbase Rowkey are used to identify the data in rows.Hbase data model having following components.

HBase Tables – It Consists of multiple rows for storing data.

Hbase Row – Hbase row are consists row key.Main Purpose of row key is sorted the data at alphabettically and row consists one or more column with value to associate the value so row key is most important.

Column – Column consists of column family and qualifier.Qualifier are mentioned by Colon(:)

Column Family – It having set of columns with their values and has storage properties also.Column family are common to all rows in hbase table.

Column Qualifier – It used to index the piece of data and it fixed in table creations.

Cell – Cell is the combination of row,column family and qualifier with their values.Value of cell is bytes of array.

Timestamp – Timestamp is the identifier of given version and it represents time on the region server

Main Components of Hbase:

Hbase Master Server:

  • Hbase handles region operations like Create and Delete the table
  • It manages the Hadoop Clusters and load balancing of region servers.
  • It assigns task to regions to regions server with help of Zoo keeper
  • In HDFS Master Server runs on namenode
  • It handles adminstration tasks and serve task to different regions

Regions:

  • Regions are divided horizontally into rowkey range called regions and it is managed by region server.
  • It contains all row between the regions start key and end key.
  • Regions are contains two components which are Memstore and Hfile.
  • It assigned to node in all cluster that is called region server.
  • It servers data for read and writes operations,

Region Server:

  • It Handles read,write,update and delete requests from clients.
  • It runs on all nodes in hadoop clusters.

Region server contains follwing components

1.Write Ahead Log – This is one type of file and it stores new data that is not permanent Storage.

2.Block Cache – Block Cache is the read cache and it stores read data in memory.When cache full recently read data are deleted.

3.MemStore – It is write cache.Memstore are stored new data and sorted before written to disk.There is one memstore per region.

4.Hfiles – it stores rows as key values on disk

ZooKeeper:

  • Zookeeper is a open source project and it provide services like configuration,naming,maintaining etc.
  • It provides when server are available and provide failure notifications also.
  • Zookeeper are mainly used for Client communication with region servers.
  • If Client wants to communicate regions they are called Zookeeper first.