Hadoop Cluster Architecture and Core Components

  • Hadoop Cluster Architecture and Core Components

What is Hadoop Cluster?

  • Cluster means Many Computers are worked together as one system.Hadoop Cluster means Computer Cluster used at Hadoop.

  • Hadoop Cluster Mainly designed for storing large amount of unstructed data in Distributed file systems.

  • It referred as “Shared Nothing” Systems and shared data between nodes.

  • Hadoop Clusters are Arranged in racks and it having three nodes which is worker node,master node and Client nodes.

Hadoop Cluster Architecture:

  • Hadoop Cluster Having 110 racks and that racks having slave machines.

  • One Rack Switch are placed on top of the Each rack

  • Slave Machines are connected as cables which connected on rack switch.

  • Rack Switch contains 80 Ports

  • NameNode and Job tracker of hadoop cluster are refered as slaves only.

Components of Hadoop:

Hadoop Cluster has three core Components

  • Client

  • Master

  • Slave


Main purpose of Client are submitted the Mapreduce jobs

Client should be describes how to all data are processed

Finally retrives all data after completion of job submission


Master having 3 nodes


2.Secondary Node

3.Job tracker


In Hadoop Cluster NameNode stores file metadata only .NameNode is the health of datanode and it access datanode data only.

NameNode Tracking all information from files such as

  • which file saved in which cluster

  • Access time of file

  • Which user access a file on current time

2.Secondary Name Node:

Purpose of Secondary name node is contact namenode at each one hour and collects the file metadata from namenode and it copy file meta data in namenode.

After collection of file meta data secondary namenode should be clean the file metadata folder.

Finally send new file metadata to namenode.

3.Job Tracker:

Job tracker mainly used for submit mapreduce jobs and manges mapreduce jobs also.

Job tracker manages data node and tracking available resoures and already running task.


Slaves are responsible for processing data and store the data

It runs on data node and task tracker

It acts as data node and namenode of hadoop cluster