Apache HDFS Architecture and Components

HDFS means Hadoop Distributed File System and it manages big data sets with high volume. HDFS stores the data at distributed manner and it is the primary storage system. HDFS allows read and write the files but cannot updated the files in HDFS. When we move file in HDFS that file are splited into small files. HDFS are implemented by Master Slave architecture.

Main Components of HDFS:

1. NameNode

2. Secondary NameNode

3. DataNode

4. Block

NameNode:

  • NameNode is the heart and master oh Hadoop
  • It maintains the namespace system of hadoop
  • NameNode stores the metadata of data blocks that data are permanently stored on Local disk
  • It reduced disk space also.

Secondary NameNode:

  • Main role of secondary namenode is copy and merge the namespace.
  • Secondary namenode requires huge amount of memory to merge the files.
  • If namenode failure namespace images are stored in secondary namenode and it can be restart the namenode.

DataNode:

  • DataNode also known as slaves.
  • If DataNode failure it does not affect any data which stored in DataNode.
  • It Configured lot of disk space because DataNode stores actually data.
  • DataNode Performs read and write operations as per client request.
  • Work of Datanode are based on NameNode instructions only.

Block:

  • Block is location of where data is stored.
  • In block files are divided into one or more segments and it stored in separate nodes.
  • If HDFS read and write small amount of data is called Block.
  • Minimum size of Block is 64MB

Goals of HDFS:

Fault Detection – HDFS having more number of harware so HDFS have mechanism for fault detection.

Huge Datasets – HDFS Should have more number of nodes to manage applications

Basic HDFS Operations:

1. Open NameNode

$ hadoop namenode -format

2. Start NameNode

$ start-dfs.sh

3. Listing files in HDFS

$ $HADOOP_HOME/bin/hadoop fs -ls <args>

4. Insert the data into HDFS

Step 1: Create input directory

$ $HADOOP_HOME/bin/hadoop fs -mkdir /user/input

Step 2: Transfer data from local to hadoop file

$ $HADOOP_HOME/bin/hadoop fs -put /home/file.txt /user/input

5. Retriving the data

Step 1: View the data

$ $HADOOP_HOME/bin/hadoop fs -cat /user/output/outfile

Step 2: Retrive data from hadoop file to local

$ $HADOOP_HOME/bin/hadoop fs -get /user/output/ /home/hadoop_tp/

6. Stop the HDFS

$ stop-dfs.sh