MapReduce Architecture and Components

  • MapReduce Architecture and Components

What is MapReduce in Hadoop?

MapReduce is the one of the processing tool of Hadoop and it processing large amount of data.It divides a main tasks into subtasks and it processing at parallel.Programmers are written program at MapReduce and its automattically parallelized.Mapreduce having one component called driver and it used to initializing job to mapreduce.MapReduce contains follwing tasks

1.Map

2.Reduce

Map:

  • Map gets input from HDFS using marklogic connector and it splits that input running across Hadoop Cluster.
  • Input of Map task function in the form of Key/Value Pairs
  • Main purpose of Map task is organize the data for reduce processing
  • Input of Map tasks at file format

Reduce:

  • Reduce gets input from Map tasks output.
  • It having several reducers and independent to one another.
  • Reducers are selected by used and default number of reducers is one
  • Reduces should be create the final results based on Map taks output.

Architecture and components of MapReduce:

mapreduce

Job Client – It submits mapreduce jobs to job tracker

Job tracker – It is one part of master node and it assigns job to task tracker

Task Tracker – It is one part of slave node and it track all task data.once completed the task informed to job tracker

PayLoad – It is one type of applications mainly designed for MapReduce functions

Mapper – Main Purpose of mapper is maps the input data to indermediate key/value pairs

NameNode – It manages the HDFS Data

DataNode – It searches advance data are presents in processing places

Master Node – Main purpose of Master node is receives job data from clients

Slavenode – it runs Map and Reduce jobs

Advantages of MapReduce:

Scalability – Mapreduce processing large amount data at a second

Cost Effective – It stores and accessing the data at affordable manner

Flexbility – It processing structured and unstructured data also from hadoop cluster

Fast – It takes input data from HDFS and stores all data at same server so data processing provess are very fast

Security – It allows only authorised users for accessing the HDFS data