What is MapReduce in Hadoop?
MapReduce is the one of the processing tool of Hadoop and it processing large amount of data.It divides a main tasks into subtasks and it processing at parallel.Programmers are written program at MapReduce and its automattically parallelized.Mapreduce having one component called driver and it used to initializing job to mapreduce.MapReduce contains follwing tasks
1.Map
2.Reduce
- Map gets input from HDFS using marklogic connector and it splits that input running across Hadoop Cluster.
- Input of Map task function in the form of Key/Value Pairs
- Main purpose of Map task is organize the data for reduce processing
- Input of Map tasks at file format
- Reduce gets input from Map tasks output.
- It having several reducers and independent to one another.
- Reducers are selected by used and default number of reducers is one
- Reduces should be create the final results based on Map taks output.
Architecture and components of MapReduce:
Job Client – It submits mapreduce jobs to job tracker
Job tracker – It is one part of master node and it assigns job to task tracker
Task Tracker – It is one part of slave node and it track all task data.once completed the task informed to job tracker
PayLoad – It is one type of applications mainly designed for MapReduce functions
Mapper – Main Purpose of mapper is maps the input data to indermediate key/value pairs
NameNode – It manages the HDFS Data
DataNode – It searches advance data are presents in processing places
Master Node – Main purpose of Master node is receives job data from clients
Slavenode – it runs Map and Reduce jobs
Scalability – Mapreduce processing large amount data at a second
Cost Effective – It stores and accessing the data at affordable manner
Flexbility – It processing structured and unstructured data also from hadoop cluster
Fast – It takes input data from HDFS and stores all data at same server so data processing provess are very fast
Security – It allows only authorised users for accessing the HDFS data