Types of Joins and Counters in Apache MapReduce

  • Type join counter Apache Mapreduce

What is MapReduce?

Mapduce is the processing technique and program of distributed model based on Java. It contains two important tasks that is Map and Reduce. Map is used to joins the data sets and convert into another datasets where data is broken. Reduce task is take output from Map task and combine the data into small tuples.

MapReduce Joins:

MapReduce joins used to joins the two datasets and this processing contains more number codes for joining. Joining datasets are based on size of the data. If one data is smaller than one and that small data are distributed to all data nodes. After distribution the small data perform matches from large datasets and combine the all records to form output records.

Types of Joins:

Mapside Join – Mapside join means joins made by mappers. In Mapside join performed before data consumed by map function that it is the input for all maps and it formed at partions and sorted order. This input data joins are sorted by join keys

Reduceside Join – Reduceside join means joins made by reducers. In this join dataset format not required. Mapreduce joins uses join key and tuples for data joins. Effect of this processing reduce side join uses same key which used in Mapjoin.

Mapreduce Counters:

Mapreduce counters means collecting the data about MapReduce Job which really helpful to solution of mapreduce job processing. Counters are related to logs in Map and Reduce

Types of Counters:

Hadoop Built-in Counters – There are five types of counters

1. Mapreduce Tast Counter – It collects specific information about task during its execution time

2. FileSystem Counters – Collects specific information for read and written task

3. FileInputFormat Counters – Collects specific information and read through FileInputFormat

4. FileOutputFormat Counters – Collects information and written through FileOutputFormat

5. Job Counters – It used to jobtracker frunctions for job processing

User Defined Counters – Used defined counters are used to every hadoop job for various metrics. It create the counters using programming functionalities.