Hadoop Pig Tutorial

What is Pig?

Pig is the one type of tool and it is used to analyze the large amount of data.Scripting Language of Pig is Pig Latin.Pig Performs Data Manipulations and it is similar to SQL.Pig converts all the tasks at Map and Reduce tasks and that tasks are run in Hadoop.

Components of Pig:

Parser:

All Pig Scripts are Handled by Parser.Parser checks syntax of Pig Latin scripts.Output of Parser is DAG(directed acyclic graph) which represents logical statements and operations.

Optimizer:

DAG passed scripts to optimizer and it carries logical operations

Compiler:

It compiles all the scripts and passed to mapreduce jobs

Execution Engine:

Finally all the scripts are passed to Execution engine and execute all scrips and provide results.

Basics of Pig Latin Scripts:

It is a scripting language which used to analyze the hadoop data.Pig Latin is the data model of pig

Pig Latin Concepts:

It Contains following concepts

1.Fields – All data are Fields in pig latin

2.Tuple – Collections of fields called tuples and it represented by “and” keyword

3.Bag – Collections of tuple is called bag .Bag also known as table.

Pig Latin Data Types:

1.int – 32 bit integer

2.long – 64 bit integer

3.float – 32 bit floating point

4.double – 64 bit floating point

5.chararray – It is character or string array and it represented by UTF-8 format

6.bytearray – It also known as blob.Byte array is the default array of pig lation.

Run Mode in Pig:

1.Local Mode:

Local mode executed at single JVM.Local mode are mainly used for prototyping and it run on olny local file system.

Command:

$ pig-x local

2.MapReduce Mode:

It also known as hadoop mode.It executes mapreduce jobs on cluster.

Command:

$ pig (or) $ pig-xmapreduce