Apache Hadoop Oozie Tutorial

Introduction:

Oozie is mainly used to manages the hadoop jobs in HDFS and it combines the multiple jobs in particular order to achieve the big task. It is the open source framework and used to make multiple hadoop jobs. Oozie supports the jobs in mapreduce,hive and hdfs also. In Oozie job workflow based on Directed Acylic Graph and it contains two nodes for managing the jobs that nodes are action and control flow nodes.

Apache Hadoop Oozie Tutorial

Advantages of Oozie is it integrate with hadoop stack and also support mapreduce and hdfs jobs. Oozie contains following three types of jobs

1. Workflow jobs – It used to represents the sequence of jobs executed.

2. Coordinator Jobs – It contains workflow jobs and it triggered by time

3. Bundle Jobs – It contains the workflow and coordinator jobs

Types of Nodes in Apache Oozie:

Action Node – It represents the workflow jobs and jobs program are written in java

Control Flow Node – It used to controls the workflow jobs between actions

Start Node – It used to starts the jobs execution

End Node – It used to stops the jobs execution

Error Node – If any error occurs while execution of job error node prints the error message

Oozie Installation Steps:

Hadoop location – /home/hduser/hadoop

Step 1: Home directory Commands

$ pwd

/home/hduser

Step 2: Download Oozie

$ wget http://supergsego.com/apache/oozie/3.3.2/oozie-3.3.2.tar.gz

Step 3: Untar

$ tar xvzf oozie-3.3.2.tar.gz

Step 4: Build Oozie

$ cd oozie-3.3.2/bin

$ ./mkdistro.sh -DskipTests

Step 5: Oozie Server Setup

1. Copy the built binaries

$ cd ../../

$ cp -R oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/ oozie

2. Create Libext Directory

$ cd oozie

$ mkdir libext

3. Copy all jar Commands

$ cp ../oozie-3.3.2/hadooplibs/target/oozie-3.3.2-hadooplibs.tar.gz .

$ tar xzvf oozie-3.3.2-hadooplibs.tar.gz

$ cp oozie-3.3.2/hadooplibs/hadooplib-1.1.1.oozie-3.3.2/* libext/

4. Update the Hadoop Files

<property>

<name>hadoop.proxyuser.hduser.hosts</name>

<value>localhost</value>

</property>

<property>

<name>hadoop.proxyuser.hduser.groups</name>

<value>hadoop</value>

</property>

Step 6: Creat Hadoop WAR file

$ ./bin/oozie-setup.sh prepare-war
setting CATALINA_OPTS=”$CATALINA_OPTS -Xmx1024m”

New Oozie WAR file with added ‘ExtJS library, JARs’ at /home/hduser/oozie/oozie-server/webapps/oozie.war

Step 7: Create Share library

$ ./bin/oozie-setup.sh sharelib create -fs hdfs://localhost:54310
setting CATALINA_OPTS=”$CATALINA_OPTS -Xmx1024m”

Step 8: Create Oozie DB

$ ./bin/ooziedb.sh create -sqlfile oozie.sql -run

setting CATALINA_OPTS=”$CATALINA_OPTS -Xmx1024m”

Validate DB Connection

DONE

Check DB schema does not exist

DONE

Check OOZIE_SYS table does not exist

DONE

Create SQL schema

DONE

Create OOZIE_SYS table

DONE

Step 9: Start a Oozie

$ ./bin/oozied.sh start

Step 10: Start Oozie at foreground

$ ./bin/oozied.sh run

Step 11: Check the Oozie Status

$ ./bin/oozie admin -oozie http://localhost:11000/oozie -status

System mode: NORMAL

Step 12: Setup the Oozie Client

$ cd ..

$ cp oozie/oozie-client-3.3.2.tar.gz .

$ tar xvzf oozie-client-3.3.2.tar.gz

$ mv oozie-client-3.3.2 oozie-client

$ cd bin