How to Install Hadoop on Ubuntu

What is Hadoop?

Hadoop is the open source and java based framework.It is used to storing lage amount amount of data and having more components to accessing the data.In Hadoop installation java is most important because hadoop is java based framework.Here we are discuss about how to install hadoop on Ubuntu operating system.

Hadoop Having following three main layers

1.HDFS – Used to stores the Large amount of data that stored file system are runs on Hadoop cluster machines.

2.MapReduce – Used to Processing the large amount of data set in the form of key /value pair.

3.Yarn – Responsible for managing resources in cluster and scheduling applications.

How to Install Hadoop on Ubuntu

Steps to Install Java:

Step 1: Click here to download Java

Hadoop Programming are written in java so java installation are most important to hadoop.

Step 2: Comment for install java

Comment:

$ sudo apt-get update

$ sudo apt-get install openjdk-8-jre

$ sudo apt-get install openjdk-8-jdk

$ java -version

Now java installed on JAVA_HOME and variable available in bashrc file

Step 3: How to know where is java installed

Comment:

$ ls -l /etc/alternatives/javac

lrwxrwxrwx 1 root root 36 Nov 14 23:15 /etc/alternatives/javac -> /usr/lib/jvm/java-8-oracle/bin/javac

Step 4: Install SSH

  • Main purposes of ssh is make a communication between hadoop components.
  • If connect hadoop to the main host without password ,we are using ssh for this process

Comment:

$ sudo apt-get install ssh

$ ssh-keygen -t rsa -P “”

$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

$ ssh localhost

Steps to Install Hadoop:

Step 1: Click Here to download Hadoop

Step 2: Extract the Hadoop download folder

$ wget http://www-us.apache.org/dist/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz

$ tar xvzf hadoop-2.7.3.tar.gz

$ sudo mkdir -p /usr/local/hadoop

$ cd hadoop-2.7.3/

$ sudo mv * /usr/local/hadoop

$ sudo chown -R hduser:hadoop /usr/local/hadoop

Following six files are most important to install the Hadoop

1..bashrc file

2.hadoop-env.sh file

3.core-site.xml file

4.mapred-site.xml file

5.hdfs-site.xml file

6.yarn-site.xml file

Step 3: configure .bashrc file

Comment:

export HADOOP_HOME=/usr/local/hadoop-2.7.3
export PATH=$PATH:$HADOOP_HOME/bin

export PATH=$PATH:$HADOOP_HOME/sbin

export HADOOP_MAPRED_HOME=$HADOOP_HOME

export HADOOP_COMMON_HOME=$HADOOP_HOME

export HADOOP_HDFS_HOME=$HADOOP_HOME

export YARN_HOME=$HADOOP_HOME

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

export HADOOP_OPTS=”-Djava.library.path=$HADOOP_HOME/lib”

Comment for update the .bashrc file

$ source ~/.bashrc

Step 4: Configure hadoop-env.sh file(used to export and set path for java)

Comment:

$ vim /usr/local/hadoop-2.7.3/etc/hadoop/hadoop-env.sh (path of java)

export JAVA_HOME=/usr/lib/jvm/java-8-oracle (export)

Step 5: Modify core-site.xml file

Comment:

$ vim /usr/local/hadoop-2.7.3/etc/hadoop/core-site.xml

Configure the core-site.xml file:

<configuration>

<property>

<name>fs.defaultFS</name>

<value>hdfs://localhost:9000</value>

</property>

</configuration>

Step 6: hdfs-site.xml file

If you need configure the hdfs-site.xml file which has two node

1.Namenode

2.Datanode

These can be done using the following commands:

$ sudo mkdir -p /usr/local/hadoop_store/hdfs/namenode

$ sudo mkdir -p /usr/local/hadoop_store/hdfs/datanode

$ sudo chown -R hduser:hadoop /usr/local/hadoop_store

Step 7: Modify the hdfs-site.xml file

Comment:

$ vim /usr/local/hadoop/etc/hadoop/hdfs-site.xml

Configure the hdfs-site.xml file

<configuration>

<property>

<name>dfs.replication</name>

<value>1</value>

</property>

</configuration>

Step 8: Modify mapred-site.xml file

Comment:

$ vim /usr/local/hadoop-2.7.3/etc/hadoop/mapred-site.xml

Configure the mapred-site.xml file

<configuration>

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

</configuration>

Step 9: Modify yarn-site.xml file

Comment:

$ vim /usr/local/hadoop-2.7.3/etc/hadoop/yarn-site.xml

Configure the yarn-site.xml file

<configuration>

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

</configuration>

Step 10: How to start hadoop:

Comment:

$ start-all.sh

Step 8: How to stop hadoop:

Comment:

$ stop-all.sh