How to Import Data From MySql to Hadoop Using Sqoop

Sqoop is the basic data transfer tool and used to import/export data from Relational Database into hadoop. Sqoop is able to import Teradata and Other JDBC Databases. For Hadoop integration sqoop installation is most important so first install the sqoop on Hadoop.

How to Import Data From MySql to Hadoop Using Sqoop

Before accessing the MySql you have to make two changes in MySql db

1. Enable remote access for database:

Step 1: Open the file

vim /etc/mysql/my.cnf

Step 2: Change Bind address localhost to IP address

bind-address = <IP Address>

Step 3: Restart MySql

/etc/init.d/mysqld restart

2. Create user for all nodes:

Step 1: Connect with root password

mysql -uroot -p<root_pass>

Step 2: Create users

create user  ‘hadoop’@'<ip of master>’ IDENTIFIED BY ‘hadoop’;

(First ‘hadoop’ is username and second ‘hadoop’ is password)

Sqoop Installation:

Step 1: Sqoop install from Apche sqoop Link – http://www.apache.org/dyn/closer.cgi/sqoop/

Step 2: Set the Hadoop_Home

Step 3: Start the Sqoop

$SQOOP_HOME/bin/sqoop

Sqoop Import:

Sqoop import requires connection string (username and password) for accessing the mysql data. In Sqoop import command line are used to accessing the database value.

Step 1: Connect Database

connect jdbc:mysql://<ip address of mysql server> :<port of mysql>/<database_name>

Step 2: Import Data using Database username and password

–username hadoop –password hadoop

Step 3: Import MySql Table

sqoop import-all-tables –connect jdbc:mysql:

Table name is which table is imported. In sqoop you can access MySql queries also.

Step 4: Split the table data

–split-by <field to split the data> -m 8

Split by option is used to import the data in parallel. Split by is splits the data for all mapper and in this code m8 is the number of mappers. Sqoop calculate the MIN and MAX value field and create the mappers for this values.

Above process are runs correctly you can integrate the database successfully..