Apache Sqoop Tutorial – Basic Sqoop Import and Export Operations

Sqoop Tutorial

What is Sqoop?

Sqoop is one type of tool which used to transfer data between RDBMS and HDFS. It is export and import data from datastores to HDFS. It uses a MapReduce for export the data for processing the large amount of data. Sqoop only works with relational databases and it is a open source tool written by Cloudra.

Main Functions of Sqoop:

  • Import one and selected tables.
  • Import Complete Hadoop Database
  • Filter out selected column and row from any table

WorkFlow of Sqoop:

Sqoop Import – It import separate table from RDBMS to HDFS and all rows of table is one record in sqoop which stored as textfile or sequence Files

Sqoop Export – It used to export file from HDFS to RDBMS and that file stored to record which is called rows.

Some Sqoop Import Operations:

1. General Syntax:

$ sqoop import (generic args) (import args)

$ sqoop-import (generic args) (import args)

2. How to import Table to HDFS

$ sqoop import –connect –table –username –password –target-dir

Connect – Give JDBC Connection

Table – Give name of Source tabe

Target Dir – Give import directory name

3. Importing Selected Data

$ sqoop import –connect –table –username –password –columns –where

columns – select subset columns

where – retrive data from where

Sqoop Export Operations:

1. General:

$ sqoop export (generic args) (export args)

$ sqoop-export (generic args) (export args)

2. Sqoop-Eval – used to run queries quickly

$ sqoop eval –connect –query “SQL query”

3. Sqoop List Database – List out all databases

$ sqoop list-databases –connect