What is Sqoop?
Sqoop is one type of tool which used to transfer data between RDBMS and HDFS. It is export and import data from datastores to HDFS. It uses a MapReduce for export the data for processing the large amount of data. Sqoop only works with relational databases and it is a open source tool written by Cloudra.
Main Functions of Sqoop:
- Import one and selected tables.
- Import Complete Hadoop Database
- Filter out selected column and row from any table
WorkFlow of Sqoop:
Sqoop Import – It import separate table from RDBMS to HDFS and all rows of table is one record in sqoop which stored as textfile or sequence Files
Sqoop Export – It used to export file from HDFS to RDBMS and that file stored to record which is called rows.
Some Sqoop Import Operations:
1. General Syntax:
$ sqoop import (generic args) (import args)
$ sqoop-import (generic args) (import args)
2. How to import Table to HDFS
$ sqoop import –connect –table –username –password –target-dir
Connect – Give JDBC Connection
Table – Give name of Source tabe
Target Dir – Give import directory name
3. Importing Selected Data
$ sqoop import –connect –table –username –password –columns –where
columns – select subset columns
where – retrive data from where
Sqoop Export Operations:
1. General:
$ sqoop export (generic args) (export args)
$ sqoop-export (generic args) (export args)
2. Sqoop-Eval – used to run queries quickly
$ sqoop eval –connect –query “SQL query”
3. Sqoop List Database – List out all databases
$ sqoop list-databases –connect