Sqoop: Transfer data from RDBMS to HDFS (Big Data Series) (Part-1)

Datetime:2017-04-05 05:14:58         Topic: HDFS  MySQL          Share        Original >>
Here to See The Original Article!!!

In this post, I will be talking about Sqoop, a simple yet powerful tool that helps in transferring bulk data between Hadoop based databases and traditional RDBMS. Sqoop is a top level apache project and as of writing this tutorial the current version is 1.4.6.

For hands-on exercise, you’ll be requiring a Cloudera or Hortonworks based instance on your desktop if you have a good configuration. You can also setup a Hortonworks HDP trial on Azure for free ($200 in credits for 1 month free trial).

I will be talking about the following in this tutorial:

  • Basic Syntaxes
  • Show tables in a MySQL database
  • Importing database/tables from a MySQL database
  • Import tables as text file, sequence file, and avrodata file format
  • Import tables using a “where” clause to import certain rows
  • Export table from HDFS to MySQL

Step 0: Getting Started

The documentation is good and you should definitely consider going through it once.

Let’s check out the mysql and the databases and some tables. The username and password are “training”. We will see databases and the tables.

mysql -u training -p 
Password: training
mysql> showdatabases;
mysql> use training;
mysql> showtables;
+--------------------+
| Tables_in_training |
+--------------------+
| Movies            |
| bible_freq        |
| cityByCountry      |
| countries          |
| shake_freq        |
+--------------------+
5 rowsin set (0.00 sec)
 
mysql> selectcount(*) fromMovies;
mysql> select * fromMovieslimit 1;
mysql> select * fromshake_freqlimit 1;
+-------+------+
| freq  | word |
+-------+------+
| 25848 | the  |
+-------+------+
1 rowin set (0.02 sec)
 
mysql> descshake_freq;
+-------+--------------+------+-----+---------+-------+
| Field | Type        | Null | Key | Default | Extra |
+-------+--------------+------+-----+---------+-------+
| freq  | int(11)      | YES  |    | NULL    |      |
| word  | varchar(256) | NO  | PRI | NULL    |      |
+-------+--------------+------+-----+---------+-------+
2 rowsin set (0.00 sec)

Finding the machine name

[training@localhost ~]$ hostname -f
localhost.localdomain

Checking out the contents inside HDFS: To check the contents inside a HDFS folder, we have to to use hadoop fs command followed by different sub-commands that operate based on the requirement. Here -ls command shows the folders inside the root directory of HDFS which can be accessed using “/”. If you miss this and simply give a space, it will return nothing. Also, we can differentiate between a folder and a file by looking at “d” at the starting of each line of the output.

[training@localhost ~]$ hadoopfs -ls /
Found 3 items
drwxrwxrwt  - hdfssupergroup          0 2013-09-10 10:35 /tmp
drwxrwxrwx  - hue  supergroup          0 2013-09-10 10:37 /user
drwxr-xr-x  - hdfssupergroup          0 2013-09-05 20:08 /var
[training@localhost ~]$ hadoopfs -ls /user
Found 3 items
drwxr-xr-x  - hue      supergroup          0 2013-09-05 20:08 /user/hive
drwxr-xr-x  - hue      hue                0 2013-09-10 10:37 /user/hue
drwxr-xr-x  - trainingsupergroup          0 2017-01-20 13:10 /user/training
[training@localhost ~]$ hadoopfs -ls /user/training
Found 1 items
-rw-r--r--  1 trainingsupergroup  211312924 2017-01-20 13:10 /user/training/purchases.txt
[training@localhost ~]$ 

Step 1: Check the Sqoop version

[training@localhost ~]$ sqoopversion
Sqoop 1.4.2-cdh4.2.1
gitcommitida0862dc56393e3096c65d6ca26766c67f5ab5309
CompiledbyjenkinsonMonApr 22 11:36:27 PDT 2013

Step 2: See all the commands that come with Sqoop

Majority of them are self explanatory and important ones are import, import-all-tables, export, and eval.

[training@localhost ~]$ sqoophelp
usage: sqoopCOMMAND [ARGS]
 
Availablecommands:
  codegen            Generatecodeto interactwithdatabaserecords
  create-hive-table  Import a tabledefinitionintoHive
  eval              Evaluate a SQLstatementand displaytheresults
  export            ExportanHDFSdirectoryto a databasetable
  help              Listavailablecommands
  import            Import a tablefrom a databaseto HDFS
  import-all-tables  Importtablesfrom a databaseto HDFS
  job                Workwithsavedjobs
  list-databases    Listavailabledatabaseson a server
  list-tables        Listavailabletablesin a database
  merge              Mergeresultsofincrementalimports
  metastore          Run a standaloneSqoopmetastore
  version            Displayversioninformation
 
See 'sqoop help COMMAND' for informationon a specificcommand.

Step 3: Sqoop list-database command

For the purpose, I’ve used a Cloudera 4.2.1 training VM on my local machine. You can download and install the VM and follow the code below to run on your own instance. I will be connecting to MySQL to see the databases that are in the system. Before proceeding you need to know the following:

  • machine name – this can be found using “hostname” on a linux based terminal
  • mysql always runs on 3306 port. Confirm this before running.
  • username and password of the mysql. This might change depending on the use case.

As you can see this lists the mysql databases and those are hue, metastore, mysql, test, and training. This can be done by the list-databases command as seen below. All it needs is the following:

  • connection string
  • username and password of the mysql
[training@localhost ~]$ sqooplist-databases \
> --connect "jdbc:mysql://localhost.localdomain:3306" \
> --usernametraining \
> --passwordtraining
17/01/30 21:54:39 WARNtool.BaseSqoopTool: Settingyourpasswordonthecommand-lineis insecure. Considerusing -P instead.
17/01/30 21:54:39 INFOmanager.MySQLManager: Preparingto use a MySQLstreamingresultset.
information_schema
hue
metastore
mysql
test
training

Step 4: list-tables command

The next command lists all the tables in a databases without actually logging into the database. This can be done using the list-tables command. This requires the following:

  • connection string
  • username and password of mysql
  • database name for which tables need to be fetched

As you can see the tables for the training database are Movies, bible_freq, cityByCountry, countries, shake_freq.

[training@localhost ~]$ sqooplist-tables --connect "jdbc:mysql://localhost.localdomain:3306/training"
 --usernametraining --passwordtraining
17/01/30 21:55:46 WARNtool.BaseSqoopTool: Settingyourpasswordonthecommand-lineis insecure. Considerusing -P instead.
17/01/30 21:55:46 INFOmanager.MySQLManager: Preparingto use a MySQLstreamingresultset.
Movies
bible_freq
cityByCountry
countries
shake_freq

Step 4: eval command

eval command lets us evaluate a query using sqoop without actually logging to mysql. This requires the following:

  • connection string
  • username and password of mysql
  • query that should be executed on a particular table
  • database name that contains the table on which the query is executed

As can be seen that we are fetching the first 10 records from the countries table, the output of which can be seen below.

[training@localhost ~]$ sqoopeval --connect "jdbc:mysql://localhost.localdomain:3306/training" 
--usernametraining --passwordtraining --query "select * from countries limit 10;"
17/01/31 01:33:43 WARNtool.BaseSqoopTool: Settingyourpasswordonthecommand-lineis insecure. 
Considerusing -P instead.
17/01/31 01:33:43 INFOmanager.MySQLManager: Preparingto use a MySQLstreamingresultset.
-------------------------------------------
| id          | name                | code | 
-------------------------------------------
| 1          | AFGHANISTAN          | AF | 
| 2          | ALBANIA              | AL | 
| 3          | ALGERIA              | DZ | 
| 4          | AMERICANSAMOA      | AS | 
| 5          | ANDORRA              | AD | 
| 6          | ANGOLA              | AO | 
| 7          | ANGUILLA            | AI | 
| 8          | ANTARCTICA          | AQ | 
| 9          | ANTIGUAAND BARBUDA  | AG | 
| 10          | ARGENTINA            | AR | 
-------------------------------------------

Step 5: Import-all-tables command

import-all-tables commands, as the name suggests, imports all the tables in a given database. This takes the following parameters

  • connection string
  • username and password of mysql
  • target-dir – this is the location to which the files should be imported. This directory should not exist on the HDFS else the job will fail. Sqoop will create the directory while importing.
  • num-mappers (or n) – this is the total numbers of threads that should work to import the data. By default, the value is 4 and based on your data size this could be increased

Note that this command will take a good amount of time, depending on your data import size and the number of mappers that are being set. I have not pasted the output purposefully as it was too long.

[training@localhost ~]$ sqoopimport-all-tables --connect \ 
"jdbc:mysql://localhost.localdomain:3306/training" --usernametraining 
--passwordtraining --target-dir /user/training/sqoop_import --num-mappers 6

Step 6: Import Specific table

Oftentimes, the requirement is of few specific tables and not all. This would be achieved by using the import statement. This requires the following parameters:

  • connection string
  • username and password
  • table to be imported – should exist in the mysql database with the same name
  • database name form which the the table should be imported
  • target-dir – the location to which the table will be imported on HDFS
  • num-mappers
[training@localhost ~]$ sqoopimport --connect "jdbc:mysql://localhost.localdomain:3306/training"
 --usernametraining --passwordtraining --tableshake_freq 
--target-dir /user/training/sqoop_shake_freq_import --num-mappers 6
17/01/30 21:28:26 WARNtool.BaseSqoopTool: Settingyourpasswordonthecommand-lineis insecure. Considerusing -P instead.
17/01/30 21:28:26 INFOmanager.MySQLManager: Preparingto use a MySQLstreamingresultset.
17/01/30 21:28:26 INFOtool.CodeGenTool: Beginningcodegeneration
17/01/30 21:28:27 INFOmanager.SqlManager: ExecutingSQLstatement: SELECT t.* FROM `shake_freq` AS t LIMIT 1
17/01/30 21:28:27 INFOmanager.SqlManager: ExecutingSQLstatement: SELECT t.* FROM `shake_freq` AS t LIMIT 1
17/01/30 21:28:27 INFOorm.CompilationManager: HADOOP_HOMEis /usr/lib/hadoop
Note: /tmp/sqoop-training/compile/bcbf38024ca522d9874110e2b55cdd49/shake_freq.javausesor overrides a deprecatedAPI.
Note: Recompilewith -Xlint:deprecationfor details.
17/01/30 21:28:37 INFOorm.CompilationManager: Writingjarfile: /tmp/sqoop-training/compile/bcbf38024ca522d9874110e2b55cdd49/shake_freq.jar
17/01/30 21:28:37 WARNmanager.MySQLManager: Itlookslikeyouareimportingfrommysql.
17/01/30 21:28:37 WARNmanager.MySQLManager: This transfercanbefaster! Use the --direct
17/01/30 21:28:37 WARNmanager.MySQLManager: optionto exercise a MySQL-specificfastpath.
17/01/30 21:28:37 INFOmanager.MySQLManager: SettingzeroDATETIMEbehaviorto convertToNull (mysql)
17/01/30 21:28:37 INFOmapreduce.ImportJobBase: Beginningimportofshake_freq
17/01/30 21:28:41 WARNmapred.JobClient: Use GenericOptionsParserfor parsingthearguments. ApplicationsshouldimplementToolfor thesame.
17/01/30 21:28:44 INFOdb.DataDrivenDBInputFormat: BoundingValsQuery: SELECTMIN(`word`), MAX(`word`) FROM `shake_freq`
17/01/30 21:28:44 WARNdb.TextSplitter: Generatingsplitsfor a textualindexcolumn.
17/01/30 21:28:44 WARNdb.TextSplitter: If yourdatabasesortsin a case-insensitiveorder, this mayresultin a partialimportor duplicaterecords.
17/01/30 21:28:44 WARNdb.TextSplitter: Youarestronglyencouragedto chooseanintegralsplitcolumn.
17/01/30 21:28:45 INFOmapred.JobClient: Runningjob: job_201701302054_0001
17/01/30 21:28:46 INFOmapred.JobClient:  map 0% reduce 0%
17/01/30 21:29:15 INFOmapred.JobClient:  map 12% reduce 0%
17/01/30 21:29:16 INFOmapred.JobClient:  map 25% reduce 0%
17/01/30 21:29:31 INFOmapred.JobClient:  map 50% reduce 0%
17/01/30 21:29:46 INFOmapred.JobClient:  map 62% reduce 0%
17/01/30 21:29:47 INFOmapred.JobClient:  map 75% reduce 0%
17/01/30 21:30:06 INFOmapred.JobClient:  map 87% reduce 0%
17/01/30 21:30:08 INFOmapred.JobClient:  map 100% reduce 0%
17/01/30 21:30:15 INFOmapred.JobClient: Jobcomplete: job_201701302054_0001
17/01/30 21:30:15 INFOmapred.JobClient: Counters: 23
17/01/30 21:30:15 INFOmapred.JobClient:  FileSystemCounters
17/01/30 21:30:15 INFOmapred.JobClient:    FILE: Numberofbytesread=0
17/01/30 21:30:15 INFOmapred.JobClient:    FILE: Numberofbyteswritten=1593792
17/01/30 21:30:15 INFOmapred.JobClient:    FILE: Numberofreadoperations=0
17/01/30 21:30:15 INFOmapred.JobClient:    FILE: Numberoflargereadoperations=0
17/01/30 21:30:15 INFOmapred.JobClient:    FILE: Numberofwriteoperations=0
17/01/30 21:30:15 INFOmapred.JobClient:    HDFS: Numberofbytesread=1075
17/01/30 21:30:15 INFOmapred.JobClient:    HDFS: Numberofbyteswritten=324840
17/01/30 21:30:15 INFOmapred.JobClient:    HDFS: Numberofreadoperations=8
17/01/30 21:30:15 INFOmapred.JobClient:    HDFS: Numberoflargereadoperations=0
17/01/30 21:30:15 INFOmapred.JobClient:    HDFS: Numberofwriteoperations=8
17/01/30 21:30:15 INFOmapred.JobClient:  JobCounters 
17/01/30 21:30:15 INFOmapred.JobClient:    Launchedmaptasks=8
17/01/30 21:30:15 INFOmapred.JobClient:    Totaltimespentbyallmapsin occupiedslots (ms)=152210
17/01/30 21:30:15 INFOmapred.JobClient:    Totaltimespentbyallreducesin occupiedslots (ms)=0
17/01/30 21:30:15 INFOmapred.JobClient:    Totaltimespentbyallmapswaitingafterreservingslots (ms)=0
17/01/30 21:30:15 INFOmapred.JobClient:    Totaltimespentbyallreduceswaitingafterreservingslots (ms)=0
17/01/30 21:30:15 INFOmapred.JobClient:  Map-ReduceFramework
17/01/30 21:30:15 INFOmapred.JobClient:    Mapinputrecords=31809
17/01/30 21:30:15 INFOmapred.JobClient:    Mapoutputrecords=31809
17/01/30 21:30:15 INFOmapred.JobClient:    Inputsplitbytes=1075
17/01/30 21:30:15 INFOmapred.JobClient:    SpilledRecords=0
17/01/30 21:30:15 INFOmapred.JobClient:    CPUtimespent (ms)=17950
17/01/30 21:30:15 INFOmapred.JobClient:    Physicalmemory (bytes) snapshot=396861440
17/01/30 21:30:15 INFOmapred.JobClient:    Virtualmemory (bytes) snapshot=3104276480
17/01/30 21:30:15 INFOmapred.JobClient:    Totalcommittedheapusage (bytes)=130023424
17/01/30 21:30:15 INFOmapreduce.ImportJobBase: Transferred 0 bytesin 97.074 seconds (0 bytes/sec)
17/01/30 21:30:15 INFOmapreduce.ImportJobBase: Retrieved 31809 records.

After the import command ran, let’s check the output and confirm. First, let’s see the table entries in MySQL.

Databasechanged
mysql> select * fromshake_freqlimit 5;
+-------+------+
| freq  | word |
+-------+------+
| 25848 | the  |
| 23031 | I    |
| 19671 | and  |
| 18038 | to  |
| 16700 | of  |
+-------+------+
5 rowsin set (0.00 sec)

Let’s confirm what was imported

[training@localhost ~]$ hadoopfs -ls /user/training/sqoop_seqfile_import/Found 6 items
-rw-r--r--  1 trainingsupergroup          0 2017-01-31 12:38 /user/training/sqoop_seqfile_import/_SUCCESS
drwxr-xr-x  - trainingsupergroup          0 2017-01-31 12:36 /user/training/sqoop_seqfile_import/_logs
-rw-r--r--  1 trainingsupergroup      92252 2017-01-31 12:37 /user/training/sqoop_seqfile_import/part-m-00000
-rw-r--r--  1 trainingsupergroup      92234 2017-01-31 12:37 /user/training/sqoop_seqfile_import/part-m-00001
-rw-r--r--  1 trainingsupergroup      91443 2017-01-31 12:38 /user/training/sqoop_seqfile_import/part-m-00002
-rw-r--r--  1 trainingsupergroup      91333 2017-01-31 12:38 /user/training/sqoop_seqfile_import/part-m-00003

We can look into the specific data using the hadoop fs -cat command and providing the appropriate path.

[training@localhost ~]$ hadoopfs -cat /user/training/sqoop_shake_freq_import/part-m-00000 
| head -n 5 49,1 1,10 1,11th 1,12th 1,1s 
[training@localhost ~]
 
$ hadoopfs -cat /user/training/sqoop_shake_freq_import/part-m-00001 
| head -n 5 2027,A 72,AARON 1,ABATE 1,ABATEMENT 9,ABERGAVENNY
cat: Unableto writeto outputstream. [training@localhost ~]$ hadoopfs -cat /user/training/sqoop_shake_freq_import/part-m-00002 | head -n 5 89,IACHIMO 299,IAGO 1,ICE 14,IDEN 1,IGNOMYcat: Unableto writeto outputstream. [training@localhost ~]$ hadoopfs -cat /user/training/sqoop_shake_freq_import/part-m-00003 | head -n 5 94,ULYSSES 1,UMBERED 1,UNANELED 1,UNAVOIDED 1,UNBARBEDcat: Unableto writeto outputstream.

Step 7: Import table as text file, sequence file, avrodata file

Sqoop provides the import to be of different formats as is described on the documentation page. Here we are importing a table as a text-file. Each of these file formats have their own advantages and dis-advantages on speed and size. This requires the following parameters:

  • connection string
  • username and password
  • table name to be imported
  • target-dir
  • database name
  • as-textfile – which gives command to import as text file
[training@localhost ~]$ sqoopimport \
> --connect "jdbc:mysql://localhost.localdomain:3306/training" \
> --usernametraining \
> --passwordtraining \
> --tableMovies \
> --as-textfile \
> --target-dir /user/training/sqoop_file_import

Let’s take a look at what was imported.

[training@localhost ~]$ hadoopfs -ls /user/training/sqoop_file_import/
Found 6 items
-rw-r--r--  1 trainingsupergroup          0 2017-01-31 12:32 /user/training/sqoop_file_import/_SUCCESS
drwxr-xr-x  - trainingsupergroup          0 2017-01-31 12:29 /user/training/sqoop_file_import/_logs
-rw-r--r--  1 trainingsupergroup      58800 2017-01-31 12:31 /user/training/sqoop_file_import/part-m-00000
-rw-r--r--  1 trainingsupergroup      58986 2017-01-31 12:31 /user/training/sqoop_file_import/part-m-00001
-rw-r--r--  1 trainingsupergroup      58458 2017-01-31 12:32 /user/training/sqoop_file_import/part-m-00002
-rw-r--r--  1 trainingsupergroup      58430 2017-01-31 12:32 /user/training/sqoop_file_import/part-m-00003

Let’s import as sequence file, only one parameter changes. Guess which one?

[training@localhost ~]$ sqoopimport --connect "jdbc:mysql://localhost.localdomain:3306/training" 
--usernametraining --passwordtraining --tableMovies --as-sequencefile 
--target-dir /user/training/sqoop_seqfile_import

Let’s import as avrodata file, only one parameter changes. Guess which one?

[training@localhost ~]$ sqoopimport --connect "jdbc:mysql://localhost.localdomain:3306/training" 
--usernametraining --passwordtraining --tableMovies --as-avrodatafile 
--target-dir /user/training/sqoop_avrofile_import
[training@localhost ~]$ hadoopfs -ls /user/training/sqoop_avrofile_import/Found 6 items
-rw-r--r--  1 trainingsupergroup          0 2017-01-31 12:43 /user/training/sqoop_avrofile_import/_SUCCESS
drwxr-xr-x  - trainingsupergroup          0 2017-01-31 12:41 /user/training/sqoop_avrofile_import/_logs
-rw-r--r--  1 trainingsupergroup      61900 2017-01-31 12:42 /user/training/sqoop_avrofile_import/part-m-00000.avro
-rw-r--r--  1 trainingsupergroup      62044 2017-01-31 12:42 /user/training/sqoop_avrofile_import/part-m-00001.avro
-rw-r--r--  1 trainingsupergroup      61222 2017-01-31 12:42 /user/training/sqoop_avrofile_import/part-m-00002.avro
-rw-r--r--  1 trainingsupergroup      61021 2017-01-31 12:42 /user/training/sqoop_avrofile_import/part-m-00003.avro

Taking a look at the documentation.

Step 8: Import table using a where clause

Sqoop import also can use where clause to import specific part of table that fulfills a criteria. For example, the command below imports the countries table that have A in the name.

[training@localhost ~]$ sqoopimport --connect "jdbc:mysql://localhost.localdomain:3306/training" 
--usernametraining --passwordtraining --where "name like '%A%'" -tablecountries -m 6 
--target-dir /user/training/sqoop_countries
17/01/31 13:35:35 WARNtool.BaseSqoopTool: Settingyourpasswordonthecommand-lineis insecure. Considerusing -P instead.
17/01/31 13:35:36 INFOmanager.MySQLManager: Preparingto use a MySQLstreamingresultset.
17/01/31 13:35:36 INFOtool.CodeGenTool: Beginningcodegeneration
17/01/31 13:35:37 INFOmanager.SqlManager: ExecutingSQLstatement: SELECT t.* FROM `countries` AS t LIMIT 1
17/01/31 13:35:37 INFOmanager.SqlManager: ExecutingSQLstatement: SELECT t.* FROM `countries` AS t LIMIT 1
17/01/31 13:35:37 INFOorm.CompilationManager: HADOOP_MAPRED_HOMEis /usr/lib/hadoop-0.20-mapreduce
17/01/31 13:35:37 INFOorm.CompilationManager: Foundhadoopcorejarat: /usr/lib/hadoop-0.20-mapreduce/hadoop-core.jar
Note: /tmp/sqoop-training/compile/f7fb45e4e1b334b9296486aca1a1522c/countries.javausesor overrides a deprecatedAPI.
Note: Recompilewith -Xlint:deprecationfor details.
17/01/31 13:35:52 INFOorm.CompilationManager: Writingjarfile: /tmp/sqoop-training/compile/f7fb45e4e1b334b9296486aca1a1522c/countries.jar
17/01/31 13:35:52 WARNmanager.MySQLManager: Itlookslikeyouareimportingfrommysql.
17/01/31 13:35:52 WARNmanager.MySQLManager: This transfercanbefaster! Use the --direct
17/01/31 13:35:52 WARNmanager.MySQLManager: optionto exercise a MySQL-specificfastpath.
17/01/31 13:35:52 INFOmanager.MySQLManager: SettingzeroDATETIMEbehaviorto convertToNull (mysql)
17/01/31 13:35:52 INFOmapreduce.ImportJobBase: Beginningimportofcountries
17/01/31 13:36:01 WARNmapred.JobClient: Use GenericOptionsParserfor parsingthearguments. ApplicationsshouldimplementToolfor thesame.
17/01/31 13:36:06 INFOdb.DataDrivenDBInputFormat: BoundingValsQuery: SELECTMIN(`id`), MAX(`id`) FROM `countries` WHERE ( namelike '%A%' )
17/01/31 13:36:07 INFOmapred.JobClient: Runningjob: job_201701311056_0004
17/01/31 13:36:08 INFOmapred.JobClient:  map 0% reduce 0%
17/01/31 13:37:20 INFOmapred.JobClient:  map 16% reduce 0%
17/01/31 13:37:29 INFOmapred.JobClient:  map 33% reduce 0%
17/01/31 13:38:23 INFOmapred.JobClient:  map 50% reduce 0%
17/01/31 13:38:24 INFOmapred.JobClient:  map 66% reduce 0%
17/01/31 13:39:17 INFOmapred.JobClient:  map 83% reduce 0%
17/01/31 13:39:18 INFOmapred.JobClient:  map 100% reduce 0%
17/01/31 13:39:39 INFOmapred.JobClient: Jobcomplete: job_201701311056_0004
17/01/31 13:39:40 INFOmapred.JobClient: Counters: 23
17/01/31 13:39:40 INFOmapred.JobClient:  FileSystemCounters
17/01/31 13:39:40 INFOmapred.JobClient:    FILE: Numberofbytesread=0
17/01/31 13:39:40 INFOmapred.JobClient:    FILE: Numberofbyteswritten=1261446
17/01/31 13:39:40 INFOmapred.JobClient:    FILE: Numberofreadoperations=0
17/01/31 13:39:40 INFOmapred.JobClient:    FILE: Numberoflargereadoperations=0
17/01/31 13:39:40 INFOmapred.JobClient:    FILE: Numberofwriteoperations=0
17/01/31 13:39:40 INFOmapred.JobClient:    HDFS: Numberofbytesread=608
17/01/31 13:39:40 INFOmapred.JobClient:    HDFS: Numberofbyteswritten=4028
17/01/31 13:39:40 INFOmapred.JobClient:    HDFS: Numberofreadoperations=6
17/01/31 13:39:40 INFOmapred.JobClient:    HDFS: Numberoflargereadoperations=0
17/01/31 13:39:40 INFOmapred.JobClient:    HDFS: Numberofwriteoperations=6
17/01/31 13:39:40 INFOmapred.JobClient:  JobCounters 
17/01/31 13:39:40 INFOmapred.JobClient:    Launchedmaptasks=6
17/01/31 13:39:40 INFOmapred.JobClient:    Totaltimespentbyallmapsin occupiedslots (ms)=374206
17/01/31 13:39:40 INFOmapred.JobClient:    Totaltimespentbyallreducesin occupiedslots (ms)=0
17/01/31 13:39:40 INFOmapred.JobClient:    Totaltimespentbyallmapswaitingafterreservingslots (ms)=0
17/01/31 13:39:40 INFOmapred.JobClient:    Totaltimespentbyallreduceswaitingafterreservingslots (ms)=0
17/01/31 13:39:40 INFOmapred.JobClient:  Map-ReduceFramework
17/01/31 13:39:40 INFOmapred.JobClient:    Mapinputrecords=212
17/01/31 13:39:40 INFOmapred.JobClient:    Mapoutputrecords=212
17/01/31 13:39:40 INFOmapred.JobClient:    Inputsplitbytes=608
17/01/31 13:39:40 INFOmapred.JobClient:    SpilledRecords=0
17/01/31 13:39:40 INFOmapred.JobClient:    CPUtimespent (ms)=26160
17/01/31 13:39:40 INFOmapred.JobClient:    Physicalmemory (bytes) snapshot=560975872
17/01/31 13:39:40 INFOmapred.JobClient:    Virtualmemory (bytes) snapshot=4344078336
17/01/31 13:39:40 INFOmapred.JobClient:    Totalcommittedheapusage (bytes)=95158272
17/01/31 13:39:40 INFOmapreduce.ImportJobBase: Transferred 3.9336 KBin 223.776 seconds (18.0001 bytes/sec)
17/01/31 13:39:40 INFOmapreduce.ImportJobBase: Retrieved 212 records.

As we can see, it retrieved 212 records. Let’s see the countries table in mysql which has 249 rows.

mysql> selectcount(*) fromcountrieslimit 1;
+----------+
| count(*) |
+----------+
|      249 |
+----------+
1 rowin set (0.02 sec)

Step 9: Export

Sqoop also gives an option to export data from HDFS to RDBMS. This can be used just like the import clause. Make sure that the database and the table exist on mysql to which the data has to be exported otherwise it will result in an error.

sqoopexport --connect "jdbc:mysql://localhost.localdomain:3306/training_new" \ 
--usernametraining --passwordtraining \ --tablecountries \ 
--export-dir /user/hive/warehouse/sqoop_countries \ 
--input-fields-terminated-by ',' \ --input-lines-terminated-by '\n' \ 
--num-mappers 6 \ --batch

For this tutorial, I will stopping at this and would cover in more details in another tutorial. Till then, Happy-Sqooping!








New

Put your ads here, just $200 per month.