카테고리 없음

Sqoop(1.4.7)를 설치해보자! MySQL5 to Hadoop3

케키키케 2020. 12. 14. 23:40

Sqoop 다운로드

archive.apache.org/dist/sqoop/

$ tar -zxvf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
$ cp -r sqoop-1.4.7.bin__hadoop-2.6.0 /usr/local/sqoop
$ sudo chown -R user명:그룹명 /usr/local/sqoop

 

Sqoop 환경변수 설정

~/.bashrc 를 열어 SQOOP_HOME를 추가하고, PATH를 추가한다.

$ cat ~/.bashrc | grep SQOOP
export SQOOP_HOME=/usr/local/sqoop
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/sbin:$SPARK_HOME/bin:$SQOOP_HOME/bin:$HIVE_HOME/bin

 

MySQL Connector 추가

 sudo cp mysql-connector-java-5.1.45/mysql-connector-java-5.1.45-bin.jar @SQOOP_HOME/lib/

 

SQOOP sqoop-env.sh 설정

$ cd $SQOOP_HOME/conf
$ cp sqoop-env.template.sh sqoop-env.sh
$ vi sqoop-env.sh
export HADOOP_COMMON_HOME=하둡홈
export HADOOP_MAPRED_HOME=하둡홈
export HIVE_HOME=하이브홈

 

Hadoop 설정

스쿱라이브러리 하둡안에 복사하고 하둡 재실행

  $ cp $SQOOP_HOME/sqoop-1.4.7.jar $HADOOP_HOME/share/hadoop/tools/lib/

 

jar파일 추가(오류 발생 시만 적용)

cp commons-lang-2.6/commons-lang-2.6.jar $SQOOP_HOME/lib/
rm commons-lang3-3.4.jar

오류내용

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/lang/StringUtils
        at org.apache.sqoop.manager.MySQLManager.initOptionDefaults(MySQLManager.java:73)
        at org.apache.sqoop.manager.SqlManager.<init>(SqlManager.java:89)
        at com.cloudera.sqoop.manager.SqlManager.<init>(SqlManager.java:33)
        at org.apache.sqoop.manager.GenericJdbcManager.<init>(GenericJdbcManager.java:51)
        at com.cloudera.sqoop.manager.GenericJdbcManager.<init>(GenericJdbcManager.java:30)
        at org.apache.sqoop.manager.CatalogQueryManager.<init>(CatalogQueryManager.java:46)
        at com.cloudera.sqoop.manager.CatalogQueryManager.<init>(CatalogQueryManager.java:31)
        at org.apache.sqoop.manager.InformationSchemaManager.<init>(InformationSchemaManager.java:38)
        at com.cloudera.sqoop.manager.InformationSchemaManager.<init>(InformationSchemaManager.java:31)
        at org.apache.sqoop.manager.MySQLManager.<init>(MySQLManager.java:65)
        at org.apache.sqoop.manager.DefaultManagerFactory.accept(DefaultManagerFactory.java:67)
        at org.apache.sqoop.ConnFactory.getManager(ConnFactory.java:184)
        at org.apache.sqoop.tool.BaseSqoopTool.init(BaseSqoopTool.java:272)
        at org.apache.sqoop.tool.ListDatabasesTool.run(ListDatabasesTool.java:44)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
Caused by: java.lang.ClassNotFoundException: org.apache.commons.lang.StringUtils
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
        ... 20 more

 

SQOOP 실행

DB 조회

sqoop list-databases --connect jdbc:mysql://mysql설치된주소 --username mysql계정 -P

$ sqoop list-databases --connect jdbc:mysql://mysql설치된주소 --username mysql계정 -P
Warning: /usr/local/sqoop/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /usr/local/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /usr/local/sqoop/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
/usr/local/hadoop/libexec/hadoop-functions.sh: line 2366: HADOOP_ORG.APACHE.SQOOP.SQOOP_USER: bad substitution
/usr/local/hadoop/libexec/hadoop-functions.sh: line 2461: HADOOP_ORG.APACHE.SQOOP.SQOOP_OPTS: bad substitution
2020-12-14 19:32:25,055 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
Enter password:
2020-12-14 19:32:27,249 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
Mon Dec 14 19:32:27 KST 2020 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
information_schema
employees
mysql
performance_schema
sys

 

테이블 조회

sqoop list-tables --connect jdbc:mysql://mysql설치된주소/DB명 --username mysql계정 -P

$ sqoop list-tables --connect jdbc:mysql://MySQL서버/employees --username hadoopuser -P
Warning: /usr/local/sqoop/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /usr/local/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /usr/local/sqoop/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
/usr/local/hadoop/libexec/hadoop-functions.sh: line 2366: HADOOP_ORG.APACHE.SQOOP.SQOOP_USER: bad substitution
/usr/local/hadoop/libexec/hadoop-functions.sh: line 2461: HADOOP_ORG.APACHE.SQOOP.SQOOP_OPTS: bad substitution
2020-12-14 19:34:11,108 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
Enter password:
2020-12-14 19:34:12,958 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
Mon Dec 14 19:34:13 KST 2020 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
current_dept_emp
departments
dept_emp
dept_emp_latest_date
dept_manager
employees
salaries
titles

 

 

Hadoop to SQOOP

1) MySQL 테이블 import

MySQL에 저장된 데이터를 SQOOP을 이용하여 Hadoop으로 import 해보자!

 

 

MySQL 설정은 아래와 같이 구성된 상태에서 진행하겠습니다.

bachong.tistory.com/48

 

MySQL을 설치해보자! (Ubuntu 18.04) + 샘플 데이터 불러오기!

MySQL 설치 $ sudo apt-get update $ sudo apt-get intsall mysql-server 설치 후 usr/bin 경로 아래 mysql 관련 파일들이 생긴 것을 확인할 수 있다. $ ls /usr/bin | grep mysql mysql mysql_config_editor mysq..

bachong.tistory.com

 

MySQL 테이블을 SQOOP으로 import해보자!

$ sqoop import --connect jdbc:mysql://MySQL서버주소/employees --table employees --target-dir /user/sqoop/employees_result --username hadoopuser -P

*ERROR! java.lang.ClassNotFoundException: Class employees not found

분명 mysql db안에 employees 테이블이 존재하는데 찾지 못하는 문제가 발생했다. 다른 테이블로 재시도 해보았으나 모든 테이블들을 찾지 못한다.

$ sqoop import --connect jdbc:mysql://MySQL서버주소/employees --table employees --target-dir /user/sqoop/employees_result --username hadoopuser -P
Warning: /usr/local/sqoop/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /usr/local/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /usr/local/sqoop/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
/usr/local/hadoop/libexec/hadoop-functions.sh: line 2366: HADOOP_ORG.APACHE.SQOOP.SQOOP_USER: bad substitution
/usr/local/hadoop/libexec/hadoop-functions.sh: line 2461: HADOOP_ORG.APACHE.SQOOP.SQOOP_OPTS: bad substitution
2020-12-14 20:10:33,755 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
Enter password:
2020-12-14 20:10:38,336 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
2020-12-14 20:10:38,337 INFO tool.CodeGenTool: Beginning code generation
Mon Dec 14 20:10:38 KST 2020 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
2020-12-14 20:10:38,577 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `employees` AS t LIMIT 1
2020-12-14 20:10:38,589 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `employees` AS t LIMIT 1
2020-12-14 20:10:38,593 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop
Note: /tmp/sqoop-hadoopuser/compile/a8dd0a30854dfe1cf60311bcc8c6d267/employees.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
2020-12-14 20:10:39,465 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoopuser/compile/a8dd0a30854dfe1cf60311bcc8c6d267/employees.jar
2020-12-14 20:10:39,476 WARN manager.MySQLManager: It looks like you are importing from mysql.
2020-12-14 20:10:39,476 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
2020-12-14 20:10:39,476 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
2020-12-14 20:10:39,476 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
2020-12-14 20:10:39,479 INFO mapreduce.ImportJobBase: Beginning import of employees
2020-12-14 20:10:39,480 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2020-12-14 20:10:39,673 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
2020-12-14 20:10:40,094 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
2020-12-14 20:10:40,159 INFO impl.MetricsConfig: Loaded properties from hadoop-metrics2.properties
2020-12-14 20:10:40,213 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
2020-12-14 20:10:40,214 INFO impl.MetricsSystemImpl: JobTracker metrics system started
Mon Dec 14 20:10:40 KST 2020 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
2020-12-14 20:10:40,349 INFO db.DBInputFormat: Using read commited transaction isolation
2020-12-14 20:10:40,430 INFO mapreduce.JobSubmitter: number of splits:1
2020-12-14 20:10:40,503 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local474282191_0001
2020-12-14 20:10:40,503 INFO mapreduce.JobSubmitter: Executing with tokens: []
2020-12-14 20:10:40,628 INFO mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-hadoopuser/mapred/local/job_local474282191_0001_bd91b32b-17a5-409b-9f2c-feb4206eff7a/libjars <- /usr/local/sqoop/bin/libjars/*
2020-12-14 20:10:40,630 WARN fs.FileUtil: Command 'ln -s /tmp/hadoop-hadoopuser/mapred/local/job_local474282191_0001_bd91b32b-17a5-409b-9f2c-feb4206eff7a/libjars /usr/local/sqoop/bin/libjars/*' failed 1 with: ln: failed to create symbolic link '/usr/local/sqoop/bin/libjars/*': No such file or directory

2020-12-14 20:10:40,631 WARN mapred.LocalDistributedCacheManager: Failed to create symlink: /tmp/hadoop-hadoopuser/mapred/local/job_local474282191_0001_bd91b32b-17a5-409b-9f2c-feb4206eff7a/libjars <- /usr/local/sqoop/bin/libjars/*
2020-12-14 20:10:40,631 INFO mapred.LocalDistributedCacheManager: Localized file:/tmp/hadoop/mapred/staging/hadoopuser474282191/.staging/job_local474282191_0001/libjars as file:/tmp/hadoop-hadoopuser/mapred/local/job_local474282191_0001_bd91b32b-17a5-409b-9f2c-feb4206eff7a/libjars
2020-12-14 20:10:40,662 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
2020-12-14 20:10:40,663 INFO mapreduce.Job: Running job: job_local474282191_0001
2020-12-14 20:10:40,663 INFO mapred.LocalJobRunner: OutputCommitter set in config null
2020-12-14 20:10:40,671 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
2020-12-14 20:10:40,671 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2020-12-14 20:10:40,672 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
2020-12-14 20:10:40,738 INFO mapred.LocalJobRunner: Waiting for map tasks
2020-12-14 20:10:40,739 INFO mapred.LocalJobRunner: Starting task: attempt_local474282191_0001_m_000000_0
2020-12-14 20:10:40,755 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
2020-12-14 20:10:40,755 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2020-12-14 20:10:40,775 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
Mon Dec 14 20:10:40 KST 2020 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
2020-12-14 20:10:40,781 INFO db.DBInputFormat: Using read commited transaction isolation
2020-12-14 20:10:40,784 INFO mapred.MapTask: Processing split: 1=1 AND 1=1
2020-12-14 20:10:40,788 INFO mapred.LocalJobRunner: map task executor complete.
2020-12-14 20:10:40,808 WARN mapred.LocalJobRunner: job_local474282191_0001
java.lang.Exception: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class employees not found
        at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:492)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:552)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class employees not found
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2638)
        at org.apache.sqoop.mapreduce.db.DBConfiguration.getInputClass(DBConfiguration.java:403)
        at org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat.createDBRecordReader(DataDrivenDBInputFormat.java:270)
        at org.apache.sqoop.mapreduce.db.DBInputFormat.createRecordReader(DBInputFormat.java:266)
        at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:527)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
        at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: Class employees not found
        at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2542)
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2636)
        ... 12 more
2020-12-14 20:10:41,667 INFO mapreduce.Job: Job job_local474282191_0001 running in uber mode : false
2020-12-14 20:10:41,668 INFO mapreduce.Job:  map 0% reduce 0%
2020-12-14 20:10:41,670 INFO mapreduce.Job: Job job_local474282191_0001 failed with state FAILED due to: NA
2020-12-14 20:10:41,675 INFO mapreduce.Job: Counters: 0
2020-12-14 20:10:41,678 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
2020-12-14 20:10:41,679 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 1.573 seconds (0 bytes/sec)
2020-12-14 20:10:41,680 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
2020-12-14 20:10:41,680 INFO mapreduce.ImportJobBase: Retrieved 0 records.
2020-12-14 20:10:41,680 ERROR tool.ImportTool: Import failed: Import job failed!

 

테이블을 찾지 못하는 에러가 발생한 경우, --bindir 옵션을 추가하여 실행한다.

--bindir $SQOOP_HOME/lib 옵션 추가

$ sqoop import --connect jdbc:mysql://MySQL서버주소/DB명 --table 테이블명 --target-dir /user/sqoop/result --bindir $SQOOP_HOME/lib --username hadoopuser -P

 

+ employees DB에 DB명과 같은 테이블이 존재하는데, 이 테이블은 끝까지 찾지 못했다고 한다. 나머지 테이블은 성공!

$ sqoop import --connect jdbc:mysql://MySQL서버주소/DB명 --table 테이블명 --target-dir /user/sqoop/result --bindir $SQOOP_HOME/lib --username hadoopuser -P
Warning: /usr/local/sqoop/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /usr/local/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /usr/local/sqoop/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
/usr/local/hadoop/libexec/hadoop-functions.sh: line 2366: HADOOP_ORG.APACHE.SQOOP.SQOOP_USER: bad substitution
/usr/local/hadoop/libexec/hadoop-functions.sh: line 2461: HADOOP_ORG.APACHE.SQOOP.SQOOP_OPTS: bad substitution
2020-12-14 23:47:43,892 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
Enter password:
2020-12-14 23:47:46,602 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
2020-12-14 23:47:46,602 INFO tool.CodeGenTool: Beginning code generation
Mon Dec 14 23:47:46 KST 2020 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
2020-12-14 23:47:46,828 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `departments` AS t LIMIT 1
2020-12-14 23:47:46,839 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `departments` AS t LIMIT 1
2020-12-14 23:47:46,844 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop
Note: /usr/local/sqoop/lib/departments.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
2020-12-14 23:47:47,690 INFO orm.CompilationManager: Writing jar file: /usr/local/sqoop/lib/departments.jar
2020-12-14 23:47:47,699 WARN manager.MySQLManager: It looks like you are importing from mysql.
2020-12-14 23:47:47,699 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
2020-12-14 23:47:47,699 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
2020-12-14 23:47:47,700 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
2020-12-14 23:47:47,805 INFO mapreduce.ImportJobBase: Beginning import of departments
2020-12-14 23:47:47,806 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2020-12-14 23:47:47,909 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
2020-12-14 23:47:48,368 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
2020-12-14 23:47:48,433 INFO impl.MetricsConfig: Loaded properties from hadoop-metrics2.properties
2020-12-14 23:47:48,492 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
2020-12-14 23:47:48,494 INFO impl.MetricsSystemImpl: JobTracker metrics system started
Mon Dec 14 23:47:48 KST 2020 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
2020-12-14 23:47:48,672 INFO db.DBInputFormat: Using read commited transaction isolation
2020-12-14 23:47:48,683 INFO mapreduce.JobSubmitter: number of splits:1
2020-12-14 23:47:48,752 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1320311531_0001
2020-12-14 23:47:48,752 INFO mapreduce.JobSubmitter: Executing with tokens: []
2020-12-14 23:47:48,877 INFO mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-hadoopuser/mapred/local/job_local1320311531_0001_4c1f58f4-0549-4eb9-92ed-5063f41288ad/libjars <- /home/hadoopuser/libjars/*
2020-12-14 23:47:48,884 WARN fs.FileUtil: Command 'ln -s /tmp/hadoop-hadoopuser/mapred/local/job_local1320311531_0001_4c1f58f4-0549-4eb9-92ed-5063f41288ad/libjars /home/hadoopuser/libjars/*' failed 1 with: ln: failed to create symbolic link '/home/hadoopuser/libjars/*': No such file or directory

2020-12-14 23:47:48,884 WARN mapred.LocalDistributedCacheManager: Failed to create symlink: /tmp/hadoop-hadoopuser/mapred/local/job_local1320311531_0001_4c1f58f4-0549-4eb9-92ed-5063f41288ad/libjars <- /home/hadoopuser/libjars/*
2020-12-14 23:47:48,885 INFO mapred.LocalDistributedCacheManager: Localized file:/tmp/hadoop/mapred/staging/hadoopuser1320311531/.staging/job_local1320311531_0001/libjars as file:/tmp/hadoop-hadoopuser/mapred/local/job_local1320311531_0001_4c1f58f4-0549-4eb9-92ed-5063f41288ad/libjars
2020-12-14 23:47:48,913 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
2020-12-14 23:47:48,913 INFO mapreduce.Job: Running job: job_local1320311531_0001
2020-12-14 23:47:48,914 INFO mapred.LocalJobRunner: OutputCommitter set in config null
2020-12-14 23:47:48,920 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
2020-12-14 23:47:48,920 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2020-12-14 23:47:48,920 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
2020-12-14 23:47:48,961 INFO mapred.LocalJobRunner: Waiting for map tasks
2020-12-14 23:47:48,961 INFO mapred.LocalJobRunner: Starting task: attempt_local1320311531_0001_m_000000_0
2020-12-14 23:47:48,976 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
2020-12-14 23:47:48,976 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2020-12-14 23:47:48,988 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
Mon Dec 14 23:47:48 KST 2020 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
2020-12-14 23:47:48,991 INFO db.DBInputFormat: Using read commited transaction isolation
2020-12-14 23:47:48,994 INFO mapred.MapTask: Processing split: 1=1 AND 1=1
2020-12-14 23:47:49,040 INFO db.DBRecordReader: Working on split: 1=1 AND 1=1
2020-12-14 23:47:49,041 INFO db.DBRecordReader: Executing query: SELECT `dept_no`, `dept_name` FROM `departments` AS `departments` WHERE ( 1=1 ) AND ( 1=1 )
2020-12-14 23:47:49,043 INFO mapreduce.AutoProgressMapper: Auto-progress thread is finished. keepGoing=false
2020-12-14 23:47:49,045 INFO mapred.LocalJobRunner:
2020-12-14 23:47:49,073 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2020-12-14 23:47:49,158 INFO mapred.Task: Task:attempt_local1320311531_0001_m_000000_0 is done. And is in the process of committing
2020-12-14 23:47:49,161 INFO mapred.LocalJobRunner:
2020-12-14 23:47:49,161 INFO mapred.Task: Task attempt_local1320311531_0001_m_000000_0 is allowed to commit now
2020-12-14 23:47:49,177 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1320311531_0001_m_000000_0' to hdfs://hadoop1:9000/user/sqoop/result
2020-12-14 23:47:49,178 INFO mapred.LocalJobRunner: map
2020-12-14 23:47:49,178 INFO mapred.Task: Task 'attempt_local1320311531_0001_m_000000_0' done.
2020-12-14 23:47:49,183 INFO mapred.Task: Final Counters for attempt_local1320311531_0001_m_000000_0: Counters: 21
        File System Counters
                FILE: Number of bytes read=5163
                FILE: Number of bytes written=548322
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=0
                HDFS: Number of bytes written=153
                HDFS: Number of read operations=6
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=3
                HDFS: Number of bytes read erasure-coded=0
        Map-Reduce Framework
                Map input records=9
                Map output records=9
                Input split bytes=87
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=0
                Total committed heap usage (bytes)=350224384
        File Input Format Counters
                Bytes Read=0
        File Output Format Counters
                Bytes Written=153
2020-12-14 23:47:49,185 INFO mapred.LocalJobRunner: Finishing task: attempt_local1320311531_0001_m_000000_0
2020-12-14 23:47:49,185 INFO mapred.LocalJobRunner: map task executor complete.
2020-12-14 23:47:49,917 INFO mapreduce.Job: Job job_local1320311531_0001 running in uber mode : false
2020-12-14 23:47:49,918 INFO mapreduce.Job:  map 100% reduce 0%
2020-12-14 23:47:49,920 INFO mapreduce.Job: Job job_local1320311531_0001 completed successfully
2020-12-14 23:47:49,924 INFO mapreduce.Job: Counters: 21
        File System Counters
                FILE: Number of bytes read=5163
                FILE: Number of bytes written=548322
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=0
                HDFS: Number of bytes written=153
                HDFS: Number of read operations=6
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=3
                HDFS: Number of bytes read erasure-coded=0
        Map-Reduce Framework
                Map input records=9
                Map output records=9
                Input split bytes=87
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=0
                Total committed heap usage (bytes)=350224384
        File Input Format Counters
                Bytes Read=0
        File Output Format Counters
                Bytes Written=153
2020-12-14 23:47:49,927 INFO mapreduce.ImportJobBase: Transferred 153 bytes in 1.5536 seconds (98.481 bytes/sec)
2020-12-14 23:47:49,928 INFO mapreduce.ImportJobBase: Retrieved 9 records.

 

2) hdfs확인

hdfs에 결과가 잘 저장되었나 확인해본다.

//import시 설정한 output 디렉토리가 잘 생성되었다.
$ hadoop fs -ls /user/sqoop/result
Found 2 items
-rw-r--r--   2 hadoopuser supergroup          0 2020-12-14 23:47 /user/sqoop/result/_SUCCESS
-rw-r--r--   2 hadoopuser supergroup        153 2020-12-14 23:47 /user/sqoop/result/part-m-00000

//결과 디렉토리 안에는 다음과 같은 데이터가 생성된 것을 확인할 수 있다.
$ hadoop fs -cat /user/sqoop/result/part-m-00000
2020-12-14 23:51:25,297 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
d009,Customer Service
d005,Development
d002,Finance
d003,Human Resources
d001,Marketing
d004,Production
d006,Quality Management
d008,Research
d007,Sales

 

Hadoop to MySQL(RDBMS)

1) hadoop에 저장된 데이터를 Mysql로 export해보자! (RDBMS 형식에 맞는 테이블 구조가 존재해야 한다.)

 

먼저, MySQL에 불러올 데이터 형식과 일치하는 테이블을 생성한다.

$ mysql -h 호스트 -u hadoopuser -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 48
Server version: 5.7.32-0ubuntu0.18.04.1 (Ubuntu)

Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> use employees;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
//import로 실습했던 departments와 같은 테이블 dept를 생성한다.
mysql> create table dept like departments;
Query OK, 0 rows affected (0.03 sec)

mysql> desc dept;
+-----------+-------------+------+-----+---------+-------+
| Field     | Type        | Null | Key | Default | Extra |
+-----------+-------------+------+-----+---------+-------+
| dept_no   | char(4)     | NO   | PRI | NULL    |       |
| dept_name | varchar(40) | NO   | UNI | NULL    |       |
+-----------+-------------+------+-----+---------+-------+
2 rows in set (0.00 sec)

mysql> desc departments;
+-----------+-------------+------+-----+---------+-------+
| Field     | Type        | Null | Key | Default | Extra |
+-----------+-------------+------+-----+---------+-------+
| dept_no   | char(4)     | NO   | PRI | NULL    |       |
| dept_name | varchar(40) | NO   | UNI | NULL    |       |
+-----------+-------------+------+-----+---------+-------+
2 rows in set (0.00 sec)

mysql> select * from dept;
Empty set (0.00 sec)

 

이제 export 해보자!

$ sqoop export --connect jdbc:mysql://MySQL서버주소/DB명 --table 테이블명 --export-dir /user/sqoop/result --bindir /usr/local/sqoop/lib --username MySQL계정 -P

$ sqoop export --connect jdbc:mysql://MySQL서버주소/DB명 --table 테이블명 --export-dir HDFS경로 --bindir /usr/local/sqoop/lib --username MySQL계정 -P
Warning: /usr/local/sqoop/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /usr/local/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /usr/local/sqoop/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
/usr/local/hadoop/libexec/hadoop-functions.sh: line 2366: HADOOP_ORG.APACHE.SQOOP.SQOOP_USER: bad substitution
/usr/local/hadoop/libexec/hadoop-functions.sh: line 2461: HADOOP_ORG.APACHE.SQOOP.SQOOP_OPTS: bad substitution
2020-12-15 00:07:32,120 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
Enter password:
2020-12-15 00:07:36,677 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
2020-12-15 00:07:36,680 INFO tool.CodeGenTool: Beginning code generation
Tue Dec 15 00:07:36 KST 2020 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
2020-12-15 00:07:36,917 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `dept` AS t LIMIT 1
2020-12-15 00:07:36,927 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `dept` AS t LIMIT 1
2020-12-15 00:07:36,932 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop
Note: /usr/local/sqoop/lib/dept.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
2020-12-15 00:07:37,861 INFO orm.CompilationManager: Writing jar file: /usr/local/sqoop/lib/dept.jar
2020-12-15 00:07:37,870 INFO mapreduce.ExportJobBase: Beginning export of dept
2020-12-15 00:07:37,870 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2020-12-15 00:07:37,949 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
2020-12-15 00:07:38,490 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2020-12-15 00:07:38,559 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
2020-12-15 00:07:38,562 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
2020-12-15 00:07:38,562 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
2020-12-15 00:07:38,698 INFO impl.MetricsConfig: Loaded properties from hadoop-metrics2.properties
2020-12-15 00:07:38,756 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
2020-12-15 00:07:38,757 INFO impl.MetricsSystemImpl: JobTracker metrics system started
2020-12-15 00:07:38,825 INFO input.FileInputFormat: Total input files to process : 1
2020-12-15 00:07:38,829 INFO input.FileInputFormat: Total input files to process : 1
2020-12-15 00:07:38,851 INFO mapreduce.JobSubmitter: number of splits:4
2020-12-15 00:07:38,871 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
2020-12-15 00:07:38,921 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1030561943_0001
2020-12-15 00:07:38,921 INFO mapreduce.JobSubmitter: Executing with tokens: []
2020-12-15 00:07:39,042 INFO mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-hadoopuser/mapred/local/job_local1030561943_0001_ed4b336f-5b3b-4583-947f-3cbbf7b14fe5/libjars <- /home/hadoopuser/libjars/*
2020-12-15 00:07:39,045 WARN fs.FileUtil: Command 'ln -s /tmp/hadoop-hadoopuser/mapred/local/job_local1030561943_0001_ed4b336f-5b3b-4583-947f-3cbbf7b14fe5/libjars /home/hadoopuser/libjars/*' failed 1 with: ln: failed to create symbolic link '/home/hadoopuser/libjars/*': No such file or directory

2020-12-15 00:07:39,045 WARN mapred.LocalDistributedCacheManager: Failed to create symlink: /tmp/hadoop-hadoopuser/mapred/local/job_local1030561943_0001_ed4b336f-5b3b-4583-947f-3cbbf7b14fe5/libjars <- /home/hadoopuser/libjars/*
2020-12-15 00:07:39,045 INFO mapred.LocalDistributedCacheManager: Localized file:/tmp/hadoop/mapred/staging/hadoopuser1030561943/.staging/job_local1030561943_0001/libjars as file:/tmp/hadoop-hadoopuser/mapred/local/job_local1030561943_0001_ed4b336f-5b3b-4583-947f-3cbbf7b14fe5/libjars
2020-12-15 00:07:39,072 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
2020-12-15 00:07:39,072 INFO mapreduce.Job: Running job: job_local1030561943_0001
2020-12-15 00:07:39,074 INFO mapred.LocalJobRunner: OutputCommitter set in config null
2020-12-15 00:07:39,077 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.sqoop.mapreduce.NullOutputCommitter
2020-12-15 00:07:39,101 INFO mapred.LocalJobRunner: Waiting for map tasks
2020-12-15 00:07:39,101 INFO mapred.LocalJobRunner: Starting task: attempt_local1030561943_0001_m_000000_0
2020-12-15 00:07:39,133 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
2020-12-15 00:07:39,136 INFO mapred.MapTask: Processing split: Paths:/user/sqoop/result/part-m-00000:114+19,/user/sqoop/result/part-m-00000:133+20
2020-12-15 00:07:39,142 INFO Configuration.deprecation: map.input.file is deprecated. Instead, use mapreduce.map.input.file
2020-12-15 00:07:39,142 INFO Configuration.deprecation: map.input.start is deprecated. Instead, use mapreduce.map.input.start
2020-12-15 00:07:39,146 INFO Configuration.deprecation: map.input.length is deprecated. Instead, use mapreduce.map.input.length
Tue Dec 15 00:07:39 KST 2020 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
2020-12-15 00:07:39,181 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2020-12-15 00:07:39,188 INFO mapreduce.AutoProgressMapper: Auto-progress thread is finished. keepGoing=false
2020-12-15 00:07:39,190 INFO mapred.LocalJobRunner:
2020-12-15 00:07:39,207 INFO mapred.Task: Task:attempt_local1030561943_0001_m_000000_0 is done. And is in the process of committing
2020-12-15 00:07:39,208 INFO mapred.LocalJobRunner: map
2020-12-15 00:07:39,208 INFO mapred.Task: Task 'attempt_local1030561943_0001_m_000000_0' done.
2020-12-15 00:07:39,213 INFO mapred.Task: Final Counters for attempt_local1030561943_0001_m_000000_0: Counters: 21
        File System Counters
                FILE: Number of bytes read=5658
                FILE: Number of bytes written=547917
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=68
                HDFS: Number of bytes written=0
                HDFS: Number of read operations=15
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=0
                HDFS: Number of bytes read erasure-coded=0
        Map-Reduce Framework
                Map input records=2
                Map output records=2
                Input split bytes=198
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=0
                Total committed heap usage (bytes)=350748672
        File Input Format Counters
                Bytes Read=0
        File Output Format Counters
                Bytes Written=0
2020-12-15 00:07:39,213 INFO mapred.LocalJobRunner: Finishing task: attempt_local1030561943_0001_m_000000_0
2020-12-15 00:07:39,213 INFO mapred.LocalJobRunner: Starting task: attempt_local1030561943_0001_m_000001_0
2020-12-15 00:07:39,214 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
2020-12-15 00:07:39,215 INFO mapred.MapTask: Processing split: Paths:/user/sqoop/result/part-m-00000:0+38
Tue Dec 15 00:07:39 KST 2020 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
2020-12-15 00:07:39,227 INFO mapreduce.AutoProgressMapper: Auto-progress thread is finished. keepGoing=false
2020-12-15 00:07:39,228 INFO mapred.LocalJobRunner:
2020-12-15 00:07:39,232 INFO mapred.Task: Task:attempt_local1030561943_0001_m_000001_0 is done. And is in the process of committing
2020-12-15 00:07:39,233 INFO mapred.LocalJobRunner: map
2020-12-15 00:07:39,233 INFO mapred.Task: Task 'attempt_local1030561943_0001_m_000001_0' done.
2020-12-15 00:07:39,233 INFO mapred.Task: Final Counters for attempt_local1030561943_0001_m_000001_0: Counters: 21
        File System Counters
                FILE: Number of bytes read=6272
                FILE: Number of bytes written=547917
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=224
                HDFS: Number of bytes written=0
                HDFS: Number of read operations=18
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=0
                HDFS: Number of bytes read erasure-coded=0
        Map-Reduce Framework
                Map input records=2
                Map output records=2
                Input split bytes=131
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=0
                Total committed heap usage (bytes)=350748672
        File Input Format Counters
                Bytes Read=0
        File Output Format Counters
                Bytes Written=0
2020-12-15 00:07:39,233 INFO mapred.LocalJobRunner: Finishing task: attempt_local1030561943_0001_m_000001_0
2020-12-15 00:07:39,233 INFO mapred.LocalJobRunner: Starting task: attempt_local1030561943_0001_m_000002_0
2020-12-15 00:07:39,236 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
2020-12-15 00:07:39,237 INFO mapred.MapTask: Processing split: Paths:/user/sqoop/result/part-m-00000:38+38
Tue Dec 15 00:07:39 KST 2020 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
2020-12-15 00:07:39,248 INFO mapreduce.AutoProgressMapper: Auto-progress thread is finished. keepGoing=false
2020-12-15 00:07:39,249 INFO mapred.LocalJobRunner:
2020-12-15 00:07:39,255 INFO mapred.Task: Task:attempt_local1030561943_0001_m_000002_0 is done. And is in the process of committing
2020-12-15 00:07:39,256 INFO mapred.LocalJobRunner: map
2020-12-15 00:07:39,256 INFO mapred.Task: Task 'attempt_local1030561943_0001_m_000002_0' done.
2020-12-15 00:07:39,256 INFO mapred.Task: Final Counters for attempt_local1030561943_0001_m_000002_0: Counters: 21
        File System Counters
                FILE: Number of bytes read=6886
                FILE: Number of bytes written=547917
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=342
                HDFS: Number of bytes written=0
                HDFS: Number of read operations=21
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=0
                HDFS: Number of bytes read erasure-coded=0
        Map-Reduce Framework
                Map input records=3
                Map output records=3
                Input split bytes=131
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=0
                Total committed heap usage (bytes)=350748672
        File Input Format Counters
                Bytes Read=0
        File Output Format Counters
                Bytes Written=0
2020-12-15 00:07:39,256 INFO mapred.LocalJobRunner: Finishing task: attempt_local1030561943_0001_m_000002_0
2020-12-15 00:07:39,256 INFO mapred.LocalJobRunner: Starting task: attempt_local1030561943_0001_m_000003_0
2020-12-15 00:07:39,257 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
2020-12-15 00:07:39,258 INFO mapred.MapTask: Processing split: Paths:/user/sqoop/result/part-m-00000:76+38
Tue Dec 15 00:07:39 KST 2020 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
2020-12-15 00:07:39,270 INFO mapreduce.AutoProgressMapper: Auto-progress thread is finished. keepGoing=false
2020-12-15 00:07:39,270 INFO mapred.LocalJobRunner:
2020-12-15 00:07:39,275 INFO mapred.Task: Task:attempt_local1030561943_0001_m_000003_0 is done. And is in the process of committing
2020-12-15 00:07:39,276 INFO mapred.LocalJobRunner: map
2020-12-15 00:07:39,276 INFO mapred.Task: Task 'attempt_local1030561943_0001_m_000003_0' done.
2020-12-15 00:07:39,276 INFO mapred.Task: Final Counters for attempt_local1030561943_0001_m_000003_0: Counters: 21
        File System Counters
                FILE: Number of bytes read=7500
                FILE: Number of bytes written=547917
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=422
                HDFS: Number of bytes written=0
                HDFS: Number of read operations=24
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=0
                HDFS: Number of bytes read erasure-coded=0
        Map-Reduce Framework
                Map input records=2
                Map output records=2
                Input split bytes=131
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=0
                Total committed heap usage (bytes)=350748672
        File Input Format Counters
                Bytes Read=0
        File Output Format Counters
                Bytes Written=0
2020-12-15 00:07:39,276 INFO mapred.LocalJobRunner: Finishing task: attempt_local1030561943_0001_m_000003_0
2020-12-15 00:07:39,277 INFO mapred.LocalJobRunner: map task executor complete.
2020-12-15 00:07:40,075 INFO mapreduce.Job: Job job_local1030561943_0001 running in uber mode : false
2020-12-15 00:07:40,076 INFO mapreduce.Job:  map 100% reduce 0%
2020-12-15 00:07:40,078 INFO mapreduce.Job: Job job_local1030561943_0001 completed successfully
2020-12-15 00:07:40,089 INFO mapreduce.Job: Counters: 21
        File System Counters
                FILE: Number of bytes read=26316
                FILE: Number of bytes written=2191668
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=1056
                HDFS: Number of bytes written=0
                HDFS: Number of read operations=78
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=0
                HDFS: Number of bytes read erasure-coded=0
        Map-Reduce Framework
                Map input records=9
                Map output records=9
                Input split bytes=591
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=0
                Total committed heap usage (bytes)=1402994688
        File Input Format Counters
                Bytes Read=0
        File Output Format Counters
                Bytes Written=0
2020-12-15 00:07:40,092 INFO mapreduce.ExportJobBase: Transferred 1.0312 KB in 1.5227 seconds (693.5028 bytes/sec)
2020-12-15 00:07:40,095 INFO mapreduce.ExportJobBase: Exported 9 records.

 

마지막으로, MySQL DB에 잘 저장되었는지 확인해보자!

$ mysql -h host주소 -u 계정 -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 64
Server version: 5.7.32-0ubuntu0.18.04.1 (Ubuntu)

Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> use employees;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed

//비어있던 dept테이블에 데이터가 생겼다!
mysql> select * from dept;
+---------+--------------------+
| dept_no | dept_name          |
+---------+--------------------+
| d009    | Customer Service   |
| d005    | Development        |
| d002    | Finance            |
| d003    | Human Resources    |
| d001    | Marketing          |
| d004    | Production         |
| d006    | Quality Management |
| d008    | Research           |
| d007    | Sales              |
+---------+--------------------+
9 rows in set (0.00 sec)

mysql> exit
Bye

 

 

특정 값만 MySQL에서 Hadoop으로 import 하기

$ sqoop import --connect jdbc:mysql://MySQL서버주소/employees --table titles --where "title='Staff'" --target-dir /user/sqoop/titles --bindir /usr/local/sqoop/lib --username MySQL계정 -P

*주의 where조건 ""로 감싸기!

$ sqoop import --connect jdbc:mysql://MySQL서버주소/employees --table titles --where "title='Staff'" --target-dir /user/sqoop/titles --bindir /usr/local/sqoop/lib --username MySQL계정 -P
Warning: /usr/local/sqoop/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /usr/local/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /usr/local/sqoop/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
/usr/local/hadoop/libexec/hadoop-functions.sh: line 2366: HADOOP_ORG.APACHE.SQOOP.SQOOP_USER: bad substitution
/usr/local/hadoop/libexec/hadoop-functions.sh: line 2461: HADOOP_ORG.APACHE.SQOOP.SQOOP_OPTS: bad substitution
2020-12-15 00:45:54,887 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
Enter password:
2020-12-15 00:45:56,948 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
2020-12-15 00:45:56,948 INFO tool.CodeGenTool: Beginning code generation
Tue Dec 15 00:45:57 KST 2020 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
2020-12-15 00:45:57,181 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `titles` AS t LIMIT 1
2020-12-15 00:45:57,193 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `titles` AS t LIMIT 1
2020-12-15 00:45:57,197 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop
Note: /usr/local/sqoop/lib/titles.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
2020-12-15 00:45:58,049 INFO orm.CompilationManager: Writing jar file: /usr/local/sqoop/lib/titles.jar
2020-12-15 00:45:58,166 WARN manager.MySQLManager: It looks like you are importing from mysql.
2020-12-15 00:45:58,166 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
2020-12-15 00:45:58,166 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
2020-12-15 00:45:58,166 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
2020-12-15 00:45:58,168 WARN manager.CatalogQueryManager: The table titles contains a multi-column primary key. Sqoop will default to the column emp_no only for this job.
2020-12-15 00:45:58,169 WARN manager.CatalogQueryManager: The table titles contains a multi-column primary key. Sqoop will default to the column emp_no only for this job.
2020-12-15 00:45:58,169 INFO mapreduce.ImportJobBase: Beginning import of titles
2020-12-15 00:45:58,169 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2020-12-15 00:45:58,248 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
2020-12-15 00:45:58,655 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
2020-12-15 00:45:58,719 INFO impl.MetricsConfig: Loaded properties from hadoop-metrics2.properties
2020-12-15 00:45:58,775 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
2020-12-15 00:45:58,775 INFO impl.MetricsSystemImpl: JobTracker metrics system started
Tue Dec 15 00:45:58 KST 2020 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
2020-12-15 00:45:58,984 INFO db.DBInputFormat: Using read commited transaction isolation
2020-12-15 00:45:58,994 INFO mapreduce.JobSubmitter: number of splits:1
2020-12-15 00:45:59,065 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1442629496_0001
2020-12-15 00:45:59,065 INFO mapreduce.JobSubmitter: Executing with tokens: []
2020-12-15 00:45:59,201 INFO mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-hadoopuser/mapred/local/job_local1442629496_0001_753bf5a2-9c4e-4612-9018-2e96f3ddd6eb/libjars <- /home/hadoopuser/libjars/*
2020-12-15 00:45:59,209 WARN fs.FileUtil: Command 'ln -s /tmp/hadoop-hadoopuser/mapred/local/job_local1442629496_0001_753bf5a2-9c4e-4612-9018-2e96f3ddd6eb/libjars /home/hadoopuser/libjars/*' failed 1 with: ln: failed to create symbolic link '/home/hadoopuser/libjars/*': No such file or directory

2020-12-15 00:45:59,209 WARN mapred.LocalDistributedCacheManager: Failed to create symlink: /tmp/hadoop-hadoopuser/mapred/local/job_local1442629496_0001_753bf5a2-9c4e-4612-9018-2e96f3ddd6eb/libjars <- /home/hadoopuser/libjars/*
2020-12-15 00:45:59,209 INFO mapred.LocalDistributedCacheManager: Localized file:/tmp/hadoop/mapred/staging/hadoopuser1442629496/.staging/job_local1442629496_0001/libjars as file:/tmp/hadoop-hadoopuser/mapred/local/job_local1442629496_0001_753bf5a2-9c4e-4612-9018-2e96f3ddd6eb/libjars
2020-12-15 00:45:59,240 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
2020-12-15 00:45:59,241 INFO mapreduce.Job: Running job: job_local1442629496_0001
2020-12-15 00:45:59,242 INFO mapred.LocalJobRunner: OutputCommitter set in config null
2020-12-15 00:45:59,249 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
2020-12-15 00:45:59,249 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2020-12-15 00:45:59,250 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
2020-12-15 00:45:59,293 INFO mapred.LocalJobRunner: Waiting for map tasks
2020-12-15 00:45:59,293 INFO mapred.LocalJobRunner: Starting task: attempt_local1442629496_0001_m_000000_0
2020-12-15 00:45:59,311 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
2020-12-15 00:45:59,311 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2020-12-15 00:45:59,324 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
Tue Dec 15 00:45:59 KST 2020 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
2020-12-15 00:45:59,327 INFO db.DBInputFormat: Using read commited transaction isolation
2020-12-15 00:45:59,329 INFO mapred.MapTask: Processing split: 1=1 AND 1=1
2020-12-15 00:45:59,382 INFO db.DBRecordReader: Working on split: 1=1 AND 1=1
2020-12-15 00:45:59,382 INFO db.DBRecordReader: Executing query: SELECT `emp_no`, `title`, `from_date`, `to_date` FROM `titles` AS `titles` WHERE ( title='Staff' ) AND ( 1=1 ) AND ( 1=1 )
2020-12-15 00:45:59,491 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2020-12-15 00:45:59,908 INFO mapreduce.AutoProgressMapper: Auto-progress thread is finished. keepGoing=false
2020-12-15 00:45:59,910 INFO mapred.LocalJobRunner:
2020-12-15 00:45:59,930 INFO mapred.Task: Task:attempt_local1442629496_0001_m_000000_0 is done. And is in the process of committing
2020-12-15 00:45:59,933 INFO mapred.LocalJobRunner:
2020-12-15 00:45:59,933 INFO mapred.Task: Task attempt_local1442629496_0001_m_000000_0 is allowed to commit now
2020-12-15 00:45:59,948 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1442629496_0001_m_000000_0' to hdfs://hadoop1:9000/user/sqoop/titles
2020-12-15 00:45:59,949 INFO mapred.LocalJobRunner: map
2020-12-15 00:45:59,949 INFO mapred.Task: Task 'attempt_local1442629496_0001_m_000000_0' done.
2020-12-15 00:45:59,954 INFO mapred.Task: Final Counters for attempt_local1442629496_0001_m_000000_0: Counters: 21
        File System Counters
                FILE: Number of bytes read=7167
                FILE: Number of bytes written=551025
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=0
                HDFS: Number of bytes written=3726491
                HDFS: Number of read operations=6
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=3
                HDFS: Number of bytes read erasure-coded=0
        Map-Reduce Framework
                Map input records=107391
                Map output records=107391
                Input split bytes=87
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=8
                Total committed heap usage (bytes)=374865920
        File Input Format Counters
                Bytes Read=0
        File Output Format Counters
                Bytes Written=3726491
2020-12-15 00:45:59,954 INFO mapred.LocalJobRunner: Finishing task: attempt_local1442629496_0001_m_000000_0
2020-12-15 00:45:59,954 INFO mapred.LocalJobRunner: map task executor complete.
2020-12-15 00:46:00,245 INFO mapreduce.Job: Job job_local1442629496_0001 running in uber mode : false
2020-12-15 00:46:00,246 INFO mapreduce.Job:  map 100% reduce 0%
2020-12-15 00:46:00,248 INFO mapreduce.Job: Job job_local1442629496_0001 completed successfully
2020-12-15 00:46:00,252 INFO mapreduce.Job: Counters: 21
        File System Counters
                FILE: Number of bytes read=7167
                FILE: Number of bytes written=551025
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=0
                HDFS: Number of bytes written=3726491
                HDFS: Number of read operations=6
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=3
                HDFS: Number of bytes read erasure-coded=0
        Map-Reduce Framework
                Map input records=107391
                Map output records=107391
                Input split bytes=87
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=8
                Total committed heap usage (bytes)=374865920
        File Input Format Counters
                Bytes Read=0
        File Output Format Counters
                Bytes Written=3726491
2020-12-15 00:46:00,253 INFO mapreduce.ImportJobBase: Transferred 3.5539 MB in 1.5886 seconds (2.2371 MB/sec)
2020-12-15 00:46:00,254 INFO mapreduce.ImportJobBase: Retrieved 107391 records.

 

hdfs에 잘 저장됐나 확인해보자!

$ hadoop fs -ls /user/sqoop/titles
Found 2 items
-rw-r--r--   2 hadoopuser supergroup          0 2020-12-15 00:45 /user/sqoop/titles/_SUCCESS
-rw-r--r--   2 hadoopuser supergroup    3726491 2020-12-15 00:45 /user/sqoop/titles/part-m-00000

//title이 Staff인 데이터들이 주우우욱 나온다.
$ hadoop fs -cat /user/sqoop/titles/part-m-00000
252317,Staff,1985-09-18,1993-09-18
252319,Staff,1999-05-04,9999-01-01
252327,Staff,1988-07-21,1996-07-21
252328,Staff,1993-02-03,1999-02-03
252331,Staff,1992-10-20,1998-10-20
252332,Staff,1989-02-14,1995-02-14
252334,Staff,1992-10-27,2001-10-27
252335,Staff,1987-12-17,1994-12-17
252336,Staff,1987-11-08,1992-11-07
252339,Staff,1998-03-16,9999-01-01
252343,Staff,1997-02-08,2002-02-08
252344,Staff,1999-10-10,9999-01-01
...