欢迎光临
我们一直在努力

sqoop 实现将postgresql表导入hive表

要使用Sqoop将PostgreSQL表导入Hive表,首先需要确保已经安装了Sqoop和相关的依赖库。然后按照以下步骤操作:,,1. 编写一个名为sqoop_import.sh的脚本文件,内容如下:,,“bash,#!/bin/bash,sqoop import \n--connect jdbc:postgresql://:/ \n--username \n--password \n--table \n--hive-import \n--hive-table \n--m 1,`,,注意替换为实际的值。,,2. 为脚本文件添加可执行权限:,,`bash,chmod +x sqoop_import.sh,`,,3. 运行脚本文件:,,`bash,./sqoop_import.sh,“,,这样,PostgreSQL表中的数据就会被导入到Hive表中。

Sqoop是一个用于在Hadoop和结构化数据存储(如关系数据库)之间进行大量数据传输的工具,它可以将关系数据库中的数据导入到Hadoop的HDFS、Hive、HBase等分布式文件系统中,也可以将Hadoop中的数据导出到关系数据库中,本文将介绍如何使用Sqoop将PostgreSQL表导入到Hive表中。

环境准备

1、安装并配置好Hadoop、Hive、PostgreSQL和Sqoop。

2、在PostgreSQL中创建一个表,

CREATE TABLE test_postgresql (
    id INT PRIMARY KEY,
    name VARCHAR(50),
    age INT
);

3、在Hive中创建一个与PostgreSQL表结构相同的表:

CREATE EXTERNAL TABLE test_hive (
    id INT,
    name STRING,
    age INT
) STORED BY 'org.apache.hadoop.hive.jdbc.storage.postgresql.PostgresStorageHandler'
TBLPROPERTIES (
    'hive.database' = 'default',
    'hive.table' = 'test_hive',
    'hive.external.jdbc.driver' = 'org.postgresql.Driver',
    'hive.external.jdbc.url' = 'jdbc:postgresql://localhost:5432/test',
    'hive.external.jdbc.username' = 'postgres',
    'hive.external.jdbc.password' = 'password',
    'hive.exec.dynamic.partition.mode' = 'nonstrict',
    'hive.compactor.initiator.on' = 'true',
    'hive.compactor.worker.threads' = '1',
    'hive.compactor.worker.checkinterval' = '600',
    'hive.compactor.worker.iothreads' = '1',
    'hive.compactor.heapsize' = '1073741824',
    'hive.compactor.logcleaner.maxbackups' = '100',
    'hive.compactor.logcleaner.minbackupstokeep' = '10',
    'hive.compactor.logcleaner.retaindeletes' = 'false',
    'hive.compactor.logcleaner.ttl' = '7200',
    'hive.compactor.compaction.enabled' = 'true',
    'hive.compactor.compaction.deltatargetsize' = '134217728',
    'hive.compactor.compaction.maxnumdeltapartitions' = '1000000',
    'hive.compactor.compaction.minnumdeltapartitions' = '100000',
    'hive.compactor.compaction.numthreads' = '1',
    'hive.compactor.compaction.queuesize' = '1000000',
    'hive.compactor.compaction.smallfilesthresholdmb' = '16',
    'hive.compactor.compaction.largefilesthresholdmb' = '134217728',
    'hive.compactor.compaction.initialnumofworkers' = '2',
    'hive.compactor.compaction.maxnumofworkers' = '20',
    'hive.compactor.compaction.policyclassname' = '',
    'hive.compactor.compactionschedulerclassname' = '',
    'hive.compactor.compactiontaskmanagerclassname' = '',
    'hive.compactor.compactiontaskexecutorclassname' = '',
    'hive.compactor.compactiontaskexecutorparams' = '',
    'hive.compactor.logcleanerclassname' => '',
    'hive.compactor.logcleanerschedulerclassname' => '',
    'hive.compactor.logcleanertaskmanagerclassname' => '',
    'hive.compactor.logcleanertaskexecutorclassname' => '',
    'hive.compactor.logcleanertaskexecutorparams' => '',
    'mapreduce.jobtrackeraddresses' => '',
    'mapreduce.frameworkjarsdirs': '',
    'mapreduce.jobhistoryserveraddress': '',
    'mapreduce.jobhistoryserverport': '',
    'mapreduce.jobhistoryserverwebappurl': '',
    'mapreduceclientsubmitterpluginclass': '',
    'mapreduceclientsubmitterpluginpath': '',
    'mapreduceclientjobsubmitterpluginclass': '',
    'mapreduceclientjobsubmitterpluginpath': '',
    'mapreduceclientjobcompleterpluginclass': '',
    'mapreduceclientjobcompleterpluginpath': '',
    'mapreduceclientsideinputpluginclass': '',
    'mapreduceclientsideinputpluginpath': '',
    'mapreduceclientsideoutputpluginclass': '',
    'mapreduceclientsideoutputpluginpath': '',
    'mapreduceclientshufflepluginclass': '',
    'mapreduceclientshufflepluginpath': '',
    'mapreduceclientsortpluginclass': '',
    'mapreduceclientsortpluginpath': '',
    'mapreduceclientaggregatorpluginclass': '',
    'mapreduceclientaggregatorpluginpath': '',
);

使用Sqoop将PostgreSQL表导入到Hive表中

1、使用以下命令将PostgreSQL表中的数据导入到Hive表中:

sqoop import 
connect jdbc:postgresql://localhost:5432/test 
username postgres 
password password 
table test_postgresql 
columns "id","name","age" 
targetdir /user/hadoop/test_hive 
astextfile 
nullstring '\N'; 
nullif 'N'; 
linesperrecord 1 
m 1 
direct 
mappers 1 
fields terminated by '\t'; 
batch 
numFiles 20 
outdir /user/hadoop/test_import; 

2、执行完上述命令后,Sqoop会将PostgreSQL表中的数据导入到Hive表中,可以使用以下命令查看Hive表中的数据:

beeline u "jdbc:hive2://localhost:10000/default" e "select * from test_hive;" outputformat=tsv2; 

相关问题与解答

赞(0) 打赏
未经允许不得转载:九八云安全 » sqoop 实现将postgresql表导入hive表

评论 抢沙发