Mapredure迁移数据到Hbase
目标
- 将txt数据通过mapredure写到hbase中
- 将sqlserver数据写入hive表中,从hive表中写入hbase
- 将sqlserver数据写入hbase
实现
1.第一个目标
一、上传原数据文件
将data.txt文件上传到到hdfs上,内容如下:
1 | key1 col1 value1 |
数据以制表符(\t)分割。
二、将数据写成HFile
通过mapredure将data.txt按hbase表格式写成hfile
pom.xml文件中依赖如下
1 | <!-- hadoop --> |
编写BulkLoadMapper
1 | public class BulkLoadMapper extends Mapper<LongWritable, Text, ImmutableBytesWritable, Put> { |
编写BulkLoadDriver
1 | public class BulkLoadDriver extends Configured implements Tool { |
整个项目需要将hbase-site.xml、yarn-site.xml、mapred-site.xml放入resources下。本地运行出错的话,再加入org.apache.hadoop.io.nativeio.NativeIO到当前工程中
三、数据加载
- 命令方式
首先修改hadoop-env.sh配置,加入以下:
1 | export HBASE_HOME=/data/bigdata/hbase-1.0.0-cdh5.4.0 |
将数据按HFile写入到hdfs中,然后进入$HBASE_HOME/bin中执行以下命令
1 | /data/bigdata/hadoop-2.6.0-cdh5.4.0/bin/hadoop jar ../lib/hbase-server-1.0.0-cdh5.4.0.jar completebulkload /user/truman/hfile truman |
- java方式四、结果查询
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16public class HFileLoader {
public static void doBulkLoad(String pathToHFile, String tableName,Configuration configuration){
try {
HBaseConfiguration.addHbaseResources(configuration);
LoadIncrementalHFiles loadFfiles = new LoadIncrementalHFiles(configuration);
HTable hTable = new HTable(configuration, tableName);//指定表名
loadFfiles.doBulkLoad(new Path(pathToHFile), hTable);//导入数据
System.out.println("Bulk Load Completed..");
} catch(Exception exception) {
exception.printStackTrace();
}
}
}1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17[root@LAB3 bin]# ./hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.0.0-cdh5.4.0, rUnknown, Tue Apr 21 12:19:34 PDT 2015
hbase(main):001:0> scan 'truman'
ROW COLUMN+CELL
key1 column=personal:words, timestamp=1476179060738, value=col1
key1 column=professional:sum, timestamp=1476179060738, value=value1
key2 column=personal:words, timestamp=1476179060738, value=col2
key2 column=professional:sum, timestamp=1476179060738, value=value2
key3 column=personal:words, timestamp=1476179060738, value=col3
key3 column=professional:sum, timestamp=1476179060738, value=value3
key4 column=personal:words, timestamp=1476179060738, value=col4
key4 column=professional:sum, timestamp=1476179060738, value=value4
4 row(s) in 0.4300 seconds