CN109271365A

CN109271365A - A method of based on Spark memory techniques to HBase database acceleration reading/writing

Info

Publication number: CN109271365A
Application number: CN201811093336.1A
Authority: CN
Inventors: 王文文; 路国隋; 梁志勇; 牛硕
Original assignee: Inspur Software Co Ltd
Current assignee: Inspur Software Co Ltd
Priority date: 2018-09-19
Filing date: 2018-09-19
Publication date: 2019-01-25

Abstract

The invention discloses a kind of based on Spark memory techniques to the method for HBase database acceleration reading/writing, Hbase database is improved, the computing unit of data is gone to Spark memory by Hbase to calculate, the efficient storage of data uses Hbase on HDFS framework, it calls corresponding API increased, deleted, modified, search operation, realizes extensive columnar database in the real-time query requirement of high concurrent, low delay scene using the advantage that memory calculates.It is of the invention it is a kind of based on Spark memory techniques to the method for HBase database acceleration reading/writing compared to the prior art, traditional Hbase database is transformed using memory calculating, there is significantly performance boost to the storage of mass data, inquiry, big data cluster calculated performance can not only be effectively improved, shorten the research and development of products period, by the high IO characteristic of memory can also high concurrent ability to cluster and stability effectively promoted.

Description

A method of based on Spark memory techniques to HBase database acceleration reading/writing

Technical field

The present invention relates to big data computing technique field, specifically a kind of Spark memory techniques that are based on are to HBase number According to the method for library acceleration reading/writing.

Background technique

In big data era, hundreds of millions of mass datas is calculated per can daily store.Outstanding column storage assembly HBase has preferable storage, calculating, stability and parallel extended capability, and the range used is more and more wider, but still faces one A little problems.Such as, high concurrent read-write undermixing, whole scan performance be high, complicated grammer is supported not enough etc..Memory computation module Spark has very big advantage as a high performance memory Computational frame, in real-time calculating field.

Summary of the invention

Technical assignment of the invention is to provide a kind of method based on Spark memory techniques to HBase database acceleration reading/writing.

Technical assignment of the invention is realized in the following manner:

A method of based on Spark memory techniques to HBase database acceleration reading/writing, Hbase database is improved, it will The computing unit of data goes to Spark memory by Hbase and calculates, and the efficient storage of data uses Hbase on HDFS framework, adjusts Increased with corresponding API, deleted, modified, search operation, realizes extensive columnar database using the advantage that memory calculates In the real-time query requirement of high concurrent, low delay scene.

The corresponding API of the calling increased, deleted, being modified, search operation, including, using Scala or Java language Speech calls corresponding API to be operated.

Specific step is as follows for this method operation:

Step 1) configures the corresponding Jar APMB package of Hbase and relies on, it is ensured that Hbase method can normally be obtained by Spark；

Step 2 using HBase Shell create table, create ' access_log ', ' info'；

Step 3) starts Spark Shell, executes bin/spark-shell-master yarn-deploy-mode client –num-executors 5 –executor-memory 500m –executor-cores 2；

The modification of step 4) configuration file, Spark application require connect to zookeeper cluster, then access by zookeeper Hbase cluster；

Hbase is written in step 5) Bulk load data, and the access_log.log file of 45M size is imported into Hbase In, the used time 7 seconds.

In the step 1), the corresponding Jar APMB package of configuration Hbase is relied on, and is based on Spark1.0 and the above version.

In the step 4), Hbase cluster is accessed by zookeeper, comprising: add hbase-site.xml file Enter classpath.

In the step 4), Hbase cluster is accessed by zookeeper, comprising: in HBaseConfiguration reality It is arranged in example.

It is of the invention it is a kind of based on Spark memory techniques to the method and prior art phase of HBase database acceleration reading/writing Than being transformed using memory calculating to traditional Hbase database, having significantly performance to the storage of mass data, inquiry It is promoted, using the mode of Spark on Hbase, big data cluster calculated performance can not only be effectively improved, shorten research and development of products Period, by the high IO characteristic of memory can also high concurrent ability to cluster and stability effectively promoted.

Specific embodiment

Embodiment 1:

A method of based on Spark memory techniques to HBase database acceleration reading/writing, Hbase database is improved, it will The computing unit of data goes to Spark memory by Hbase and calculates, and the efficient storage of data uses Hbase on HDFS framework, adopts It calls corresponding API to be increased with Scala or Java language, deletes, modifies, search operation, the advantage calculated using memory Realize extensive columnar database in the real-time query requirement of high concurrent, low delay scene.

Specific step is as follows for this method operation:

Step 1) is based on Spark1.0 and the above version, and the corresponding Jar APMB package of configuration Hbase relies on, it is ensured that Hbase method can quilt Spark is normally obtained；Such as: spark-defaults.conf adds configuration item spark.driver.extraClassPath / opt/hadoop/data/lib/*, wherein the jar packet that/path opt/hadoop/data/lib/ decentralization Hbase is relied on；

Step 2 using HBase Shell create table, create ' access_log ', ' info'；

The modification of step 4) configuration file, Spark application require connect to zookeeper cluster, then access by zookeeper Hbase cluster；There are two types of implementation methods: one is classpath is added in hbase-site.xml file；Another kind be It is arranged in HBaseConfiguration example；

Such as: val conf

=HBaseConfiguration.create();conf.set("hbase.zookeeper.quorum","slave1, slave2,slave3");conf.set("hbase.zookeeper.property.clientPort", "2181"); conf.set("zookeeper.znode.parent", "/hbase")

Hbase is written in step 5) Bulk load data, and the access_log.log file of 45M size is imported into Hbase In, the used time 7 seconds.The promotion of 60 speeds was shown using 7 minutes compared to traditional intercalation model.

Such as: val conf

=HBaseConfiguration.create();conf.set("hbase.zookeeper.quorum","slave1, slave2,slave3");conf.set("hbase.zookeeper.property.clientPort", "2181"); conf.set("zookeeper.znode.parent", "/hbase");conf.set(TableInputFormat.INPUT_ TABLE,"access_log");val table = new HTable(conf,"access_log") ; lazy val job = Job.getInstance(conf);job.setMapOutputKeyClass(classOf [ImmutableBytesWritable]);job.setMapOutputValueClass(classOf[KeyValue]);HFil eOutputFormat.configureIncrementalLoad(job,table);val rdd = sc.textFile("/ opt/hadoop/data/access.log").zipWithIndex().map(x=>{val kv:KeyValue = newKeyValue(Bytes.toBytes(x._2+1), "info".getBytes(), "value".getBytes(), x._1.getBytes() )

(new ImmutableBytesWritable(Bytes.toBytes(x._2+1)), kv)}); rdd.saveAsNewAPIHadoopFile("/tmp/data1",classOf[ImmutableBytesWritable], classOf[KeyValue],classOf[HFileOutputFormat],conf);val bulkLoader = newLoadIncrementalHFiles(conf);bulkLoader.doBulkLoad(new Path("/tmp/data1"), table)。

The technical personnel in the technical field can readily realize the present invention with the above specific embodiments,.But it answers Work as understanding, the present invention is not limited to above-mentioned several specific embodiments.On the basis of the disclosed embodiments, the technology The technical staff in field can arbitrarily combine different technical features, to realize different technical solutions.

Claims

1. it is a kind of based on Spark memory techniques to the method for HBase database acceleration reading/writing, which is characterized in that Hbase data Library improves, and the computing unit of data is gone to Spark memory by Hbase and is calculated, and the efficient storage of data uses Hbase On HDFS framework calls corresponding API increased, deleted, modified, search operation, is realized using the advantage that memory calculates big Real-time query requirement of the scale columnar database in high concurrent, low delay scene.

2. it is according to claim 1 based on Spark memory techniques to the method for HBase database acceleration reading/writing, feature It is, the corresponding API of the calling increased, deleted, being modified, search operation, including, using Scala or Java language Corresponding API is called to be operated.

3. it is according to claim 1 based on Spark memory techniques to the method for HBase database acceleration reading/writing, feature It is, specific step is as follows for this method operation:

Step 2 using HBase Shell create table, create ' access_log ', ' info'；

4. it is according to claim 3 based on Spark memory techniques to the method for HBase database acceleration reading/writing, feature It is, it is based on Spark1.0 and the above version that in the step 1), the corresponding Jar APMB package of configuration Hbase, which is relied on,.

5. it is according to claim 3 based on Spark memory techniques to the method for HBase database acceleration reading/writing, feature It is, in the step 4), accesses Hbase cluster by zookeeper, comprising: hbase-site.xml file is added classpath。

6. it is according to claim 3 based on Spark memory techniques to the method for HBase database acceleration reading/writing, feature It is, in the step 4), accesses Hbase cluster by zookeeper, comprising: in HBaseConfiguration example Middle setting.