CN109271365A - A method of based on Spark memory techniques to HBase database acceleration reading/writing - Google Patents

A method of based on Spark memory techniques to HBase database acceleration reading/writing Download PDF

Info

Publication number
CN109271365A
CN109271365A CN201811093336.1A CN201811093336A CN109271365A CN 109271365 A CN109271365 A CN 109271365A CN 201811093336 A CN201811093336 A CN 201811093336A CN 109271365 A CN109271365 A CN 109271365A
Authority
CN
China
Prior art keywords
hbase
spark
writing
memory
acceleration reading
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811093336.1A
Other languages
Chinese (zh)
Inventor
王文文
路国隋
梁志勇
牛硕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Software Co Ltd
Original Assignee
Inspur Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Software Co Ltd filed Critical Inspur Software Co Ltd
Priority to CN201811093336.1A priority Critical patent/CN109271365A/en
Publication of CN109271365A publication Critical patent/CN109271365A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of based on Spark memory techniques to the method for HBase database acceleration reading/writing, Hbase database is improved, the computing unit of data is gone to Spark memory by Hbase to calculate, the efficient storage of data uses Hbase on HDFS framework, it calls corresponding API increased, deleted, modified, search operation, realizes extensive columnar database in the real-time query requirement of high concurrent, low delay scene using the advantage that memory calculates.It is of the invention it is a kind of based on Spark memory techniques to the method for HBase database acceleration reading/writing compared to the prior art, traditional Hbase database is transformed using memory calculating, there is significantly performance boost to the storage of mass data, inquiry, big data cluster calculated performance can not only be effectively improved, shorten the research and development of products period, by the high IO characteristic of memory can also high concurrent ability to cluster and stability effectively promoted.

Description

A method of based on Spark memory techniques to HBase database acceleration reading/writing
Technical field
The present invention relates to big data computing technique field, specifically a kind of Spark memory techniques that are based on are to HBase number According to the method for library acceleration reading/writing.
Background technique
In big data era, hundreds of millions of mass datas is calculated per can daily store.Outstanding column storage assembly HBase has preferable storage, calculating, stability and parallel extended capability, and the range used is more and more wider, but still faces one A little problems.Such as, high concurrent read-write undermixing, whole scan performance be high, complicated grammer is supported not enough etc..Memory computation module Spark has very big advantage as a high performance memory Computational frame, in real-time calculating field.
Summary of the invention
Technical assignment of the invention is to provide a kind of method based on Spark memory techniques to HBase database acceleration reading/writing.
Technical assignment of the invention is realized in the following manner:
A method of based on Spark memory techniques to HBase database acceleration reading/writing, Hbase database is improved, it will The computing unit of data goes to Spark memory by Hbase and calculates, and the efficient storage of data uses Hbase on HDFS framework, adjusts Increased with corresponding API, deleted, modified, search operation, realizes extensive columnar database using the advantage that memory calculates In the real-time query requirement of high concurrent, low delay scene.
The corresponding API of the calling increased, deleted, being modified, search operation, including, using Scala or Java language Speech calls corresponding API to be operated.
Specific step is as follows for this method operation:
Step 1) configures the corresponding Jar APMB package of Hbase and relies on, it is ensured that Hbase method can normally be obtained by Spark;
Step 2 using HBase Shell create table, create ' access_log ', ' info';
Step 3) starts Spark Shell, executes bin/spark-shell-master yarn-deploy-mode client –num-executors 5 –executor-memory 500m –executor-cores 2;
The modification of step 4) configuration file, Spark application require connect to zookeeper cluster, then access by zookeeper Hbase cluster;
Hbase is written in step 5) Bulk load data, and the access_log.log file of 45M size is imported into Hbase In, the used time 7 seconds.
In the step 1), the corresponding Jar APMB package of configuration Hbase is relied on, and is based on Spark1.0 and the above version.
In the step 4), Hbase cluster is accessed by zookeeper, comprising: add hbase-site.xml file Enter classpath.
In the step 4), Hbase cluster is accessed by zookeeper, comprising: in HBaseConfiguration reality It is arranged in example.
It is of the invention it is a kind of based on Spark memory techniques to the method and prior art phase of HBase database acceleration reading/writing Than being transformed using memory calculating to traditional Hbase database, having significantly performance to the storage of mass data, inquiry It is promoted, using the mode of Spark on Hbase, big data cluster calculated performance can not only be effectively improved, shorten research and development of products Period, by the high IO characteristic of memory can also high concurrent ability to cluster and stability effectively promoted.
Specific embodiment
Embodiment 1:
A method of based on Spark memory techniques to HBase database acceleration reading/writing, Hbase database is improved, it will The computing unit of data goes to Spark memory by Hbase and calculates, and the efficient storage of data uses Hbase on HDFS framework, adopts It calls corresponding API to be increased with Scala or Java language, deletes, modifies, search operation, the advantage calculated using memory Realize extensive columnar database in the real-time query requirement of high concurrent, low delay scene.
Specific step is as follows for this method operation:
Step 1) is based on Spark1.0 and the above version, and the corresponding Jar APMB package of configuration Hbase relies on, it is ensured that Hbase method can quilt Spark is normally obtained;Such as: spark-defaults.conf adds configuration item spark.driver.extraClassPath / opt/hadoop/data/lib/*, wherein the jar packet that/path opt/hadoop/data/lib/ decentralization Hbase is relied on;
Step 2 using HBase Shell create table, create ' access_log ', ' info';
Step 3) starts Spark Shell, executes bin/spark-shell-master yarn-deploy-mode client –num-executors 5 –executor-memory 500m –executor-cores 2;
The modification of step 4) configuration file, Spark application require connect to zookeeper cluster, then access by zookeeper Hbase cluster;There are two types of implementation methods: one is classpath is added in hbase-site.xml file;Another kind be It is arranged in HBaseConfiguration example;
Such as: val conf
=HBaseConfiguration.create();conf.set("hbase.zookeeper.quorum","slave1, slave2,slave3");conf.set("hbase.zookeeper.property.clientPort", "2181"); conf.set("zookeeper.znode.parent", "/hbase")
Hbase is written in step 5) Bulk load data, and the access_log.log file of 45M size is imported into Hbase In, the used time 7 seconds.The promotion of 60 speeds was shown using 7 minutes compared to traditional intercalation model.
Such as: val conf
=HBaseConfiguration.create();conf.set("hbase.zookeeper.quorum","slave1, slave2,slave3");conf.set("hbase.zookeeper.property.clientPort", "2181"); conf.set("zookeeper.znode.parent", "/hbase");conf.set(TableInputFormat.INPUT_ TABLE,"access_log");val table = new HTable(conf,"access_log") ; lazy val job = Job.getInstance(conf);job.setMapOutputKeyClass(classOf [ImmutableBytesWritable]);job.setMapOutputValueClass(classOf[KeyValue]);HFil eOutputFormat.configureIncrementalLoad(job,table);val rdd = sc.textFile("/ opt/hadoop/data/access.log").zipWithIndex().map(x=>{val kv:KeyValue = newKeyValue(Bytes.toBytes(x._2+1), "info".getBytes(), "value".getBytes(), x._1.getBytes() )
(new ImmutableBytesWritable(Bytes.toBytes(x._2+1)), kv)}); rdd.saveAsNewAPIHadoopFile("/tmp/data1",classOf[ImmutableBytesWritable], classOf[KeyValue],classOf[HFileOutputFormat],conf);val bulkLoader = newLoadIncrementalHFiles(conf);bulkLoader.doBulkLoad(new Path("/tmp/data1"), table)。
The technical personnel in the technical field can readily realize the present invention with the above specific embodiments,.But it answers Work as understanding, the present invention is not limited to above-mentioned several specific embodiments.On the basis of the disclosed embodiments, the technology The technical staff in field can arbitrarily combine different technical features, to realize different technical solutions.

Claims (6)

1. it is a kind of based on Spark memory techniques to the method for HBase database acceleration reading/writing, which is characterized in that Hbase data Library improves, and the computing unit of data is gone to Spark memory by Hbase and is calculated, and the efficient storage of data uses Hbase On HDFS framework calls corresponding API increased, deleted, modified, search operation, is realized using the advantage that memory calculates big Real-time query requirement of the scale columnar database in high concurrent, low delay scene.
2. it is according to claim 1 based on Spark memory techniques to the method for HBase database acceleration reading/writing, feature It is, the corresponding API of the calling increased, deleted, being modified, search operation, including, using Scala or Java language Corresponding API is called to be operated.
3. it is according to claim 1 based on Spark memory techniques to the method for HBase database acceleration reading/writing, feature It is, specific step is as follows for this method operation:
Step 1) configures the corresponding Jar APMB package of Hbase and relies on, it is ensured that Hbase method can normally be obtained by Spark;
Step 2 using HBase Shell create table, create ' access_log ', ' info';
Step 3) starts Spark Shell, executes bin/spark-shell-master yarn-deploy-mode client –num-executors 5 –executor-memory 500m –executor-cores 2;
The modification of step 4) configuration file, Spark application require connect to zookeeper cluster, then access by zookeeper Hbase cluster;
Hbase is written in step 5) Bulk load data, and the access_log.log file of 45M size is imported into Hbase In, the used time 7 seconds.
4. it is according to claim 3 based on Spark memory techniques to the method for HBase database acceleration reading/writing, feature It is, it is based on Spark1.0 and the above version that in the step 1), the corresponding Jar APMB package of configuration Hbase, which is relied on,.
5. it is according to claim 3 based on Spark memory techniques to the method for HBase database acceleration reading/writing, feature It is, in the step 4), accesses Hbase cluster by zookeeper, comprising: hbase-site.xml file is added classpath。
6. it is according to claim 3 based on Spark memory techniques to the method for HBase database acceleration reading/writing, feature It is, in the step 4), accesses Hbase cluster by zookeeper, comprising: in HBaseConfiguration example Middle setting.
CN201811093336.1A 2018-09-19 2018-09-19 A method of based on Spark memory techniques to HBase database acceleration reading/writing Pending CN109271365A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811093336.1A CN109271365A (en) 2018-09-19 2018-09-19 A method of based on Spark memory techniques to HBase database acceleration reading/writing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811093336.1A CN109271365A (en) 2018-09-19 2018-09-19 A method of based on Spark memory techniques to HBase database acceleration reading/writing

Publications (1)

Publication Number Publication Date
CN109271365A true CN109271365A (en) 2019-01-25

Family

ID=65198063

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811093336.1A Pending CN109271365A (en) 2018-09-19 2018-09-19 A method of based on Spark memory techniques to HBase database acceleration reading/writing

Country Status (1)

Country Link
CN (1) CN109271365A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109977160A (en) * 2019-03-28 2019-07-05 上海中通吉网络技术有限公司 Data manipulation method, device, equipment and storage medium
CN110287172A (en) * 2019-07-01 2019-09-27 四川新网银行股份有限公司 A method of formatting HBase data
CN111159112A (en) * 2019-12-20 2020-05-15 新华三大数据技术有限公司 Data processing method and system
CN112100182A (en) * 2020-09-27 2020-12-18 中国建设银行股份有限公司 Data warehousing processing method and device and server

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103617211A (en) * 2013-11-20 2014-03-05 浪潮电子信息产业股份有限公司 HBase loaded data importing method
CN104021194A (en) * 2014-06-13 2014-09-03 浪潮(北京)电子信息产业有限公司 Mixed type processing system and method oriented to industry big data diversity application
US20180196867A1 (en) * 2017-01-09 2018-07-12 Alexander WIESMAIER System, method and computer program product for analytics assignment
CN108491277A (en) * 2017-12-28 2018-09-04 华南师范大学 A kind of real-time hot spot collaborative filtering of students in middle and primary schools' education resource and the method for recommendation

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103617211A (en) * 2013-11-20 2014-03-05 浪潮电子信息产业股份有限公司 HBase loaded data importing method
CN104021194A (en) * 2014-06-13 2014-09-03 浪潮(北京)电子信息产业有限公司 Mixed type processing system and method oriented to industry big data diversity application
US20180196867A1 (en) * 2017-01-09 2018-07-12 Alexander WIESMAIER System, method and computer program product for analytics assignment
CN108491277A (en) * 2017-12-28 2018-09-04 华南师范大学 A kind of real-time hot spot collaborative filtering of students in middle and primary schools' education resource and the method for recommendation

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
FAYSON: "使用Spark通过BulkLoad快速导入数据到HBase", 《HADOOP实操》 *
胡沛,韩璞: "《大数据技术及应用探究》", 31 August 2018 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109977160A (en) * 2019-03-28 2019-07-05 上海中通吉网络技术有限公司 Data manipulation method, device, equipment and storage medium
CN110287172A (en) * 2019-07-01 2019-09-27 四川新网银行股份有限公司 A method of formatting HBase data
CN110287172B (en) * 2019-07-01 2023-05-02 四川新网银行股份有限公司 Method for formatting HBase data
CN111159112A (en) * 2019-12-20 2020-05-15 新华三大数据技术有限公司 Data processing method and system
CN112100182A (en) * 2020-09-27 2020-12-18 中国建设银行股份有限公司 Data warehousing processing method and device and server

Similar Documents

Publication Publication Date Title
CN109271365A (en) A method of based on Spark memory techniques to HBase database acceleration reading/writing
US8868576B1 (en) Storing files in a parallel computing system based on user-specified parser function
CN106156278B (en) Database data reading and writing method and device
WO2020029844A1 (en) Blockchain node and transaction method
KR20170019352A (en) Data query method and apparatus
US10949386B2 (en) Apparatus and method for accessing data from a database as a file
Dede et al. Processing Cassandra datasets with Hadoop-streaming based approaches
CN105095247B (en) symbol data analysis method and system
CA2846417A1 (en) Shared cache used to provide zero copy memory mapped database
CN106970929A (en) Data lead-in method and device
CN106919697B (en) Method for simultaneously importing data into multiple Hadoop assemblies
CN103699656A (en) GPU-based mass-multimedia-data-oriented MapReduce platform
US9767107B1 (en) Parallel file system with metadata distributed across partitioned key-value store
CN105573967A (en) Multi-format file online browsing method and system
Xu et al. ZQL: a unified middleware bridging both relational and NoSQL databases
CN115237599A (en) Rendering task processing method and device
US20170228423A1 (en) Declarative partitioning for data collection queries
CN108319604B (en) Optimization method for association of large and small tables in hive
CN105677579B (en) Data access method in caching system and system
US11030177B1 (en) Selectively scanning portions of a multidimensional index for processing queries
CN109033295A (en) The merging method and device of super large data set
CN107562943B (en) Data calculation method and system
US20210149960A1 (en) Graph Data Storage Method, System and Electronic Device
Sun et al. Co-kv: A collaborative key-value store using near-data processing to improve compaction for the lsm-tree
Kiraz et al. Iot data storage: Relational & non-relational database management systems performance comparison

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190125

RJ01 Rejection of invention patent application after publication