CN109271365A - A method of based on Spark memory techniques to HBase database acceleration reading/writing - Google Patents
A method of based on Spark memory techniques to HBase database acceleration reading/writing Download PDFInfo
- Publication number
- CN109271365A CN109271365A CN201811093336.1A CN201811093336A CN109271365A CN 109271365 A CN109271365 A CN 109271365A CN 201811093336 A CN201811093336 A CN 201811093336A CN 109271365 A CN109271365 A CN 109271365A
- Authority
- CN
- China
- Prior art keywords
- hbase
- spark
- writing
- memory
- acceleration reading
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of based on Spark memory techniques to the method for HBase database acceleration reading/writing, Hbase database is improved, the computing unit of data is gone to Spark memory by Hbase to calculate, the efficient storage of data uses Hbase on HDFS framework, it calls corresponding API increased, deleted, modified, search operation, realizes extensive columnar database in the real-time query requirement of high concurrent, low delay scene using the advantage that memory calculates.It is of the invention it is a kind of based on Spark memory techniques to the method for HBase database acceleration reading/writing compared to the prior art, traditional Hbase database is transformed using memory calculating, there is significantly performance boost to the storage of mass data, inquiry, big data cluster calculated performance can not only be effectively improved, shorten the research and development of products period, by the high IO characteristic of memory can also high concurrent ability to cluster and stability effectively promoted.
Description
Technical field
The present invention relates to big data computing technique field, specifically a kind of Spark memory techniques that are based on are to HBase number
According to the method for library acceleration reading/writing.
Background technique
In big data era, hundreds of millions of mass datas is calculated per can daily store.Outstanding column storage assembly
HBase has preferable storage, calculating, stability and parallel extended capability, and the range used is more and more wider, but still faces one
A little problems.Such as, high concurrent read-write undermixing, whole scan performance be high, complicated grammer is supported not enough etc..Memory computation module
Spark has very big advantage as a high performance memory Computational frame, in real-time calculating field.
Summary of the invention
Technical assignment of the invention is to provide a kind of method based on Spark memory techniques to HBase database acceleration reading/writing.
Technical assignment of the invention is realized in the following manner:
A method of based on Spark memory techniques to HBase database acceleration reading/writing, Hbase database is improved, it will
The computing unit of data goes to Spark memory by Hbase and calculates, and the efficient storage of data uses Hbase on HDFS framework, adjusts
Increased with corresponding API, deleted, modified, search operation, realizes extensive columnar database using the advantage that memory calculates
In the real-time query requirement of high concurrent, low delay scene.
The corresponding API of the calling increased, deleted, being modified, search operation, including, using Scala or Java language
Speech calls corresponding API to be operated.
Specific step is as follows for this method operation:
Step 1) configures the corresponding Jar APMB package of Hbase and relies on, it is ensured that Hbase method can normally be obtained by Spark;
Step 2 using HBase Shell create table, create ' access_log ', ' info';
Step 3) starts Spark Shell, executes bin/spark-shell-master yarn-deploy-mode
client –num-executors 5 –executor-memory 500m –executor-cores 2;
The modification of step 4) configuration file, Spark application require connect to zookeeper cluster, then access by zookeeper
Hbase cluster;
Hbase is written in step 5) Bulk load data, and the access_log.log file of 45M size is imported into Hbase
In, the used time 7 seconds.
In the step 1), the corresponding Jar APMB package of configuration Hbase is relied on, and is based on Spark1.0 and the above version.
In the step 4), Hbase cluster is accessed by zookeeper, comprising: add hbase-site.xml file
Enter classpath.
In the step 4), Hbase cluster is accessed by zookeeper, comprising: in HBaseConfiguration reality
It is arranged in example.
It is of the invention it is a kind of based on Spark memory techniques to the method and prior art phase of HBase database acceleration reading/writing
Than being transformed using memory calculating to traditional Hbase database, having significantly performance to the storage of mass data, inquiry
It is promoted, using the mode of Spark on Hbase, big data cluster calculated performance can not only be effectively improved, shorten research and development of products
Period, by the high IO characteristic of memory can also high concurrent ability to cluster and stability effectively promoted.
Specific embodiment
Embodiment 1:
A method of based on Spark memory techniques to HBase database acceleration reading/writing, Hbase database is improved, it will
The computing unit of data goes to Spark memory by Hbase and calculates, and the efficient storage of data uses Hbase on HDFS framework, adopts
It calls corresponding API to be increased with Scala or Java language, deletes, modifies, search operation, the advantage calculated using memory
Realize extensive columnar database in the real-time query requirement of high concurrent, low delay scene.
Specific step is as follows for this method operation:
Step 1) is based on Spark1.0 and the above version, and the corresponding Jar APMB package of configuration Hbase relies on, it is ensured that Hbase method can quilt
Spark is normally obtained;Such as: spark-defaults.conf adds configuration item spark.driver.extraClassPath
/ opt/hadoop/data/lib/*, wherein the jar packet that/path opt/hadoop/data/lib/ decentralization Hbase is relied on;
Step 2 using HBase Shell create table, create ' access_log ', ' info';
Step 3) starts Spark Shell, executes bin/spark-shell-master yarn-deploy-mode
client –num-executors 5 –executor-memory 500m –executor-cores 2;
The modification of step 4) configuration file, Spark application require connect to zookeeper cluster, then access by zookeeper
Hbase cluster;There are two types of implementation methods: one is classpath is added in hbase-site.xml file;Another kind be
It is arranged in HBaseConfiguration example;
Such as: val conf
=HBaseConfiguration.create();conf.set("hbase.zookeeper.quorum","slave1,
slave2,slave3");conf.set("hbase.zookeeper.property.clientPort", "2181");
conf.set("zookeeper.znode.parent", "/hbase")
Hbase is written in step 5) Bulk load data, and the access_log.log file of 45M size is imported into Hbase
In, the used time 7 seconds.The promotion of 60 speeds was shown using 7 minutes compared to traditional intercalation model.
Such as: val conf
=HBaseConfiguration.create();conf.set("hbase.zookeeper.quorum","slave1,
slave2,slave3");conf.set("hbase.zookeeper.property.clientPort", "2181");
conf.set("zookeeper.znode.parent", "/hbase");conf.set(TableInputFormat.INPUT_
TABLE,"access_log");val table = new HTable(conf,"access_log") ; lazy val job
= Job.getInstance(conf);job.setMapOutputKeyClass(classOf
[ImmutableBytesWritable]);job.setMapOutputValueClass(classOf[KeyValue]);HFil
eOutputFormat.configureIncrementalLoad(job,table);val rdd = sc.textFile("/
opt/hadoop/data/access.log").zipWithIndex().map(x=>{val kv:KeyValue =
newKeyValue(Bytes.toBytes(x._2+1), "info".getBytes(), "value".getBytes(),
x._1.getBytes() )
(new ImmutableBytesWritable(Bytes.toBytes(x._2+1)), kv)});
rdd.saveAsNewAPIHadoopFile("/tmp/data1",classOf[ImmutableBytesWritable],
classOf[KeyValue],classOf[HFileOutputFormat],conf);val bulkLoader =
newLoadIncrementalHFiles(conf);bulkLoader.doBulkLoad(new Path("/tmp/data1"),
table)。
The technical personnel in the technical field can readily realize the present invention with the above specific embodiments,.But it answers
Work as understanding, the present invention is not limited to above-mentioned several specific embodiments.On the basis of the disclosed embodiments, the technology
The technical staff in field can arbitrarily combine different technical features, to realize different technical solutions.
Claims (6)
1. it is a kind of based on Spark memory techniques to the method for HBase database acceleration reading/writing, which is characterized in that Hbase data
Library improves, and the computing unit of data is gone to Spark memory by Hbase and is calculated, and the efficient storage of data uses Hbase
On HDFS framework calls corresponding API increased, deleted, modified, search operation, is realized using the advantage that memory calculates big
Real-time query requirement of the scale columnar database in high concurrent, low delay scene.
2. it is according to claim 1 based on Spark memory techniques to the method for HBase database acceleration reading/writing, feature
It is, the corresponding API of the calling increased, deleted, being modified, search operation, including, using Scala or Java language
Corresponding API is called to be operated.
3. it is according to claim 1 based on Spark memory techniques to the method for HBase database acceleration reading/writing, feature
It is, specific step is as follows for this method operation:
Step 1) configures the corresponding Jar APMB package of Hbase and relies on, it is ensured that Hbase method can normally be obtained by Spark;
Step 2 using HBase Shell create table, create ' access_log ', ' info';
Step 3) starts Spark Shell, executes bin/spark-shell-master yarn-deploy-mode
client –num-executors 5 –executor-memory 500m –executor-cores 2;
The modification of step 4) configuration file, Spark application require connect to zookeeper cluster, then access by zookeeper
Hbase cluster;
Hbase is written in step 5) Bulk load data, and the access_log.log file of 45M size is imported into Hbase
In, the used time 7 seconds.
4. it is according to claim 3 based on Spark memory techniques to the method for HBase database acceleration reading/writing, feature
It is, it is based on Spark1.0 and the above version that in the step 1), the corresponding Jar APMB package of configuration Hbase, which is relied on,.
5. it is according to claim 3 based on Spark memory techniques to the method for HBase database acceleration reading/writing, feature
It is, in the step 4), accesses Hbase cluster by zookeeper, comprising: hbase-site.xml file is added
classpath。
6. it is according to claim 3 based on Spark memory techniques to the method for HBase database acceleration reading/writing, feature
It is, in the step 4), accesses Hbase cluster by zookeeper, comprising: in HBaseConfiguration example
Middle setting.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811093336.1A CN109271365A (en) | 2018-09-19 | 2018-09-19 | A method of based on Spark memory techniques to HBase database acceleration reading/writing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811093336.1A CN109271365A (en) | 2018-09-19 | 2018-09-19 | A method of based on Spark memory techniques to HBase database acceleration reading/writing |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109271365A true CN109271365A (en) | 2019-01-25 |
Family
ID=65198063
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811093336.1A Pending CN109271365A (en) | 2018-09-19 | 2018-09-19 | A method of based on Spark memory techniques to HBase database acceleration reading/writing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109271365A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109977160A (en) * | 2019-03-28 | 2019-07-05 | 上海中通吉网络技术有限公司 | Data manipulation method, device, equipment and storage medium |
CN110287172A (en) * | 2019-07-01 | 2019-09-27 | 四川新网银行股份有限公司 | A method of formatting HBase data |
CN111159112A (en) * | 2019-12-20 | 2020-05-15 | 新华三大数据技术有限公司 | Data processing method and system |
CN112100182A (en) * | 2020-09-27 | 2020-12-18 | 中国建设银行股份有限公司 | Data warehousing processing method and device and server |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103617211A (en) * | 2013-11-20 | 2014-03-05 | 浪潮电子信息产业股份有限公司 | HBase loaded data importing method |
CN104021194A (en) * | 2014-06-13 | 2014-09-03 | 浪潮(北京)电子信息产业有限公司 | Mixed type processing system and method oriented to industry big data diversity application |
US20180196867A1 (en) * | 2017-01-09 | 2018-07-12 | Alexander WIESMAIER | System, method and computer program product for analytics assignment |
CN108491277A (en) * | 2017-12-28 | 2018-09-04 | 华南师范大学 | A kind of real-time hot spot collaborative filtering of students in middle and primary schools' education resource and the method for recommendation |
-
2018
- 2018-09-19 CN CN201811093336.1A patent/CN109271365A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103617211A (en) * | 2013-11-20 | 2014-03-05 | 浪潮电子信息产业股份有限公司 | HBase loaded data importing method |
CN104021194A (en) * | 2014-06-13 | 2014-09-03 | 浪潮(北京)电子信息产业有限公司 | Mixed type processing system and method oriented to industry big data diversity application |
US20180196867A1 (en) * | 2017-01-09 | 2018-07-12 | Alexander WIESMAIER | System, method and computer program product for analytics assignment |
CN108491277A (en) * | 2017-12-28 | 2018-09-04 | 华南师范大学 | A kind of real-time hot spot collaborative filtering of students in middle and primary schools' education resource and the method for recommendation |
Non-Patent Citations (2)
Title |
---|
FAYSON: "使用Spark通过BulkLoad快速导入数据到HBase", 《HADOOP实操》 * |
胡沛,韩璞: "《大数据技术及应用探究》", 31 August 2018 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109977160A (en) * | 2019-03-28 | 2019-07-05 | 上海中通吉网络技术有限公司 | Data manipulation method, device, equipment and storage medium |
CN110287172A (en) * | 2019-07-01 | 2019-09-27 | 四川新网银行股份有限公司 | A method of formatting HBase data |
CN110287172B (en) * | 2019-07-01 | 2023-05-02 | 四川新网银行股份有限公司 | Method for formatting HBase data |
CN111159112A (en) * | 2019-12-20 | 2020-05-15 | 新华三大数据技术有限公司 | Data processing method and system |
CN112100182A (en) * | 2020-09-27 | 2020-12-18 | 中国建设银行股份有限公司 | Data warehousing processing method and device and server |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109271365A (en) | A method of based on Spark memory techniques to HBase database acceleration reading/writing | |
US8868576B1 (en) | Storing files in a parallel computing system based on user-specified parser function | |
CN106156278B (en) | Database data reading and writing method and device | |
WO2020029844A1 (en) | Blockchain node and transaction method | |
KR20170019352A (en) | Data query method and apparatus | |
US10949386B2 (en) | Apparatus and method for accessing data from a database as a file | |
Dede et al. | Processing Cassandra datasets with Hadoop-streaming based approaches | |
CN105095247B (en) | symbol data analysis method and system | |
CA2846417A1 (en) | Shared cache used to provide zero copy memory mapped database | |
CN106970929A (en) | Data lead-in method and device | |
CN106919697B (en) | Method for simultaneously importing data into multiple Hadoop assemblies | |
CN103699656A (en) | GPU-based mass-multimedia-data-oriented MapReduce platform | |
US9767107B1 (en) | Parallel file system with metadata distributed across partitioned key-value store | |
CN105573967A (en) | Multi-format file online browsing method and system | |
Xu et al. | ZQL: a unified middleware bridging both relational and NoSQL databases | |
CN115237599A (en) | Rendering task processing method and device | |
US20170228423A1 (en) | Declarative partitioning for data collection queries | |
CN108319604B (en) | Optimization method for association of large and small tables in hive | |
CN105677579B (en) | Data access method in caching system and system | |
US11030177B1 (en) | Selectively scanning portions of a multidimensional index for processing queries | |
CN109033295A (en) | The merging method and device of super large data set | |
CN107562943B (en) | Data calculation method and system | |
US20210149960A1 (en) | Graph Data Storage Method, System and Electronic Device | |
Sun et al. | Co-kv: A collaborative key-value store using near-data processing to improve compaction for the lsm-tree | |
Kiraz et al. | Iot data storage: Relational & non-relational database management systems performance comparison |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190125 |
|
RJ01 | Rejection of invention patent application after publication |