CN103646073A - Condition query optimizing method based on HBase table - Google Patents

Condition query optimizing method based on HBase table Download PDF

Info

Publication number
CN103646073A
CN103646073A CN201310667847.0A CN201310667847A CN103646073A CN 103646073 A CN103646073 A CN 103646073A CN 201310667847 A CN201310667847 A CN 201310667847A CN 103646073 A CN103646073 A CN 103646073A
Authority
CN
China
Prior art keywords
region
hbase
data
query
condition query
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310667847.0A
Other languages
Chinese (zh)
Inventor
郭美思
吴楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Electronic Information Industry Co Ltd
Original Assignee
Inspur Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Electronic Information Industry Co Ltd filed Critical Inspur Electronic Information Industry Co Ltd
Priority to CN201310667847.0A priority Critical patent/CN103646073A/en
Publication of CN103646073A publication Critical patent/CN103646073A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a condition query optimizing method based on an HBase table. According to the analyzing and processing capability and the parallel computing characteristics for data from a distributed computing frame, the condition query of the HBase table is based on a MapReduce computing frame, and then the query efficiency is improved. The optimizing method is achieved mainly through Region pre-allocation, RowKey designing and a MapReduce module. Compared with the prior art, the condition query optimizing method based on the HBase table has the advantages that the condition query efficiency is improved, an optimizing method is provided for the condition query, the condition query process of the method is simple, the parallel computation is allowed, and therefore the condition query optimizing method is efficient, high in practicality and easy to popularize.

Description

A kind of condition query optimization method based on HBase table
Technical field
The present invention relates to communication information technical field, specifically a kind of condition query optimization method based on HBase table.
Background technology
Along with the magnanimity sharp increase of data, large data become current focus already.In the required technology of large data, distributed file system, distributed data base etc. is all the technology that is applicable to large data.HBase be one distributed, towards row the database of increasing income.It is to utilize Hadoop HDFS as its document storage system.Along with HBase continues to improve in performance and stability, HBase becomes gradually in one of the standard in large data NoSQL field.HBase is adopted by a lot of major companies, as Facebook, and Twitter, Adobe, Cloudera, IBM etc.Therefore, the condition query optimization based on HBase table is very important.
Inquiry realization for HBase at present has two kinds of modes: a kind of is by specifying RowKey to obtain the Get method of a unique record; Another kind is the Scan method of a batch record of obtaining by the condition of appointment.What wherein realization condition query function was used is Scan method, and scan can improve speed (trading space for time) by setCaching and setBatch method; Also can carry out limited range according to setStartRow and setEndRow simultaneously.Scope is less, and performance is higher.By the design of RowKey cleverly, make to obtain in batches element in set of records ends and concentrate in together (should under same Region), can when traversing result, obtain good performance.
In distributed computing framework, parallel computation improves a lot in efficiency, and distributed file system HDFS can preserve a large amount of data, and extensibility is strong.For these reasons, a kind of condition query optimization method based on HBase form is proposed.This scheme adopts Region predistribution and RowKey reasonable in design that source data is imported to during HBase shows.And utilize MapReduce framework to realize the condition query of HBase table, improve search efficiency, and by corresponding configuration parameter in cluster is set, reach the effect of optimization.
Summary of the invention
Technical assignment of the present invention is to solve the deficiencies in the prior art, and a kind of condition query optimization method based on HBase table is provided.
Technical scheme of the present invention realizes in the following manner, this kind of condition query optimization method based on HBase table, and the specific implementation process of the method is:
According to the configuration of the data volume size of form and cluster, determine predistribution table subregion Region number: in Region predistribution, according to the number that imports the data volume of HBase and the make out the scale Region of distributed type assemblies, then by data volume, the rule of Row Key designs and distributes Region in advance, the Region of HBase is along with the continuous change conference of size triggers a threshold value, once trigger, Region is division automatically; More Region can guarantee concurrency performance;
According to the RowKey reasonable in design that should be used for of condition query: if within the Row Key of record drops on the start key of certain Region and the scope of end key, these data will store on this Region;
According to Region predistribution, RowKey and distributed programmed framework, improve query performance: in MapReduce Computational frame, Map module need to be according to the record meeting in condition query HBase table, according to the feature of condition query and RowKey, design the prioritization scheme of condition query, utilize StartKey and EndKey parameter in Scan to improve search performance, according to distributed type assemblies Rational Parameters configuration Map quantity, reach effect of optimization simultaneously.
Described MapReduce programming framework is the processing procedure that finally obtains condition query result: Map resume module according to condition query hbase, show, finally obtain the record of inquiry, again the record of inquiry is processed, the query note form that obtains wanting, and the DLL (dynamic link library) of traversal is provided.
The detailed step of described optimization method is:
First in table subregion Region predistribution, according to the environment in cluster and configuration, the attribute of zookeeper is set, creates pre-HBase table name and the row Praenomen importing and claim, then according to the Split function creation HBase table of writing; Wherein Split function is according to source data form, to determine the number of Split, represents the number of region by two-dimensional array; Treat that Region predistribution finishes, can source data be generated to Hfile file according to the design of RowKey; Finally will with completebulkload order, complete the importing of data, at this moment data have imported to HBase table according to predetermined form.
Write MapReduce program: the MapReduce program that this user writes is an operation, user's configuration is also submitted to an operation in framework, and framework can resolve into this operation Map tasks and the reduce tasks of some row; Framework is responsible for task distribution and is carried out; In this query optimization, need to complete the work in map stage, finally will meet user's Query Result output, this output procedure is:
According to the querying condition of user program appointment, first according to the query argument of input, carry out the processing of form, query argument is spliced into the form that meets line unit RowKey in HBase table, the reference position of inquiry is set according to the startKey in scan and endKey again, and the parameter setting in scan; Then will be submitted in Hadoop framework according to initTableMapperJob function, in the corresponding Map task of each data block of this process middle frame split; For each Map task, according to the mode of iteration, each data recording is processed according to the method for appointment in Map function.
In map function, first according to value.getRow (), obtain the RowKey of eligible inquiry, according to value.value (), obtain the data of first row; Then the data of the RowKey of extraction and first row are processed and converted to the output format that user expects, the character string that can need according to the intercepting of substring function is also set predetermined form for, directly qualified result Output rusults is outputed in the file of appointment, and by setOutputPath function, set the path of output.
The beneficial effect that the present invention compared with prior art produced is:
A kind of condition query optimization method based on HBase table of the present invention has computation capability, by to the appropriate design of RowKey and predistribution region, and realized the map interface of condition query, reach the object of optimization: the HDFS of Hadoop distributed type assemblies provides sufficient storage capacity, map task can obtain to reduce the resource that data transmission consumes nearby by data; In Hadoop distributed type assemblies, corresponding parameter in map quantity and HBase can be set and carry out parallel processing operation, improved the efficiency of condition query, for condition query provides the method for optimizing; The condition query process of the method is comparatively succinct, and supports parallel computation, therefore more efficient, practical, is easy to promote.
Accompanying drawing explanation
Accompanying drawing 1 is the flowchart of HBase surface condition query optimization of the present invention.
Accompanying drawing 2 is the structural drawing of Hregion Server.
Accompanying drawing 3 is the Computational frame flowchart without the reduce stage.
Embodiment
Below in conjunction with accompanying drawing, a kind of condition query optimization method based on HBase table of the present invention is described in detail below.
HBase be one distributed, towards row the database of increasing income.It is to utilize Hadoop HDFS as its document storage system.Along with HBase continues to improve in performance and stability, HBase becomes gradually in one of the standard in large data NoSQL field.Therefore, the query optimization based on HBase table is very important.For the condition query optimization of HBase table, the invention provides a kind of condition query optimization method based on HBase table, relate generally to the predistribution of Region while building table, the design of RowKey, conceptual design three aspects: during inquiry.According to the configuration of the data volume size of form and cluster, determine predistribution Region number, and according to the appropriate design of RowKey (RowKey reasonable in design according to condition query), source data is evenly distributed in these Region.For the ready work of condition query of HBase table, and provide rational inquiry environment; According to the RowKey reasonable in design that should be used for of condition query; The Map stage: improve query performance according to Region predistribution, RowKey and distributed programmed framework, reach effect of optimization, this stage is that the parameter of importing into according to condition query is done processing processing according to the Query Result of Scan, the form that becomes user to expect the recording processing of eligible inquiry, because there is no the Reduce stage, therefore can directly result be outputed in output directory, reduce the limit bandwidth of transmission.
The specific implementation process of the method is:
According to the configuration of the data volume size of form and cluster, determine predistribution table subregion Region number: in Region predistribution, according to the number that imports the data volume of HBase and the make out the scale Region of distributed type assemblies, then by data volume, the rule of Row Key designs and distributes Region in advance, can significantly reduce the number of times of Region Split, not even Split.The Region of HBase is along with the continuous change conference of size triggers a threshold value, once trigger, Region is division automatically.More Region can guarantee concurrency performance.Therefore, reasonably determine the number of region according to the scale of distributed type assemblies environment, programming realizes the Region number that meets application, improves concurrency performance.
According to the RowKey reasonable in design that should be used for of condition query: if within the Row Key of record drops on the start key of certain Region and the scope of end key, these data will store on this Region.If within certain period, very multidata row key is within the scope of certain specific row key.The region that this particular range row key is corresponding can be very busy, and other region unusual free time probably cause the wasting of resources, affect performance.Therefore, when RowKey designs will with reference to import the application of data and as far as possible guarantor unit in the time row key of data writing for region, be evenly distributed.
According to Region predistribution, RowKey and distributed programmed framework, improve query performance: in MapReduce Computational frame, Map module need to be according to the record meeting in condition query HBase table, designs the prioritization scheme of condition query according to the feature of condition query and RowKey.Utilize StartKey and EndKey parameter in Scan to improve search performance, simultaneously according to distributed type assemblies Rational Parameters configuration Map quantity, guarantee to reach the effect of optimization.
Described MapReduce programming framework, for the concurrent operation of large-scale dataset (being greater than 1TB), is the processing procedure that finally obtains condition query result.Map resume module according to condition query hbase table, finally obtain the record of inquiry, then the record of inquiry processed to the query note form that obtains wanting.Do not use Reducer, can reduce the restriction of bandwidth in cluster.Result is outputed in the output directory of appointment.This process can arrange a plurality of map quantity, has improved treatment effeciency, has greatly promoted performance.Programming trouble when MapReduce framework has been simplified concurrent processor, provides the DLL (dynamic link library) traveling through.
The detailed step of described optimization method is:
First in Region predistribution, according to the environment in cluster and configuration, the attribute of zookeeper is set, creates pre-HBase table name and the row Praenomen importing and claim, then according to the Split function creation HBase table of writing.Wherein Split function is according to source data form, to determine the number of Split, represents the number of region by two-dimensional array.Treat that Region predistribution finishes, can source data be generated to Hfile file according to the design of RowKey, wherein due to data analysis is also reasonably designed to RowKey, this has just guaranteed that the data in each region are uniformly, which Region data can not occur much or which Region data phenomenon seldom.Finally will with completebulkload order, complete the importing of data, at this moment data have imported to HBase table according to predetermined form.
ZooKeeper in technique scheme is the formal sub-project of Hadoop, it be one for the reliable coherent system of large-scale distributed system, the function providing comprises: configuring maintenance, name Service, distributed synchronization, group service etc.The target of ZooKeeper is exactly the key service that packaged complexity is easily made mistakes, and the interface and the performance system efficient, function-stable that are simple and easy to use are offered to user.
According to HBase, show condition query optimization method to realize by writing MapReduce program.The MapReduce program that user writes is an operation, and user's configuration is also submitted to an operation in framework, and framework can resolve into this operation Map tasks and the reduce tasks of some row.Framework is responsible for task distribution and is carried out.In this query optimization, need to complete the work in map stage, finally will meet user's Query Result output.
Condition query optimization based on HBase table has mainly realized Map interface.The main treatment scheme of this module is: according to the querying condition of user program appointment, first according to the query argument of input, carry out the processing of form, query argument is spliced into the form that meets RowKey in HBase table, the reference position of inquiry is set according to the startKey in scan and endKey again, and the parameter setting in scan, as Batch and Caching etc.Then will be submitted in Hadoop framework according to initTableMapperJob function, in the corresponding Map task of each data block of this process middle frame split.For each Map task, according to the mode of iteration, each data recording is processed according to the method for appointment in Map function.In map function, first according to value.getRow (), obtain the RowKey of eligible inquiry, according to value.value (), obtain the data of first row.Then the data of the RowKey of extraction and first row are processed and converted to the output format that user expects, the character string that can need according to the intercepting of substring function is also set predetermined form for, for guaranteed performance, prevent the restriction of the network bandwidth, inapplicable Reduce function, directly qualified result Output rusults is outputed in the file of appointment, can set by setOutputPath function the path of output.
According to Distributed Architecture, can improve degree of parallelism, then the parameter configuration of the relevant read operation in map number of tasks and HBase is set according to the Hadoop cluster scale of building, as caching arranges larger being conducive to, read.Batch arranges larger being conducive to and once can capture many data, by rationally arranging of these parameters, can improve performance, reaches the object of optimization.
Embodiment is as shown in accompanying drawing 1, Fig. 2, Fig. 3, and its specific operation process is:
First dispose distributed type assemblies environment, the hardware environment in this cluster is 7 station servers, and every station server is 96G internal memory, and cpu has 24core, and hard disk is 12*2T.Operating system is centos6.3.According to official's document, hadoop assembly is installed in server.Then hdfs, mapreduce and hbase are opened to service according to normal sequence.In this example, the form of source data is QVW75520121124120403222,22222,4,3.First 6 of first row represents license plate number, and in first row, latter 8 represent the date, and last 9 of first row represents Hour Minute Second and millisecond, secondary series representative card slogan.Condition query refers to the given number-plate number and from date time and time Close Date, search in the meantime in the fixing bayonet socket information of license plate number process.Source data has 10,000,000,000 data, and it is necessary improving search efficiency.By analysis, when source data imports HBase, RowKey can be designed to meet this time effect of inquiry, can, using vehicle number and time on date as RowKey, can extract very fast like this record of wanting inquiry.The flowchart of the condition query optimization method based on HBase table as shown in Figure 1.First according to Split function, obtain the number M of region, with region predistribution routine call Split function, generate and have the HBase of M region table, then, in data importing being shown to HBase according to the RowKey designing, finally write mapreduce program and come condition query to reach the result of optimization.
Most crucial module in HBase, is mainly responsible for response user I/O request, in HDFS file system, reads and writes data.The structural drawing of Hregion Server as shown in Figure 2, HRegionServer inner management a series of HRegion objects, each HRegion correspondence a Region in Table, in HRegion, by a plurality of HStore, formed.Each HStore correspondence the storage of a Column Family in Table, each Column Family is exactly a concentrated storage unit in fact, therefore preferably the column that possesses common IO characteristic is placed in a Column Family, the most efficient like this.Therefore, when HBase is arrived in data importing, first we be evenly distributed to data in M region according to region predistribution, then according to application query condition, reasonably designs RowKey, and whole data are imported in HBase table according to the RowKey pre-establishing.
In Optimizing Queries, be first that client is communicated by letter once with regionserver, can find the region of regionserver, and scan region and return to a given data.This data volume is by the Batch appointment of scan.And the effect of caching is communicated by letter exactly and is once found region, call scanning caching time, that is to say by these two parameters, the data that can return of once communicating by letter are caching*batch bar.Obviously this can reduce the traffic of client and rs.
In writing MapReduce, in order to reduce the restriction of bandwidth, without reduce function.Without the Computational frame flowchart in reduce stage as shown in Figure 3.In MapReduce algorithm operational process, there is a primary control program, be called master.Primary control program can produce a lot of job procedures, is called worker.And M these worker of map task, allow them go.The worker that has been assigned with map task reads and processes relevant input data, by the key/value analyzing (key/value).Owing to there is no reduce function, intermediate result key/value (key/value) that map () function produces directly outputs in output file.
Writing first of Map function resolved from date and Close Date, with the parse function in SimpleDateFormat class, the Parameter analysis of electrochemical of input is become to the form on date, the form of again Date of parsing being arranged to want, with the format function in SimpleDateFormat class, at this moment from date and car can be joined to the form that number sets is synthesized RowKey.In this condition query, RowKey is designed to the combination of time on the number-plate number+date, can the reference position in condition query be determined according to the number-plate number and from date and time like this, according to the number-plate number and Close Date and time, the end position in condition query being determined, by these two values, give respectively startkey and the endkey in scan.Due at vehicle number fixedly in the situation that, the bayonet socket information of searching its process may be all in a region, during return data, be first put into client and carry out buffer memory, by caching configuration item, the data number that HBase scanner once captures from service end can be set.By being set to a rational value, can reduce the time overhead of next () in scan process, cost is that scanner need to maintain these by the line item of cache by the internal memory of client.In this test, there are 7 station servers, every station server has 96G internal memory, therefore, can cache be set to larger value and improves performance.Then we are according to initTableMapperJob (sourceTable, scan, Mapper.class, Text.class, Text.class, job) function starts to execute the task, and wherein sourceTable refers to import to the table in HBase, the form that will inquire about, removes the record that meets of enquiry form by startkey, endkey and corresponding caching and batch size are set in scan.In Mapper.class, mainly contain the implementation of map function, this process is processed qualified record, and in the record of output, form is: QVW755 2012-11-24 10:23 22222.According to value.getRow () function, obtain RowKey, according to value.value (), obtain bayonet socket information again, above-mentioned information is processed to the output format that obtains wanting, can carry out substring to RowKey, obtain date and time information, temporal information, then according to output format, be translated into expected effect, owing to there is no the reduce stage, therefore, can directly the result in map stage be outputed to the file of indication, the catalogue of output can be set according to setOutputPath.Finally the configuration file in whole cluster is optimized to adjustment, by the relevant parameter of mapreduce task, as map number of tasks etc.The parameter that hbase is relevant, arrives first in Memstore and looks into data for read request, can not find out in the BlockCache that arrives and looks into, then can not find out to arrive on disk and read, and the result of reading is put into BlockCache.On a Regionserver, have a BlockCache and N Memstore, their big or small sum can not be more than or equal to heapsize * 0.8, otherwise HBase can not start.Acquiescence BlockCache is 0.2, and Memstore is 0.4.For the system of focusing on reading the response time, BlockCache can be established greatly, such as BlockCache=0.4 is set, Memstore=0.39, to strengthen the hit rate of buffer memory.
The foregoing is only embodiments of the invention, within the spirit and principles in the present invention all, any modification of doing, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims (3)

1. a condition query optimization method of showing based on HBase, is characterized in that the specific implementation process of the method is:
According to the configuration of the data volume size of form and cluster, determine predistribution table subregion Region number: in Region predistribution, according to the number that imports the data volume of HBase and the make out the scale Region of distributed type assemblies, then by the rule of data volume, line unit Row Key, design in advance and distribute Region, the Region of HBase is along with the continuous change conference of size triggers a threshold value, once trigger, Region is division automatically, and more Region can guarantee concurrency performance;
According to the RowKey reasonable in design that should be used for of condition query: if within the Row Key of record drops on the start key of certain Region and the scope of end key, these data will store on this Region;
According to Region predistribution, RowKey and distributed programmed framework, improve query performance: in MapReduce Computational frame, Map module need to be according to the record meeting in condition query HBase table, according to the feature of condition query and RowKey, design the prioritization scheme of condition query, utilize StartKey and EndKey parameter in Scan to improve search performance, according to distributed type assemblies Rational Parameters configuration Map quantity, reach effect of optimization simultaneously.
2. a kind of condition query optimization method based on HBase table according to claim 1, it is characterized in that: described MapReduce programming framework is the processing procedure that finally obtains condition query result: Map resume module according to condition query hbase, show, finally obtain the record of inquiry, again the record of inquiry is processed, the query note form that obtains wanting, and the DLL (dynamic link library) of traversal is provided.
3. a kind of condition query optimization method based on HBase table according to claim 2, is characterized in that: the detailed step of described optimization method is:
First in table subregion Region predistribution, according to the environment in cluster and configuration, the attribute of zookeeper is set, creates pre-HBase table name and the row Praenomen importing and claim, then according to the Split function creation HBase table of writing; Wherein Split function is according to source data form, to determine the number of Split, represents the number of region by two-dimensional array; Treat that Region predistribution finishes, can source data be generated to Hfile file according to the design of RowKey; Finally will with completebulkload order, complete the importing of data, at this moment data have imported to HBase table according to predetermined form;
Write MapReduce program: the MapReduce program that this user writes is an operation, user's configuration is also submitted to an operation in framework, and framework can resolve into this operation Map tasks and the reduce tasks of some row; Framework is responsible for task distribution and is carried out; In this query optimization, need to complete the work in map stage, finally will meet user's Query Result output, this output procedure is:
According to the querying condition of user program appointment, first according to the query argument of input, carry out the processing of form, query argument is spliced into the form that meets line unit RowKey in HBase table, the reference position of inquiry is set according to the startKey in scan and endKey again, and the parameter setting in scan; Then will be submitted in Hadoop framework according to initTableMapperJob function, in the corresponding Map task of each data block of this process middle frame split; For each Map task, according to the mode of iteration, each data recording is processed according to the method for appointment in Map function:
In map function, first according to value.getRow (), obtain the RowKey of eligible inquiry, according to value.value (), obtain the data of first row; Then the data of the RowKey of extraction and first row are processed and converted to the output format that user expects, the character string that can need according to the intercepting of substring function is also set predetermined form for, directly qualified result Output rusults is outputed in the file of appointment, and by setOutputPath function, set the path of output.
CN201310667847.0A 2013-12-11 2013-12-11 Condition query optimizing method based on HBase table Pending CN103646073A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310667847.0A CN103646073A (en) 2013-12-11 2013-12-11 Condition query optimizing method based on HBase table

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310667847.0A CN103646073A (en) 2013-12-11 2013-12-11 Condition query optimizing method based on HBase table

Publications (1)

Publication Number Publication Date
CN103646073A true CN103646073A (en) 2014-03-19

Family

ID=50251287

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310667847.0A Pending CN103646073A (en) 2013-12-11 2013-12-11 Condition query optimizing method based on HBase table

Country Status (1)

Country Link
CN (1) CN103646073A (en)

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104252536A (en) * 2014-09-16 2014-12-31 福建新大陆软件工程有限公司 Hbase-based internet log data inquiring method and device
CN104361090A (en) * 2014-11-17 2015-02-18 浙江宇视科技有限公司 Data query method and device
CN104516985A (en) * 2015-01-15 2015-04-15 浪潮(北京)电子信息产业有限公司 Rapid mass data importing method based on HBase database
CN104537003A (en) * 2014-12-16 2015-04-22 北京中交兴路车联网科技有限公司 Universal high-performance data writing method for Hbase database
CN105187498A (en) * 2015-08-10 2015-12-23 携程计算机技术(上海)有限公司 Region allocation method and system for HBase table
CN105206062A (en) * 2015-10-23 2015-12-30 浪潮(北京)电子信息产业有限公司 Searching method and device
CN105630896A (en) * 2015-12-21 2016-06-01 浪潮集团有限公司 Method for quickly importing mass data
CN105956043A (en) * 2016-04-26 2016-09-21 海尔优家智能科技(北京)有限公司 Method and device for allocating Map task for MapReduce running on Hbase database
CN106294886A (en) * 2016-10-17 2017-01-04 北京集奥聚合科技有限公司 A kind of method and system of full dose extracted data from HBase
CN106407432A (en) * 2016-09-28 2017-02-15 郑州云海信息技术有限公司 Oracle data warehouse query method and device
CN106528573A (en) * 2015-09-14 2017-03-22 北京国双科技有限公司 Data query method and apparatus for HBase
CN106874132A (en) * 2017-01-03 2017-06-20 努比亚技术有限公司 A kind of abnormality eliminating method and device
CN107070645A (en) * 2016-12-30 2017-08-18 华为技术有限公司 Compare the method and system of the data of tables of data
CN107145607A (en) * 2017-06-12 2017-09-08 济南浪潮高新科技投资发展有限公司 A kind of many optional condition method for quickly querying of hbaes
CN107239517A (en) * 2017-05-23 2017-10-10 中国联合网络通信集团有限公司 Many condition searching method and device based on Hbase databases
CN107370797A (en) * 2017-06-30 2017-11-21 北京百度网讯科技有限公司 A kind of method and apparatus of the strongly-ordered queue operation based on HBase
CN107368477A (en) * 2016-05-11 2017-11-21 北京京东尚科信息技术有限公司 The method and system of class SQL query based on HBase coprocessors
CN108228581A (en) * 2016-12-09 2018-06-29 阿里巴巴集团控股有限公司 Zookeeper compatible communication methods, server and system
CN108319604A (en) * 2017-01-16 2018-07-24 南京烽火软件科技有限公司 The associated optimization method of size table in a kind of hive
CN108446383A (en) * 2018-03-21 2018-08-24 吉林大学 A kind of data task redistribution method based on geographically distributed data query
CN109657009A (en) * 2018-12-21 2019-04-19 北京锐安科技有限公司 The pre- partitioned storage periodic table creation method of data, device, equipment and storage medium
CN110019199A (en) * 2017-09-29 2019-07-16 株式会社理光 Data storage, querying method, device, equipment, computer readable storage medium
CN110019094A (en) * 2017-12-28 2019-07-16 中国移动通信集团广东有限公司 Ticket retrieve method, system, electronic equipment and storage medium
US20220012213A1 (en) * 2016-03-08 2022-01-13 International Business Machines Corporation Spatial-temporal storage system, method, and recording medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101957863A (en) * 2010-10-14 2011-01-26 广州从兴电子开发有限公司 Data parallel processing method, device and system
US20110154339A1 (en) * 2009-12-17 2011-06-23 Electronics And Telecommunications Research Institute Incremental mapreduce-based distributed parallel processing system and method for processing stream data
CN102725753A (en) * 2011-11-28 2012-10-10 华为技术有限公司 Method and apparatus for optimizing data access, method and apparatus for optimizing data storage
CN103246700A (en) * 2013-04-01 2013-08-14 厦门市美亚柏科信息股份有限公司 Mass small file low latency storage method based on HBase

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110154339A1 (en) * 2009-12-17 2011-06-23 Electronics And Telecommunications Research Institute Incremental mapreduce-based distributed parallel processing system and method for processing stream data
CN101957863A (en) * 2010-10-14 2011-01-26 广州从兴电子开发有限公司 Data parallel processing method, device and system
CN102725753A (en) * 2011-11-28 2012-10-10 华为技术有限公司 Method and apparatus for optimizing data access, method and apparatus for optimizing data storage
CN103246700A (en) * 2013-04-01 2013-08-14 厦门市美亚柏科信息股份有限公司 Mass small file low latency storage method based on HBase

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
宋亚奇,刘树仁,朱永利,王德文,李莉: ""电力设备状态高速采样数据的云存储技术研究"", 《电力自动化设备》 *

Cited By (37)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104252536B (en) * 2014-09-16 2017-12-08 福建新大陆软件工程有限公司 A kind of internet log data query method and device based on hbase
CN104252536A (en) * 2014-09-16 2014-12-31 福建新大陆软件工程有限公司 Hbase-based internet log data inquiring method and device
CN104361090A (en) * 2014-11-17 2015-02-18 浙江宇视科技有限公司 Data query method and device
CN104361090B (en) * 2014-11-17 2018-01-05 浙江宇视科技有限公司 Data query method and device
CN104537003A (en) * 2014-12-16 2015-04-22 北京中交兴路车联网科技有限公司 Universal high-performance data writing method for Hbase database
CN104537003B (en) * 2014-12-16 2018-01-09 北京中交兴路车联网科技有限公司 A kind of general high-performance data wiring method of Hbase databases
CN104516985A (en) * 2015-01-15 2015-04-15 浪潮(北京)电子信息产业有限公司 Rapid mass data importing method based on HBase database
CN105187498A (en) * 2015-08-10 2015-12-23 携程计算机技术(上海)有限公司 Region allocation method and system for HBase table
CN105187498B (en) * 2015-08-10 2018-05-08 携程计算机技术(上海)有限公司 The Region distribution methods and system of HBase table
CN106528573B (en) * 2015-09-14 2019-08-20 北京国双科技有限公司 The data query method and apparatus of HBase database
CN106528573A (en) * 2015-09-14 2017-03-22 北京国双科技有限公司 Data query method and apparatus for HBase
CN105206062A (en) * 2015-10-23 2015-12-30 浪潮(北京)电子信息产业有限公司 Searching method and device
CN105630896A (en) * 2015-12-21 2016-06-01 浪潮集团有限公司 Method for quickly importing mass data
US20220012213A1 (en) * 2016-03-08 2022-01-13 International Business Machines Corporation Spatial-temporal storage system, method, and recording medium
CN105956043A (en) * 2016-04-26 2016-09-21 海尔优家智能科技(北京)有限公司 Method and device for allocating Map task for MapReduce running on Hbase database
CN107368477A (en) * 2016-05-11 2017-11-21 北京京东尚科信息技术有限公司 The method and system of class SQL query based on HBase coprocessors
CN106407432B (en) * 2016-09-28 2020-02-07 苏州浪潮智能科技有限公司 Query method and device for Oracle data warehouse
CN106407432A (en) * 2016-09-28 2017-02-15 郑州云海信息技术有限公司 Oracle data warehouse query method and device
CN106294886A (en) * 2016-10-17 2017-01-04 北京集奥聚合科技有限公司 A kind of method and system of full dose extracted data from HBase
CN108228581B (en) * 2016-12-09 2022-06-28 阿里云计算有限公司 Zookeeper compatible communication method, server and system
CN108228581A (en) * 2016-12-09 2018-06-29 阿里巴巴集团控股有限公司 Zookeeper compatible communication methods, server and system
CN107070645A (en) * 2016-12-30 2017-08-18 华为技术有限公司 Compare the method and system of the data of tables of data
CN107070645B (en) * 2016-12-30 2020-06-16 华为技术有限公司 Method and system for comparing data of data table
CN106874132A (en) * 2017-01-03 2017-06-20 努比亚技术有限公司 A kind of abnormality eliminating method and device
CN108319604A (en) * 2017-01-16 2018-07-24 南京烽火软件科技有限公司 The associated optimization method of size table in a kind of hive
CN108319604B (en) * 2017-01-16 2021-10-19 南京烽火天地通信科技有限公司 Optimization method for association of large and small tables in hive
CN107239517A (en) * 2017-05-23 2017-10-10 中国联合网络通信集团有限公司 Many condition searching method and device based on Hbase databases
CN107239517B (en) * 2017-05-23 2020-09-29 中国联合网络通信集团有限公司 Multi-condition searching method and device based on Hbase database
CN107145607A (en) * 2017-06-12 2017-09-08 济南浪潮高新科技投资发展有限公司 A kind of many optional condition method for quickly querying of hbaes
CN107370797B (en) * 2017-06-30 2021-07-27 北京百度网讯科技有限公司 HBase-based strongly-ordered queue operation method and device
CN107370797A (en) * 2017-06-30 2017-11-21 北京百度网讯科技有限公司 A kind of method and apparatus of the strongly-ordered queue operation based on HBase
CN110019199A (en) * 2017-09-29 2019-07-16 株式会社理光 Data storage, querying method, device, equipment, computer readable storage medium
CN110019094A (en) * 2017-12-28 2019-07-16 中国移动通信集团广东有限公司 Ticket retrieve method, system, electronic equipment and storage medium
CN108446383B (en) * 2018-03-21 2021-12-10 吉林大学 Data task redistribution method based on geographic distributed data query
CN108446383A (en) * 2018-03-21 2018-08-24 吉林大学 A kind of data task redistribution method based on geographically distributed data query
CN109657009B (en) * 2018-12-21 2021-03-12 北京锐安科技有限公司 Method, device, equipment and storage medium for creating data pre-partition storage periodic table
CN109657009A (en) * 2018-12-21 2019-04-19 北京锐安科技有限公司 The pre- partitioned storage periodic table creation method of data, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN103646073A (en) Condition query optimizing method based on HBase table
US10169433B2 (en) Systems and methods for an SQL-driven distributed operating system
CN111400326B (en) Smart city data management system and method thereof
JP2019194882A (en) Mounting of semi-structure data as first class database element
US8949222B2 (en) Changing the compression level of query plans
US9396018B2 (en) Low latency architecture with directory service for integration of transactional data system with analytical data structures
CN105279286A (en) Interactive large data analysis query processing method
US20160103914A1 (en) Offloading search processing against analytic data stores
CN110019267A (en) A kind of metadata updates method, apparatus, system, electronic equipment and storage medium
CN105138661A (en) Hadoop-based k-means clustering analysis system and method of network security log
CN107066546B (en) MPP engine-based cross-data center quick query method and system
US20210256023A1 (en) Subquery predicate generation to reduce processing in a multi-table join
CN103440288A (en) Big data storage method and device
CN108536778A (en) A kind of data application shared platform and method
CN106919697B (en) Method for simultaneously importing data into multiple Hadoop assemblies
CN104036029A (en) Big data consistency comparison method and system
Caldarola et al. Big data: A survey-the new paradigms, methodologies and tools
Gupta et al. Faster as well as early measurements from big data predictive analytics model
CN108268468B (en) Big data analysis method and system
CN110390739A (en) A kind of vehicle data processing method and vehicle data processing system
Zhang et al. Oceanrt: Real-time analytics over large temporal data
CN106599190A (en) Dynamic Skyline query method based on cloud computing
CN114297173A (en) Knowledge graph construction method and system for large-scale mass data
US8396858B2 (en) Adding entries to an index based on use of the index
CN108319604B (en) Optimization method for association of large and small tables in hive

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20140319