CN104750815A - Lob data storing method and device based on HBase - Google Patents

Lob data storing method and device based on HBase Download PDF

Info

Publication number
CN104750815A
CN104750815A CN201510144162.7A CN201510144162A CN104750815A CN 104750815 A CN104750815 A CN 104750815A CN 201510144162 A CN201510144162 A CN 201510144162A CN 104750815 A CN104750815 A CN 104750815A
Authority
CN
China
Prior art keywords
data
lob
data block
file
unloading
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510144162.7A
Other languages
Chinese (zh)
Other versions
CN104750815B (en
Inventor
贾德星
徐正礼
魏金雷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Cloud Information Technology Co Ltd
Original Assignee
Inspur Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Group Co Ltd filed Critical Inspur Group Co Ltd
Priority to CN201510144162.7A priority Critical patent/CN104750815B/en
Publication of CN104750815A publication Critical patent/CN104750815A/en
Application granted granted Critical
Publication of CN104750815B publication Critical patent/CN104750815B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention provides a Lob data storing method and device based on the HBase. The Lob data storing method comprises the steps that when a storage request is sent to a LobTable object, the LobTable determines that the storage request carries Lob data, and an LobPut object is established; the Lob data are written into a Lob output stream provided by the LobPut object; every time the LobPut receives a data block within a first threshold value, the data block within the first threshold value is made to generate a corresponding Put object; the generated Put objects are submitted to a server side, and then the server side sequentially transfers the data blocks corresponding to the Put objects to a Lob file in an HDFS according to the sequence of block numbers contained by the submitted Put objects through a set transfer mechanism. By the adoption of the Lob data storing method and device based on the HBase, the workload of the client side is lightened, and the practicability of the mode for storing data to the HBase is improved.

Description

A kind of storage means of the Lob data based on HBase and device
Technical field
The present invention relates to field of computer technology, particularly a kind of storage means of the Lob data based on HBase and device.
Background technology
HBase (distributed memory system) be one distributed, towards the PostgreSQL database of row, can realize tens, the data of over ten billion store.When to process large objects (Large Object, LOB) data, as the data such as document, music, video store time, may there are the following problems:
1, HBase stores with KeyValue form when carrying out bottom storage, when client more new data time, server end is first stored in KeyValue object to be updated in internal memory (MemStore), when the KeyValue object of the storage in internal memory reaches threshold value, the KeyValue object again this being reached threshold value is stored as StoreFiel, therefore, if KeyValue value is excessive, easily cause server end internal memory to overflow;
2, HBase table is when carrying out data and storing, and can be split as multiple subregion (Region) according to row key word (RowKey), each subregion comprises multiple HFile file.And due to Lob data train value very large, so HFile file easily exceeds subregion restriction, thus carry out splitting and form a large amount of subregions, and each subregion only has little row, too much number of partitions makes the scanning to HBase table data (Scan) efficiency comparison low.
For the problems referred to above, need often to assist in conjunction with HDFS (distributed file system) when storing Lob data in client, Lob data are stored in HDFS by client, and are stored in HBase by other structural datas.Because client is when storing for Lob data and other structural datas, need to utilize these two kinds of databases (HBase and HDFS) to store respectively, client is needed to carry out a large amount of programmed process for the storage mode of each database respectively, therefore, this storage mode practicality is lower.
Summary of the invention
In view of this, the invention provides a kind of storage means and device of the Lob data based on HBase, under the prerequisite ensureing practicality, realize the problem that large object data stores.
The invention provides a kind of storage means of the Lob data based on HBase, according to HTable object extension LobTable object; Qualifier for identifying Lob data is set; Server corresponding to HBase is set to the unloading system of HDFS unloading data, also comprises:
Client is when sending storage resource request to LobTable object, and LobTable, when determining that described storage resource request comprises Qualifier, determines that described storage resource request carries Lob data, then create LobPut object;
In the Lob output stream that the Lob data of carrying in described storage resource request write LobPut object provides by client;
LobPut object reception is written with the Lob output stream of Lob data, and receive at every turn meet the data block of first threshold time, the data block meeting this first threshold is generated corresponding Put object, and wherein, Put object comprises the block number of respective data blocks in Lob data;
The Put object of generation is committed to server end, to make server end according to the order of block included in the Put object submitted to number, utilizes the unloading mechanism arranged, by data block corresponding for each Put object successively unloading in the Lob file of HDFS.
Preferably,
Comprise further: client call LobTable object submits scan request to server end, wherein, described inquiry request is used for the Lob data of querying server end, and the Result that reception server end returns for described inquiry request, described Result comprise the Row Key of the Lob data that described inquiry request is inquired about; Described Result is encapsulated as LobResult object by LobTable object, LobResult object generates LobGet object according to the Row Key of inquired about Lob data, reads Lob data to make the start address of each blocks of data block in Lob data of LobGet object corresponding to inquired about Lob data to server end piecemeal.
Present invention also offers a kind of storage means of the large object data based on HBase, create LobCorprocessor, server corresponding to HBase is set to the unloading system of HDFS unloading data, also comprises:
Server end utilizes Lob Corprocessor piecemeal to receive each Put object of client piecemeal submission, and wherein, each Put object comprises the block number of respective data blocks in its corresponding Lob data;
Data block corresponding to the Put object received first, determine currently in the Lob file of HDFS system whether comprise the data relevant to this data block, if so, utilize the unloading of setting mechanism by the data block corresponding to the Put object that receives first, relative data are upgraded; If not, utilize the unloading mechanism arranged, by the data block corresponding to the Put object that receives first, create in Lob file;
And according to the order of block included in the Put object submitted to number, utilize the unloading mechanism arranged, the data block unloading that subsequently received Put object is corresponding respectively enters in Lob file, and according to the block number of each data block, each data chunk is combined into Lob data.
Preferably, comprising further: abnormal by calling the dish out subclass of a specific Do Not RetryIO Exception of PrePut object, to stop Lob Corprocessor, the data block corresponding to each Put object being written to the operation of HBase.
Preferably,
Comprise further: create the LobScanner object reading data;
Comprise further: receive the Get object that client is submitted to by LobGet, described Get object comprises the Row Key of inquired about Lob data; Lob Corprocessor opens LobScanner object; LobScanner object determines the memory location of inquired about Lob data according to the Row Key in described Get object, and piecemeal reads each data block of Lob data in the memory location determined, and each data block is converted to KeyValue form, make server end that each data block of this KeyValue form is returned to client;
Or,
Comprise further: LobStore stores Lob data with the form of HFile file and MemStore form further;
Comprise further: the Lob file in traversal HDFS, and the multiple Lob data being less than Second Threshold traversed are merged in HFile file;
Comprise further: the deletion information of the multiple Lob data merging into HFile file is stored in MemStore.
Present invention also offers a kind of client, comprising:
Expanding element, for according to HTable object extension LobTable object; Qualifier for identifying Lob data is set;
Processing unit, for when sending storage resource request to LobTable object, LobTable, when determining that described storage resource request comprises Qualifier, determines that described storage resource request carries Lob data, then create LobPut object;
Writing unit, in the Lob output stream that provided by the Lob carried in described storage resource request data write LobPut object;
Receiving element, for the Lob output stream utilizing LobPut object reception to be written with Lob data, and receive at every turn meet the data block of first threshold time, the data block meeting this first threshold is generated corresponding Put object, wherein, Put object comprises the block number of respective data blocks in Lob data;
Commit unit, for the Put object of generation is committed to server end, to make the server of server end corresponding to the HBase pre-set to the unloading system of HDFS unloading data, and according to the order of block included in the Put object submitted to number, by data block corresponding for each Put object successively unloading in the Lob file of HDFS.
Preferably,
Comprise further: data query unit, for calling LobTable object to the request of server end submit Query, wherein, described inquiry request is used for the Lob data of querying server end, and the Result that reception server end returns for described inquiry request, described Result comprise the Row Key of the Lob data that described inquiry request is inquired about; Described Result is encapsulated as LobResult object by LobTable object, LobResult object generates LobGet object according to the Row Key of inquired about Lob data, reads Lob data to make the start address of each blocks of data block in Lob data of LobGet object corresponding to inquired about Lob data to server end piecemeal.
Present invention also offers a kind of server, comprising:
Storage unit, for creating Lob Corprocessor, arranges the unloading system of the server corresponding to HBase to HDFS unloading data;
Receiving element, for each Put object utilizing Lob Corprocessor piecemeal to receive the submission of client piecemeal, wherein, each Put object comprises the block number of respective data blocks in its corresponding Lob data;
Processing unit, for data block corresponding to the Put object received first, determine currently in the Lob file of HDFS system whether comprise the data relevant to this data block, if, utilize the unloading mechanism arranged by the data block corresponding to the Put object that receives first, relative data are upgraded; If not, utilize the unloading mechanism arranged, by the data block corresponding to the Put object that receives first, create in Lob file;
Unloading unit, for the order according to block included in the Put object submitted to number, utilize the unloading mechanism arranged, the data block unloading that subsequently received Put object is corresponding respectively enters in Lob file, and according to the block number of each data block, each data chunk is combined into Lob data.
Preferably,
Comprise further: stop unit, for abnormal by calling the dish out subclass of a specific DoNot Retry IO Exception of PrePut object, to stop Lob Corprocessor, the data block corresponding to each Put object is written to the operation of HBase.
Preferably,
Described storage unit, for creating the LobScanner object reading data;
Comprise further: query unit, for receiving the Get object that client is submitted to by LobGet, described Get object comprises the Row Key of inquired about Lob data; Lob Corprocessor opens LobScanner object; LobScanner object determines the memory location of inquired about Lob data according to the Row Key in described Get object, and piecemeal reads each data block of Lob data in the memory location determined, and each data block is converted to KeyValue form, make server end that each data block of this KeyValue form is returned to client;
Or,
Comprise further: LobStore stores Lob data with the form of HFile file and MemStore form further;
Comprising further: Traversal Unit, for traveling through the Lob file in HDFS, and the multiple Lob data being less than Second Threshold traversed being merged in HFile file;
Comprise further: merge cells, for the deletion information of the multiple Lob data merging into HFile file is stored in MemStore.
Embodiments provide a kind of storage means and device of the Lob data based on HBase, when Lob data are stored into server end, by client according to preset value, piecemeal uploads the server end of Lob data to HBase, the data block uploaded according to piecemeal by server end stores, thus avoids the problem of server end internal memory spilling.In addition, in order to avoid client learns two kinds of memory technologies, Lob data can all to upload onto the server end with other structural datas by client, by server end by Lob data conversion storage in the Lob file in HDFS, client only need be programmed to the storage mode that data are stored to HBase, thus reduce the workload of client, improve the practicality of the storage mode being stored to HBase for data.
Accompanying drawing explanation
Fig. 1 is the method flow diagram that the embodiment of the present invention provides;
Fig. 2 is the method flow diagram that another embodiment of the present invention provides;
Fig. 3 is the method flow diagram that further embodiment of this invention provides;
Fig. 4 is the date storage method schematic diagram that the embodiment of the present invention provides;
Fig. 5 is the data enquire method schematic diagram that the embodiment of the present invention provides;
Fig. 6 is the client terminal structure schematic diagram that the embodiment of the present invention provides;
Fig. 7 is the server architecture schematic diagram that the embodiment of the present invention provides.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described.Obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.
As shown in Figure 1, a kind of storage means of the Lob data based on HBase is embodiments provided, according to HTable object extension LobTable object; Qualifier for identifying Lob data is set; Arrange the unloading system of the server corresponding to HBase to HDFS unloading data, the method can comprise the following steps:
Step 101: client is when sending storage resource request to LobTable object, and LobTable, when determining that storage resource request comprises Qualifier, determines that storage resource request carries Lob data, then create LobPut object.
Step 102: in the Lob output stream that the Lob data of carrying in storage resource request write LobPut object provides by client.
Step 103:LobPut object reception is written with the Lob output stream of Lob data, and receive at every turn meet the data block of first threshold time, the data block meeting this first threshold is generated corresponding Put object, and wherein, Put object comprises the block number of respective data blocks in Lob data.
Step 104: the Put object of generation is committed to server end, to make server end according to the order of block included in the Put object submitted to number, utilize the unloading mechanism arranged, by data block corresponding for each Put object successively unloading in the Lob file of HDFS.
According to such scheme, when Lob data are stored into server end, by client according to preset value, piecemeal uploads the server end of Lob data to HBase, the data block uploaded according to piecemeal by server end stores, thus avoids the problem of server end internal memory spilling.In addition, in order to avoid client learns two kinds of memory technologies, Lob data can all to upload onto the server end with other structural datas by client, by server end by Lob data conversion storage in the Lob file in HDFS, client only need be programmed to the storage mode that data are stored to HBase, thus reduce the workload of client, improve the practicality of the storage mode being stored to HBase for data.
As shown in Figure 2, embodiments provide a kind of storage means of the large object data based on HBase, create Lob Corprocessor, arrange the unloading system of the server corresponding to HBase to HDFS unloading data, the method can comprise the following steps:
Step 201: server end utilizes Lob Corprocessor piecemeal to receive each Put object of client piecemeal submission, and wherein, each Put object comprises the block number of respective data blocks in its corresponding Lob data.
Step 202: data block corresponding to the Put object received first, determine currently in the Lob file of HDFS system whether comprise the data relevant to this data block, if, utilize the unloading mechanism arranged by the data block corresponding to the Put object that receives first, relative data are upgraded; If not, utilize the unloading mechanism arranged, by the data block corresponding to the Put object that receives first, create in Lob file.
Step 203: and according to the order of block included in the Put object submitted to number, utilize the unloading mechanism arranged, the data block unloading that subsequently received Put object is corresponding respectively enters in Lob file, and according to the block number of each data block, each data chunk is combined into Lob data.
According to such scheme, the data block that server end is uploaded according to piecemeal carries out piecemeal storage, thus avoids the problem of server end internal memory spilling.In addition, in order to avoid client learns two kinds of memory technologies, Lob data can all to upload onto the server end with other structural datas by client, by server end by Lob data conversion storage in the Lob file in HDFS, client only need be programmed to the storage mode that data are stored to HBase, thus reduce the workload of client, improve the practicality of the storage mode being stored to HBase for data.
For making the object, technical solutions and advantages of the present invention clearly, below in conjunction with drawings and the specific embodiments, the present invention is described in further detail.
As shown in Figure 3, embodiments provide a kind of storage means of the large object data based on HBase, the method can comprise the following steps:
Step 301: build memory model, server end is that each Region associates a corresponding LobStore, and arranges the unloading mechanism of server end to HDFS unloading data of HBase.
As shown in Figure 4, in the present embodiment, LobStore can use following three kinds of forms to store when storage object:
Lob file: Lob file is stored in HDFS, when client is when storing Lob object to server end, by server end the unloading of Lob object entered in the Lob file in HDFS, wherein, the original value (binary value of video file) of Lob object is directly stored in HDFS, and each Put value is all directly write lob file.Wherein, the filename being stored as Lob file can comprise: row+ row race+row name; Its catalogue is: table catalogue/region/Lob.
HFile: for by little Lob group of objects synthesis HFile, avoid producing a large amount of small object file on HDFS.This class file produces when gauge pressure contracting.Wherein, the key value being stored as HFile can comprise: traffic table row+f+q, Value value can comprise: little Lob value or DEL.
The Deleted of MemStore: record Lob object, its effect is used for when gauge pressure contracting removing the little Lob object deleted.
In the present embodiment, can carrying out as given a definition to Lob row: at [Family+Qualifier] level definition of HBase Table, stating whether Lob arranges certain Qualifier shown.When receiving storage resource request or inquiry request, can according to this Qualifier determine the data that store or inquire about whether be Lob object.
Further, the threshold value of " large Lob data " and " little Lob data " can be defined, such as, this threshold value is 10M, the Lob data being less than this threshold value are referred to as " little Lob data ", these little Lob data use HFile form to store, and the Lob data being greater than this threshold value are referred to as " large Lob data ", and these large Lob data are stored as single Lob file to list in HDFS.
Step 302: client realizes data storage to the server end of HBase.
As shown in Figure 4, in the present embodiment, client can realize data storage to the server end of HBase or upgrade (put) in the following way:
1, client sends to LobTable object and stores put request.
2, LobTable is when determining carrying structure data in put request, and this structural data, as the data of bivariate table type, is submitted to this Put to ask to server end, and performed 3 steps; When determining to carry unstructured data in put request, this unstructured data, as Lob data, performs 4 steps.
3, by structural data normal storage in the HBase Store in HBase database.
4, LobTable creates LobPut object.
5, in the Lob output stream that the Lob data write LobPut object that client is carried in being asked by put provides.Wherein, the data stream that this LobPut object that this Lob output stream can be regarded as client sends to server end.
6, LobPut object reception is written with the Lob output stream of Lob data, and piecemeal reads Lob data, and the data block of the Lob data read by piecemeal generates corresponding Put object respectively.Wherein, the threshold value setting piecemeal reading can be 4M.Such as, corresponding during the full 4M of each reading Lob data data block.Wherein, when reading the data block of last Lob data, this data block may be discontented with 4M.
7, each Put object is committed to the HRegion of server end, wherein, Put object comprises the block number of respective data blocks in Lob data.
8, the Lob Corprocessor piecemeal that server end creates receives data block corresponding to each Put object.
9, Lob Corprocessor utilizes the unloading mechanism arranged, by the data block corresponding to each Put object of receiving successively unloading in the Lob file in HDFS.
In this step 9, Lob Corprocessor is data block corresponding to the Put object received first, determine currently in the Lob file of HDFS system whether comprise the data relevant to this data block, if, utilize the unloading mechanism arranged by the data block corresponding to the Put object that receives first, relative data are upgraded; If not, utilize the unloading mechanism arranged, by the data block corresponding to the Put object that receives first, create in Lob file; And according to the order of block included in the Put object submitted to number, utilize the unloading mechanism arranged, the data block unloading that subsequently received Put object is corresponding respectively enters in Lob file, and according to the block number of each data block, each data chunk is combined into Lob data.
After above-mentioned steps process terminates, can also comprising: abnormal by calling the dish out subclass of a specific Do Not Retry IO Exception of PrePut object, to stop Lob Corprocessor, the data block corresponding to each Put object being written to the operation of HFile.
Wherein, API:LobTable, LobPut that client 2 of newly defining are new, inherit respectively from the original class of HBase: HTable and Put.Wherein, LobTable, LobPut can be interpreted as the interface in client.In addition, client, before storage Lob data, can also submit to lob train value to be the Put object of byte [0], to be stored in HBase by major key.
Step 303: client realizes realizing data query to the server end of HBase.
As shown in Figure 5, in the present embodiment, when client realizes the data query to the server end of HBase, following three API:LobTable, LobResult and LobGet can be expanded, data query (scan) can be realized in the following way:
1, client sends Scan request to LobTable object.
2, LobTable object submits to this Scan to ask to server end.
3, the HRegion of server end returns Result according to this Scan request to client.Wherein, this Result comprises the Row Key that scan asks the Lob data of inquiring about.
4, LobTable object comprises Row Key according to Result, Result is encapsulated as LobResult object.
5, LobResult object generates LobGet object according to the Row Key of inquired about Lob data.
6, the start address of each the blocks of data block of LobGet object corresponding to inquired about Lob data reads Lob data to server end piecemeal.Wherein, LobGet object also needs piecemeal to read Lob data, and the threshold value that piecemeal reads can be the start address of 4M, LobGet object according to each data block of Lob data, reads to server end piecemeal.
7, HRegion receives the Get object that client is submitted to by LobGet, and Get object comprises the Row Key of inquired about Lob data.
8, Lob Corprocessor opens LobScanner object.
9, LobScanner object is according to the Row Key in Get object, determine that the memory location of Lob data is HFile or Lob files, if memory location is HFile, then directly read the LobGet object that KeyValue value returns to client, and return to client EOF terminate mark, if memory location is Lob file, LobScanner object is according to the start address of each data block, read each database, and each data block is converted to KeyValue form, server end is made each data block of this KeyValue form to be returned to LobGet object in client.And return to client EOF terminate mark.
10, data are read in the inlet flow that client provides according to LobGet object.Wherein, this inlet flow can be regarded as LobPut object obtains data data stream from server end.
In the present embodiment, due to not sure to the size of the Lob data in some application scenarios, the file stored in such as net dish application may have a few kb, also may have hundreds of MB or upper GB.If all Lob data be all stored in the Lob file in HDFS, so too much small documents may affect the efficiency of HDFS system.Therefore, the little Lob data in Lob file in HDFS can be carried out compact (merging).
Due to the problem for little Lob data, hadoop self provides three kinds of solutions: HadoopArchive, Sequence File and CombineFileInputFormat.But in these schemes, small documents just cannot be deleted and revise once be merged in large archive file, unless repacked.Therefore, in the present embodiment, the Lob file in HDFS can be traveled through, the little Lob data of multiple 10M of being less than are merged, and ensure that the file merged is not more than another threshold value, e.g., 1G, and the file after merging is stored in HBase, storage format is HFile.In fact the content of HFile also can not revise deletion, to the little lob KV being labeled as DELETE, or is cleared up by Major Compact.Namely travel through multiple HFile, cleaning KV also generates new HFile file.
In the present embodiment, can judge whether that lob arranges by Lob Corprocessor to delete (comprise Delete Entire Row, whole row race deletes), then generate the delete object of lob row, and be stored into memstore, delete the lob file (if there is) of hdfs simultaneously.
In the present embodiment, because LobStore associates with a Region, therefore when Region carries out split (fractionation), LobStore also needs split.Wherein, the split of Lob file only needs to move to different directories.In addition, the split of HFile then needs whole traversal, and carries out splitting different Region according to split key.
As shown in Figure 6, the embodiment of the present invention additionally provides a kind of client, comprising:
Expanding element 601, for according to HTable object extension LobTable object; Qualifier for identifying Lob data is set;
Processing unit 602, for when sending storage resource request to LobTable object, LobTable, when determining that described storage resource request comprises Qualifier, determines that described storage resource request carries Lob data, then create LobPut object;
Writing unit 603, in the Lob output stream that provided by the Lob carried in described storage resource request data write LobPut object;
Receiving element 604, for the Lob output stream utilizing LobPut object reception to be written with Lob data, and receive at every turn meet the data block of first threshold time, the data block meeting this first threshold is generated corresponding Put object, wherein, Put object comprises the block number of respective data blocks in Lob data;
Commit unit 605, for the Put object of generation is committed to server end, to make the server of server end corresponding to the HBase pre-set to the unloading system of HDFS unloading data, and according to the order of block included in the Put object submitted to number, by data block corresponding for each Put object successively unloading in the Lob file of HDFS.
Comprise further: data query unit 606, for calling LobTable object to the request of server end submit Query, wherein, described inquiry request is used for the Lob data of querying server end, and the Result that reception server end returns for described inquiry request, described Result comprise the Row Key of the Lob data that described inquiry request is inquired about; Described Result is encapsulated as LobResult object by LobTable object, LobResult object generates LobGet object according to the Row Key of inquired about Lob data, reads Lob data to make the start address of each blocks of data block in Lob data of LobGet object corresponding to inquired about Lob data to server end piecemeal.
As shown in Figure 7, the embodiment of the present invention additionally provides a kind of server, comprising:
Storage unit 701, for creating Lob Corprocessor, arranges the unloading system of the server corresponding to HBase to HDFS unloading data;
Receiving element 702, for each Put object utilizing Lob Corprocessor piecemeal to receive the submission of client piecemeal, wherein, each Put object comprises the block number of respective data blocks in its corresponding Lob data;
Processing unit 703, for data block corresponding to the Put object received first, determine currently in the Lob file of HDFS system whether comprise the data relevant to this data block, if, utilize the unloading mechanism arranged by the data block corresponding to the Put object that receives first, relative data are upgraded; If not, utilize the unloading mechanism arranged, by the data block corresponding to the Put object that receives first, create in Lob file;
Unloading unit 704, for the order according to block included in the Put object submitted to number, utilize the unloading mechanism arranged, the data block unloading that subsequently received Put object is corresponding respectively enters in Lob file, and according to the block number of each data block, each data chunk is combined into Lob data.
Comprise further: stop unit 705, for abnormal by calling the dish out subclass of a specific Do Not Retry IO Exception of PrePut object, to stop Lob Corprocessor, the data block corresponding to each Put object is written to the operation of HBase.
Described storage unit, for creating the LobScanner object reading data;
Comprise further: query unit 706, for receiving the Get object that client is submitted to by LobGet, described Get object comprises the Row Key of inquired about Lob data; Lob Corprocessor opens LobScanner object; LobScanner object determines the memory location of inquired about Lob data according to the Row Key in described Get object, and piecemeal reads each data block of Lob data in the memory location determined, and each data block is converted to KeyValue form, make server end that each data block of this KeyValue form is returned to client;
Or,
Further, LobStore stores Lob data with the form of HFile file and MemStore form further;
Comprising further: Traversal Unit 707, for traveling through the Lob file in HDFS, and the multiple Lob data being less than Second Threshold traversed being merged in HFile file;
Comprise further: merge cells 708, for the deletion information of the multiple Lob data merging into HFile file is stored in MemStore.
As mentioned above, the embodiment of the present invention at least can comprise following beneficial effect:
1, when Lob data are stored into server end, by client according to preset value, piecemeal uploads the server end of Lob data to HBase, and the data block uploaded according to piecemeal by server end stores, thus avoids the problem of server end internal memory spilling.
2, in order to avoid client learns two kinds of memory technologies, Lob data can all to upload onto the server end with other structural datas by client, by server end by Lob data conversion storage in the Lob file in HDFS, client only need be programmed to the storage mode that data are stored to HBase, thus reduce the workload of client, improve the practicality of the storage mode being stored to HBase for data.
The content such as information interaction, implementation between each unit in the said equipment, due to the inventive method embodiment based on same design, particular content can see in the inventive method embodiment describe, repeat no more herein.
It should be noted that, in this article, the relational terms of such as first and second and so on is only used for an entity or operation to separate with another entity or operational zone, and not necessarily requires or imply the relation that there is any this reality between these entities or operation or sequentially.And, term " comprises ", " comprising " or its any other variant are intended to contain comprising of nonexcludability, thus make to comprise the process of a series of key element, method, article or equipment and not only comprise those key elements, but also comprise other key elements clearly do not listed, or also comprise by the intrinsic key element of this process, method, article or equipment.When not more restrictions, the key element " being comprised " limited by statement, and be not precluded within process, method, article or the equipment comprising described key element and also there is other same factor.
One of ordinary skill in the art will appreciate that: all or part of step realizing said method embodiment can have been come by the hardware that programmed instruction is relevant, aforesaid program can be stored in the storage medium of embodied on computer readable, this program, when performing, performs the step comprising said method embodiment; And aforesaid storage medium comprises: ROM, RAM, magnetic disc or CD etc. various can be program code stored medium in.
Finally it should be noted that: the foregoing is only preferred embodiment of the present invention, only for illustration of technical scheme of the present invention, be not intended to limit protection scope of the present invention.All any amendments done within the spirit and principles in the present invention, equivalent replacement, improvement etc., be all included in protection scope of the present invention.

Claims (10)

1. based on a storage means for the Lob data of HBase, it is characterized in that, according to HTable object extension LobTable object; Qualifier for identifying Lob data is set; Server corresponding to HBase is set to the unloading system of HDFS unloading data, also comprises:
Client is when sending storage resource request to LobTable object, and LobTable, when determining that described storage resource request comprises Qualifier, determines that described storage resource request carries Lob data, then create LobPut object;
In the Lob output stream that the Lob data of carrying in described storage resource request write LobPut object provides by client;
LobPut object reception is written with the Lob output stream of Lob data, and receive at every turn meet the data block of first threshold time, the data block meeting this first threshold is generated corresponding Put object, and wherein, Put object comprises the block number of respective data blocks in Lob data;
The Put object of generation is committed to server end, to make server end according to the order of block included in the Put object submitted to number, utilizes the unloading mechanism arranged, by data block corresponding for each Put object successively unloading in the Lob file of HDFS.
2. method according to claim 1, is characterized in that,
Comprise further: client call LobTable object submits scan request to server end, wherein, described inquiry request is used for the Lob data of querying server end, and the Result that reception server end returns for described inquiry request, described Result comprise the Row Key of the Lob data that described inquiry request is inquired about; Described Result is encapsulated as LobResult object by LobTable object, LobResult object generates LobGet object according to the Row Key of inquired about Lob data, reads Lob data to make the start address of each blocks of data block in Lob data of LobGet object corresponding to inquired about Lob data to server end piecemeal.
3. based on a storage means for the large object data of HBase, it is characterized in that, create LobCorprocessor, server corresponding to HBase is set to the unloading system of HDFS unloading data, also comprises:
Server end utilizes Lob Corprocessor piecemeal to receive each Put object of client piecemeal submission, and wherein, each Put object comprises the block number of respective data blocks in its corresponding Lob data;
Data block corresponding to the Put object received first, determine currently in the Lob file of HDFS system whether comprise the data relevant to this data block, if so, utilize the unloading of setting mechanism by the data block corresponding to the Put object that receives first, relative data are upgraded; If not, utilize the unloading mechanism arranged, by the data block corresponding to the Put object that receives first, create in Lob file;
And according to the order of block included in the Put object submitted to number, utilize the unloading mechanism arranged, the data block unloading that subsequently received Put object is corresponding respectively enters in Lob file, and according to the block number of each data block, each data chunk is combined into Lob data.
4. method according to claim 3, is characterized in that,
Comprising further: abnormal by calling the dish out subclass of a specific Do Not Retry IOException of PrePut object, to stop Lob Corprocessor, the data block corresponding to each Put object being written to the operation of HBase.
5. the method according to claim 3 or 4, is characterized in that,
Comprise further: create the LobScanner object reading data;
Comprise further: receive the Get object that client is submitted to by LobGet, described Get object comprises the Row Key of inquired about Lob data; Lob Corprocessor opens LobScanner object; LobScanner object determines the memory location of inquired about Lob data according to the Row Key in described Get object, and piecemeal reads each data block of Lob data in the memory location determined, and each data block is converted to KeyValue form, make server end that each data block of this KeyValue form is returned to client;
Or,
Comprise further: LobStore stores Lob data with the form of HFile file and MemStore form further;
Comprise further: the Lob file in traversal HDFS, and the multiple Lob data being less than Second Threshold traversed are merged in HFile file;
Comprise further: the deletion information of the multiple Lob data merging into HFile file is stored in MemStore.
6. a client, is characterized in that, comprising:
Expanding element, for according to HTable object extension LobTable object; Qualifier for identifying Lob data is set;
Processing unit, for when sending storage resource request to LobTable object, LobTable, when determining that described storage resource request comprises Qualifier, determines that described storage resource request carries Lob data, then create LobPut object;
Writing unit, in the Lob output stream that provided by the Lob carried in described storage resource request data write LobPut object;
Receiving element, for the Lob output stream utilizing LobPut object reception to be written with Lob data, and receive at every turn meet the data block of first threshold time, the data block meeting this first threshold is generated corresponding Put object, wherein, Put object comprises the block number of respective data blocks in Lob data;
Commit unit, for the Put object of generation is committed to server end, to make the server of server end corresponding to the HBase pre-set to the unloading system of HDFS unloading data, and according to the order of block included in the Put object submitted to number, by data block corresponding for each Put object successively unloading in the Lob file of HDFS.
7. client according to claim 6, is characterized in that,
Comprise further: data query unit, for calling LobTable object to the request of server end submit Query, wherein, described inquiry request is used for the Lob data of querying server end, and the Result that reception server end returns for described inquiry request, described Result comprise the Row Key of the Lob data that described inquiry request is inquired about; Described Result is encapsulated as LobResult object by LobTable object, LobResult object generates LobGet object according to the Row Key of inquired about Lob data, reads Lob data to make the start address of each blocks of data block in Lob data of LobGet object corresponding to inquired about Lob data to server end piecemeal.
8. a server, is characterized in that, comprising:
Storage unit, for creating Lob Corprocessor, arranges the unloading system of the server corresponding to HBase to HDFS unloading data;
Receiving element, for each Put object utilizing Lob Corprocessor piecemeal to receive the submission of client piecemeal, wherein, each Put object comprises the block number of respective data blocks in its corresponding Lob data;
Processing unit, for data block corresponding to the Put object received first, determine currently in the Lob file of HDFS system whether comprise the data relevant to this data block, if, utilize the unloading mechanism arranged by the data block corresponding to the Put object that receives first, relative data are upgraded; If not, utilize the unloading mechanism arranged, by the data block corresponding to the Put object that receives first, create in Lob file;
Unloading unit, for the order according to block included in the Put object submitted to number, utilize the unloading mechanism arranged, the data block unloading that subsequently received Put object is corresponding respectively enters in Lob file, and according to the block number of each data block, each data chunk is combined into Lob data.
9. server according to claim 8, is characterized in that,
Comprise further: stop unit, for abnormal by calling the dish out subclass of a specific DoNot Retry IO Exception of PrePut object, to stop Lob Corprocessor, the data block corresponding to each Put object is written to the operation of HBase.
10. server according to claim 8 or claim 9, is characterized in that,
Described storage unit, for creating the LobScanner object reading data;
Comprise further: query unit, for receiving the Get object that client is submitted to by LobGet, described Get object comprises the Row Key of inquired about Lob data; Lob Corprocessor opens LobScanner object; LobScanner object determines the memory location of inquired about Lob data according to the Row Key in described Get object, and piecemeal reads each data block of Lob data in the memory location determined, and each data block is converted to KeyValue form, make server end that each data block of this KeyValue form is returned to client;
Or,
Comprise further: LobStore stores Lob data with the form of HFile file and MemStore form further;
Comprising further: Traversal Unit, for traveling through the Lob file in HDFS, and the multiple Lob data being less than Second Threshold traversed being merged in HFile file;
Comprise further: merge cells, for the deletion information of the multiple Lob data merging into HFile file is stored in MemStore.
CN201510144162.7A 2015-03-30 2015-03-30 The storage method and device of a kind of Lob data based on HBase Active CN104750815B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510144162.7A CN104750815B (en) 2015-03-30 2015-03-30 The storage method and device of a kind of Lob data based on HBase

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510144162.7A CN104750815B (en) 2015-03-30 2015-03-30 The storage method and device of a kind of Lob data based on HBase

Publications (2)

Publication Number Publication Date
CN104750815A true CN104750815A (en) 2015-07-01
CN104750815B CN104750815B (en) 2017-11-03

Family

ID=53590499

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510144162.7A Active CN104750815B (en) 2015-03-30 2015-03-30 The storage method and device of a kind of Lob data based on HBase

Country Status (1)

Country Link
CN (1) CN104750815B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107391653A (en) * 2016-09-21 2017-11-24 广州特道信息科技有限公司 A kind of distributed NewSQL Database Systems and image data storage method
CN111367857A (en) * 2020-03-03 2020-07-03 中国联合网络通信集团有限公司 Data storage method and device, FTP server and storage medium
CN111694847A (en) * 2020-06-04 2020-09-22 贵州易鲸捷信息技术有限公司 Updating access method with high concurrency and low delay for extra-large LOB data
CN112214458A (en) * 2020-10-19 2021-01-12 珠海金山网络游戏科技有限公司 Data transfer storage method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102750300A (en) * 2011-12-27 2012-10-24 浙江大学 High-performance unstructured data access protocol supporting multi-granularity searching.
CN103299267A (en) * 2010-12-20 2013-09-11 销售力网络公司 Methods and systems for performing cross store joins in a multi-tenant store
WO2014031618A2 (en) * 2012-08-22 2014-02-27 Bitvore Corp. Data relationships storage platform
US20140081918A1 (en) * 2012-09-18 2014-03-20 Mapr Technologies, Inc. Table format for map reduce system
US20150058354A1 (en) * 2013-08-20 2015-02-26 Raytheon Company System and methods for storing and analyzing geographically-referenced data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103299267A (en) * 2010-12-20 2013-09-11 销售力网络公司 Methods and systems for performing cross store joins in a multi-tenant store
CN102750300A (en) * 2011-12-27 2012-10-24 浙江大学 High-performance unstructured data access protocol supporting multi-granularity searching.
WO2014031618A2 (en) * 2012-08-22 2014-02-27 Bitvore Corp. Data relationships storage platform
US20140081918A1 (en) * 2012-09-18 2014-03-20 Mapr Technologies, Inc. Table format for map reduce system
US20150058354A1 (en) * 2013-08-20 2015-02-26 Raytheon Company System and methods for storing and analyzing geographically-referenced data

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107391653A (en) * 2016-09-21 2017-11-24 广州特道信息科技有限公司 A kind of distributed NewSQL Database Systems and image data storage method
CN111367857A (en) * 2020-03-03 2020-07-03 中国联合网络通信集团有限公司 Data storage method and device, FTP server and storage medium
CN111694847A (en) * 2020-06-04 2020-09-22 贵州易鲸捷信息技术有限公司 Updating access method with high concurrency and low delay for extra-large LOB data
CN112214458A (en) * 2020-10-19 2021-01-12 珠海金山网络游戏科技有限公司 Data transfer storage method and device
CN112214458B (en) * 2020-10-19 2023-08-11 珠海金山数字网络科技有限公司 Data transfer and storage method and device

Also Published As

Publication number Publication date
CN104750815B (en) 2017-11-03

Similar Documents

Publication Publication Date Title
JP6778795B2 (en) Methods, devices and systems for storing data
US8683112B2 (en) Asynchronous distributed object uploading for replicated content addressable storage clusters
US8200633B2 (en) Database backup and restore with integrated index reorganization
CN102317938B (en) Asynchronous distributed de-duplication for replicated content addressable storage clusters
CN103714123B (en) Enterprise's cloud memory partitioning object data de-duplication and restructuring version control method
US8103621B2 (en) HSM two-way orphan reconciliation for extremely large file systems
US8321487B1 (en) Recovery of directory information
CN106484820B (en) Renaming method, access method and device
CN102375853A (en) Distributed database system, method for building index therein and query method
US20140101167A1 (en) Creation of Inverted Index System, and Data Processing Method and Apparatus
CN102495894A (en) Method, device and system for searching repeated data
CN109284273B (en) Massive small file query method and system adopting suffix array index
CN103279502B (en) A kind of framework and method with the data de-duplication file system be combined with parallel file system
CN103955530A (en) Data reconstruction and optimization method of on-line repeating data deletion system
CN104978330A (en) Data storage method and device
US20130191328A1 (en) Standardized framework for reporting archived legacy system data
CN110888837B (en) Object storage small file merging method and device
CN109739828B (en) Data processing method and device and computer readable storage medium
Su et al. Taming massive distributed datasets: data sampling using bitmap indices
CN104881466A (en) Method and device for processing data fragments and deleting garbage files
CN104750815A (en) Lob data storing method and device based on HBase
CN112084190A (en) Big data based acquired data real-time storage and management system and method
CN105224663A (en) A kind of data-accessing tasks management method based on multiple data source and device
CN112912870A (en) Tenant identifier conversion
CN110633261A (en) Picture storage method, picture query method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20180807

Address after: 250100 S06 tower, 1036, Chao Lu Road, hi tech Zone, Ji'nan, Shandong.

Patentee after: Shandong wave cloud Mdt InfoTech Ltd

Address before: No. 1036, Shandong high tech Zone wave road, Ji'nan, Shandong

Patentee before: Inspur Group Co., Ltd.

TR01 Transfer of patent right
CP03 Change of name, title or address

Address after: 250100 No. 1036 Tidal Road, Jinan High-tech Zone, Shandong Province, S01 Building, Tidal Science Park

Patentee after: Inspur cloud Information Technology Co., Ltd

Address before: 250100 Ji'nan science and technology zone, Shandong high tide Road, No. 1036 wave of science and Technology Park, building S06

Patentee before: SHANDONG LANGCHAO YUNTOU INFORMATION TECHNOLOGY Co.,Ltd.

CP03 Change of name, title or address