Summary of the invention
In view of this, the invention provides a kind of storage means and device of the Lob data based on HBase, under the prerequisite ensureing practicality, realize the problem that large object data stores.
The invention provides a kind of storage means of the Lob data based on HBase, according to HTable object extension LobTable object; Qualifier for identifying Lob data is set; Server corresponding to HBase is set to the unloading system of HDFS unloading data, also comprises:
Client is when sending storage resource request to LobTable object, and LobTable, when determining that described storage resource request comprises Qualifier, determines that described storage resource request carries Lob data, then create LobPut object;
In the Lob output stream that the Lob data of carrying in described storage resource request write LobPut object provides by client;
LobPut object reception is written with the Lob output stream of Lob data, and receive at every turn meet the data block of first threshold time, the data block meeting this first threshold is generated corresponding Put object, and wherein, Put object comprises the block number of respective data blocks in Lob data;
The Put object of generation is committed to server end, to make server end according to the order of block included in the Put object submitted to number, utilizes the unloading mechanism arranged, by data block corresponding for each Put object successively unloading in the Lob file of HDFS.
Preferably,
Comprise further: client call LobTable object submits scan request to server end, wherein, described inquiry request is used for the Lob data of querying server end, and the Result that reception server end returns for described inquiry request, described Result comprise the Row Key of the Lob data that described inquiry request is inquired about; Described Result is encapsulated as LobResult object by LobTable object, LobResult object generates LobGet object according to the Row Key of inquired about Lob data, reads Lob data to make the start address of each blocks of data block in Lob data of LobGet object corresponding to inquired about Lob data to server end piecemeal.
Present invention also offers a kind of storage means of the large object data based on HBase, create LobCorprocessor, server corresponding to HBase is set to the unloading system of HDFS unloading data, also comprises:
Server end utilizes Lob Corprocessor piecemeal to receive each Put object of client piecemeal submission, and wherein, each Put object comprises the block number of respective data blocks in its corresponding Lob data;
Data block corresponding to the Put object received first, determine currently in the Lob file of HDFS system whether comprise the data relevant to this data block, if so, utilize the unloading of setting mechanism by the data block corresponding to the Put object that receives first, relative data are upgraded; If not, utilize the unloading mechanism arranged, by the data block corresponding to the Put object that receives first, create in Lob file;
And according to the order of block included in the Put object submitted to number, utilize the unloading mechanism arranged, the data block unloading that subsequently received Put object is corresponding respectively enters in Lob file, and according to the block number of each data block, each data chunk is combined into Lob data.
Preferably, comprising further: abnormal by calling the dish out subclass of a specific Do Not RetryIO Exception of PrePut object, to stop Lob Corprocessor, the data block corresponding to each Put object being written to the operation of HBase.
Preferably,
Comprise further: create the LobScanner object reading data;
Comprise further: receive the Get object that client is submitted to by LobGet, described Get object comprises the Row Key of inquired about Lob data; Lob Corprocessor opens LobScanner object; LobScanner object determines the memory location of inquired about Lob data according to the Row Key in described Get object, and piecemeal reads each data block of Lob data in the memory location determined, and each data block is converted to KeyValue form, make server end that each data block of this KeyValue form is returned to client;
Or,
Comprise further: LobStore stores Lob data with the form of HFile file and MemStore form further;
Comprise further: the Lob file in traversal HDFS, and the multiple Lob data being less than Second Threshold traversed are merged in HFile file;
Comprise further: the deletion information of the multiple Lob data merging into HFile file is stored in MemStore.
Present invention also offers a kind of client, comprising:
Expanding element, for according to HTable object extension LobTable object; Qualifier for identifying Lob data is set;
Processing unit, for when sending storage resource request to LobTable object, LobTable, when determining that described storage resource request comprises Qualifier, determines that described storage resource request carries Lob data, then create LobPut object;
Writing unit, in the Lob output stream that provided by the Lob carried in described storage resource request data write LobPut object;
Receiving element, for the Lob output stream utilizing LobPut object reception to be written with Lob data, and receive at every turn meet the data block of first threshold time, the data block meeting this first threshold is generated corresponding Put object, wherein, Put object comprises the block number of respective data blocks in Lob data;
Commit unit, for the Put object of generation is committed to server end, to make the server of server end corresponding to the HBase pre-set to the unloading system of HDFS unloading data, and according to the order of block included in the Put object submitted to number, by data block corresponding for each Put object successively unloading in the Lob file of HDFS.
Preferably,
Comprise further: data query unit, for calling LobTable object to the request of server end submit Query, wherein, described inquiry request is used for the Lob data of querying server end, and the Result that reception server end returns for described inquiry request, described Result comprise the Row Key of the Lob data that described inquiry request is inquired about; Described Result is encapsulated as LobResult object by LobTable object, LobResult object generates LobGet object according to the Row Key of inquired about Lob data, reads Lob data to make the start address of each blocks of data block in Lob data of LobGet object corresponding to inquired about Lob data to server end piecemeal.
Present invention also offers a kind of server, comprising:
Storage unit, for creating Lob Corprocessor, arranges the unloading system of the server corresponding to HBase to HDFS unloading data;
Receiving element, for each Put object utilizing Lob Corprocessor piecemeal to receive the submission of client piecemeal, wherein, each Put object comprises the block number of respective data blocks in its corresponding Lob data;
Processing unit, for data block corresponding to the Put object received first, determine currently in the Lob file of HDFS system whether comprise the data relevant to this data block, if, utilize the unloading mechanism arranged by the data block corresponding to the Put object that receives first, relative data are upgraded; If not, utilize the unloading mechanism arranged, by the data block corresponding to the Put object that receives first, create in Lob file;
Unloading unit, for the order according to block included in the Put object submitted to number, utilize the unloading mechanism arranged, the data block unloading that subsequently received Put object is corresponding respectively enters in Lob file, and according to the block number of each data block, each data chunk is combined into Lob data.
Preferably,
Comprise further: stop unit, for abnormal by calling the dish out subclass of a specific DoNot Retry IO Exception of PrePut object, to stop Lob Corprocessor, the data block corresponding to each Put object is written to the operation of HBase.
Preferably,
Described storage unit, for creating the LobScanner object reading data;
Comprise further: query unit, for receiving the Get object that client is submitted to by LobGet, described Get object comprises the Row Key of inquired about Lob data; Lob Corprocessor opens LobScanner object; LobScanner object determines the memory location of inquired about Lob data according to the Row Key in described Get object, and piecemeal reads each data block of Lob data in the memory location determined, and each data block is converted to KeyValue form, make server end that each data block of this KeyValue form is returned to client;
Or,
Comprise further: LobStore stores Lob data with the form of HFile file and MemStore form further;
Comprising further: Traversal Unit, for traveling through the Lob file in HDFS, and the multiple Lob data being less than Second Threshold traversed being merged in HFile file;
Comprise further: merge cells, for the deletion information of the multiple Lob data merging into HFile file is stored in MemStore.
Embodiments provide a kind of storage means and device of the Lob data based on HBase, when Lob data are stored into server end, by client according to preset value, piecemeal uploads the server end of Lob data to HBase, the data block uploaded according to piecemeal by server end stores, thus avoids the problem of server end internal memory spilling.In addition, in order to avoid client learns two kinds of memory technologies, Lob data can all to upload onto the server end with other structural datas by client, by server end by Lob data conversion storage in the Lob file in HDFS, client only need be programmed to the storage mode that data are stored to HBase, thus reduce the workload of client, improve the practicality of the storage mode being stored to HBase for data.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described.Obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.
As shown in Figure 1, a kind of storage means of the Lob data based on HBase is embodiments provided, according to HTable object extension LobTable object; Qualifier for identifying Lob data is set; Arrange the unloading system of the server corresponding to HBase to HDFS unloading data, the method can comprise the following steps:
Step 101: client is when sending storage resource request to LobTable object, and LobTable, when determining that storage resource request comprises Qualifier, determines that storage resource request carries Lob data, then create LobPut object.
Step 102: in the Lob output stream that the Lob data of carrying in storage resource request write LobPut object provides by client.
Step 103:LobPut object reception is written with the Lob output stream of Lob data, and receive at every turn meet the data block of first threshold time, the data block meeting this first threshold is generated corresponding Put object, and wherein, Put object comprises the block number of respective data blocks in Lob data.
Step 104: the Put object of generation is committed to server end, to make server end according to the order of block included in the Put object submitted to number, utilize the unloading mechanism arranged, by data block corresponding for each Put object successively unloading in the Lob file of HDFS.
According to such scheme, when Lob data are stored into server end, by client according to preset value, piecemeal uploads the server end of Lob data to HBase, the data block uploaded according to piecemeal by server end stores, thus avoids the problem of server end internal memory spilling.In addition, in order to avoid client learns two kinds of memory technologies, Lob data can all to upload onto the server end with other structural datas by client, by server end by Lob data conversion storage in the Lob file in HDFS, client only need be programmed to the storage mode that data are stored to HBase, thus reduce the workload of client, improve the practicality of the storage mode being stored to HBase for data.
As shown in Figure 2, embodiments provide a kind of storage means of the large object data based on HBase, create Lob Corprocessor, arrange the unloading system of the server corresponding to HBase to HDFS unloading data, the method can comprise the following steps:
Step 201: server end utilizes Lob Corprocessor piecemeal to receive each Put object of client piecemeal submission, and wherein, each Put object comprises the block number of respective data blocks in its corresponding Lob data.
Step 202: data block corresponding to the Put object received first, determine currently in the Lob file of HDFS system whether comprise the data relevant to this data block, if, utilize the unloading mechanism arranged by the data block corresponding to the Put object that receives first, relative data are upgraded; If not, utilize the unloading mechanism arranged, by the data block corresponding to the Put object that receives first, create in Lob file.
Step 203: and according to the order of block included in the Put object submitted to number, utilize the unloading mechanism arranged, the data block unloading that subsequently received Put object is corresponding respectively enters in Lob file, and according to the block number of each data block, each data chunk is combined into Lob data.
According to such scheme, the data block that server end is uploaded according to piecemeal carries out piecemeal storage, thus avoids the problem of server end internal memory spilling.In addition, in order to avoid client learns two kinds of memory technologies, Lob data can all to upload onto the server end with other structural datas by client, by server end by Lob data conversion storage in the Lob file in HDFS, client only need be programmed to the storage mode that data are stored to HBase, thus reduce the workload of client, improve the practicality of the storage mode being stored to HBase for data.
For making the object, technical solutions and advantages of the present invention clearly, below in conjunction with drawings and the specific embodiments, the present invention is described in further detail.
As shown in Figure 3, embodiments provide a kind of storage means of the large object data based on HBase, the method can comprise the following steps:
Step 301: build memory model, server end is that each Region associates a corresponding LobStore, and arranges the unloading mechanism of server end to HDFS unloading data of HBase.
As shown in Figure 4, in the present embodiment, LobStore can use following three kinds of forms to store when storage object:
Lob file: Lob file is stored in HDFS, when client is when storing Lob object to server end, by server end the unloading of Lob object entered in the Lob file in HDFS, wherein, the original value (binary value of video file) of Lob object is directly stored in HDFS, and each Put value is all directly write lob file.Wherein, the filename being stored as Lob file can comprise: row+ row race+row name; Its catalogue is: table catalogue/region/Lob.
HFile: for by little Lob group of objects synthesis HFile, avoid producing a large amount of small object file on HDFS.This class file produces when gauge pressure contracting.Wherein, the key value being stored as HFile can comprise: traffic table row+f+q, Value value can comprise: little Lob value or DEL.
The Deleted of MemStore: record Lob object, its effect is used for when gauge pressure contracting removing the little Lob object deleted.
In the present embodiment, can carrying out as given a definition to Lob row: at [Family+Qualifier] level definition of HBase Table, stating whether Lob arranges certain Qualifier shown.When receiving storage resource request or inquiry request, can according to this Qualifier determine the data that store or inquire about whether be Lob object.
Further, the threshold value of " large Lob data " and " little Lob data " can be defined, such as, this threshold value is 10M, the Lob data being less than this threshold value are referred to as " little Lob data ", these little Lob data use HFile form to store, and the Lob data being greater than this threshold value are referred to as " large Lob data ", and these large Lob data are stored as single Lob file to list in HDFS.
Step 302: client realizes data storage to the server end of HBase.
As shown in Figure 4, in the present embodiment, client can realize data storage to the server end of HBase or upgrade (put) in the following way:
1, client sends to LobTable object and stores put request.
2, LobTable is when determining carrying structure data in put request, and this structural data, as the data of bivariate table type, is submitted to this Put to ask to server end, and performed 3 steps; When determining to carry unstructured data in put request, this unstructured data, as Lob data, performs 4 steps.
3, by structural data normal storage in the HBase Store in HBase database.
4, LobTable creates LobPut object.
5, in the Lob output stream that the Lob data write LobPut object that client is carried in being asked by put provides.Wherein, the data stream that this LobPut object that this Lob output stream can be regarded as client sends to server end.
6, LobPut object reception is written with the Lob output stream of Lob data, and piecemeal reads Lob data, and the data block of the Lob data read by piecemeal generates corresponding Put object respectively.Wherein, the threshold value setting piecemeal reading can be 4M.Such as, corresponding during the full 4M of each reading Lob data data block.Wherein, when reading the data block of last Lob data, this data block may be discontented with 4M.
7, each Put object is committed to the HRegion of server end, wherein, Put object comprises the block number of respective data blocks in Lob data.
8, the Lob Corprocessor piecemeal that server end creates receives data block corresponding to each Put object.
9, Lob Corprocessor utilizes the unloading mechanism arranged, by the data block corresponding to each Put object of receiving successively unloading in the Lob file in HDFS.
In this step 9, Lob Corprocessor is data block corresponding to the Put object received first, determine currently in the Lob file of HDFS system whether comprise the data relevant to this data block, if, utilize the unloading mechanism arranged by the data block corresponding to the Put object that receives first, relative data are upgraded; If not, utilize the unloading mechanism arranged, by the data block corresponding to the Put object that receives first, create in Lob file; And according to the order of block included in the Put object submitted to number, utilize the unloading mechanism arranged, the data block unloading that subsequently received Put object is corresponding respectively enters in Lob file, and according to the block number of each data block, each data chunk is combined into Lob data.
After above-mentioned steps process terminates, can also comprising: abnormal by calling the dish out subclass of a specific Do Not Retry IO Exception of PrePut object, to stop Lob Corprocessor, the data block corresponding to each Put object being written to the operation of HFile.
Wherein, API:LobTable, LobPut that client 2 of newly defining are new, inherit respectively from the original class of HBase: HTable and Put.Wherein, LobTable, LobPut can be interpreted as the interface in client.In addition, client, before storage Lob data, can also submit to lob train value to be the Put object of byte [0], to be stored in HBase by major key.
Step 303: client realizes realizing data query to the server end of HBase.
As shown in Figure 5, in the present embodiment, when client realizes the data query to the server end of HBase, following three API:LobTable, LobResult and LobGet can be expanded, data query (scan) can be realized in the following way:
1, client sends Scan request to LobTable object.
2, LobTable object submits to this Scan to ask to server end.
3, the HRegion of server end returns Result according to this Scan request to client.Wherein, this Result comprises the Row Key that scan asks the Lob data of inquiring about.
4, LobTable object comprises Row Key according to Result, Result is encapsulated as LobResult object.
5, LobResult object generates LobGet object according to the Row Key of inquired about Lob data.
6, the start address of each the blocks of data block of LobGet object corresponding to inquired about Lob data reads Lob data to server end piecemeal.Wherein, LobGet object also needs piecemeal to read Lob data, and the threshold value that piecemeal reads can be the start address of 4M, LobGet object according to each data block of Lob data, reads to server end piecemeal.
7, HRegion receives the Get object that client is submitted to by LobGet, and Get object comprises the Row Key of inquired about Lob data.
8, Lob Corprocessor opens LobScanner object.
9, LobScanner object is according to the Row Key in Get object, determine that the memory location of Lob data is HFile or Lob files, if memory location is HFile, then directly read the LobGet object that KeyValue value returns to client, and return to client EOF terminate mark, if memory location is Lob file, LobScanner object is according to the start address of each data block, read each database, and each data block is converted to KeyValue form, server end is made each data block of this KeyValue form to be returned to LobGet object in client.And return to client EOF terminate mark.
10, data are read in the inlet flow that client provides according to LobGet object.Wherein, this inlet flow can be regarded as LobPut object obtains data data stream from server end.
In the present embodiment, due to not sure to the size of the Lob data in some application scenarios, the file stored in such as net dish application may have a few kb, also may have hundreds of MB or upper GB.If all Lob data be all stored in the Lob file in HDFS, so too much small documents may affect the efficiency of HDFS system.Therefore, the little Lob data in Lob file in HDFS can be carried out compact (merging).
Due to the problem for little Lob data, hadoop self provides three kinds of solutions: HadoopArchive, Sequence File and CombineFileInputFormat.But in these schemes, small documents just cannot be deleted and revise once be merged in large archive file, unless repacked.Therefore, in the present embodiment, the Lob file in HDFS can be traveled through, the little Lob data of multiple 10M of being less than are merged, and ensure that the file merged is not more than another threshold value, e.g., 1G, and the file after merging is stored in HBase, storage format is HFile.In fact the content of HFile also can not revise deletion, to the little lob KV being labeled as DELETE, or is cleared up by Major Compact.Namely travel through multiple HFile, cleaning KV also generates new HFile file.
In the present embodiment, can judge whether that lob arranges by Lob Corprocessor to delete (comprise Delete Entire Row, whole row race deletes), then generate the delete object of lob row, and be stored into memstore, delete the lob file (if there is) of hdfs simultaneously.
In the present embodiment, because LobStore associates with a Region, therefore when Region carries out split (fractionation), LobStore also needs split.Wherein, the split of Lob file only needs to move to different directories.In addition, the split of HFile then needs whole traversal, and carries out splitting different Region according to split key.
As shown in Figure 6, the embodiment of the present invention additionally provides a kind of client, comprising:
Expanding element 601, for according to HTable object extension LobTable object; Qualifier for identifying Lob data is set;
Processing unit 602, for when sending storage resource request to LobTable object, LobTable, when determining that described storage resource request comprises Qualifier, determines that described storage resource request carries Lob data, then create LobPut object;
Writing unit 603, in the Lob output stream that provided by the Lob carried in described storage resource request data write LobPut object;
Receiving element 604, for the Lob output stream utilizing LobPut object reception to be written with Lob data, and receive at every turn meet the data block of first threshold time, the data block meeting this first threshold is generated corresponding Put object, wherein, Put object comprises the block number of respective data blocks in Lob data;
Commit unit 605, for the Put object of generation is committed to server end, to make the server of server end corresponding to the HBase pre-set to the unloading system of HDFS unloading data, and according to the order of block included in the Put object submitted to number, by data block corresponding for each Put object successively unloading in the Lob file of HDFS.
Comprise further: data query unit 606, for calling LobTable object to the request of server end submit Query, wherein, described inquiry request is used for the Lob data of querying server end, and the Result that reception server end returns for described inquiry request, described Result comprise the Row Key of the Lob data that described inquiry request is inquired about; Described Result is encapsulated as LobResult object by LobTable object, LobResult object generates LobGet object according to the Row Key of inquired about Lob data, reads Lob data to make the start address of each blocks of data block in Lob data of LobGet object corresponding to inquired about Lob data to server end piecemeal.
As shown in Figure 7, the embodiment of the present invention additionally provides a kind of server, comprising:
Storage unit 701, for creating Lob Corprocessor, arranges the unloading system of the server corresponding to HBase to HDFS unloading data;
Receiving element 702, for each Put object utilizing Lob Corprocessor piecemeal to receive the submission of client piecemeal, wherein, each Put object comprises the block number of respective data blocks in its corresponding Lob data;
Processing unit 703, for data block corresponding to the Put object received first, determine currently in the Lob file of HDFS system whether comprise the data relevant to this data block, if, utilize the unloading mechanism arranged by the data block corresponding to the Put object that receives first, relative data are upgraded; If not, utilize the unloading mechanism arranged, by the data block corresponding to the Put object that receives first, create in Lob file;
Unloading unit 704, for the order according to block included in the Put object submitted to number, utilize the unloading mechanism arranged, the data block unloading that subsequently received Put object is corresponding respectively enters in Lob file, and according to the block number of each data block, each data chunk is combined into Lob data.
Comprise further: stop unit 705, for abnormal by calling the dish out subclass of a specific Do Not Retry IO Exception of PrePut object, to stop Lob Corprocessor, the data block corresponding to each Put object is written to the operation of HBase.
Described storage unit, for creating the LobScanner object reading data;
Comprise further: query unit 706, for receiving the Get object that client is submitted to by LobGet, described Get object comprises the Row Key of inquired about Lob data; Lob Corprocessor opens LobScanner object; LobScanner object determines the memory location of inquired about Lob data according to the Row Key in described Get object, and piecemeal reads each data block of Lob data in the memory location determined, and each data block is converted to KeyValue form, make server end that each data block of this KeyValue form is returned to client;
Or,
Further, LobStore stores Lob data with the form of HFile file and MemStore form further;
Comprising further: Traversal Unit 707, for traveling through the Lob file in HDFS, and the multiple Lob data being less than Second Threshold traversed being merged in HFile file;
Comprise further: merge cells 708, for the deletion information of the multiple Lob data merging into HFile file is stored in MemStore.
As mentioned above, the embodiment of the present invention at least can comprise following beneficial effect:
1, when Lob data are stored into server end, by client according to preset value, piecemeal uploads the server end of Lob data to HBase, and the data block uploaded according to piecemeal by server end stores, thus avoids the problem of server end internal memory spilling.
2, in order to avoid client learns two kinds of memory technologies, Lob data can all to upload onto the server end with other structural datas by client, by server end by Lob data conversion storage in the Lob file in HDFS, client only need be programmed to the storage mode that data are stored to HBase, thus reduce the workload of client, improve the practicality of the storage mode being stored to HBase for data.
The content such as information interaction, implementation between each unit in the said equipment, due to the inventive method embodiment based on same design, particular content can see in the inventive method embodiment describe, repeat no more herein.
It should be noted that, in this article, the relational terms of such as first and second and so on is only used for an entity or operation to separate with another entity or operational zone, and not necessarily requires or imply the relation that there is any this reality between these entities or operation or sequentially.And, term " comprises ", " comprising " or its any other variant are intended to contain comprising of nonexcludability, thus make to comprise the process of a series of key element, method, article or equipment and not only comprise those key elements, but also comprise other key elements clearly do not listed, or also comprise by the intrinsic key element of this process, method, article or equipment.When not more restrictions, the key element " being comprised " limited by statement, and be not precluded within process, method, article or the equipment comprising described key element and also there is other same factor.
One of ordinary skill in the art will appreciate that: all or part of step realizing said method embodiment can have been come by the hardware that programmed instruction is relevant, aforesaid program can be stored in the storage medium of embodied on computer readable, this program, when performing, performs the step comprising said method embodiment; And aforesaid storage medium comprises: ROM, RAM, magnetic disc or CD etc. various can be program code stored medium in.
Finally it should be noted that: the foregoing is only preferred embodiment of the present invention, only for illustration of technical scheme of the present invention, be not intended to limit protection scope of the present invention.All any amendments done within the spirit and principles in the present invention, equivalent replacement, improvement etc., be all included in protection scope of the present invention.