CN104750815B - The storage method and device of a kind of Lob data based on HBase - Google Patents

The storage method and device of a kind of Lob data based on HBase Download PDF

Info

Publication number
CN104750815B
CN104750815B CN201510144162.7A CN201510144162A CN104750815B CN 104750815 B CN104750815 B CN 104750815B CN 201510144162 A CN201510144162 A CN 201510144162A CN 104750815 B CN104750815 B CN 104750815B
Authority
CN
China
Prior art keywords
data
lob
objects
block
data block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510144162.7A
Other languages
Chinese (zh)
Other versions
CN104750815A (en
Inventor
贾德星
徐正礼
魏金雷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Cloud Information Technology Co Ltd
Original Assignee
Inspur Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Group Co Ltd filed Critical Inspur Group Co Ltd
Priority to CN201510144162.7A priority Critical patent/CN104750815B/en
Publication of CN104750815A publication Critical patent/CN104750815A/en
Application granted granted Critical
Publication of CN104750815B publication Critical patent/CN104750815B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The present invention provides the storage method and device of a kind of Lob data based on HBase, and method includes:When sending storage request to LobTable objects, LobTable asks to carry Lob data it is determined that storing, and creates LobPut objects;Lob data are write in the Lob output streams that LobPut objects are provided;The data block for meeting the first threshold is generated corresponding Put objects by LobPut objects in the data block for meeting first threshold received every time;The Put objects of generation are committed to server end, so that server end, using the unloading mechanism of setting, each corresponding data block of Put objects is transferred in HDFS Lob files successively according to the order of block number included in the Put objects of submission.According to this programme, so as to reduce the workload of client, the practicality of the storage mode for data storage to HBase is improved.

Description

The storage method and device of a kind of Lob data based on HBase
Technical field
The present invention relates to field of computer technology, the storage method and dress of more particularly to a kind of Lob data based on HBase Put.
Background technology
HBase (distributed memory system) is a distribution, the PostgreSQL database towards row, can be achieved tens, on 10000000000 data storage.When to processing blob (Large Object, LOB) data, such as document, music, video data are entered , may there are the following problems during row storage:
1st, HBase is stored when carrying out bottom storage in KeyValue forms, works as client When updating the data, server end is first stored in KeyValue objects to be updated in internal memory (MemStore), when in internal memory When the KeyValue objects of storage reach threshold value, then this is reached that the KeyValue objects of threshold value are stored as StoreFiel, because This, if KeyValue values are excessive, easily causes server end internal memory spilling;
2nd, HBase table can be split as multiple subregions when carrying out data storage according to row keyword (RowKey) (Region), each subregion includes multiple HFile files.And because Lob data train value is very big, so HFile files easily surpass Go out subregion limitation, so that split and form substantial amounts of subregion, and each subregion only has seldom row, excessive number of partitions So that low to scanning (Scan) efficiency comparison of HBase table data.
In view of the above-mentioned problems, often combining HDFS (distributed file system) when client needs storage Lob data Aided in, client is by Lob data Cun Chudao HDFS, and by other structures data Cun Chudao HBase.Due to client End is when for Lob data and other structures data storage, it is necessary to be entered respectively using both databases (HBase and HDFS) Row storage is, it is necessary to which client carries out a large amount of programmed process, therefore, the storage side for the storage mode of each database respectively Formula practicality is relatively low.
The content of the invention
In view of this, the present invention provides the storage method and device of a kind of Lob data based on HBase, is ensureing practical The problem of large object data is stored is realized on the premise of property.
The invention provides a kind of storage method of the Lob data based on HBase, according to HTable object extensions LobTable objects;It is provided for recognizing the Qualifier of Lob data;Server corresponding to HBase is set to HDFS unloadings The unloading system of data, in addition to:
Client to LobTable objects when sending storage request, and LobTable is it is determined that the storage request includes During Qualifier, determine that the storage request carries Lob data, then create LobPut objects;
Client writes the Lob data carried in the storage request in the Lob output streams that LobPut objects are provided;
LobPut objects receive the Lob output streams for being written with Lob data, and in the first threshold that meets received every time During data block, the data block for meeting the first threshold is generated into corresponding Put objects, wherein, Put objects include respective data blocks Block number in Lob data;
The Put objects of generation are committed to server end, so that server end is according to included in the Put objects of submission The order of block number, using the unloading mechanism of setting, each corresponding data block of Put objects is transferred to successively HDFS Lob texts In part.
Preferably,
Further comprise:Client call LobTable objects submit scan requests to server end, wherein, the inquiry The Lob data for inquiring about server end are asked, and the reception server end is directed to the Result that the inquiry request is returned, it is described Result includes the Row Key for the Lob data that the inquiry request is inquired about;The Result is encapsulated as by LobTable objects LobResult objects, LobResult objects generate LobGet objects according to the Row Key for the Lob data inquired about, so that Each piece data block initial address in Lob data of the LobGet objects according to corresponding to the Lob data inquired about is to service Device end piecemeal reads Lob data.
Present invention also offers a kind of storage method of the large object data based on HBase, create LobCorprocessor, sets unloading system of the server corresponding to HBase to HDFS unloading data, in addition to:
Server end receives each Put object that client piecemeal is submitted using Lob Corprocessor piecemeals, wherein, Each Put objects include block number of the respective data blocks in its correspondence Lob data;
The data block according to corresponding to the Put objects received first, determines currently whether wrapped in the Lob files of HDFS systems The data related to the data block are included, if so, using the unloading mechanism of setting by the number corresponding to the Put objects received first According to block, relative data are updated;If it is not, using the unloading mechanism set, by the Put objects institute received first Corresponding data block, is created into Lob files;
And according to the order of block number included in the Put objects of submission, using the unloading mechanism of setting, by receipt of subsequent To Put objects corresponding data block unloading enters in Lob files respectively, and according to the block number of each data block by each data block It is combined as Lob data.
Preferably, further comprise:Dished out a specific Do Not RetryIO by calling PrePut objects Exception subclass is abnormal, to prevent Lob Corprocessor from being written to the data block corresponding to each Put object HBase operation.
Preferably,
Further comprise:Create the LobScanner objects for reading data;
Further comprise:The Get objects that client is submitted by LobGet are received, the Get objects include being inquired about Lob data Row Key;Lob Corprocessor open LobScanner objects;LobScanner objects are according to described Row Key in Get objects determine the storage location of inquired about Lob data, and it is determined that storage location in piecemeal read Each data block of Lob data, and each data block is converted into KeyValue forms so that server end is by the KeyValue Each data block of form returns to client;
Or,
Further comprise:LobStore further in the form of HFile files and MemStore forms storage Lob data;
Further comprise:The Lob files in HDFS are traveled through, and by the multiple Lob data less than Second Threshold traversed It is merged into HFile files;
Further comprise:The deletion information that multiple Lob data of HFile files will be merged into is stored into MemStore.
Present invention also offers a kind of client, including:
Expanding element, for according to HTable object extension LobTable objects;It is provided for recognizing Lob data Qualifier;
Processing unit, for when sending storage request to LobTable objects, LobTable to be it is determined that the storage please Ask when including Qualifier, determine that the storage request carries Lob data, then create LobPut objects;
Writing unit, for the Lob data carried in the storage request to be write into the Lob outputs that LobPut objects are provided In stream;
Receiving unit, the Lob that Lob data are written with for being received using LobPut objects exports stream, and is receiving every time To the data block for meeting first threshold when, the data block that will meet the first threshold generates corresponding Put objects, wherein, Put Object includes respective data blocks the block number in Lob data;
Unit is submitted, for the Put objects of generation to be committed into server end, so that server end is according to pre-setting Unloading system from server corresponding to HBase to HDFS unloading data, and according to block included in the Put objects of submission Number order, each corresponding data block of Put objects is transferred in HDFS Lob files successively.
Preferably,
Further comprise:Data query unit, for call LobTable objects to server end submit inquiry request, its In, the inquiry request is used for the Lob data for inquiring about server end, and the reception server end is for inquiry request return Result, the Result include the Row Key for the Lob data that the inquiry request is inquired about;LobTable objects will be described Result is encapsulated as LobResult objects, and LobResult objects generate LobGet according to the Row Key for the Lob data inquired about Object, so that each piece data block starting point in Lob data of the LobGet objects according to corresponding to the Lob data inquired about Lob data are read to server end piecemeal in location.
Present invention also offers a kind of server, including:
Memory cell, for creating Lob Corprocessor, sets server corresponding to HBase to HDFS unloading numbers According to unloading system;
Receiving unit, for receiving each Put object that client piecemeal is submitted using Lob Corprocessor piecemeals, Wherein, each Put objects include block number of the respective data blocks in its correspondence Lob data;
Processing unit, for the data block according to corresponding to the Put objects received first, determines the Lob texts of HDFS systems It is current in part whether to include the data related to the data block, if so, using the unloading mechanism of setting by the Put received first Relative data are updated by the data block corresponding to object;If it is not, using the unloading mechanism set, will connect first The data block corresponding to Put objects received, is created into Lob files;
Unloading unit, for the order of block number included in the Put objects according to submission, utilizes the unloading machine of setting System, by subsequently received Put objects, corresponding data block unloading enters in Lob files respectively, and according to the block of each data block Number each data chunk is combined into Lob data.
Preferably,
Further comprise:Unit is prevented, for being dished out a specific DoNot Retry by calling PrePut objects IO Exception subclass is abnormal, to prevent Lob Corprocessor from writing the data block corresponding to each Put object To HBase operation.
Preferably,
The memory cell, the LobScanner objects of data are read for creating;
Further comprise:Query unit, for receiving the Get objects that client is submitted by LobGet, the Get objects Include the Row Key of inquired about Lob data;Lob Corprocessor open LobScanner objects;LobScanner Row Key of the object in the Get objects determines the storage location of inquired about Lob data, and it is determined that storage position Put middle piecemeal and read each data block of Lob data, and each data block is converted into KeyValue forms so that server end Each data block of the KeyValue forms is returned into client;
Or,
Further comprise:LobStore further in the form of HFile files and MemStore forms storage Lob data;
Further comprise:Traversal Unit, is less than Second Threshold for traveling through the Lob files in HDFS, and by what is traversed Multiple Lob data be merged into HFile files;
Further comprise:Combining unit, the deletion information for multiple Lob data by HFile files are merged into is stored Into MemStore.
The embodiments of the invention provide a kind of storage method of Lob data based on HBase and device, by Lob data When storing server end, by client according to preset value, piecemeal uploads Lob data to HBase server end, by server The data block uploaded according to piecemeal is held to be stored, so as to avoid the problem of server end internal memory overflows.In addition, in order to avoid Client learns two kinds of memory technologies, and client can upload onto the server Lob data end with other structures data, by Server end is by Lob files of the Lob data conversion storages into HDFS, and client only need to be to data storage to HBase storage side Formula is programmed, so as to reduce the workload of client, improves the practicality of the storage mode for data storage to HBase Property.
Brief description of the drawings
Fig. 1 is method flow diagram provided in an embodiment of the present invention;
Fig. 2 is the method flow diagram that another embodiment of the present invention is provided;
Fig. 3 is the method flow diagram that further embodiment of this invention is provided;
Fig. 4 is date storage method schematic diagram provided in an embodiment of the present invention;
Fig. 5 is data query method schematic diagram provided in an embodiment of the present invention;
Fig. 6 is client terminal structure schematic diagram provided in an embodiment of the present invention;
Fig. 7 is server architecture schematic diagram provided in an embodiment of the present invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation is described.Obviously, described embodiment is only a part of embodiment of the invention, rather than whole embodiments.Based on this Embodiment in invention, the every other reality that those of ordinary skill in the art are obtained under the premise of creative work is not made Example is applied, the scope of protection of the invention is belonged to.
As shown in figure 1, the embodiments of the invention provide a kind of storage method of the Lob data based on HBase, according to HTable object extension LobTable objects;It is provided for recognizing the Qualifier of Lob data;Clothes corresponding to HBase are set Business unloading system of the device to HDFS unloading data, this method may comprise steps of:
Step 101:Client to LobTable objects when sending storage request, and LobTable is it is determined that in storage request During including Qualifier, it is determined that storage request carries Lob data, then LobPut objects are created.
Step 102:Client, which exports the Lob for storing the Lob data write-in LobPut object offers carried in request, to flow In.
Step 103:LobPut objects receive the Lob output streams for being written with Lob data, and in the satisfaction received every time the During the data block of one threshold value, the data block for meeting the first threshold is generated into corresponding Put objects, wherein, Put objects include phase Answer block number of the data block in Lob data.
Step 104:The Put objects of generation are committed to server end, so that in Put objects of the server end according to submission The order of included block number, using the unloading mechanism of setting, HDFS is transferred to by each corresponding data block of Put objects successively Lob files in.
According to such scheme, when by Lob data Cun Chudao server ends, by client according to preset value, piecemeal is uploaded Lob data are to HBase server end, and the data block uploaded by server end according to piecemeal is stored, so as to avoid clothes The problem of business device end memory overflows.In addition, in order to avoid client learns two kinds of memory technologies, client can be by Lob data Uploaded onto the server end with other structures data, by server end by Lob files of the Lob data conversion storages into HDFS, Client need to be only programmed to the storage mode of data storage to HBase, so as to reduce the workload of client, be improved For the practicality of the storage mode of data storage to HBase.
As shown in Fig. 2 the embodiments of the invention provide a kind of storage method of the large object data based on HBase, creating Lob Corprocessor, set unloading system of the server corresponding to HBase to HDFS unloading data, and this method can be wrapped Include following steps:
Step 201:Server end receives each Put pairs that client piecemeal is submitted using Lob Corprocessor piecemeals As, wherein, each Put objects include block number of the respective data blocks in its correspondence Lob data.
Step 202:The data block according to corresponding to the Put objects received first, determine in the Lob files of HDFS systems when It is preceding whether to include the data related to the data block, if so, using the unloading mechanism of setting by the Put objects institute received first Relative data are updated by corresponding data block;If it is not, using the unloading mechanism set, by what is received first Data block corresponding to Put objects, is created into Lob files.
Step 203:And according to the order of block number included in the Put objects of submission, will using the unloading mechanism of setting Corresponding data block unloading enters in Lob files subsequently received Put objects respectively, and will be each according to the block number of each data block Individual data block is combined as Lob data.
According to such scheme, the data block that server end is uploaded according to piecemeal carries out piecemeal storage, so as to avoid service The problem of device end memory overflows.In addition, in order to avoid client learn two kinds of memory technologies, client can by Lob data with Other structures data upload onto the server end, by server end by Lob files of the Lob data conversion storages into HDFS, visitor Family end need to be only programmed to the storage mode of data storage to HBase, so as to reduce the workload of client, improve pin To the practicality of the storage mode of data storage to HBase.
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with the accompanying drawings and specific embodiment to this Invention is described in further detail.
As shown in figure 3, the embodiments of the invention provide a kind of storage method of the large object data based on HBase, the party Method may comprise steps of:
Step 301:Storage model is built, server end is that each Region associates a corresponding LobStore, and is set HBase server end is put to the unloading mechanism of HDFS unloading data.
As shown in figure 4, in the present embodiment, LobStore can use following three kinds of forms to be deposited in storage object Storage:
Lob files:Lob files are stored in HDFS, when client is storing Lob objects to server end, by servicing Device end enters Lob object unloadings in the Lob files in HDFS, wherein, the original value (binary value of video file) of Lob objects HDFS is stored directly in, each Put values are all directly to write lob files.Wherein, being stored as the filename of Lob files can include: Row+ row race+row name;Its catalogue is:Table catalogues/region/Lob.
HFile:For by small Lob object compositions into HFile, it is to avoid a large amount of small obj ect files are produced on HDFS.It is such File is produced in gauge pressure contracting.Wherein, being stored as HFile key values can include:Traffic table row+f+q, Value value It can include:Small Lob values or DEL.
MemStore:The Deleted of Lob objects is recorded, it is for removing deleted small in gauge pressure contracting that it, which is acted on, Lob objects.
In the present embodiment, Lob row can be defined as below:In HBase Table [Family+ Qualifier] rank definition, state some table Qualifier whether Lob arrange.Asked or inquiry request when receiving storage When, it can determine whether the data for storing or being inquired about are Lob objects according to the Qualifier.
It is possible to further define the threshold value of " big Lob data " and " small Lob data ", for example, the threshold value is 10M, it is less than The Lob data of the threshold value are referred to as " small Lob data ", and the small Lob data are stored using HFile forms, more than the threshold value Lob data be referred to as " big Lob data ", the big Lob data storages be HDFS in single pair of list Lob files.
Step 302:Client realizes data storage to HBase server end.
As shown in figure 4, in the present embodiment, client can realize number to HBase server end in the following way According to storage or renewal (put):
1st, client sends storage put requests to LobTable objects.
2nd, LobTable it is determined that put request in carrying structure data when, the structural data is such as bivariate table type Data, submit the Put to ask to server end, and perform 3 steps;, should when it is determined that carrying unstructured data in put requests Unstructured data such as Lob data, perform 4 steps.
3rd, by HBase Store of the structural data normal storage into HBase databases.
4th, LobTable creates LobPut objects.
5th, in the Lob output streams that the Lob data write-in LobPut objects carried during client asks put are provided.Its In, Lob output streams can be regarded as the data flow that the LobPut objects of client are sent to server end.
6th, LobPut objects receive the Lob output streams for being written with Lob data, and piecemeal reads Lob data, and piecemeal is read The data block of the Lob data taken generates corresponding Put objects respectively.Wherein, the threshold value that setting piecemeal is read can be 4M.Example Such as, data block corresponding during Lob data completely 4M is read every time.Wherein, when the data block for reading last Lob data When, the data block may be discontented with 4M.
The 7th, each Put object is committed to the HRegion of server end, wherein, Put objects exist including respective data blocks Block number in Lob data.
8th, the Lob Corprocessor piecemeals that server end is created receive each corresponding data block of Put objects.
9th, Lob Corprocessor are using the unloading mechanism set, by the number corresponding to each Put object received It is transferred to successively according to block in the Lob files in HDFS.
In this step 9, Lob Corprocessor data blocks according to corresponding to the Put objects received first, it is determined that It is current in the Lob files of HDFS systems whether to include the data related to the data block, if so, will using the unloading mechanism set Relative data are updated by the data block corresponding to Put objects received first;If it is not, utilizing turning for setting Mechanism is deposited, the data block corresponding to the Put objects received first is created into Lob files;And according to Put pairs of submission The order of included block number as in, using the unloading mechanism of setting, corresponding data are distinguished by subsequently received Put objects Block unloading enters in Lob files, and each data chunk is combined into Lob data according to the block number of each data block.
After above-mentioned steps processing terminates, it can also include:Dished out a specific Do by calling PrePut objects Not Retry IO Exception subclass is abnormal, to prevent Lob Corprocessor by corresponding to each Put object Data block is written to HFile operation.
Wherein, 2 new API that client is newly defined:LobTable, LobPut, are inherited from HBase original respectively Class:HTable and Put.Wherein, LobTable, LobPut can be understood as the interface in client.In addition, client is being deposited Store up before Lob data, the Put objects that lob train values can also be submitted to be byte [0] store major key into HBase.
Step 303:Client, which is realized to HBase server end, realizes data query.
As shown in figure 5, in the present embodiment, when client realizes the data query to HBase server end, Ke Yikuo Open up following three API:LobTable, LobResult and LobGet, data query (scan) can be realized in the following way:
1st, client sends Scan requests to LobTable objects.
2nd, LobTable objects submit the Scan to ask to server end.
3rd, the HRegion of server end asks to return to Result to client according to the Scan.Wherein, wrapped in the Result Include the Row Key that scan asks inquired about Lob data.
4th, LobTable objects include Row Key according to Result, and Result is encapsulated as into LobResult objects.
5th, LobResult objects generate LobGet objects according to the Row Key for the Lob data inquired about.
6th, the initial address of each piece data block of the LobGet objects according to corresponding to the Lob data inquired about is to server Piecemeal is held to read Lob data.Wherein, LobGet objects are also required to piecemeal and read Lob data, and the threshold value that piecemeal is read can be 4M, LobGet object are read according to the initial address of each data block of Lob data to server end piecemeal.
7th, HRegion receives the Get objects that client is submitted by LobGet, and Get objects include inquired about Lob numbers According to Row Key.
8th, Lob Corprocessor open LobScanner objects.
9th, Row Key of the LobScanner objects in Get objects, the storage location for determining Lob data be HFile also It is Lob files, if storage location is HFile, directly reads KeyValue values and return to the LobGet objects of client, and return Back to client EOF end of identification, if storage location is Lob files, LobScanner objects are according to the starting of each data block Address, reads each database, and each data block is converted into KeyValue forms so that server end is by the KeyValue Each data block of form returns to the LobGet objects in client.And return to client EOF end of identification.
10th, data are read in the inlet flow that client is provided according to LobGet objects.Wherein, the inlet flow can be regarded as LobPut objects obtain the data flow of data from server end.
In the present embodiment, because the size to the Lob data in some application scenarios is not sure, such as Dropbox application The file of middle storage may have several kb, it is also possible to have hundreds of MB or upper GB.If all Lob data are stored in HDFS Lob files in, then excessive small documents may influence the efficiency of HDFS systems.It therefore, it can Lob files in HDFS In small Lob data carry out compact (merging).
The problem of due to for small Lob data, hadoop itself provides three kinds of solutions:HadoopArchive、 Sequence File and CombineFileInputFormat.But in these schemes, small documents are once merged into big filing Just it can not delete and change in file, unless repacked.Therefore, in the present embodiment, the Lob files in HDFS can be traveled through, Multiple small Lob data less than 10M are merged, and ensure that the file merged is not more than another threshold value, e.g., 1G, and will File after merging is stored into HBase, and storage format is HFile.Actually HFile content is also that can not change deletion , the small lob KV for being to mark, or cleared up by Major Compact.Namely travel through multiple HFile, clears up KV and generates new HFile files.
In the present embodiment, it can judge whether that lob row are deleted by Lob Corprocessor (including Delete Entire Row, whole Ge Lie races delete), the delete objects of lob row are then generated, and store to memstore, while deleting hdfs lob files (if there is).
In the present embodiment, because LobStore is associated with a Region, therefore split is carried out in Region When (fractionation), LobStore is also required to split.Wherein, the split of Lob files only needs to be moved to different directories.Separately Outside, HFile split then needs whole traversals, and is split to different Region according to split key progress.
As shown in fig. 6, the embodiment of the present invention additionally provides a kind of client, including:
Expanding element 601, for according to HTable object extension LobTable objects;It is provided for recognizing Lob data Qualifier;
Processing unit 602, for when sending storage request to LobTable objects, LobTable to be it is determined that the storage When request includes Qualifier, determine that the storage request carries Lob data, then create LobPut objects;
Writing unit 603, for the Lob data carried in the storage request to be write into the Lob that LobPut objects are provided In output stream;
Receiving unit 604, the Lob that Lob data are written with for being received using LobPut objects exports stream, and is connecing every time When what is received meets the data block of first threshold, the data block for meeting the first threshold is generated into corresponding Put objects, wherein, Put objects include respective data blocks the block number in Lob data;
Unit 605 is submitted, for the Put objects of generation to be committed into server end, so that server end according to setting in advance Unloading system from the server corresponding to HBase put to HDFS unloading data, and according to included by the Put objects of submission Block number order, each corresponding data block of Put objects is transferred in HDFS Lob files successively.
Further comprise:Data query unit 606, is asked for calling LobTable objects to submit to inquire about to server end Ask, wherein, the inquiry request is used for the Lob data for inquiring about server end, and the reception server end is directed to the inquiry request The Result of return, the Result include the Row Key for the Lob data that the inquiry request is inquired about;LobTable objects The Result is encapsulated as LobResult objects, LobResult objects are generated according to the Row Key for the Lob data inquired about LobGet objects, so that each piece data block of the LobGet objects according to corresponding to the Lob data inquired about is in Lob data Initial address reads Lob data to server end piecemeal.
As shown in fig. 7, the embodiment of the present invention additionally provides a kind of server, including:
Memory cell 701, for creating Lob Corprocessor, sets the server corresponding to HBase to turn to HDFS The unloading system of deposit data;
Receiving unit 702, for receiving each Put pairs that client piecemeal is submitted using Lob Corprocessor piecemeals As, wherein, each Put objects include block number of the respective data blocks in its correspondence Lob data;
Processing unit 703, for the data block according to corresponding to the Put objects received first, determines the Lob of HDFS systems It is current in file whether include the data related to the data block, if so, will be received first using the unloading mechanism of setting Relative data are updated by the data block corresponding to Put objects;If it is not, using the unloading mechanism set, will first The data block corresponding to Put objects received, is created into Lob files;
Unloading unit 704, for the order of block number included in the Put objects according to submission, utilizes the unloading of setting Mechanism, by subsequently received Put objects, corresponding data block unloading enters in Lob files respectively, and according to each data block Each data chunk is combined into Lob data by block number.
Further comprise:Unit 705 is prevented, for being dished out a specific Do Not by calling PrePut objects Retry IO Exception subclass is abnormal, to prevent Lob Corprocessor by the data corresponding to each Put object Block is written to HBase operation.
The memory cell, the LobScanner objects of data are read for creating;
Further comprise:Query unit 706, for receiving the Get objects that client is submitted by LobGet, the Get Object includes the Row Key of inquired about Lob data;Lob Corprocessor open LobScanner objects; Row Key of the LobScanner objects in the Get objects determines the storage location of inquired about Lob data, and true Piecemeal reads each data block of Lob data in fixed storage location, and each data block is converted into KeyValue forms, makes Obtain server end and each data block of the KeyValue forms is returned into client;
Or,
Further, LobStore further in the form of HFile files and MemStore forms storage Lob data;
Further comprise:Traversal Unit 707, is less than the second threshold for traveling through the Lob files in HDFS, and by what is traversed Multiple Lob data of value are merged into HFile files;
Further comprise:Combining unit 708, the deletion information for multiple Lob data by HFile files are merged into is deposited Storage is into MemStore.
As described above, the embodiment of the present invention can at least include following beneficial effect:
1st, when by Lob data Cun Chudao server ends, by client according to preset value, piecemeal uploads Lob data extremely HBase server end, the data block uploaded by server end according to piecemeal is stored, so as to avoid server end internal memory The problem of spilling.
2nd, in order to avoid client learns two kinds of memory technologies, client can be by Lob data with other structures data Upload onto the server end, and by server end by Lob files of the Lob data conversion storages into HDFS, client only need to be to data Store to HBase storage mode and be programmed, so as to reduce the workload of client, improve for data storage extremely The practicality of HBase storage mode.
The contents such as the information exchange between each unit, implementation procedure in the said equipment, due to implementing with the inventive method Example is based on same design, and particular content can be found in the narration in the inventive method embodiment, and here is omitted.
It should be noted that herein, such as first and second etc relational terms are used merely to an entity Or operation makes a distinction with another entity or operation, and not necessarily require or imply exist between these entities or operation Any this actual relation or order.Moreover, term " comprising ", "comprising" or its any other variant be intended to it is non- It is exclusive to include, so that process, method, article or equipment including a series of key elements not only include those key elements, But also other key elements including being not expressly set out, or also include solid by this process, method, article or equipment Some key elements.In the absence of more restrictions, the key element limited by sentence " including one ", is not arranged Except also there is other identical factor in the process including the key element, method, article or equipment.
One of ordinary skill in the art will appreciate that:Realizing all or part of step of above method embodiment can pass through Programmed instruction related hardware is completed, and foregoing program can be stored in the storage medium of embodied on computer readable, the program Upon execution, the step of including above method embodiment is performed;And foregoing storage medium includes:ROM, RAM, magnetic disc or light Disk etc. is various can be with the medium of store program codes.
It is last it should be noted that:Presently preferred embodiments of the present invention is the foregoing is only, the skill of the present invention is merely to illustrate Art scheme, is not intended to limit the scope of the present invention.Any modification for being made within the spirit and principles of the invention, Equivalent substitution, improvement etc., are all contained in protection scope of the present invention.

Claims (10)

1. a kind of storage method of the Lob data based on HBase, it is characterised in that according to HTable object extensions LobTable Object;It is provided for recognizing the Qualifier of Lob data;Server turning to HDFS unloading data corresponding to HBase is set Mechanism is deposited, in addition to:
Client to LobTable objects when sending storage request, and LobTable is it is determined that the storage request includes During Qualifier, determine that the storage request carries Lob data, then create LobPut objects;
Client writes the Lob data carried in the storage request in the Lob output streams that LobPut objects are provided;
LobPut objects receive the Lob output streams for being written with Lob data, and in the data for meeting first threshold received every time During block, the data block for meeting the first threshold is generated into corresponding Put objects, wherein, Put objects exist including respective data blocks Block number in Lob data;
The Put objects of generation are committed to server end, so that server end is according to block number included in the Put objects of submission Order, using the unloading mechanism of setting, each corresponding data block of Put objects is transferred in HDFS Lob files successively.
2. according to the method described in claim 1, it is characterised in that
Further comprise:Client call LobTable objects submit scan requests to server end, wherein, inquiry request is used for The Lob data of server end are inquired about, and the reception server end is directed to the Result that inquiry request is returned, Result includes inquiry please Seek the Row Key of inquired about Lob data;Result is encapsulated as LobResult objects, LobResult by LobTable objects Object generates LobGet objects according to the Row Key for the Lob data inquired about, so that LobGet objects are according to the Lob inquired about Initial address of each piece of data block in Lob data corresponding to data reads Lob data to server end piecemeal.
3. a kind of storage method of the large object data based on HBase, it is characterised in that create Lob Corprocessor, if Unloading mechanism of the server corresponding to HBase to HDFS unloading data is put, in addition to:
Server end receives each Put object that client piecemeal is submitted using Lob Corprocessor piecemeals, wherein, each Put objects include block number of the respective data blocks in its correspondence Lob data;
The data block according to corresponding to the Put objects received first, determine in the Lob files of HDFS systems it is current whether include with The related data of the data block, if so, using the unloading mechanism of setting by the data corresponding to the Put objects received first Relative data are updated by block;If it is not, using the unloading mechanism set, the Put objects received first institute is right The data block answered, is created into Lob files;
And according to the order of block number included in the Put objects of submission, will be subsequently received using the unloading mechanism of setting Corresponding data block unloading enters in Lob files Put objects respectively, and according to the block number of each data block by each data block combinations For Lob data.
4. method according to claim 3, it is characterised in that
Further comprise:Dished out specific Do Not Retry IO Exception by calling PrePut objects Class is abnormal, to prevent Lob Corprocessor that the data block corresponding to each Put object is written to HBase operation.
5. the method according to claim 3 or 4, it is characterised in that
Further comprise:Create the LobScanner objects for reading data;
Further comprise:The Get objects that client is submitted by LobGet are received, the Get objects include inquired about Lob The Row Key of data;Lob Corprocessor open LobScanner objects;LobScanner objects are according to described Get pairs Row Key as in determine the storage location of inquired about Lob data, and it is determined that storage location in piecemeal read Lob numbers According to each data block, and each data block is converted into KeyValue forms so that server end is by the KeyValue forms Each data block return to client;
Or,
Further comprise:LobStore further in the form of HFile files and MemStore forms storage Lob data;
Further comprise:The Lob files in HDFS are traveled through, and the multiple Lob data less than Second Threshold traversed are merged Into HFile files;
Further comprise:The deletion information that multiple Lob data of HFile files will be merged into is stored into MemStore.
6. a kind of client, it is characterised in that including:
Expanding element, for according to HTable object extension LobTable objects;It is provided for recognizing Lob data Qualifier;
Processing unit, for when sending storage request to LobTable objects, LobTable to be it is determined that in the storage request During including Qualifier, determine that the storage request carries Lob data, then create LobPut objects;
Writing unit, flows for the Lob data carried in the storage request to be write to the Lob of LobPut objects offer and are exported In;
Receiving unit, the Lob that Lob data are written with for being received using LobPut objects is exported and flowed, and is being received every time When meeting the data block of first threshold, the data block for meeting the first threshold is generated into corresponding Put objects, wherein, Put objects Including respective data blocks the block number in Lob data;
Unit is submitted, for the Put objects of generation to be committed into server end, so that server end is according to pre-setting Unloading mechanism from server corresponding to HBase to HDFS unloading data, and according to block included in the Put objects of submission Number order, each corresponding data block of Put objects is transferred in HDFS Lob files successively.
7. client according to claim 6, it is characterised in that
Further comprise:Data query unit, for calling LobTable objects to server end submission inquiry request, wherein, The inquiry request is used for the Lob data for inquiring about server end, and the reception server end is for inquiry request return Result, the Result include the Row Key for the Lob data that the inquiry request is inquired about;LobTable objects will be described Result is encapsulated as LobResult objects, and LobResult objects generate LobGet according to the Row Key for the Lob data inquired about Object, so that each piece data block starting point in Lob data of the LobGet objects according to corresponding to the Lob data inquired about Lob data are read to server end piecemeal in location.
8. a kind of server, it is characterised in that including:
Memory cell, for creating Lob Corprocessor, sets server corresponding to HBase to HDFS unloading data Unloading mechanism;
Receiving unit, for receiving each Put object that client piecemeal is submitted using Lob Corprocessor piecemeals, its In, each Put objects include block number of the respective data blocks in its correspondence Lob data;
Processing unit, in data block, the Lob files for determining HDFS systems according to corresponding to the Put objects received first It is current whether to include the data related to the data block, if so, using the unloading mechanism of setting by the Put objects received first Relative data are updated by corresponding data block;If it is not, using the unloading mechanism set, will receive first Put objects corresponding to data block, create into Lob files;
Unloading unit, for the order of block number included in the Put objects according to submission, using the unloading mechanism of setting by after Corresponding data block unloading enters in Lob files the Put objects that continued access is received respectively, and according to the block number of each data block by each Data chunk is combined into Lob data.
9. server according to claim 8, it is characterised in that
Further comprise:Unit is prevented, for being dished out a specific Do Not Retry by calling PrePut objects IOException subclass is abnormal, to prevent Lob Corprocessor from being written to the data block corresponding to each Put object HBase operation.
10. server according to claim 8 or claim 9, it is characterised in that
The memory cell, the LobScanner objects of data are read for creating;
Further comprise:Query unit, is wrapped for receiving in the Get objects that client is submitted by LobGet, the Get objects Include the Row Key of inquired about Lob data;Lob Corprocessor open LobScanner objects;LobScanner objects Row Key in the Get objects determine the storage location of inquired about Lob data, and it is determined that storage location in Piecemeal reads each data block of Lob data, and each data block is converted into KeyValue forms so that server end should Each data block of KeyValue forms returns to client;
Or,
Further comprise:LobStore further in the form of HFile files and MemStore forms storage Lob data;
Further comprise:Traversal Unit, is less than many of Second Threshold for traveling through the Lob files in HDFS, and by what is traversed Individual Lob data are merged into HFile files;
Further comprise:Combining unit, the deletion information for multiple Lob data by HFile files are merged into store to In MemStore.
CN201510144162.7A 2015-03-30 2015-03-30 The storage method and device of a kind of Lob data based on HBase Active CN104750815B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510144162.7A CN104750815B (en) 2015-03-30 2015-03-30 The storage method and device of a kind of Lob data based on HBase

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510144162.7A CN104750815B (en) 2015-03-30 2015-03-30 The storage method and device of a kind of Lob data based on HBase

Publications (2)

Publication Number Publication Date
CN104750815A CN104750815A (en) 2015-07-01
CN104750815B true CN104750815B (en) 2017-11-03

Family

ID=53590499

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510144162.7A Active CN104750815B (en) 2015-03-30 2015-03-30 The storage method and device of a kind of Lob data based on HBase

Country Status (1)

Country Link
CN (1) CN104750815B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106446153A (en) * 2016-09-21 2017-02-22 广州特道信息科技有限公司 Distributed newSQL database system and method
CN111367857B (en) * 2020-03-03 2023-06-16 中国联合网络通信集团有限公司 Data storage method and device, FTP server and storage medium
CN111694847B (en) * 2020-06-04 2023-07-18 贵州易鲸捷信息技术有限公司 Update access method with high concurrency and low delay for extra-large LOB data
CN112214458B (en) * 2020-10-19 2023-08-11 珠海金山数字网络科技有限公司 Data transfer and storage method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102750300A (en) * 2011-12-27 2012-10-24 浙江大学 High-performance unstructured data access protocol supporting multi-granularity searching.
CN103299267A (en) * 2010-12-20 2013-09-11 销售力网络公司 Methods and systems for performing cross store joins in a multi-tenant store

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014031618A2 (en) * 2012-08-22 2014-02-27 Bitvore Corp. Data relationships storage platform
US9501483B2 (en) * 2012-09-18 2016-11-22 Mapr Technologies, Inc. Table format for map reduce system
US9390105B2 (en) * 2013-08-20 2016-07-12 Raytheon Company System and methods for storing and analyzing geographically-referenced data

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103299267A (en) * 2010-12-20 2013-09-11 销售力网络公司 Methods and systems for performing cross store joins in a multi-tenant store
CN102750300A (en) * 2011-12-27 2012-10-24 浙江大学 High-performance unstructured data access protocol supporting multi-granularity searching.

Also Published As

Publication number Publication date
CN104750815A (en) 2015-07-01

Similar Documents

Publication Publication Date Title
CN103731483B (en) Virtual file system based on cloud computing
CN103714123B (en) Enterprise's cloud memory partitioning object data de-duplication and restructuring version control method
CN104794123B (en) A kind of method and device building NoSQL database indexes for semi-structured data
CN103136243B (en) File system duplicate removal method based on cloud storage and device
CN104750815B (en) The storage method and device of a kind of Lob data based on HBase
CN111177302B (en) Service bill processing method, device, computer equipment and storage medium
US20140101167A1 (en) Creation of Inverted Index System, and Data Processing Method and Apparatus
CN105144080A (en) System for metadata management
JP2012098934A (en) Document management system, method for controlling document management system and program
CN104408111A (en) Method and device for deleting duplicate data
CN104881466A (en) Method and device for processing data fragments and deleting garbage files
CN104331453A (en) Distributed file system and constructing method thereof
CN106970929A (en) Data lead-in method and device
CN107832423A (en) A kind of file read/write method for distributed file system
CN105843554B (en) The method and its system of Data Migration based on object storage
CN110633261A (en) Picture storage method, picture query method and device
CN103299297A (en) File directory storage method, retrieval method and device
CN112912870A (en) Tenant identifier conversion
CN111680030A (en) Data fusion method and device, and data processing method and device based on meta information
CN111427845A (en) Interactive modeling analysis operator data exchange method
CN103778231A (en) Method and system for managing operation record information, of databank
US11853274B2 (en) Efficient deduplication of randomized file paths
CN113448946B (en) Data migration method and device and electronic equipment
Li et al. Design of the mass multimedia files storage architecture based on Hadoop
Lu et al. Research on Cassandra data compaction strategies for time-series data

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20180807

Address after: 250100 S06 tower, 1036, Chao Lu Road, hi tech Zone, Ji'nan, Shandong.

Patentee after: Shandong wave cloud Mdt InfoTech Ltd

Address before: No. 1036, Shandong high tech Zone wave road, Ji'nan, Shandong

Patentee before: Inspur Group Co., Ltd.

TR01 Transfer of patent right
CP03 Change of name, title or address

Address after: 250100 No. 1036 Tidal Road, Jinan High-tech Zone, Shandong Province, S01 Building, Tidal Science Park

Patentee after: Inspur cloud Information Technology Co., Ltd

Address before: 250100 Ji'nan science and technology zone, Shandong high tide Road, No. 1036 wave of science and Technology Park, building S06

Patentee before: SHANDONG LANGCHAO YUNTOU INFORMATION TECHNOLOGY Co.,Ltd.

CP03 Change of name, title or address