CN103959259B - Date storage method, data storage device and data-storage system - Google Patents
Date storage method, data storage device and data-storage system Download PDFInfo
- Publication number
- CN103959259B CN103959259B CN201280005841.0A CN201280005841A CN103959259B CN 103959259 B CN103959259 B CN 103959259B CN 201280005841 A CN201280005841 A CN 201280005841A CN 103959259 B CN103959259 B CN 103959259B
- Authority
- CN
- China
- Prior art keywords
- data block
- fingerprint
- block
- data
- history
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000003860 storage Methods 0.000 title claims abstract description 219
- 230000000875 corresponding Effects 0.000 claims description 24
- 230000000903 blocking Effects 0.000 claims description 3
- 238000004891 communication Methods 0.000 description 7
- 238000000034 method Methods 0.000 description 7
- 210000001215 Vagina Anatomy 0.000 description 5
- 238000011030 bottleneck Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000002372 labelling Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 238000000151 deposition Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
Abstract
A kind of date storage method, data storage device and data-storage system, this date storage method includes: data to be stored are divided into n data block;Judge whether each data block of being marked off attaches most importance to complex data block;Storage is not the data block repeating data block.This judges that step includes: follow-up history based on the i-th 1 data blocks judge whether the i-th data block attaches most importance to complex data block, wherein i is more than 1 and less than or equal to the integer of n, and the follow-up history of the i-th 1 data blocks includes the content of the data block once reading or storing after the content of the i-th 1 data blocks;And if the follow-up history of the i-th 1 data blocks includes the content of the i-th data block, then the i-th data block is attached most importance to complex data block, otherwise continue to judge whether the i-th data block attaches most importance to complex data block, and wherein storing history has the content of all data blocks stored based on storage history.The present invention can optimize the performance of data de-duplication.
Description
Technical field
The present invention relates to field of data storage, more particularly to carry out data de-duplication date storage method,
Data storage device and data-storage system.
Background technology
Data de-duplication is a kind of technology of field of storage, a kind of way of data de-duplication
Being that the data that user is write by storage system become data block according to certain algorithm partition, data block is big
Little can be fixed length or elongated.According to predetermined algorithm (such as sha1 or md5 algorithm) to institute
The all data blocks marked off carry out fingerprint calculating, carry out each number of labelling with calculating the fingerprint obtained
According to block, and set up the fingerprint base of all data blocks stored.When being stored in new data block
Time, first search fingerprint base according to the fingerprint of new data block.Judge whether to have stored and refer to this
The data block that stricture of vagina is corresponding.If finding this block fingerprint in fingerprint base, then need not be this again
Individual data block allocation space also stores.So, the data block that all fingerprints are identical, depositing
Storage system only stores portion, such that it is able to greatly save memory space.
Fig. 1 shows the schematic diagram of the message structure that fingerprint table of the prior art recorded.Such as figure
Shown in 1, fingerprint table record of the prior art have all data blocks stored block fingerprint and
The information such as the block storage address, block reference count and the block length that are associated with each piece of fingerprint.
Fig. 2 shows the flow process of the date storage method carrying out data de-duplication in prior art
Figure.As in figure 2 it is shown, when receiving data to be stored, in step S101, will be received
To data be divided into n data block (n >=1).Subsequently, successively each data block is stored,
And be described in detail as follows as a example by data block i.In step s 102, carry out fingerprint to calculate to obtain
The block fingerprint i of data block i.In step s 103, fingerprint table has been searched with decision block fingerprint i the most
It is present in fingerprint table.
If not finding the block fingerprint i of data block i in fingerprint table, then it is judged as that data block i is not deposit
The data block stored up, enters step S104.In step S104, for data block i distribution storage sky
Between, and data block i is write the memory space distributed, subsequently enter step S105 and S106.
In step S105, in fingerprint table, increase the entry of block fingerprint i, and fill the storage of corresponding block
The information such as address, block length and block reference count, wherein this block reference count is set to 1.?
In step S106, return to the block storage address corresponding with block fingerprint i to upper strata.
On the other hand, if finding block fingerprint i in fingerprint table, then it is judged as that data block i is to repeat
Data block, and enter step S107 and S106.In step s 107, update in fingerprint table with block
Corresponding for fingerprint i block reference count, specially the block reference count of correspondence is made to add 1.
The above-mentioned date storage method with data de-duplication function have following defects that along with
The increase of block fingerprint quantity, searching time of being spent of block fingerprint increases, and this becomes affects block and refer to
The Main Bottleneck of stricture of vagina search performance.
Summary of the invention
In view of this, the embodiment of the present invention provides a kind of date storage method, data storage device
And data-storage system, can reduce in the data storage procedure possess data de-duplication and look into
Look for the time that fingerprint is spent.
First aspect, embodiments provides a kind of date storage method, including: will treat
The data of storage are divided into n data block, and wherein n is the integer being more than;Judgement is marked off
Whether each data block attaches most importance to complex data block, and wherein said repetition data block is that content had stored
Data block;And storage is not the data block repeating data block, wherein, described judgement is divided
Whether the attach most importance to step of complex data block of each data block gone out includes: based on the i-th-1 data block follow-up
History judges whether the i-th data block attaches most importance to complex data block, and wherein i is whole more than 1 and less than or equal to n
Number, the i-th-1 data block uses described date storage method to process before being close in the i-th data block
Data block, the follow-up history of the i-th-1 data block includes once after the content of the i-th-1 data block
The content of the data block reading or storing;And if the follow-up history of the i-th-1 data block includes
The content of the i-th data block, then the i-th data block is attached most importance to complex data block, otherwise continues based on storage history
Judging whether the i-th data block attaches most importance to complex data block, wherein said storing history has all to be deposited
The content of the data block stored up.
In conjunction with first aspect, in the embodiment that the first is possible, described date storage method
Also include: follow-up history based on the i-th-1 data block judge result be the i-th data block be not weight
In the case of complex data block, update the follow-up history of the i-th-1 data block.
In conjunction with first aspect or the first possible embodiment of combining first aspect, second
Planting in possible embodiment, described date storage method also includes: divided in described judgement
The each data block gone out whether attach most importance to complex data block step before, according to pre-defined algorithm generate each number
According to the fingerprint of block, to use described fingerprint to represent the content of each data block, described judgement is drawn
Whether the attach most importance to step of complex data block of each data block separated specifically includes: based on the i-th-1 data block
Follow-up history judge whether the i-th data block attaches most importance to complex data block, and wherein the i-th-1 data block is follow-up
History includes the data block once reading or storing after the content of the i-th-1 data block
Fingerprint;And if the follow-up history of the i-th-1 data block includes the fingerprint of the i-th data block, then i-th
Data block is attached most importance to complex data block, otherwise continues to judge whether the i-th data block attaches most importance to based on storage history
Complex data block, wherein said storing history has the fingerprint of all data blocks stored.
In conjunction with the embodiment that the second of first aspect is possible, the embodiment party that the third is possible
In formula, described date storage method may further comprise: and utilizes fingerprint table to record described storage
History, described fingerprint table includes that fingerprint and the block storage address, the block that are associated with this fingerprint draw
With counting and the follow-up history of block, wherein: block storage address represents the data block corresponding with this fingerprint
Storage address;Block reference count represents the occurrence number of the data block corresponding with this fingerprint;With
And the follow-up history of block represents the follow-up history of the data block corresponding with this fingerprint.
In conjunction with the third possible embodiment of first aspect, the 4th kind of possible embodiment party
In formula, a described date storage method also step includes: utilize routing table to record described fingerprint table
In history follow-up with the block that certain fingerprint is associated, described routing table include next block fingerprint and
The recording address being associated with this next block fingerprint, wherein: next block fingerprint representation once immediately preceding
The fingerprint of the data block read after the data block corresponding with this fingerprint or stored, Yi Jiji
Record address represents this next block fingerprint recording address in described fingerprint table.
In conjunction with the 4th kind of possible embodiment of first aspect, the 5th kind of possible embodiment party
In formula, described follow-up history based on the i-th-1 data block judges whether the i-th data block attaches most importance to complex data
The step of block includes: judge in described fingerprint table the path that the fingerprint with the i-th-1 data block is associated
Whether table comprises the fingerprint of the i-th data block, if it is determined that result be to comprise, then according to i-th-1
The routing table that the fingerprint of data block is associated determines the recording address of the fingerprint of the i-th data block, according to
Determined by recording address the block that fingerprint with the i-th data block in described fingerprint table is associated is drawn
Add 1 with counting, and return to the block storage address being associated with the fingerprint of the i-th data block.
In conjunction with the 5th kind of possible embodiment of first aspect, the 6th kind of possible embodiment party
In formula, described judge whether the i-th data block attaches most importance to the step bag of complex data block based on storage history
Include, it is judged that whether described fingerprint table comprises the fingerprint of the i-th data block, if it is determined that result be
Comprise, then the block reference count that fingerprint with the i-th data block in described fingerprint table is associated added 1,
Return to the block storage address being associated with the fingerprint of the i-th data block, and update and the i-th-1 data block
The routing table that fingerprint is associated is to include the recording address pointing to the fingerprint of the i-th data block.
Second aspect, embodiments provides a kind of data storage device, including: piecemeal
Unit, for data to be stored are divided into n data block, wherein n is the integer being more than;
Repeat judging unit, for judging whether each data block of being marked off attaches most importance to complex data block, its
Described in repeat data block be the data block that content had stored;And data block memory element,
Not being the data block repeating data block for storage, wherein, described repetition judging unit includes:
For follow-up history based on the i-th-1 data block, prediction module, judges whether the i-th data block is repetition
Data block, wherein i is the integer more than 1 and less than or equal to n, and the i-th-1 data block is to be close in i-th
The data block processed by described data storage device before data block, the i-th-1 the follow-up of data block is gone through
The data block that history includes once reading or stored after the content of the i-th-1 data block
Content;And lookup module, the result for judging in described prediction module be the i-th data block not
It is to repeat in the case of data block, judges whether the i-th data block attaches most importance to complex data based on storage history
Block, wherein said storing history has the content of all data blocks stored.
In conjunction with second aspect, in the embodiment that the first is possible, described data storage device
Also include: history updating block, be the i-th data block for the result judged in described prediction module
In the case of not being repetition data block, update the follow-up history of the i-th-1 data block.
In conjunction with second aspect or the first possible embodiment of combining second aspect, second
Planting in possible embodiment, described data storage device also includes: fingerprint computing unit, uses
In the fingerprint according to the pre-defined algorithm each data block of generation, to use described fingerprint to represent each data
The content of block, described prediction module follow-up history based on the i-th-1 data block judges that the i-th data block is
No complex data block of attaching most importance to, wherein the follow-up history of the i-th-1 data block includes once immediately preceding the i-th-1 data
The fingerprint of the data block read after the content of block or stored, described lookup module is described
The result that prediction module judges is in the case of the i-th data block is not repetition data block, based on storage
History judges whether the i-th data block attaches most importance to complex data block, and wherein said storing history has all
The fingerprint of the data block stored.
In conjunction with the embodiment that the second of second aspect is possible, the embodiment party that the third is possible
In formula, described history updating block utilize fingerprint table to record storage history, described fingerprint table bag
Include fingerprint and the block storage address, block reference count and the follow-up history of block that are associated with this fingerprint,
Wherein: block storage address represents the storage address of the data block corresponding with this fingerprint;Block quotes meter
Number represents the occurrence number of the data block corresponding with this fingerprint;And the follow-up history of block represents and is somebody's turn to do
The follow-up history of the data block that fingerprint is corresponding.
In conjunction with the third possible embodiment of second aspect, the 4th kind of possible embodiment party
In formula, described history updating block utilize routing table to record in described fingerprint table with certain fingerprint
The follow-up history of block being associated, described routing table includes next block fingerprint and refers to this next block
The recording address that stricture of vagina is associated, wherein: next block fingerprint representation is once immediately preceding corresponding with this fingerprint
Data block after the fingerprint of data block that read or stored, and recording address represents this
Next block fingerprint recording address in described fingerprint table.
In conjunction with the 4th kind of possible embodiment of second aspect, the 5th kind of possible embodiment party
In formula, described prediction module judges in described fingerprint table what the fingerprint with the i-th-1 data block was associated
Whether routing table comprises the fingerprint of the i-th data block, if it is determined that result be to comprise, then described in go through
History updating block determines the i-th data block according to the routing table being associated with the fingerprint of the i-th-1 data block
The recording address of fingerprint, according to determined by recording address by described fingerprint table with the i-th data
The block reference count that the fingerprint of block is associated adds 1, and return is associated with the fingerprint of the i-th data block
Block storage address.
In conjunction with the 5th kind of possible embodiment of second aspect, the 6th kind of possible embodiment party
In formula, whether fingerprint table described in described lookup unit judges comprises the fingerprint of the i-th data block, as
Fruit judge result be to comprise, the most described history updating block by described fingerprint table with the i-th data
The block reference count that the fingerprint of block is associated adds 1, and return is associated with the fingerprint of the i-th data block
Block storage address, and update the routing table being associated with the fingerprint of the i-th-1 data block to include pointing to
The recording address of the fingerprint of the i-th data block.
The third aspect, the embodiment of the present invention provides a kind of data-storage system, including: memorizer,
For providing the memory space of storage data block;And according to above-mentioned second aspect and above-mentioned second
Any one described data storage device in the first to six kind of possible embodiment of aspect.
Fourth aspect, the embodiment of the present invention provides a kind of storage control, including communication interface,
Processor, computer-readable medium, wherein said communication interface, described processor and described meter
Calculation machine computer-readable recording medium is connected by bus: described communication interface, for communicating with memorizer;
Described computer-readable medium, is used for storing program code, when these program codes are by described place
When reason device performs, described processor is for performing the of above-mentioned first aspect and above-mentioned first aspect
Any one described date storage method in one to six kind of possible embodiment.
5th aspect, the embodiment of the present invention provides a kind of data-storage system, including: memorizer,
For providing the memory space of storage data block;And according to the storage described in above-mentioned fourth aspect
Controller.
Date storage method, data storage device and the data storage that the embodiment of the present invention is provided
System is the record that each data block adds follow-up history, this follow-up history include once immediately preceding
The content of the data block read after the content of this data block or stored, enabling at tool
In the data storage procedure of standby data de-duplication function, according to the follow-up history of last data block
Look-ahead follow-up repetition data block, thus effectively reduce and search the time that fingerprint is spent, enter
And reduce repetition data block judge needed for time, alleviate repeat data block search performance bottleneck,
Reach to optimize the purpose of data de-duplication performance.
Accompanying drawing explanation
Fig. 1 shows the schematic diagram of the recording information structure of fingerprint table in prior art.
Fig. 2 shows the date storage method possessing data de-duplication function in prior art
Flow chart.
Fig. 3 shows that the data possessing data de-duplication function according to embodiments of the present invention are deposited
The flow chart of method for storing.
Fig. 4 shows the record information knot according to the fingerprint table constructed by another embodiment of the present invention
The schematic diagram of structure.
Fig. 5 shows the number possessing data de-duplication function according to another embodiment of the present invention
Flow chart according to storage method.
Fig. 6 shows the number possessing data de-duplication function according to further embodiment of this invention
Structure chart according to storage device.
Fig. 7 shows the number possessing data de-duplication function according to yet another embodiment of the invention
Structure chart according to storage device.
Fig. 8 illustrates the data storage possessing data de-duplication function according to embodiments of the present invention
The structure chart of system.
Detailed description of the invention
Embodiments of the invention will be provided detailed reference below.Although the present invention passes through these
Embodiment is illustrated and illustrates, but it should be noted that the present invention is not merely confined to
These embodiments.On the contrary, the present invention contains spirit defined in claim and invention
In the range of all substitutes, variant and equivalent.
It addition, in order to better illustrate the present invention, detailed description of the invention below gives
Numerous details.It will be understood by those skilled in the art that there is no these details, this
Bright equally implement.In other example, for known method, formality,
Element and circuit are not described in detail, in order to highlight the purport of the present invention.
As it has been described above, the present invention can optimize the data de-duplication performance in data storage procedure,
And implementing method may is that by the follow-up data block that may need to access of look-ahead, comes
Effectively reduce the time repeated needed for data block judges, and alleviate the performance that repetition data block is searched
Bottleneck.
Fig. 3 shows that the data possessing data de-duplication function according to embodiments of the present invention are deposited
The flow chart of method for storing.This date storage method is applicable to any data storage procedure and can
Carry out device or the system of data storage.As it is shown on figure 3, receiving data to be stored and inciting somebody to action
After data are divided into n block (n >=1), based on step S301 to S306, each data block is stored,
Describe in detail as a example by data block i below.
In step S301, it is judged that after the last data block of data block (i-1), i.e. data block i
Whether continuous history predicts the content of data block i, and wherein, i is the integer more than 1 and less than or equal to n,
The follow-up history of data block (i-1) includes once reading after the content of the i-th-1 data block
Or the content of the data block stored.Can be by the follow-up history judging data block (i-1)
Whether the no content including data block i predicts number to the follow-up history judging data block (i-1)
Content according to block i.If being judged as that the follow-up historical forecast of data block (i-1) is to number in step S301
Content according to block i, then it is assumed that data block i is repetition data block, had stored data block i i.e. before
Content, subsequently enter step S304 and S306.In step s 304, the number stored is updated
Storage history according to block.In step S306, return to the storage ground of the content storing data block i
Location.
On the other hand, if being judged as that the follow-up history of data block (i-1) is not predicted in step S301
To the content of data block i, then enter step S302.In step s 302, sentence based on storage history
The disconnected content the most having stored data block i, wherein storing history has and all had stored
The content of data block.If being judged as storing the content of data block i, then it is assumed that data block i is weight
Complex data block, subsequently enters step S304, S305, S306.In step s 304, updated
The storage history of the data block of storage.In step S305, more new data block (i-1) follow-up
History.In step S306, return to the storage address of the content storing data block i.
If being judged as not storing the content of data block i in step S302, then it is judged as data block i
It is new data block, subsequently enters step S303.In step S303, for the distribution storage of data block i
Space, and data block i is write the memory space that distributed, later step enter S304, S305,
S306.In step s 304, the storage history of the data block stored is updated.In step S305
In, the follow-up history of more new data block (i-1).In step S306, return and store data
The storage address of the content of block i.
To sum up, determine that data block i is attached most importance to complex data block in follow-up history based on data block (i-1)
In the case of, it is not for data block i and carries out data block storage operation, and update storage history;?
Storage history based on all data blocks determine that data block i is attached most importance in the case of complex data block, the most not
Carry out data block storage operation for data block i, and update storage history and data block (i-1)
Follow-up history;And in the case of determining that data block i is not repetition data block, for data block
I carries out data block storage operation, and updates storage history and the follow-up history of data block (i-1).
By above-mentioned introduction, date storage method according to embodiments of the present invention is existing
Repetition data block deletion scheme on the basis of, according to the appearance sequential recording of data block of data block
Follow-up history, and carry out repeat data block judge time be predicted in advance.In other words, exist
Before judging to repeat data block based on complete storage history, first general based on hit in storage history
Part that rate is higher, the i.e. follow-up history of last data block carry out the repetition data block of little scope and sentence
Disconnected.This obviously can effectively reduce the judgement time of repetition data block, promotes repetition data block and deletes
The storage efficiency removed.
According to another embodiment of the present invention, for based on each number generated according to pre-defined algorithm
Carry out repeating data block according to the fingerprint of block and judge (the identical content phase meaning data block of fingerprint
Data storage together), can increase the historical path information of block fingerprint in fingerprint table.
In embodiments of the present invention, the follow-up history of data block refers to, according to store, read
Sequencing, the content of the data block reading after being once close to this data block or storing.
With the sequencing according to storage, once it was close to the data stored after this data block
As a example by the content of block is as follow-up history.Such as: before currently stored, had twice storage,
In storage for the first time, it is sequentially stored into tri-data blocks of A, B, C;In second time storage,
It is sequentially stored into tri-data blocks of A, B, D, say, that B data block is once after A data block
Storing, C, D data block once stored after B data block.So, currently stored
In, the follow-up history of data block A is exactly the content of data block B, the follow-up history bag of data block B
Include two, be the content of the content of data block C, data block D respectively.
In an alternative embodiment of the invention, except to be close to the number stored after this data block
According to the content of block as follow-up history, the data block read after being close to this data block interior
Appearance can also be as follow-up history.For example, it is assumed that had before currently stored twice storage,
Once read.Wherein, in storage for the first time, it is sequentially stored into tri-data blocks of A, B, C;
In second time storage, it is sequentially stored into tri-data blocks of A, B, D;In once reading, depend on
Secondary reading tri-data blocks of A, E, F, then in currently stored, the follow-up history of data block A
Including content and the content of data block E of data block B, the follow-up history of data block B includes number
According to block C, the content of data block D, the follow-up history of data block E includes the content of data block F.
Storage history in the embodiment of the present invention refers to, the data block the most stored
Content.In the embodiment of the present invention, data are then stored into storage system after being divided into data block
In system, data block is the data storage unit in storage system.The size of data block can be solid
Fixed can also be variable, according to reading, can write (storage) efficiency, memory space
The conditions such as size are set.The operation that data are divided into data block can be held by storage system
Row can also be performed by the application server communicated with storage system.Deposit when user needs to read
During the data stored up, search the data block of composition data, then these data blocks are reduced into data
And return to user.Along with the increase of stored data, the data block quantity in storage history is also
Increase therewith.In order to reduce taking of memory space, the data block in storage history can be entered
The data block that row data de-duplication content is identical only stores portion, and storage is not repeated.
Fingerprint in the embodiment of the present invention is used for mark data block, when the content phase of two data blocks
Meanwhile, its fingerprint is the most identical.Fingerprint can be by the content of data block is carried out Hash (hash)
Computing obtains.Except hash algorithm, it would however also be possible to employ other can the calculation of mark data block content
Method.Can be the fixing labelling of length, it is also possible to be the unfixed labelling of length, as long as can rise
Effect to mark data block content.
Owing to fingerprint has a function of mark data block, the most above-mentioned follow-up history, storage are gone through
The judgement of history, can be carried out based on data block contents itself, it is also possible to fingerprint based on data block
Carry out.
Fig. 4 shows the record information according to the fingerprint table 400 constructed by another embodiment of the present invention
The schematic diagram of structure.Fingerprint table 400 shown in Fig. 4 can be used to record the storage history of data block.
In fingerprint table 400 as shown in Figure 4, the information being associated with fingerprint 410, except including that block is deposited
The information such as storage address 420, block length 430, block reference count 440, also include the follow-up history of block 450.
Wherein, the storage address of the data block that the expression of block storage address 420 is corresponding with this fingerprint 410;Block
Length 430 represents the length of the data block corresponding with fingerprint 410;Block reference count 440 represents and is somebody's turn to do
The occurrence number of the data block of fingerprint 410 correspondence;And block follow-up history 450 represents and this fingerprint
The follow-up history of the data block of 410 correspondences, for record once immediately preceding the content of the i-th-1 data block
The fingerprint of the data block that read afterwards or stored, i.e. subsequent data chunk and fingerprint position.
For fingerprint table 400, it is possible to use after routing table records the block being associated with fingerprint 410
Continuous history 450, this routing table can include n paths, under the information that every paths is corresponding can include
One piece of fingerprint 451 and fingerprint address 452.Wherein, next block fingerprint 451 represent once immediately preceding
The fingerprint of the data block read after this data block or stored;And fingerprint address
452 recording addresses representing next block fingerprint 451.As can be seen here, according to the follow-up history of block 450
The recording address of the fingerprint of measurable subsequent data chunk, and then next data can be found quickly
The block storage address of block.
What Fig. 5 showed this another embodiment according to the present invention possesses data de-duplication function
The flow chart of date storage method.This embodiment is based on below step S501 to S508 successively
Each data block is stored.
In step S501, data block i is carried out fingerprint calculating, refer to generating the block of data block i
Stricture of vagina i.In step S502, it is judged that the routing table being associated with the fingerprint 410 of data block (i-1)
Whether 450 comprise the path 451 pointing to block fingerprint i.If being judged as comprising sensing block in step S502
The path of fingerprint i, then it is assumed that data block i is to repeat data block, determines that this refers to according to this path subsequently
The recording address 452 of stricture of vagina, and enter step S507 and S508.In step s 507, according to this
Recording address returns to the block storage address 420 being associated in fingerprint table with this fingerprint.In step S508
In, the block reference count 440 being associated with this fingerprint in fingerprint table according to this recording address 452
Add 1.
On the other hand, if be judged as not comprising the path of sensing block fingerprint i in step S502, then enter
Enter step S503.In step S503, it is judged that fingerprint table 400 has had block fingerprint i the most.
If there being block fingerprint i in step S503 is judged as fingerprint table 400, then it is assumed that data block i is weight
Complex data block, subsequently enters step S506~S508.In step S506, update and data block
(i-1) routing table that fingerprint is associated, the i.e. follow-up history of block 450, with in this routing table
Increase the path of the block fingerprint i pointing to data block i.In step s 507, return in fingerprint table with
The block storage address 420 that this fingerprint is associated.In step S508, by fingerprint table with this fingerprint
The block reference count 440 being associated adds 1.
If not having block fingerprint i in step S503 is judged as fingerprint table 400, then it is assumed that data block i
It is the new data block not stored, subsequently enters step S504.In step S504, for data
Block i distributes memory space, and data block i writes this memory space, subsequently enter step S505~
S507.In step S505, fingerprint table 400 increases the record of block fingerprint i, i.e. increases block
Block storage address 420, the block length of fingerprint i 410 and data block i that is associated with block fingerprint i
430, block reference count 440 (in such a case, it is possible to being set to 1) and the follow-up history of block 450
(in such a case, it is possible to being set to sky) etc..In step S506, update and data block (i-1)
Fingerprint be associated routing table, i.e. the follow-up history of block 450, point to increase in this routing table
The path of the block fingerprint i of data block i.In step s 507, return storage and have the content of data block i
Block storage address.
To sum up, owing to adding fingerprint and the fingerprint institute of instruction subsequent data chunk in fingerprint table 400
At the routing information 450 of address, above-mentioned date storage method according to embodiments of the present invention can root
It is predicted according to data-block history path relation and prefetches the data block fingerprint that hit probability is big in advance
Information.Correspondingly, in Figure 5, except include respectively with step S102 in Fig. 2~S107 phase
Outside step S501 of correspondence, S503, S504, S505, S507, S508, also include step
S502 and S506.Wherein, by repeat number that look-ahead hit probability in step S502 is big
According to block fingerprint, it is possible to effectively reduce the time needed for fingerprint is searched.Additionally, step S506 in order to
Safeguard data-block history path relation, i.e. routing table.
Additionally, in one embodiment, the read-write speed of such as hard disk etc. it is stored at fingerprint table 400
In the case of spending in slower memorizer, the road that can will be associated with the fingerprint of data block (i-1)
Footpath table prefetches to read or write speed buffer storage faster, thus improves step S502 further
Processing speed.
Fig. 6 shows the number possessing data de-duplication function according to further embodiment of this invention
Structure chart according to storage device 600.Data storage device in any embodiment of the present invention is e.g.
Storage control, it is also possible to be the PC (PC) with identical function.
Data storage device 600 includes that blocking unit 610, repetition judging unit 630, data block are deposited
Storage unit 640 and history updating block 650, wherein: blocking unit 610 is for by be stored
Data are divided into n data block, and wherein n is the integer more than or equal to 1;Repeat judging unit 630
Whether it is the repetition data block stored for judging each data block marked off;Data block
Memory element 640 is not the data block repeating data block for storage;History updating block 650 is used
Storage history in more new data block.
As shown in Figure 6, repeat judging unit 630 to include prediction module 631 and search module 632.
Wherein, it was predicted that module 631 for judge the i-th data block whether attach most importance to complex data block time, based on
The content of follow-up historical forecast i-th data block of the i-th-1 data block, i.e. last data block, Qi Zhong
The follow-up history of i-1 data block include once reading after the content of the i-th-1 data block or
The content of the data block stored;And search module 632, for the most pre-in prediction module 631
In the case of measuring the content of the i-th data block, judge whether to store by searching storage history
The content of the i-th data block, wherein storing history has the interior of all data blocks stored
Hold.
Judge the i-th data block whether attach most importance to complex data block time (i is whole more than 1 and less than or equal to n
Number), it was predicted that module 631 judges whether the follow-up history of the i-th-1 data block predicts the i-th data block
Content.It is judged as that the follow-up historical forecast of the i-th-1 data block is to the i-th data block in prediction module 631
Content in the case of, data block memory element 640 is not for the i-th data block and carries out data block and deposit
Storage operation, history updating block 650 updates storage history.It is judged as i-th-1 in prediction module 631
In the case of the follow-up history of data block does not predicts the content of the i-th data block, search module 632
Judge whether to store the content of the i-th data block based on storage history.
It is judged as that storage history has stored the situation of the content of the i-th data block searching module 632
Under, data block memory element 640 is not for the i-th data block and carries out data block storage operation, history
Updating block 650 updates storage history and the follow-up history of the i-th-1 data block.Searching module 632
In the case of being judged as storing the content that history does not stores the i-th data block, data block memory element
640 carry out data block storage operation for the i-th data block, and history updating block 650 returns and deposits
The address in storage space is as the storage address of data block i and updates storage history and the i-th-1 data block
Follow-up history.
Fig. 7 shows the number possessing data de-duplication function according to yet another embodiment of the invention
Structure chart according to storage device 700.Parts identical with Fig. 6 label in Fig. 7 have identical merit
Energy.Data storage device 700 shown in Fig. 7 can also include fingerprint computing unit 620.Wherein,
Fingerprint computing unit 620 for generating the fingerprint of each data block according to pre-defined algorithm.
Correspondingly, history updating block 650 may utilize the fingerprint table 400 shown in Fig. 4 to record data
The storage history of block, and utilize the routing table being associated with fingerprint in fingerprint table 400 to record and be somebody's turn to do
The follow-up history of the data block that fingerprint is corresponding.Repeat judging unit 630 and can calculate based on fingerprint single
The fingerprint that unit 620 is generated carries out repeating data block and judges, i.e. fingerprint is identical means data block
Content identical.
In one embodiment, it was predicted that module 631 based in fingerprint table 400 with the i-th-1 data block
The routing table 450 that fingerprint is associated judges whether the i-th data block attaches most importance to complex data block, searches mould
Based on whole fingerprint table 400, block 632 judges whether the i-th data block attaches most importance to complex data block.
Further, confirming that via prediction module 631 i-th data block is attached most importance in the case of complex data block,
Data block memory element 640 is not for the i-th data block and carries out data block storage operation, and history updates
The block reference count that fingerprint with the i-th data block in fingerprint table 400 is associated is added 1 by unit 650,
And return to the block storage address being associated with this fingerprint.The i-th number is being confirmed via lookup module 632
Attaching most importance in the case of complex data block according to block, data block memory element 640 is also not for the i-th data block
Carry out data block storage operation, history updating block 650 by fingerprint table 400 with the i-th data block
The block reference count that fingerprint is associated adds 1, returns to the block storage address being associated with this fingerprint, and
The path pointing to the i-th data block is increased in the routing table being associated with the fingerprint of the i-th-1 data block.
In the case of lookup module 632 confirms that the i-th data block is not to repeat data block, data block stores
Unit 640 carries out data block storage operation for the i-th data block, and history updating block 650 is at fingerprint
Table 400 increases the record of the fingerprint of the i-th data block and is associated at the fingerprint with the i-th-1 data block
Routing table in increase point to the i-th data block path.
Fig. 8 illustrates the data storage possessing data de-duplication function according to embodiments of the present invention
The structure chart of system 800.As shown in Figure 8, data-storage system 800 includes memorizer 801 and deposits
Storage controller 802.Wherein, memorizer 801 is used for providing the memory space required for storage data,
It can be able to be such as by hard disk, tape and solid state hard disc to be any form storage medium
Deng storage medium form, these storage mediums can pass through cheap magnetic disc redundant array (RAID,
Redundant Array of Inexpensive Disks) etc. mode improve the reliability of data.Storage
Controller 802 is connected with memorizer 801, for by performing such as Fig. 3 or illustrated in fig. 5 data
Storage method controls the storage operation of memorizer 801.Storage control includes processor, calculating
Machine computer-readable recording medium and communication interface, wherein communication interface, processor and computer-readable medium
Connected by bus.Communication interface communicates with memorizer 801, when there being data block to need storage,
Storage control 802 mails to memorizer 801 by communication interface data block and sends storage and refer to
Order, and by memorizer 801, the content of data block is carried out record physically;Computer-readable is situated between
Matter is used for storing program code, when these program codes are held by the processor in storage control 802
During row, processor can perform the date storage method in the above embodiment of the present invention.
Additionally, according to another embodiment of the present invention, data-storage system can also include storage
Data storage device in device and the above embodiment of the present invention.
Other can also be used can to represent or replace the labelling of the content of data block to replace this
Fingerprint in bright.In a word, the date storage method provided according to embodiments of the present invention and device
Can be in the data storage procedure possessing data de-duplication function, according to the history of data block
The follow-up possible repetition data block of sequential prediction such that it is able to effectively reduce and repeat data block judgement
Required time, alleviate repeat data block and confirm performance bottleneck, reach to optimize and repeat data and delete
Purpose except performance.
Those of ordinary skill in the art are it is to be appreciated that combine the embodiments described herein and retouch
The unit of each example stated and algorithm steps, it is possible to electronic hardware or computer software and
Being implemented in combination in of electronic hardware.These functions perform with hardware or software mode actually,
Depend on application-specific and the design constraint of technical scheme.Professional and technical personnel can be to often
Individual specifically should being used for uses different methods to realize described function, but this realization is not
It is considered as beyond the scope of this invention.
If described function realizes and as independent production marketing using the form of SFU software functional unit
Or when using, can be stored in a computer read/write memory medium.Based on such reason
Solve, part that prior art is contributed by technical scheme the most in other words or
The part of this technical scheme can embody with the form of software product, and this computer software produces
Product are stored in a storage medium, including some instructions with so that a computer equipment (can
To be personal computer, server, or the network equipment etc.) perform each embodiment of the present invention
All or part of step of described method.And aforesaid storage medium includes: USB flash disk, mobile hard
Dish, read only memory (ROM, Read-Only Memory), random access memory (RAM,
Random Access Memory), various Jie that can store program code such as magnetic disc or CD
Matter.
Embodiment and accompanying drawing are only the conventional embodiment of the present invention specifically above.Obviously, not
Depart from claims defined the present invention spirit and invention scope on the premise of can have various
Augment, revise and replace.It should be appreciated by those skilled in the art that the present invention is in actual applications
Can according to concrete environment and job requirement on the premise of without departing substantially from invention criterion at form, knot
Structure, layout, ratio, material, element, assembly and other side are varied from.Therefore, exist
This embodiment disclosed is merely to illustrate and unrestricted, and the scope of the present invention is by claims
And legal equivalents defines, and it is not limited to description before this.
Claims (15)
1. a date storage method, it is characterised in that including:
Data to be stored are divided into n data block, and wherein n is the integer more than 1;
Judge whether each data block of being marked off attaches most importance to complex data block, wherein said repetition data
Block is the data block that content had stored;And
Storage is not the data block repeating data block,
Wherein, whether each data block that described judgement is marked off attaches most importance to the step bag of complex data block
Include:
Follow-up history based on the i-th-1 data block judges whether the i-th data block attaches most importance to complex data block, its
Middle i is more than 1 and less than or equal to the integer of n, the i-th-1 data block be close in the i-th data block before adopt
The data block processed with described date storage method, the follow-up history of the i-th-1 data block includes once
The content of the data block reading or storing after the content of the i-th-1 data block;And
If the follow-up history of the i-th-1 data block includes the content of the i-th data block, then the i-th data block is
Repeat data block, otherwise continue to judge whether the i-th data block attaches most importance to complex data based on storage history
Block, wherein said storing history has the content of all data blocks stored.
Date storage method the most according to claim 1, it is characterised in that also include:
Follow-up history based on the i-th-1 data block judge result be the i-th data block be not repeat data
In the case of block, update the follow-up history of the i-th-1 data block.
Date storage method the most according to claim 1 and 2, it is characterised in that also include:
The each data block marked off in described judgement whether attach most importance to complex data block step before, according to
Pre-defined algorithm generates the fingerprint of each data block, to use described fingerprint in representing each data block
Hold,
Whether the attach most importance to step of complex data block of each data block that described judgement is marked off specifically is wrapped
Include:
Follow-up history based on the i-th-1 data block judges whether the i-th data block attaches most importance to complex data block, its
In the follow-up history of the i-th-1 data block include once reading after the content of the i-th-1 data block
The fingerprint of the data block crossed or stored;And
If the follow-up history of the i-th-1 data block includes the fingerprint of the i-th data block, then the i-th data block is
Repeat data block, otherwise continue to judge whether the i-th data block attaches most importance to complex data based on storage history
Block, wherein said storing history has the fingerprint of all data blocks stored.
Date storage method the most according to claim 3, it is characterised in that the method is also
Farther include: utilizing fingerprint table to record described storage history, described fingerprint table includes fingerprint
And block storage address, block reference count and the follow-up history of block being associated with this fingerprint, wherein:
Block storage address represents the storage address of the data block corresponding with this fingerprint;
Block reference count represents the occurrence number of the data block corresponding with this fingerprint;And
The follow-up history of block represents the follow-up history of the data block corresponding with this fingerprint.
Date storage method the most according to claim 4, it is characterised in that the method is also
One step includes: utilize routing table to after recording the block being associated in described fingerprint table with certain fingerprint
Continuous history, described routing table includes next block fingerprint and the note being associated with this next block fingerprint
Record address, wherein:
Next block fingerprint representation once read after the data block corresponding with this fingerprint or deposited
The fingerprint of the data block stored up, and
Recording address represents this next block fingerprint recording address in described fingerprint table.
Date storage method the most according to claim 5, it is characterised in that described based on
The follow-up history of the i-th-1 data block judges that whether the attach most importance to step of complex data block of the i-th data block includes:
Judge in described fingerprint table whether the routing table that the fingerprint with the i-th-1 data block is associated wraps
Containing the fingerprint of the i-th data block,
If it is determined that result be to comprise, then according to the road that is associated with the fingerprint of the i-th-1 data block
Footpath table determines the recording address of the fingerprint of the i-th data block, according to determined by recording address by described
The block reference count that in fingerprint table, fingerprint with the i-th data block is associated adds 1, and returns and the i-th number
The block storage address being associated according to the fingerprint of block.
Date storage method the most according to claim 6, it is characterised in that described based on
Storage history judges that whether the attach most importance to step of complex data block of the i-th data block includes, it is judged that described fingerprint
Whether table comprises the fingerprint of the i-th data block,
If it is determined that result be to comprise, then by fingerprint phase with the i-th data block in described fingerprint table
The block reference count of association adds 1, returns to the block storage address being associated with the fingerprint of the i-th data block,
And update the routing table being associated with the fingerprint of the i-th-1 data block to include pointing to the i-th data block
The recording address of fingerprint.
8. a data storage device, it is characterised in that including:
Blocking unit, for data to be stored are divided into n data block, wherein n is more than 1
Integer;
Repeat judging unit, for judging whether each data block of being marked off attaches most importance to complex data block,
Wherein said repetition data block is the data block that content had stored;And
Data block memory element, is not the data block repeating data block for storage,
Wherein, described repetition judging unit includes:
For follow-up history based on the i-th-1 data block, prediction module, judges that whether the i-th data block is
Repeating data block, wherein i is the integer more than 1 and less than or equal to n, and the i-th-1 data block is to be close in
The data block processed by described data storage device before i-th data block, the i-th-1 data block follow-up
History includes the data block once reading or storing after the content of the i-th-1 data block
Content;And
Search module, the result for judging in described prediction module be the i-th data block be not to repeat
In the case of data block, judge whether the i-th data block attaches most importance to complex data block based on storage history, its
Described in storing history have the content of all data blocks stored.
Data storage device the most according to claim 8, it is characterised in that also include:
History updating block, the result for judging in described prediction module be the i-th data block be not to repeat
In the case of data block, update the follow-up history of the i-th-1 data block.
Data storage device the most according to claim 8 or claim 9, it is characterised in that also wrap
Include: fingerprint computing unit, for generating the fingerprint of each data block according to pre-defined algorithm, to use
Described fingerprint represents the content of each data block,
Described prediction module follow-up history based on the i-th-1 data block judges that whether the i-th data block is
Repeating data block, wherein the follow-up history of the i-th-1 data block includes once immediately preceding the i-th-1 data block
The fingerprint of the data block read after content or stored,
Described lookup module the result that described prediction module judges be the i-th data block be not repeat
In the case of data block, judge whether the i-th data block attaches most importance to complex data block based on storage history, its
Described in storing history have the fingerprint of all data blocks stored.
11. data storage devices according to claim 10, it is characterised in that
Described history updating block utilizes fingerprint table to record storage history, and described fingerprint table includes
Fingerprint and the block storage address, block reference count and the follow-up history of block that are associated with this fingerprint,
Wherein:
Block storage address represents the storage address of the data block corresponding with this fingerprint;
Block reference count represents the occurrence number of the data block corresponding with this fingerprint;And
The follow-up history of block represents the follow-up history of the data block corresponding with this fingerprint.
12. data storage devices according to claim 11, it is characterised in that described in go through
History updating block utilizes routing table to after recording the block being associated in described fingerprint table with certain fingerprint
Continuous history, described routing table includes next block fingerprint and the note being associated with this next block fingerprint
Record address, wherein:
Next block fingerprint representation once read after the data block corresponding with this fingerprint or deposited
The fingerprint of the data block stored up, and
Recording address represents this next block fingerprint recording address in described fingerprint table.
13. data storage devices according to claim 12, it is characterised in that described pre-
Whether survey routing table that module judges in described fingerprint table that the fingerprint with the i-th-1 data block is associated
Comprise the fingerprint of the i-th data block,
If it is determined that result be to comprise, the most described history updating block according to the i-th-1 data block
The routing table that is associated of fingerprint determine the recording address of fingerprint of the i-th data block, according to being determined
Recording address block reference count that fingerprint with the i-th data block in described fingerprint table is associated
Add 1, and return to the block storage address being associated with the fingerprint of the i-th data block.
14. data storage devices according to claim 13, it is characterised in that described in look into
Module is looked for judge whether described fingerprint table comprises the fingerprint of the i-th data block,
If it is determined that result be to comprise, the most described history updating block by described fingerprint table with
The block reference count that the fingerprint of the i-th data block is associated adds 1, returns the fingerprint phase with the i-th data block
The block storage address of association, and update the routing table being associated with the fingerprint of the i-th-1 data block with bag
Include the recording address of the fingerprint pointing to the i-th data block.
15. 1 kinds of data-storage systems, it is characterised in that including:
Memorizer, for providing the memory space of storage data block;And
According to Claim 8 to the data storage device described in 14 any one.
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2012/084901 WO2014078990A1 (en) | 2012-11-20 | 2012-11-20 | Data storage method, data storage device and data storage system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103959259A CN103959259A (en) | 2014-07-30 |
CN103959259B true CN103959259B (en) | 2016-11-30 |
Family
ID=
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1672137A (en) * | 2002-07-25 | 2005-09-21 | 三洋电机株式会社 | Data storage device capable of storing multiple sets of history information on input/output processing of security data without duplication |
CN1708763A (en) * | 2002-11-08 | 2005-12-14 | 皇家飞利浦电子股份有限公司 | Method and system for providing previous selection information |
US7487162B2 (en) * | 2003-04-11 | 2009-02-03 | Hitachi, Ltd. | Method and data processing system with data replication |
CN102222085A (en) * | 2011-05-17 | 2011-10-19 | 华中科技大学 | Data de-duplication method based on combination of similarity and locality |
CN102624908A (en) * | 2012-03-12 | 2012-08-01 | 浙江大学 | Method for detecting semantic Web service based on mixed P2P (peer-to-peer) network structure |
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1672137A (en) * | 2002-07-25 | 2005-09-21 | 三洋电机株式会社 | Data storage device capable of storing multiple sets of history information on input/output processing of security data without duplication |
CN1708763A (en) * | 2002-11-08 | 2005-12-14 | 皇家飞利浦电子股份有限公司 | Method and system for providing previous selection information |
US7487162B2 (en) * | 2003-04-11 | 2009-02-03 | Hitachi, Ltd. | Method and data processing system with data replication |
CN102222085A (en) * | 2011-05-17 | 2011-10-19 | 华中科技大学 | Data de-duplication method based on combination of similarity and locality |
CN102624908A (en) * | 2012-03-12 | 2012-08-01 | 浙江大学 | Method for detecting semantic Web service based on mixed P2P (peer-to-peer) network structure |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104008064B (en) | The method and system compressed for multi-level store | |
CN102591909B (en) | Systems and methods for providing increased scalability in deduplication storage systems | |
CN102047305B (en) | File input/output scheduler and processing method | |
CN103345472B (en) | De-redundant file system based on limited binary tree Bloom filter and construction method thereof | |
CN103019887B (en) | Data back up method and device | |
US20150169448A1 (en) | Enhancing Analytics Performance Using Distributed Multi-Tiering | |
CN103842967B (en) | For safeguarding method and the computer system of instant virtual copies | |
CN106610790A (en) | Repeated data deleting method and device | |
CN104731886B (en) | A kind of processing method and system of mass small documents | |
US8838890B2 (en) | Stride based free space management on compressed volumes | |
CN109766341A (en) | A kind of method, apparatus that establishing Hash mapping, storage medium | |
CN104731523A (en) | Method and controller for collaborative management of non-volatile hierarchical storage system | |
CN102591947A (en) | Fast and low-RAM-footprint indexing for data deduplication | |
CN104750432B (en) | A kind of date storage method and device | |
CN103617097B (en) | File access pattern method and device | |
CN109799950A (en) | The adaptive management of intermediate storage | |
CN108875046A (en) | A kind of storage system access method, device and electronic equipment | |
CN104040508B (en) | For the method and system for managing data in cache systems | |
CN106066818B (en) | A kind of data layout method improving data de-duplication standby system restorability | |
CN104063330A (en) | Data prefetching method and device | |
US10503608B2 (en) | Efficient management of reference blocks used in data deduplication | |
CN103678158B (en) | A kind of data layout optimization method and system | |
CN109582213A (en) | Data reconstruction method and device, data-storage system | |
CN105760111A (en) | Random Read Performance Of Optical Media Library | |
CN112148217B (en) | Method, device and medium for caching deduplication metadata of full flash memory system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |