CN103959259B - Date storage method, data storage device and data-storage system - Google Patents

Date storage method, data storage device and data-storage system Download PDF

Info

Publication number
CN103959259B
CN103959259B CN201280005841.0A CN201280005841A CN103959259B CN 103959259 B CN103959259 B CN 103959259B CN 201280005841 A CN201280005841 A CN 201280005841A CN 103959259 B CN103959259 B CN 103959259B
Authority
CN
China
Prior art keywords
data block
fingerprint
block
data
history
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201280005841.0A
Other languages
Chinese (zh)
Other versions
CN103959259A (en
Inventor
魏明昌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority claimed from PCT/CN2012/084901 external-priority patent/WO2014078990A1/en
Publication of CN103959259A publication Critical patent/CN103959259A/en
Application granted granted Critical
Publication of CN103959259B publication Critical patent/CN103959259B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

A kind of date storage method, data storage device and data-storage system, this date storage method includes: data to be stored are divided into n data block;Judge whether each data block of being marked off attaches most importance to complex data block;Storage is not the data block repeating data block.This judges that step includes: follow-up history based on the i-th 1 data blocks judge whether the i-th data block attaches most importance to complex data block, wherein i is more than 1 and less than or equal to the integer of n, and the follow-up history of the i-th 1 data blocks includes the content of the data block once reading or storing after the content of the i-th 1 data blocks;And if the follow-up history of the i-th 1 data blocks includes the content of the i-th data block, then the i-th data block is attached most importance to complex data block, otherwise continue to judge whether the i-th data block attaches most importance to complex data block, and wherein storing history has the content of all data blocks stored based on storage history.The present invention can optimize the performance of data de-duplication.

Description

Date storage method, data storage device and data-storage system
Technical field
The present invention relates to field of data storage, more particularly to carry out data de-duplication date storage method, Data storage device and data-storage system.
Background technology
Data de-duplication is a kind of technology of field of storage, a kind of way of data de-duplication Being that the data that user is write by storage system become data block according to certain algorithm partition, data block is big Little can be fixed length or elongated.According to predetermined algorithm (such as sha1 or md5 algorithm) to institute The all data blocks marked off carry out fingerprint calculating, carry out each number of labelling with calculating the fingerprint obtained According to block, and set up the fingerprint base of all data blocks stored.When being stored in new data block Time, first search fingerprint base according to the fingerprint of new data block.Judge whether to have stored and refer to this The data block that stricture of vagina is corresponding.If finding this block fingerprint in fingerprint base, then need not be this again Individual data block allocation space also stores.So, the data block that all fingerprints are identical, depositing Storage system only stores portion, such that it is able to greatly save memory space.
Fig. 1 shows the schematic diagram of the message structure that fingerprint table of the prior art recorded.Such as figure Shown in 1, fingerprint table record of the prior art have all data blocks stored block fingerprint and The information such as the block storage address, block reference count and the block length that are associated with each piece of fingerprint.
Fig. 2 shows the flow process of the date storage method carrying out data de-duplication in prior art Figure.As in figure 2 it is shown, when receiving data to be stored, in step S101, will be received To data be divided into n data block (n >=1).Subsequently, successively each data block is stored, And be described in detail as follows as a example by data block i.In step s 102, carry out fingerprint to calculate to obtain The block fingerprint i of data block i.In step s 103, fingerprint table has been searched with decision block fingerprint i the most It is present in fingerprint table.
If not finding the block fingerprint i of data block i in fingerprint table, then it is judged as that data block i is not deposit The data block stored up, enters step S104.In step S104, for data block i distribution storage sky Between, and data block i is write the memory space distributed, subsequently enter step S105 and S106. In step S105, in fingerprint table, increase the entry of block fingerprint i, and fill the storage of corresponding block The information such as address, block length and block reference count, wherein this block reference count is set to 1.? In step S106, return to the block storage address corresponding with block fingerprint i to upper strata.
On the other hand, if finding block fingerprint i in fingerprint table, then it is judged as that data block i is to repeat Data block, and enter step S107 and S106.In step s 107, update in fingerprint table with block Corresponding for fingerprint i block reference count, specially the block reference count of correspondence is made to add 1.
The above-mentioned date storage method with data de-duplication function have following defects that along with The increase of block fingerprint quantity, searching time of being spent of block fingerprint increases, and this becomes affects block and refer to The Main Bottleneck of stricture of vagina search performance.
Summary of the invention
In view of this, the embodiment of the present invention provides a kind of date storage method, data storage device And data-storage system, can reduce in the data storage procedure possess data de-duplication and look into Look for the time that fingerprint is spent.
First aspect, embodiments provides a kind of date storage method, including: will treat The data of storage are divided into n data block, and wherein n is the integer being more than;Judgement is marked off Whether each data block attaches most importance to complex data block, and wherein said repetition data block is that content had stored Data block;And storage is not the data block repeating data block, wherein, described judgement is divided Whether the attach most importance to step of complex data block of each data block gone out includes: based on the i-th-1 data block follow-up History judges whether the i-th data block attaches most importance to complex data block, and wherein i is whole more than 1 and less than or equal to n Number, the i-th-1 data block uses described date storage method to process before being close in the i-th data block Data block, the follow-up history of the i-th-1 data block includes once after the content of the i-th-1 data block The content of the data block reading or storing;And if the follow-up history of the i-th-1 data block includes The content of the i-th data block, then the i-th data block is attached most importance to complex data block, otherwise continues based on storage history Judging whether the i-th data block attaches most importance to complex data block, wherein said storing history has all to be deposited The content of the data block stored up.
In conjunction with first aspect, in the embodiment that the first is possible, described date storage method Also include: follow-up history based on the i-th-1 data block judge result be the i-th data block be not weight In the case of complex data block, update the follow-up history of the i-th-1 data block.
In conjunction with first aspect or the first possible embodiment of combining first aspect, second Planting in possible embodiment, described date storage method also includes: divided in described judgement The each data block gone out whether attach most importance to complex data block step before, according to pre-defined algorithm generate each number According to the fingerprint of block, to use described fingerprint to represent the content of each data block, described judgement is drawn Whether the attach most importance to step of complex data block of each data block separated specifically includes: based on the i-th-1 data block Follow-up history judge whether the i-th data block attaches most importance to complex data block, and wherein the i-th-1 data block is follow-up History includes the data block once reading or storing after the content of the i-th-1 data block Fingerprint;And if the follow-up history of the i-th-1 data block includes the fingerprint of the i-th data block, then i-th Data block is attached most importance to complex data block, otherwise continues to judge whether the i-th data block attaches most importance to based on storage history Complex data block, wherein said storing history has the fingerprint of all data blocks stored.
In conjunction with the embodiment that the second of first aspect is possible, the embodiment party that the third is possible In formula, described date storage method may further comprise: and utilizes fingerprint table to record described storage History, described fingerprint table includes that fingerprint and the block storage address, the block that are associated with this fingerprint draw With counting and the follow-up history of block, wherein: block storage address represents the data block corresponding with this fingerprint Storage address;Block reference count represents the occurrence number of the data block corresponding with this fingerprint;With And the follow-up history of block represents the follow-up history of the data block corresponding with this fingerprint.
In conjunction with the third possible embodiment of first aspect, the 4th kind of possible embodiment party In formula, a described date storage method also step includes: utilize routing table to record described fingerprint table In history follow-up with the block that certain fingerprint is associated, described routing table include next block fingerprint and The recording address being associated with this next block fingerprint, wherein: next block fingerprint representation once immediately preceding The fingerprint of the data block read after the data block corresponding with this fingerprint or stored, Yi Jiji Record address represents this next block fingerprint recording address in described fingerprint table.
In conjunction with the 4th kind of possible embodiment of first aspect, the 5th kind of possible embodiment party In formula, described follow-up history based on the i-th-1 data block judges whether the i-th data block attaches most importance to complex data The step of block includes: judge in described fingerprint table the path that the fingerprint with the i-th-1 data block is associated Whether table comprises the fingerprint of the i-th data block, if it is determined that result be to comprise, then according to i-th-1 The routing table that the fingerprint of data block is associated determines the recording address of the fingerprint of the i-th data block, according to Determined by recording address the block that fingerprint with the i-th data block in described fingerprint table is associated is drawn Add 1 with counting, and return to the block storage address being associated with the fingerprint of the i-th data block.
In conjunction with the 5th kind of possible embodiment of first aspect, the 6th kind of possible embodiment party In formula, described judge whether the i-th data block attaches most importance to the step bag of complex data block based on storage history Include, it is judged that whether described fingerprint table comprises the fingerprint of the i-th data block, if it is determined that result be Comprise, then the block reference count that fingerprint with the i-th data block in described fingerprint table is associated added 1, Return to the block storage address being associated with the fingerprint of the i-th data block, and update and the i-th-1 data block The routing table that fingerprint is associated is to include the recording address pointing to the fingerprint of the i-th data block.
Second aspect, embodiments provides a kind of data storage device, including: piecemeal Unit, for data to be stored are divided into n data block, wherein n is the integer being more than; Repeat judging unit, for judging whether each data block of being marked off attaches most importance to complex data block, its Described in repeat data block be the data block that content had stored;And data block memory element, Not being the data block repeating data block for storage, wherein, described repetition judging unit includes: For follow-up history based on the i-th-1 data block, prediction module, judges whether the i-th data block is repetition Data block, wherein i is the integer more than 1 and less than or equal to n, and the i-th-1 data block is to be close in i-th The data block processed by described data storage device before data block, the i-th-1 the follow-up of data block is gone through The data block that history includes once reading or stored after the content of the i-th-1 data block Content;And lookup module, the result for judging in described prediction module be the i-th data block not It is to repeat in the case of data block, judges whether the i-th data block attaches most importance to complex data based on storage history Block, wherein said storing history has the content of all data blocks stored.
In conjunction with second aspect, in the embodiment that the first is possible, described data storage device Also include: history updating block, be the i-th data block for the result judged in described prediction module In the case of not being repetition data block, update the follow-up history of the i-th-1 data block.
In conjunction with second aspect or the first possible embodiment of combining second aspect, second Planting in possible embodiment, described data storage device also includes: fingerprint computing unit, uses In the fingerprint according to the pre-defined algorithm each data block of generation, to use described fingerprint to represent each data The content of block, described prediction module follow-up history based on the i-th-1 data block judges that the i-th data block is No complex data block of attaching most importance to, wherein the follow-up history of the i-th-1 data block includes once immediately preceding the i-th-1 data The fingerprint of the data block read after the content of block or stored, described lookup module is described The result that prediction module judges is in the case of the i-th data block is not repetition data block, based on storage History judges whether the i-th data block attaches most importance to complex data block, and wherein said storing history has all The fingerprint of the data block stored.
In conjunction with the embodiment that the second of second aspect is possible, the embodiment party that the third is possible In formula, described history updating block utilize fingerprint table to record storage history, described fingerprint table bag Include fingerprint and the block storage address, block reference count and the follow-up history of block that are associated with this fingerprint, Wherein: block storage address represents the storage address of the data block corresponding with this fingerprint;Block quotes meter Number represents the occurrence number of the data block corresponding with this fingerprint;And the follow-up history of block represents and is somebody's turn to do The follow-up history of the data block that fingerprint is corresponding.
In conjunction with the third possible embodiment of second aspect, the 4th kind of possible embodiment party In formula, described history updating block utilize routing table to record in described fingerprint table with certain fingerprint The follow-up history of block being associated, described routing table includes next block fingerprint and refers to this next block The recording address that stricture of vagina is associated, wherein: next block fingerprint representation is once immediately preceding corresponding with this fingerprint Data block after the fingerprint of data block that read or stored, and recording address represents this Next block fingerprint recording address in described fingerprint table.
In conjunction with the 4th kind of possible embodiment of second aspect, the 5th kind of possible embodiment party In formula, described prediction module judges in described fingerprint table what the fingerprint with the i-th-1 data block was associated Whether routing table comprises the fingerprint of the i-th data block, if it is determined that result be to comprise, then described in go through History updating block determines the i-th data block according to the routing table being associated with the fingerprint of the i-th-1 data block The recording address of fingerprint, according to determined by recording address by described fingerprint table with the i-th data The block reference count that the fingerprint of block is associated adds 1, and return is associated with the fingerprint of the i-th data block Block storage address.
In conjunction with the 5th kind of possible embodiment of second aspect, the 6th kind of possible embodiment party In formula, whether fingerprint table described in described lookup unit judges comprises the fingerprint of the i-th data block, as Fruit judge result be to comprise, the most described history updating block by described fingerprint table with the i-th data The block reference count that the fingerprint of block is associated adds 1, and return is associated with the fingerprint of the i-th data block Block storage address, and update the routing table being associated with the fingerprint of the i-th-1 data block to include pointing to The recording address of the fingerprint of the i-th data block.
The third aspect, the embodiment of the present invention provides a kind of data-storage system, including: memorizer, For providing the memory space of storage data block;And according to above-mentioned second aspect and above-mentioned second Any one described data storage device in the first to six kind of possible embodiment of aspect.
Fourth aspect, the embodiment of the present invention provides a kind of storage control, including communication interface, Processor, computer-readable medium, wherein said communication interface, described processor and described meter Calculation machine computer-readable recording medium is connected by bus: described communication interface, for communicating with memorizer; Described computer-readable medium, is used for storing program code, when these program codes are by described place When reason device performs, described processor is for performing the of above-mentioned first aspect and above-mentioned first aspect Any one described date storage method in one to six kind of possible embodiment.
5th aspect, the embodiment of the present invention provides a kind of data-storage system, including: memorizer, For providing the memory space of storage data block;And according to the storage described in above-mentioned fourth aspect Controller.
Date storage method, data storage device and the data storage that the embodiment of the present invention is provided System is the record that each data block adds follow-up history, this follow-up history include once immediately preceding The content of the data block read after the content of this data block or stored, enabling at tool In the data storage procedure of standby data de-duplication function, according to the follow-up history of last data block Look-ahead follow-up repetition data block, thus effectively reduce and search the time that fingerprint is spent, enter And reduce repetition data block judge needed for time, alleviate repeat data block search performance bottleneck, Reach to optimize the purpose of data de-duplication performance.
Accompanying drawing explanation
Fig. 1 shows the schematic diagram of the recording information structure of fingerprint table in prior art.
Fig. 2 shows the date storage method possessing data de-duplication function in prior art Flow chart.
Fig. 3 shows that the data possessing data de-duplication function according to embodiments of the present invention are deposited The flow chart of method for storing.
Fig. 4 shows the record information knot according to the fingerprint table constructed by another embodiment of the present invention The schematic diagram of structure.
Fig. 5 shows the number possessing data de-duplication function according to another embodiment of the present invention Flow chart according to storage method.
Fig. 6 shows the number possessing data de-duplication function according to further embodiment of this invention Structure chart according to storage device.
Fig. 7 shows the number possessing data de-duplication function according to yet another embodiment of the invention Structure chart according to storage device.
Fig. 8 illustrates the data storage possessing data de-duplication function according to embodiments of the present invention The structure chart of system.
Detailed description of the invention
Embodiments of the invention will be provided detailed reference below.Although the present invention passes through these Embodiment is illustrated and illustrates, but it should be noted that the present invention is not merely confined to These embodiments.On the contrary, the present invention contains spirit defined in claim and invention In the range of all substitutes, variant and equivalent.
It addition, in order to better illustrate the present invention, detailed description of the invention below gives Numerous details.It will be understood by those skilled in the art that there is no these details, this Bright equally implement.In other example, for known method, formality, Element and circuit are not described in detail, in order to highlight the purport of the present invention.
As it has been described above, the present invention can optimize the data de-duplication performance in data storage procedure, And implementing method may is that by the follow-up data block that may need to access of look-ahead, comes Effectively reduce the time repeated needed for data block judges, and alleviate the performance that repetition data block is searched Bottleneck.
Fig. 3 shows that the data possessing data de-duplication function according to embodiments of the present invention are deposited The flow chart of method for storing.This date storage method is applicable to any data storage procedure and can Carry out device or the system of data storage.As it is shown on figure 3, receiving data to be stored and inciting somebody to action After data are divided into n block (n >=1), based on step S301 to S306, each data block is stored, Describe in detail as a example by data block i below.
In step S301, it is judged that after the last data block of data block (i-1), i.e. data block i Whether continuous history predicts the content of data block i, and wherein, i is the integer more than 1 and less than or equal to n, The follow-up history of data block (i-1) includes once reading after the content of the i-th-1 data block Or the content of the data block stored.Can be by the follow-up history judging data block (i-1) Whether the no content including data block i predicts number to the follow-up history judging data block (i-1) Content according to block i.If being judged as that the follow-up historical forecast of data block (i-1) is to number in step S301 Content according to block i, then it is assumed that data block i is repetition data block, had stored data block i i.e. before Content, subsequently enter step S304 and S306.In step s 304, the number stored is updated Storage history according to block.In step S306, return to the storage ground of the content storing data block i Location.
On the other hand, if being judged as that the follow-up history of data block (i-1) is not predicted in step S301 To the content of data block i, then enter step S302.In step s 302, sentence based on storage history The disconnected content the most having stored data block i, wherein storing history has and all had stored The content of data block.If being judged as storing the content of data block i, then it is assumed that data block i is weight Complex data block, subsequently enters step S304, S305, S306.In step s 304, updated The storage history of the data block of storage.In step S305, more new data block (i-1) follow-up History.In step S306, return to the storage address of the content storing data block i.
If being judged as not storing the content of data block i in step S302, then it is judged as data block i It is new data block, subsequently enters step S303.In step S303, for the distribution storage of data block i Space, and data block i is write the memory space that distributed, later step enter S304, S305, S306.In step s 304, the storage history of the data block stored is updated.In step S305 In, the follow-up history of more new data block (i-1).In step S306, return and store data The storage address of the content of block i.
To sum up, determine that data block i is attached most importance to complex data block in follow-up history based on data block (i-1) In the case of, it is not for data block i and carries out data block storage operation, and update storage history;? Storage history based on all data blocks determine that data block i is attached most importance in the case of complex data block, the most not Carry out data block storage operation for data block i, and update storage history and data block (i-1) Follow-up history;And in the case of determining that data block i is not repetition data block, for data block I carries out data block storage operation, and updates storage history and the follow-up history of data block (i-1).
By above-mentioned introduction, date storage method according to embodiments of the present invention is existing Repetition data block deletion scheme on the basis of, according to the appearance sequential recording of data block of data block Follow-up history, and carry out repeat data block judge time be predicted in advance.In other words, exist Before judging to repeat data block based on complete storage history, first general based on hit in storage history Part that rate is higher, the i.e. follow-up history of last data block carry out the repetition data block of little scope and sentence Disconnected.This obviously can effectively reduce the judgement time of repetition data block, promotes repetition data block and deletes The storage efficiency removed.
According to another embodiment of the present invention, for based on each number generated according to pre-defined algorithm Carry out repeating data block according to the fingerprint of block and judge (the identical content phase meaning data block of fingerprint Data storage together), can increase the historical path information of block fingerprint in fingerprint table.
In embodiments of the present invention, the follow-up history of data block refers to, according to store, read Sequencing, the content of the data block reading after being once close to this data block or storing.
With the sequencing according to storage, once it was close to the data stored after this data block As a example by the content of block is as follow-up history.Such as: before currently stored, had twice storage, In storage for the first time, it is sequentially stored into tri-data blocks of A, B, C;In second time storage, It is sequentially stored into tri-data blocks of A, B, D, say, that B data block is once after A data block Storing, C, D data block once stored after B data block.So, currently stored In, the follow-up history of data block A is exactly the content of data block B, the follow-up history bag of data block B Include two, be the content of the content of data block C, data block D respectively.
In an alternative embodiment of the invention, except to be close to the number stored after this data block According to the content of block as follow-up history, the data block read after being close to this data block interior Appearance can also be as follow-up history.For example, it is assumed that had before currently stored twice storage, Once read.Wherein, in storage for the first time, it is sequentially stored into tri-data blocks of A, B, C; In second time storage, it is sequentially stored into tri-data blocks of A, B, D;In once reading, depend on Secondary reading tri-data blocks of A, E, F, then in currently stored, the follow-up history of data block A Including content and the content of data block E of data block B, the follow-up history of data block B includes number According to block C, the content of data block D, the follow-up history of data block E includes the content of data block F.
Storage history in the embodiment of the present invention refers to, the data block the most stored Content.In the embodiment of the present invention, data are then stored into storage system after being divided into data block In system, data block is the data storage unit in storage system.The size of data block can be solid Fixed can also be variable, according to reading, can write (storage) efficiency, memory space The conditions such as size are set.The operation that data are divided into data block can be held by storage system Row can also be performed by the application server communicated with storage system.Deposit when user needs to read During the data stored up, search the data block of composition data, then these data blocks are reduced into data And return to user.Along with the increase of stored data, the data block quantity in storage history is also Increase therewith.In order to reduce taking of memory space, the data block in storage history can be entered The data block that row data de-duplication content is identical only stores portion, and storage is not repeated.
Fingerprint in the embodiment of the present invention is used for mark data block, when the content phase of two data blocks Meanwhile, its fingerprint is the most identical.Fingerprint can be by the content of data block is carried out Hash (hash) Computing obtains.Except hash algorithm, it would however also be possible to employ other can the calculation of mark data block content Method.Can be the fixing labelling of length, it is also possible to be the unfixed labelling of length, as long as can rise Effect to mark data block content.
Owing to fingerprint has a function of mark data block, the most above-mentioned follow-up history, storage are gone through The judgement of history, can be carried out based on data block contents itself, it is also possible to fingerprint based on data block Carry out.
Fig. 4 shows the record information according to the fingerprint table 400 constructed by another embodiment of the present invention The schematic diagram of structure.Fingerprint table 400 shown in Fig. 4 can be used to record the storage history of data block. In fingerprint table 400 as shown in Figure 4, the information being associated with fingerprint 410, except including that block is deposited The information such as storage address 420, block length 430, block reference count 440, also include the follow-up history of block 450. Wherein, the storage address of the data block that the expression of block storage address 420 is corresponding with this fingerprint 410;Block Length 430 represents the length of the data block corresponding with fingerprint 410;Block reference count 440 represents and is somebody's turn to do The occurrence number of the data block of fingerprint 410 correspondence;And block follow-up history 450 represents and this fingerprint The follow-up history of the data block of 410 correspondences, for record once immediately preceding the content of the i-th-1 data block The fingerprint of the data block that read afterwards or stored, i.e. subsequent data chunk and fingerprint position.
For fingerprint table 400, it is possible to use after routing table records the block being associated with fingerprint 410 Continuous history 450, this routing table can include n paths, under the information that every paths is corresponding can include One piece of fingerprint 451 and fingerprint address 452.Wherein, next block fingerprint 451 represent once immediately preceding The fingerprint of the data block read after this data block or stored;And fingerprint address 452 recording addresses representing next block fingerprint 451.As can be seen here, according to the follow-up history of block 450 The recording address of the fingerprint of measurable subsequent data chunk, and then next data can be found quickly The block storage address of block.
What Fig. 5 showed this another embodiment according to the present invention possesses data de-duplication function The flow chart of date storage method.This embodiment is based on below step S501 to S508 successively Each data block is stored.
In step S501, data block i is carried out fingerprint calculating, refer to generating the block of data block i Stricture of vagina i.In step S502, it is judged that the routing table being associated with the fingerprint 410 of data block (i-1) Whether 450 comprise the path 451 pointing to block fingerprint i.If being judged as comprising sensing block in step S502 The path of fingerprint i, then it is assumed that data block i is to repeat data block, determines that this refers to according to this path subsequently The recording address 452 of stricture of vagina, and enter step S507 and S508.In step s 507, according to this Recording address returns to the block storage address 420 being associated in fingerprint table with this fingerprint.In step S508 In, the block reference count 440 being associated with this fingerprint in fingerprint table according to this recording address 452 Add 1.
On the other hand, if be judged as not comprising the path of sensing block fingerprint i in step S502, then enter Enter step S503.In step S503, it is judged that fingerprint table 400 has had block fingerprint i the most. If there being block fingerprint i in step S503 is judged as fingerprint table 400, then it is assumed that data block i is weight Complex data block, subsequently enters step S506~S508.In step S506, update and data block (i-1) routing table that fingerprint is associated, the i.e. follow-up history of block 450, with in this routing table Increase the path of the block fingerprint i pointing to data block i.In step s 507, return in fingerprint table with The block storage address 420 that this fingerprint is associated.In step S508, by fingerprint table with this fingerprint The block reference count 440 being associated adds 1.
If not having block fingerprint i in step S503 is judged as fingerprint table 400, then it is assumed that data block i It is the new data block not stored, subsequently enters step S504.In step S504, for data Block i distributes memory space, and data block i writes this memory space, subsequently enter step S505~ S507.In step S505, fingerprint table 400 increases the record of block fingerprint i, i.e. increases block Block storage address 420, the block length of fingerprint i 410 and data block i that is associated with block fingerprint i 430, block reference count 440 (in such a case, it is possible to being set to 1) and the follow-up history of block 450 (in such a case, it is possible to being set to sky) etc..In step S506, update and data block (i-1) Fingerprint be associated routing table, i.e. the follow-up history of block 450, point to increase in this routing table The path of the block fingerprint i of data block i.In step s 507, return storage and have the content of data block i Block storage address.
To sum up, owing to adding fingerprint and the fingerprint institute of instruction subsequent data chunk in fingerprint table 400 At the routing information 450 of address, above-mentioned date storage method according to embodiments of the present invention can root It is predicted according to data-block history path relation and prefetches the data block fingerprint that hit probability is big in advance Information.Correspondingly, in Figure 5, except include respectively with step S102 in Fig. 2~S107 phase Outside step S501 of correspondence, S503, S504, S505, S507, S508, also include step S502 and S506.Wherein, by repeat number that look-ahead hit probability in step S502 is big According to block fingerprint, it is possible to effectively reduce the time needed for fingerprint is searched.Additionally, step S506 in order to Safeguard data-block history path relation, i.e. routing table.
Additionally, in one embodiment, the read-write speed of such as hard disk etc. it is stored at fingerprint table 400 In the case of spending in slower memorizer, the road that can will be associated with the fingerprint of data block (i-1) Footpath table prefetches to read or write speed buffer storage faster, thus improves step S502 further Processing speed.
Fig. 6 shows the number possessing data de-duplication function according to further embodiment of this invention Structure chart according to storage device 600.Data storage device in any embodiment of the present invention is e.g. Storage control, it is also possible to be the PC (PC) with identical function.
Data storage device 600 includes that blocking unit 610, repetition judging unit 630, data block are deposited Storage unit 640 and history updating block 650, wherein: blocking unit 610 is for by be stored Data are divided into n data block, and wherein n is the integer more than or equal to 1;Repeat judging unit 630 Whether it is the repetition data block stored for judging each data block marked off;Data block Memory element 640 is not the data block repeating data block for storage;History updating block 650 is used Storage history in more new data block.
As shown in Figure 6, repeat judging unit 630 to include prediction module 631 and search module 632. Wherein, it was predicted that module 631 for judge the i-th data block whether attach most importance to complex data block time, based on The content of follow-up historical forecast i-th data block of the i-th-1 data block, i.e. last data block, Qi Zhong The follow-up history of i-1 data block include once reading after the content of the i-th-1 data block or The content of the data block stored;And search module 632, for the most pre-in prediction module 631 In the case of measuring the content of the i-th data block, judge whether to store by searching storage history The content of the i-th data block, wherein storing history has the interior of all data blocks stored Hold.
Judge the i-th data block whether attach most importance to complex data block time (i is whole more than 1 and less than or equal to n Number), it was predicted that module 631 judges whether the follow-up history of the i-th-1 data block predicts the i-th data block Content.It is judged as that the follow-up historical forecast of the i-th-1 data block is to the i-th data block in prediction module 631 Content in the case of, data block memory element 640 is not for the i-th data block and carries out data block and deposit Storage operation, history updating block 650 updates storage history.It is judged as i-th-1 in prediction module 631 In the case of the follow-up history of data block does not predicts the content of the i-th data block, search module 632 Judge whether to store the content of the i-th data block based on storage history.
It is judged as that storage history has stored the situation of the content of the i-th data block searching module 632 Under, data block memory element 640 is not for the i-th data block and carries out data block storage operation, history Updating block 650 updates storage history and the follow-up history of the i-th-1 data block.Searching module 632 In the case of being judged as storing the content that history does not stores the i-th data block, data block memory element 640 carry out data block storage operation for the i-th data block, and history updating block 650 returns and deposits The address in storage space is as the storage address of data block i and updates storage history and the i-th-1 data block Follow-up history.
Fig. 7 shows the number possessing data de-duplication function according to yet another embodiment of the invention Structure chart according to storage device 700.Parts identical with Fig. 6 label in Fig. 7 have identical merit Energy.Data storage device 700 shown in Fig. 7 can also include fingerprint computing unit 620.Wherein, Fingerprint computing unit 620 for generating the fingerprint of each data block according to pre-defined algorithm.
Correspondingly, history updating block 650 may utilize the fingerprint table 400 shown in Fig. 4 to record data The storage history of block, and utilize the routing table being associated with fingerprint in fingerprint table 400 to record and be somebody's turn to do The follow-up history of the data block that fingerprint is corresponding.Repeat judging unit 630 and can calculate based on fingerprint single The fingerprint that unit 620 is generated carries out repeating data block and judges, i.e. fingerprint is identical means data block Content identical.
In one embodiment, it was predicted that module 631 based in fingerprint table 400 with the i-th-1 data block The routing table 450 that fingerprint is associated judges whether the i-th data block attaches most importance to complex data block, searches mould Based on whole fingerprint table 400, block 632 judges whether the i-th data block attaches most importance to complex data block.
Further, confirming that via prediction module 631 i-th data block is attached most importance in the case of complex data block, Data block memory element 640 is not for the i-th data block and carries out data block storage operation, and history updates The block reference count that fingerprint with the i-th data block in fingerprint table 400 is associated is added 1 by unit 650, And return to the block storage address being associated with this fingerprint.The i-th number is being confirmed via lookup module 632 Attaching most importance in the case of complex data block according to block, data block memory element 640 is also not for the i-th data block Carry out data block storage operation, history updating block 650 by fingerprint table 400 with the i-th data block The block reference count that fingerprint is associated adds 1, returns to the block storage address being associated with this fingerprint, and The path pointing to the i-th data block is increased in the routing table being associated with the fingerprint of the i-th-1 data block. In the case of lookup module 632 confirms that the i-th data block is not to repeat data block, data block stores Unit 640 carries out data block storage operation for the i-th data block, and history updating block 650 is at fingerprint Table 400 increases the record of the fingerprint of the i-th data block and is associated at the fingerprint with the i-th-1 data block Routing table in increase point to the i-th data block path.
Fig. 8 illustrates the data storage possessing data de-duplication function according to embodiments of the present invention The structure chart of system 800.As shown in Figure 8, data-storage system 800 includes memorizer 801 and deposits Storage controller 802.Wherein, memorizer 801 is used for providing the memory space required for storage data, It can be able to be such as by hard disk, tape and solid state hard disc to be any form storage medium Deng storage medium form, these storage mediums can pass through cheap magnetic disc redundant array (RAID, Redundant Array of Inexpensive Disks) etc. mode improve the reliability of data.Storage Controller 802 is connected with memorizer 801, for by performing such as Fig. 3 or illustrated in fig. 5 data Storage method controls the storage operation of memorizer 801.Storage control includes processor, calculating Machine computer-readable recording medium and communication interface, wherein communication interface, processor and computer-readable medium Connected by bus.Communication interface communicates with memorizer 801, when there being data block to need storage, Storage control 802 mails to memorizer 801 by communication interface data block and sends storage and refer to Order, and by memorizer 801, the content of data block is carried out record physically;Computer-readable is situated between Matter is used for storing program code, when these program codes are held by the processor in storage control 802 During row, processor can perform the date storage method in the above embodiment of the present invention.
Additionally, according to another embodiment of the present invention, data-storage system can also include storage Data storage device in device and the above embodiment of the present invention.
Other can also be used can to represent or replace the labelling of the content of data block to replace this Fingerprint in bright.In a word, the date storage method provided according to embodiments of the present invention and device Can be in the data storage procedure possessing data de-duplication function, according to the history of data block The follow-up possible repetition data block of sequential prediction such that it is able to effectively reduce and repeat data block judgement Required time, alleviate repeat data block and confirm performance bottleneck, reach to optimize and repeat data and delete Purpose except performance.
Those of ordinary skill in the art are it is to be appreciated that combine the embodiments described herein and retouch The unit of each example stated and algorithm steps, it is possible to electronic hardware or computer software and Being implemented in combination in of electronic hardware.These functions perform with hardware or software mode actually, Depend on application-specific and the design constraint of technical scheme.Professional and technical personnel can be to often Individual specifically should being used for uses different methods to realize described function, but this realization is not It is considered as beyond the scope of this invention.
If described function realizes and as independent production marketing using the form of SFU software functional unit Or when using, can be stored in a computer read/write memory medium.Based on such reason Solve, part that prior art is contributed by technical scheme the most in other words or The part of this technical scheme can embody with the form of software product, and this computer software produces Product are stored in a storage medium, including some instructions with so that a computer equipment (can To be personal computer, server, or the network equipment etc.) perform each embodiment of the present invention All or part of step of described method.And aforesaid storage medium includes: USB flash disk, mobile hard Dish, read only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), various Jie that can store program code such as magnetic disc or CD Matter.
Embodiment and accompanying drawing are only the conventional embodiment of the present invention specifically above.Obviously, not Depart from claims defined the present invention spirit and invention scope on the premise of can have various Augment, revise and replace.It should be appreciated by those skilled in the art that the present invention is in actual applications Can according to concrete environment and job requirement on the premise of without departing substantially from invention criterion at form, knot Structure, layout, ratio, material, element, assembly and other side are varied from.Therefore, exist This embodiment disclosed is merely to illustrate and unrestricted, and the scope of the present invention is by claims And legal equivalents defines, and it is not limited to description before this.

Claims (15)

1. a date storage method, it is characterised in that including:
Data to be stored are divided into n data block, and wherein n is the integer more than 1;
Judge whether each data block of being marked off attaches most importance to complex data block, wherein said repetition data Block is the data block that content had stored;And
Storage is not the data block repeating data block,
Wherein, whether each data block that described judgement is marked off attaches most importance to the step bag of complex data block Include:
Follow-up history based on the i-th-1 data block judges whether the i-th data block attaches most importance to complex data block, its Middle i is more than 1 and less than or equal to the integer of n, the i-th-1 data block be close in the i-th data block before adopt The data block processed with described date storage method, the follow-up history of the i-th-1 data block includes once The content of the data block reading or storing after the content of the i-th-1 data block;And
If the follow-up history of the i-th-1 data block includes the content of the i-th data block, then the i-th data block is Repeat data block, otherwise continue to judge whether the i-th data block attaches most importance to complex data based on storage history Block, wherein said storing history has the content of all data blocks stored.
Date storage method the most according to claim 1, it is characterised in that also include: Follow-up history based on the i-th-1 data block judge result be the i-th data block be not repeat data In the case of block, update the follow-up history of the i-th-1 data block.
Date storage method the most according to claim 1 and 2, it is characterised in that also include: The each data block marked off in described judgement whether attach most importance to complex data block step before, according to Pre-defined algorithm generates the fingerprint of each data block, to use described fingerprint in representing each data block Hold,
Whether the attach most importance to step of complex data block of each data block that described judgement is marked off specifically is wrapped Include:
Follow-up history based on the i-th-1 data block judges whether the i-th data block attaches most importance to complex data block, its In the follow-up history of the i-th-1 data block include once reading after the content of the i-th-1 data block The fingerprint of the data block crossed or stored;And
If the follow-up history of the i-th-1 data block includes the fingerprint of the i-th data block, then the i-th data block is Repeat data block, otherwise continue to judge whether the i-th data block attaches most importance to complex data based on storage history Block, wherein said storing history has the fingerprint of all data blocks stored.
Date storage method the most according to claim 3, it is characterised in that the method is also Farther include: utilizing fingerprint table to record described storage history, described fingerprint table includes fingerprint And block storage address, block reference count and the follow-up history of block being associated with this fingerprint, wherein:
Block storage address represents the storage address of the data block corresponding with this fingerprint;
Block reference count represents the occurrence number of the data block corresponding with this fingerprint;And
The follow-up history of block represents the follow-up history of the data block corresponding with this fingerprint.
Date storage method the most according to claim 4, it is characterised in that the method is also One step includes: utilize routing table to after recording the block being associated in described fingerprint table with certain fingerprint Continuous history, described routing table includes next block fingerprint and the note being associated with this next block fingerprint Record address, wherein:
Next block fingerprint representation once read after the data block corresponding with this fingerprint or deposited The fingerprint of the data block stored up, and
Recording address represents this next block fingerprint recording address in described fingerprint table.
Date storage method the most according to claim 5, it is characterised in that described based on The follow-up history of the i-th-1 data block judges that whether the attach most importance to step of complex data block of the i-th data block includes:
Judge in described fingerprint table whether the routing table that the fingerprint with the i-th-1 data block is associated wraps Containing the fingerprint of the i-th data block,
If it is determined that result be to comprise, then according to the road that is associated with the fingerprint of the i-th-1 data block Footpath table determines the recording address of the fingerprint of the i-th data block, according to determined by recording address by described The block reference count that in fingerprint table, fingerprint with the i-th data block is associated adds 1, and returns and the i-th number The block storage address being associated according to the fingerprint of block.
Date storage method the most according to claim 6, it is characterised in that described based on Storage history judges that whether the attach most importance to step of complex data block of the i-th data block includes, it is judged that described fingerprint Whether table comprises the fingerprint of the i-th data block,
If it is determined that result be to comprise, then by fingerprint phase with the i-th data block in described fingerprint table The block reference count of association adds 1, returns to the block storage address being associated with the fingerprint of the i-th data block, And update the routing table being associated with the fingerprint of the i-th-1 data block to include pointing to the i-th data block The recording address of fingerprint.
8. a data storage device, it is characterised in that including:
Blocking unit, for data to be stored are divided into n data block, wherein n is more than 1 Integer;
Repeat judging unit, for judging whether each data block of being marked off attaches most importance to complex data block, Wherein said repetition data block is the data block that content had stored;And
Data block memory element, is not the data block repeating data block for storage,
Wherein, described repetition judging unit includes:
For follow-up history based on the i-th-1 data block, prediction module, judges that whether the i-th data block is Repeating data block, wherein i is the integer more than 1 and less than or equal to n, and the i-th-1 data block is to be close in The data block processed by described data storage device before i-th data block, the i-th-1 data block follow-up History includes the data block once reading or storing after the content of the i-th-1 data block Content;And
Search module, the result for judging in described prediction module be the i-th data block be not to repeat In the case of data block, judge whether the i-th data block attaches most importance to complex data block based on storage history, its Described in storing history have the content of all data blocks stored.
Data storage device the most according to claim 8, it is characterised in that also include: History updating block, the result for judging in described prediction module be the i-th data block be not to repeat In the case of data block, update the follow-up history of the i-th-1 data block.
Data storage device the most according to claim 8 or claim 9, it is characterised in that also wrap Include: fingerprint computing unit, for generating the fingerprint of each data block according to pre-defined algorithm, to use Described fingerprint represents the content of each data block,
Described prediction module follow-up history based on the i-th-1 data block judges that whether the i-th data block is Repeating data block, wherein the follow-up history of the i-th-1 data block includes once immediately preceding the i-th-1 data block The fingerprint of the data block read after content or stored,
Described lookup module the result that described prediction module judges be the i-th data block be not repeat In the case of data block, judge whether the i-th data block attaches most importance to complex data block based on storage history, its Described in storing history have the fingerprint of all data blocks stored.
11. data storage devices according to claim 10, it is characterised in that
Described history updating block utilizes fingerprint table to record storage history, and described fingerprint table includes Fingerprint and the block storage address, block reference count and the follow-up history of block that are associated with this fingerprint, Wherein:
Block storage address represents the storage address of the data block corresponding with this fingerprint;
Block reference count represents the occurrence number of the data block corresponding with this fingerprint;And
The follow-up history of block represents the follow-up history of the data block corresponding with this fingerprint.
12. data storage devices according to claim 11, it is characterised in that described in go through History updating block utilizes routing table to after recording the block being associated in described fingerprint table with certain fingerprint Continuous history, described routing table includes next block fingerprint and the note being associated with this next block fingerprint Record address, wherein:
Next block fingerprint representation once read after the data block corresponding with this fingerprint or deposited The fingerprint of the data block stored up, and
Recording address represents this next block fingerprint recording address in described fingerprint table.
13. data storage devices according to claim 12, it is characterised in that described pre- Whether survey routing table that module judges in described fingerprint table that the fingerprint with the i-th-1 data block is associated Comprise the fingerprint of the i-th data block,
If it is determined that result be to comprise, the most described history updating block according to the i-th-1 data block The routing table that is associated of fingerprint determine the recording address of fingerprint of the i-th data block, according to being determined Recording address block reference count that fingerprint with the i-th data block in described fingerprint table is associated Add 1, and return to the block storage address being associated with the fingerprint of the i-th data block.
14. data storage devices according to claim 13, it is characterised in that described in look into Module is looked for judge whether described fingerprint table comprises the fingerprint of the i-th data block,
If it is determined that result be to comprise, the most described history updating block by described fingerprint table with The block reference count that the fingerprint of the i-th data block is associated adds 1, returns the fingerprint phase with the i-th data block The block storage address of association, and update the routing table being associated with the fingerprint of the i-th-1 data block with bag Include the recording address of the fingerprint pointing to the i-th data block.
15. 1 kinds of data-storage systems, it is characterised in that including:
Memorizer, for providing the memory space of storage data block;And
According to Claim 8 to the data storage device described in 14 any one.
CN201280005841.0A 2012-11-20 Date storage method, data storage device and data-storage system Active CN103959259B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2012/084901 WO2014078990A1 (en) 2012-11-20 2012-11-20 Data storage method, data storage device and data storage system

Publications (2)

Publication Number Publication Date
CN103959259A CN103959259A (en) 2014-07-30
CN103959259B true CN103959259B (en) 2016-11-30

Family

ID=

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1672137A (en) * 2002-07-25 2005-09-21 三洋电机株式会社 Data storage device capable of storing multiple sets of history information on input/output processing of security data without duplication
CN1708763A (en) * 2002-11-08 2005-12-14 皇家飞利浦电子股份有限公司 Method and system for providing previous selection information
US7487162B2 (en) * 2003-04-11 2009-02-03 Hitachi, Ltd. Method and data processing system with data replication
CN102222085A (en) * 2011-05-17 2011-10-19 华中科技大学 Data de-duplication method based on combination of similarity and locality
CN102624908A (en) * 2012-03-12 2012-08-01 浙江大学 Method for detecting semantic Web service based on mixed P2P (peer-to-peer) network structure

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1672137A (en) * 2002-07-25 2005-09-21 三洋电机株式会社 Data storage device capable of storing multiple sets of history information on input/output processing of security data without duplication
CN1708763A (en) * 2002-11-08 2005-12-14 皇家飞利浦电子股份有限公司 Method and system for providing previous selection information
US7487162B2 (en) * 2003-04-11 2009-02-03 Hitachi, Ltd. Method and data processing system with data replication
CN102222085A (en) * 2011-05-17 2011-10-19 华中科技大学 Data de-duplication method based on combination of similarity and locality
CN102624908A (en) * 2012-03-12 2012-08-01 浙江大学 Method for detecting semantic Web service based on mixed P2P (peer-to-peer) network structure

Similar Documents

Publication Publication Date Title
CN104008064B (en) The method and system compressed for multi-level store
CN102591909B (en) Systems and methods for providing increased scalability in deduplication storage systems
CN102047305B (en) File input/output scheduler and processing method
CN103345472B (en) De-redundant file system based on limited binary tree Bloom filter and construction method thereof
CN103019887B (en) Data back up method and device
US20150169448A1 (en) Enhancing Analytics Performance Using Distributed Multi-Tiering
CN103842967B (en) For safeguarding method and the computer system of instant virtual copies
CN106610790A (en) Repeated data deleting method and device
CN104731886B (en) A kind of processing method and system of mass small documents
US8838890B2 (en) Stride based free space management on compressed volumes
CN109766341A (en) A kind of method, apparatus that establishing Hash mapping, storage medium
CN104731523A (en) Method and controller for collaborative management of non-volatile hierarchical storage system
CN102591947A (en) Fast and low-RAM-footprint indexing for data deduplication
CN104750432B (en) A kind of date storage method and device
CN103617097B (en) File access pattern method and device
CN109799950A (en) The adaptive management of intermediate storage
CN108875046A (en) A kind of storage system access method, device and electronic equipment
CN104040508B (en) For the method and system for managing data in cache systems
CN106066818B (en) A kind of data layout method improving data de-duplication standby system restorability
CN104063330A (en) Data prefetching method and device
US10503608B2 (en) Efficient management of reference blocks used in data deduplication
CN103678158B (en) A kind of data layout optimization method and system
CN109582213A (en) Data reconstruction method and device, data-storage system
CN105760111A (en) Random Read Performance Of Optical Media Library
CN112148217B (en) Method, device and medium for caching deduplication metadata of full flash memory system

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant