CN107368545A - A kind of De-weight method and device based on MerkleTree deformation algorithms - Google Patents
A kind of De-weight method and device based on MerkleTree deformation algorithms Download PDFInfo
- Publication number
- CN107368545A CN107368545A CN201710507717.9A CN201710507717A CN107368545A CN 107368545 A CN107368545 A CN 107368545A CN 201710507717 A CN201710507717 A CN 201710507717A CN 107368545 A CN107368545 A CN 107368545A
- Authority
- CN
- China
- Prior art keywords
- piecemeal
- hash
- cryptographic hash
- data
- content
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/174—Redundancy elimination performed by the file system
- G06F16/1748—De-duplication implemented within the file system, e.g. based on file segments
- G06F16/1752—De-duplication implemented within the file system, e.g. based on file segments based on file chunks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/13—File access structures, e.g. distributed indices
- G06F16/137—Hash-based
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the present invention provides a kind of De-weight method and device based on Merkle Tree deformation algorithms, including:Piecemeal is carried out to the first data and calculates the cryptographic Hash of each piecemeal, each piecemeal sets reference count;The cryptographic Hash of first piecemeal and the first Hash subtree are contrasted with the first Hash tree pre-established:If the cryptographic Hash of the first piecemeal, content are identical with the first cryptographic Hash in the first Hash tree, content, and when the root node of the first Hash subtree is identical with the second cryptographic Hash in the first Hash tree, then the reference count of each piecemeal adds 1;If during the second cryptographic Hash difference in the root node and the first Hash tree of the first Hash subtree, the reference count of the first piecemeal is then added 1, and the first piecemeal is deleted, obtains the second data, aforesaid operations are performed to the second data, terminated when the second data are last piecemeal.Embodiment provided by the invention can improve deduplicated efficiency on the premise of duplicate removal rate is ensured, reduce the duplicate removal time.
Description
Technical field
The present invention relates to technical field of data processing, and in particular to a kind of duplicate removal based on Merkle Tree deformation algorithms
Method and device.
Background technology
Duplicate removal technology, also as data de-duplication technology, the value based on data in magnetic disk protection is drastically increased,
Significantly improve long-range and divisional office backup integration and disaster recovery strategy based on wide area network.The technology identification goes out weight
Complex data, redundancy is eliminated, so as to reduce the data volume of transmission and storage.
Common mode classification is file-level data de-duplication, the data de-duplication of block rank.
The data de-duplication technology of file-level contrasts what will be backed up or achieve by regarding the attribute of file as index
File and existing file.If this file is unique, it will be stored and update its index;If
Through existing, the pointer of only one sensing existing file is stored.As a result, only a document instance is saved, and it is subsequent
Copy all by one sensing actual file label substituted.
The data de-duplication of block rank, data are split as fragment --- data block or data slice, to these blocks of files
Redundancy check is carried out, it is compared with existing information.It is to use Hash to determine the most frequently used mode of redundant data
Scheduling algorithm is that data specify a unique mark, generates a unique ID or " fingerprint " of data block.By this unique mark
Contrasted with the mark in a central index service.If ID has been present, data block is processed corresponding to explanation
Cross and stored.Therefore it may only be necessary to preserve a pointer for having pointed to previously data storage.If this ID is not repeated,
So this data block is unique.The ID is added in central index, and stores this unique data block.
The shortcomings that method of traditional file level, is write efficiency and duplicate removal rate all than relatively low.
In the method for file-level, any change in file will all cause the preservation again of whole file.One file, can
It can carry out some simply to change as title transformation of page, to reflect new speaker or data, this will cause whole text
Part preserves again.By comparison, the duplicate data inspection of block rank can only preserve redaction and be carried out relative to legacy version
The data block of modification.The duplicate data ratio of usual file-level is probably 5:1 or less than and the data de-duplication of block rank
Have been found as 20:1 to 50:1.
In traditional block level method for distinguishing, because existing concordance list is bigger, so searching continuous duplicate data
When, can be relatively time-consuming, such as 1GB file, if being split as 4KB block, need to calculate and search corresponding position 256K times
Put.
The duplicate removal rate of data de-duplication method based on file-level present in above-mentioned prior art is not high, block rank weight
Existing for complex data delet method take it is long the defects of, how can reduce the time of consumption can also improve duplicate removal rate into
For urgent problem to be solved.
The content of the invention
For in the prior art the defects of, the embodiments of the invention provide a kind of based on Merkle Tree deformation algorithms
De-weight method and device.
In a first aspect, the embodiments of the invention provide a kind of De-weight method based on Merkle Tree deformation algorithms, bag
Include:
S1:Piecemeal is carried out to the first data and calculates the cryptographic Hash of each piecemeal, each piecemeal sets reference count;
S2:The cryptographic Hash of the first piecemeal is taken out, and establishes the first of the cryptographic Hash of each piecemeal of first data and breathes out
Uncommon subtree;
S3:The cryptographic Hash of first piecemeal and the first Hash subtree are entered with the first Hash tree pre-established
Row contrast:
If the cryptographic Hash of first piecemeal is identical with the first cryptographic Hash in the first Hash tree pre-established, and institute
It is identical with the content of the first cryptographic Hash identical block to state the content of the first piecemeal, and the root node of the first Hash subtree
Identical with the second cryptographic Hash in first Hash tree, then the reference count of each piecemeal adds 1;
If the cryptographic Hash of first piecemeal is identical with the first cryptographic Hash in the first Hash tree pre-established, and institute
It is identical with the content of the first cryptographic Hash identical block to state the content of the first piecemeal, and the root node of the first Hash subtree
It is different from the second cryptographic Hash in first Hash tree, then the reference count of first piecemeal is added 1, and by described first
Piecemeal is deleted, and obtains the second data;
S4:The step of to the second Data duplication S2-S3, tied when second data are last piecemeal
Beam.
Second aspect, the embodiment of the present invention provide a kind of duplicate removal device based on Merkle Tree deformation algorithms, including:
First piecemeal module, for carrying out piecemeal to the first data and calculating the cryptographic Hash of each piecemeal, each point
Block sets reference count;
First processing module, for taking out the cryptographic Hash of the first piecemeal, and establish each piecemeal of first data
First Hash subtree of cryptographic Hash;
First judge module, for by the cryptographic Hash of the first piecemeal and the first Hash subtree and pre-establish the
One Hash tree is contrasted:
If the cryptographic Hash of first piecemeal is identical with the first cryptographic Hash in the first Hash tree pre-established, and institute
It is identical with the content of the first cryptographic Hash identical block to state the content of the first piecemeal, and the root node of the first Hash subtree
Identical with the second cryptographic Hash in first Hash tree, then the reference count of each piecemeal adds 1;
If the cryptographic Hash of first piecemeal is identical with the first cryptographic Hash in the first Hash tree pre-established, and institute
It is identical with the content of the first cryptographic Hash identical block to state the content of the first piecemeal, and the root node of the first Hash subtree
It is different from the second cryptographic Hash in first Hash tree, then the reference count of first piecemeal is added 1, and by described first
Piecemeal is deleted, and obtains the second data;
Loop module, for the step above-mentioned to second Data duplication, until second data are last
Terminate during piecemeal.
The third aspect, the embodiment of the present invention provide a kind of computer equipment, including:Memory and processor, the processing
Device and the memory complete mutual communication by bus;The memory storage has can be by the journey of the computing device
Sequence instructs, and the processor calls described program instruction to be able to carry out following method:
S1:Piecemeal is carried out to the first data and calculates the cryptographic Hash of each piecemeal, each piecemeal sets reference count;
S2:The cryptographic Hash of the first piecemeal is taken out, and establishes the first of the cryptographic Hash of each piecemeal of first data and breathes out
Uncommon subtree;
S3:The cryptographic Hash of first piecemeal and the first Hash subtree are entered with the first Hash tree pre-established
Row contrast:
If the cryptographic Hash of first piecemeal is identical with the first cryptographic Hash in the first Hash tree pre-established, and institute
It is identical with the content of the first cryptographic Hash identical block to state the content of the first piecemeal, and the root node of the first Hash subtree
Identical with the second cryptographic Hash in first Hash tree, then the reference count of each piecemeal adds 1;
If the cryptographic Hash of first piecemeal is identical with the first cryptographic Hash in the first Hash tree pre-established, and institute
It is identical with the content of the first cryptographic Hash identical block to state the content of the first piecemeal, and the root node of the first Hash subtree
It is different from the second cryptographic Hash in first Hash tree, then the reference count of first piecemeal is added 1, and by described first
Piecemeal is deleted, and obtains the second data;
S4:The step of to the second Data duplication S2-S3, tied when second data are last piecemeal
Beam.
Fourth aspect, the embodiment of the present invention provide a kind of computer-readable recording medium, are stored thereon with computer program,
The method when computer program is executed by processor for storing foregoing computer program.
De-weight method and device provided in an embodiment of the present invention based on Merkle Tree deformation algorithms, it is to be written by going
The data entered carry out piecemeal, calculate cryptographic Hash, Hash tree are established based on Merkle Tree deformation algorithms, with being stored in system
In Hash tree contrasted, duplicate removal is carried out to data to be written according to the result of judgement.Utilized using the embodiment of the present invention
Merkle Tree characteristic can realize the quick detection of multiple consecutive data blocks, duplicate data provided in an embodiment of the present invention
Delet method and device on the premise of duplicate removal rate is ensured, can improve deduplicated efficiency, reduce the duplicate removal time so that duplicate removal rate
While reaching block rank, deduplicated efficiency is close to file-level.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing
There is the required accompanying drawing used in technology description to be briefly described, it should be apparent that, drawings in the following description are this hairs
Some bright embodiments, for those of ordinary skill in the art, on the premise of not paying creative work, can be with root
Other accompanying drawings are obtained according to these accompanying drawings.
Fig. 1 is that the method flow of the duplicate removal technology provided in an embodiment of the present invention based on Merkle Tree deformation algorithms shows
It is intended to;
Fig. 2 is the method stream for the duplicate removal technology based on Merkle Tree deformation algorithms that further embodiment of this invention provides
Journey schematic diagram;
Fig. 3 is the first Hash tree that the data provided in an embodiment of the present invention having been written into systems in advance are established;
Fig. 4 is that the cryptographic Hash provided in an embodiment of the present invention in piecemeal is identical, the under the content different situations in piecemeal
One Hash tree;
Fig. 5 is the first Hash subtree that the first data provided in an embodiment of the present invention are established;
Fig. 6 is the method stream for the duplicate removal technology based on Merkle Tree deformation algorithms that further embodiment of this invention provides
Journey schematic diagram;
Fig. 7 is the device knot for the duplicate removal technology based on Merkle Tree deformation algorithms that yet another embodiment of the invention provides
Structure schematic diagram;
Fig. 8 is the device knot for the duplicate removal technology based on Merkle Tree deformation algorithms that yet another embodiment of the invention provides
Structure schematic diagram.
Embodiment
To make the purpose, technical scheme and advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention
In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is
Part of the embodiment of the present invention, rather than whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art
The every other embodiment obtained under the premise of creative work is not made, belongs to the scope of protection of the invention.
Fig. 1 is that the method flow of the duplicate removal technology provided in an embodiment of the present invention based on Merkle Tree deformation algorithms shows
It is intended to, as shown in figure 1, methods described includes:
S1:Piecemeal is carried out to the first data and calculates the cryptographic Hash of each piecemeal, each piecemeal sets reference count;
Duplicate removal technical method provided in an embodiment of the present invention based on Merkle Tree deformation algorithms can be applied to distribution
In the storing process of the software of storage, and globally consistent duplicate removal service is provided out, for example, can be applied to the write-in to hard disk
Operation, read operation and data reclaimer operation.
A file is stored in system for computer hard disk, when a file is written to hard disk by user
In, when system receives this demand, system will judge whether file to be written is identical with stored file, will be
The file translations of this hard disk to be written into one section of data IO to be written, that is, the interface of data interaction, wherein, the number
Physical address (offset address), data length, action type, the content either internal memory to be read to be write are carry according to IO
Space, cryptographic Hash etc., wherein, when the file translations that system is written into are into data IO to be written, system will give
Data IO to be written allocated physical address.
For example:A file has been stored in hard disk, and each block divided in this file has certainly
Oneself physical address, for one section of data IO to be written, system just provides physical address to this segment data IO automatically.
This one piece of data IO can be divided into the block of formed objects, such as 4K, 8K or 4K/ by system with certain space size
8K integral multiple can, the embodiment of the present invention is described in detail by taking 4K as an example.
After the data IO that system is written into carries out piecemeal with 4K sizes, the Hash of each piecemeal will be calculated
Value, simultaneity factor can give one reference count of each blocking settings, the reference of each piecemeal set before hard disk is write
0 is counted as, each piecemeal is cited once, and reference count adds 1, that is to say, that reference count is used for representing that this piecemeal is drawn
Number.For example, be stored in the 100th sector of hard disk for a block number, initially set reference count as 0 according to 123, and
For this block number according to when being cited in C disks one time, the reference count of this block just adds 1, and when being quoted by D disks, this block draws
Just add 1 again with counting, become 2.
S2:The cryptographic Hash of the first piecemeal is taken out, and establishes the first of the cryptographic Hash of each piecemeal of first data and breathes out
Uncommon subtree;
From above-mentioned steps, piecemeal is carried out to the first data, and calculates the cryptographic Hash of each piecemeal, extracts the first point
The cryptographic Hash of block, the cryptographic Hash using Merkle Tree deformation algorithms to each piecemeal of first data, establish described
First Hash subtree of the cryptographic Hash of each piecemeal of one data.
S3:The cryptographic Hash of first piecemeal and the first Hash subtree are entered with the first Hash tree pre-established
Row contrast:
If the cryptographic Hash of first piecemeal is identical with the first cryptographic Hash in the first Hash tree pre-established, and institute
It is identical with the content of the first cryptographic Hash identical block to state the content of the first piecemeal, and the root node of the first Hash subtree
Identical with the second cryptographic Hash in first Hash tree, then the reference count of each piecemeal adds 1;
If the cryptographic Hash of first piecemeal is identical with the first cryptographic Hash in the first Hash tree pre-established, and institute
It is identical with the content of the first cryptographic Hash identical block to state the content of the first piecemeal, and the root node of the first Hash subtree
It is different from the second cryptographic Hash in first Hash tree, then the reference count of first piecemeal is added 1, and by described first
Piecemeal is deleted, and obtains the second data;
In embodiment above, some files have been prestored in a hard disk, first have to deform using MerkleTree
Algorithm establishes the Hash tree of the file stored, the first Hash tree as pre-established, then with the cryptographic Hash of the first piecemeal
Contrasted with the first Hash subtree established with the first Hash tree pre-established.
If the cryptographic Hash of first piecemeal is identical with the first cryptographic Hash in the first Hash tree pre-established, and institute
It is identical with the content of the first cryptographic Hash identical block to state the content of the first piecemeal, and the root node of the first Hash subtree
Identical with the second cryptographic Hash in first Hash tree, then the reference count of each piecemeal adds 1;
If the cryptographic Hash of first piecemeal is identical with the first cryptographic Hash in the first Hash tree pre-established, and institute
It is identical with the content of the first cryptographic Hash identical block to state the content of the first piecemeal, and the root node of the first Hash subtree
It is different from the second cryptographic Hash in first Hash tree, then the reference count of first piecemeal is added 1, and by described first
Piecemeal is deleted, and obtains the second data;
S4:The step of to the second Data duplication S2-S3, tied when second data are last piecemeal
Beam.
If it is identical with the first cryptographic Hash in the first Hash tree pre-established in the cryptographic Hash of first piecemeal, and
The content of first piecemeal is identical with the content of the first cryptographic Hash identical block, and the root section of the first Hash subtree
Point is different from the second cryptographic Hash in first Hash tree, then the reference count of first piecemeal is added into 1, and by described the
One piecemeal is deleted, and in the case of obtaining the second data, has obtained the second data, the step of going to perform S2 again, from the second data
It is middle to take out the cryptographic Hash of the first piecemeal, and establish the second Hash subtree of the cryptographic Hash of each piecemeal of second data;Again
The step of performing S3, by the cryptographic Hash of the first piecemeal described in the second data and the second Hash subtree and pre-establish
First Hash tree is contrasted:
If the cryptographic Hash of first piecemeal in the second data is breathed out with first in the first Hash tree pre-established
Uncommon value is identical, and the content of first piecemeal is identical with the content of the first cryptographic Hash identical block, and described second breathes out
The root node of uncommon subtree is identical with the second cryptographic Hash in first Hash tree, then the institute of each piecemeal in the second data
State reference count and add 1;
If the cryptographic Hash of the first piecemeal in second data is breathed out with first in the first Hash tree pre-established
Uncommon value is identical, and the content of first piecemeal is identical with the content of the first cryptographic Hash identical block, and described second breathes out
The root node of uncommon subtree is different from the second cryptographic Hash in first Hash tree, then by the first piecemeal in second data
Reference count add 1, and the first piecemeal in second data is deleted, obtains the 3rd data, circulated according to above-mentioned steps
Perform, block number to the last terminates according to when being last piecemeal of first data.
De-weight method provided in an embodiment of the present invention based on Merkle Tree deformation algorithms, by removing number to be written
According to piecemeal is carried out, cryptographic Hash is calculated, Hash tree is established based on Merkle Tree deformation algorithms, with having stored Kazakhstan in systems
Uncommon tree is contrasted, if the cryptographic Hash of first piecemeal and the first cryptographic Hash phase in the first Hash tree pre-established
Together, and the content of first piecemeal is identical with the content of the first cryptographic Hash identical block, and the first Hash subtree
Root node it is identical with the second cryptographic Hash in first Hash tree, then the reference count of each piecemeal adds 1;
If the cryptographic Hash of first piecemeal is identical with the first cryptographic Hash in the first Hash tree pre-established, and described first point
The content of block is identical with the content of the first cryptographic Hash identical block, and the root node of the first Hash subtree and described the
The second cryptographic Hash in one Hash tree is different, then the reference count of first piecemeal is added into 1, and first piecemeal is deleted
Remove, obtain the second data, terminate when second data are last piecemeal.Utilized using the embodiment of the present invention
Merkle Tree characteristic can realize the quick detection of multiple consecutive data blocks, duplicate data provided in an embodiment of the present invention
Delet method and device on the premise of duplicate removal rate is ensured, can improve deduplicated efficiency, reduce the duplicate removal time so that duplicate removal rate
While reaching block rank, deduplicated efficiency is close to file-level.
Alternatively, it is described by the cryptographic Hash of first piecemeal and the first Hash subtree and pre-establish first
Hash tree is contrasted, in addition to:
If the cryptographic Hash of first piecemeal is identical with the first cryptographic Hash in first Hash tree pre-established,
And the content of first piecemeal is different with the content of the first cryptographic Hash identical block, then the described of first piecemeal is drawn
It is added in count is incremented, and by the content of first piecemeal behind the content of the first cryptographic Hash identical block.
On the basis of above-described embodiment, it is described by the cryptographic Hash of first piecemeal and the first Hash subtree with
When the first Hash tree pre-established is contrasted, also a kind of situation is, when first piecemeal cryptographic Hash with it is described
The first cryptographic Hash in the first Hash tree pre-established is identical, and the content of first piecemeal and first cryptographic Hash
During the content difference of identical block, the reference count of the first piecemeal is added 1, and the content of first piecemeal is added in
Behind the content of the first cryptographic Hash identical block.In such a case it is not necessary to the root node to the first Hash subtree
Compared with the cryptographic Hash in the first Hash tree pre-established.
Also a kind of possible situation, if the cryptographic Hash of the first piecemeal does not find identical Kazakhstan in the first Hash tree
Uncommon value, then apply for a memory headroom again from system, and the reference count of the first piecemeal adds 1, and by the content of the first piecemeal
It is written in the memory headroom newly applied.
De-weight method provided in an embodiment of the present invention based on Merkle Tree deformation algorithms, by removing number to be written
According to piecemeal is carried out, cryptographic Hash is calculated, Hash tree is established based on Merkle Tree deformation algorithms, with having stored Kazakhstan in systems
Uncommon tree is contrasted, and is performed different operating according to different situations, is gone for a variety of situations, using profit of the embodiment of the present invention
The quick detection of multiple consecutive data blocks, repeat number provided in an embodiment of the present invention can be realized with Merkle Tree characteristic
According to delet method and device on the premise of duplicate removal rate is ensured, can improve deduplicated efficiency, reduce the duplicate removal time so that duplicate removal
While rate reaches block rank, deduplicated efficiency is close to file-level.
For embodiments of the invention, it is discussed in detail below using the specific embodiment of an ablation process.
Fig. 2 is the method stream for the duplicate removal technology based on Merkle Tree deformation algorithms that further embodiment of this invention provides
Journey schematic diagram;
The first step:The data write to needs are divided into several blocks;
This one piece of data IO can be divided into the block of formed objects, such as 4K, 8K or 4K/ by system with certain space size
8K integral multiple can, the embodiment of the present invention is described in detail by taking 4K as an example.
Second step:Cryptographic Hash and the reference count of each piecemeal are obtained, specifically, calculates the Hash of each piecemeal
Value, while be each reference count of blocking settings one;
After the data IO that system is written into carries out piecemeal with 4K sizes, the Hash of each piecemeal will be calculated
Value, simultaneity factor can give one reference count of each blocking settings, the reference of each piecemeal set before hard disk is write
0 is counted as, each piecemeal is cited once, and reference count adds 1, that is to say, that reference count is used for representing that this piecemeal is drawn
Number.For example, be stored in the 100th sector of hard disk for a block number, initially set reference count as 0 according to 123, and
For this block number according to when being cited in C disks one time, the reference count of this block just adds 1, and when being quoted by D disks, this block draws
Just add 1 again with counting, become 2.
3rd step:The cryptographic Hash of first piecemeal is taken, first piecemeal of searching in the first Hash tree pre-established
Cryptographic Hash, if finding identical cryptographic Hash, contrast identical cryptographic Hash pair in content and the first Hash tree in the first piecemeal
Content in the leaf node answered, if both contents are identical, the reference count of the first piecemeal adds 1, and performs the behaviour of the 4th step
Make;If both contents are different, the content in the first piecemeal is increased to identical cryptographic Hash in the first Hash tree and corresponded to by system
Leaf node content behind memory space in, and two piecemeals together constitute a list object, common to point to
Same cryptographic Hash.
If not finding identical cryptographic Hash in the first Hash tree, apply for an internal memory sky again from system
Between, the reference count of the first piecemeal adds 1, and the content of the first piecemeal is written in the memory headroom newly applied.
4th step:The cryptographic Hash of first piecemeal and remaining all piecemeals is established into the first Hash subtree, pre-established
Global Hash tree in search the first Hash subtree root node cryptographic Hash, it is remaining every if finding identical cryptographic Hash
The reference count of individual piecemeal all adds 1;
If not finding identical cryptographic Hash, system can delete first piecemeal, take the cryptographic Hash of second piecemeal
Repeat the operation of above-mentioned 3rd step.
Fig. 3 is the first Hash tree that the data provided in an embodiment of the present invention having been written into systems in advance are established;
Fig. 4 is that the cryptographic Hash provided in an embodiment of the present invention in piecemeal is identical, the under the content different situations in piecemeal
One Hash tree;
Fig. 5 is the first Hash subtree that the first data provided in an embodiment of the present invention are established;
With reference to Fig. 3-Fig. 5, carry out the specific scheme that the embodiment of the present invention is discussed in detail for a specific example.
First in systems, piecemeal is carried out to a file, such as L1, L2, L3, L4, calculates the cryptographic Hash of each block, adopt
With Merkle Tree deformation algorithms, each piece of cryptographic Hash is created as to MerkleTree Hash tree, it is real in the present invention
Apply the first Hash tree for being referred to as pre-establishing in example, such as the first Hash tree as pre-established shown in Fig. 3.
As shown in figure 5, when user needs to write one section of new data, this one piece of data is divided with 4K size
Block, such as L30, L40, and the cryptographic Hash hash (L30) and hash (L40) of each block are calculated, concurrently set each piecemeal
Reference count be 0.
The cryptographic Hash hash (L30) of first piecemeal is taken compared with each cryptographic Hash in the first Hash tree, if
It is identical with the cryptographic Hash hash (L30) of first piecemeal that hash (L3) value is found in the first Hash tree, then is just compared in L3
Content and L30 in content it is whether identical, if the content in L3 is identical with the content in L30, be written into data IO
The first piecemeal L30 reference count add 1, then utilize the first piecemeal L30 cryptographic Hash hash (L30) and the second piecemeal L40
Cryptographic Hash hash (L40) establishes the first Hash subtree, compare the first Hash subtree root node cryptographic Hash Hash60 and in advance
Cryptographic Hash in the global Hash tree of foundation, if there is the root section with the first Hash subtree in the first Hash tree pre-established
The cryptographic Hash Hash60 identical cryptographic Hash Hash6 of point, the then reference count for being written into remaining piecemeal L40 in data add 1;
If the not no cryptographic Hash Hash60 with the root node of the first Hash subtree in the global Hash tree pre-established
Identical cryptographic Hash, then the first piecemeal L30 for being written into data are deleted, and take the second piecemeal L40 cryptographic Hash and establish the
Two Hash subtrees repeat above-mentioned step, until remaining piecemeal is taken end compared with the Hash tree pre-established.
If the content in L3 is different with the content in L30, data IO to be written the first piecemeal L30 reference count
Add 1, and by behind the L3 contents of L30 content the first Hash tree of increase, two piecemeals form a list objects, point to
Same cryptographic Hash, as shown in Figure 4.
If not finding cryptographic Hash hash (L30) identical cryptographic Hash with first piecemeal in the first Hash tree,
Then system will apply for a new memory space, and the first piecemeal L30 reference count is added into 1, and by the first piecemeal L30
Appearance is written in new memory space.
De-weight method provided in an embodiment of the present invention based on Merkle Tree deformation algorithms, by removing number to be written
According to piecemeal is carried out, cryptographic Hash is calculated, Hash tree is established based on Merkle Tree deformation algorithms, with having stored Kazakhstan in systems
Uncommon tree is contrasted, and is performed different operating according to different situations, is gone for a variety of situations, using profit of the embodiment of the present invention
The quick detection of multiple consecutive data blocks, repeat number provided in an embodiment of the present invention can be realized with Merkle Tree characteristic
According to delet method and device on the premise of duplicate removal rate is ensured, can improve deduplicated efficiency, reduce the duplicate removal time so that duplicate removal
While rate reaches block rank, deduplicated efficiency is close to file-level.
Fig. 6 is the method stream for the duplicate removal technology based on Merkle Tree deformation algorithms that further embodiment of this invention provides
Journey schematic diagram, as shown in fig. 6, methods described also includes data read process, the data read process is specially:
S21:Obtain an at least logical address for data to be read;
S22:By the object indexing pre-established, according at least logical address got, it is determined that it is described extremely
Piecemeal to be read corresponding to a few logical address;
S23:Read the data content in the piecemeal to be read.
On the basis of above-described embodiment, the method for the duplicate removal technology based on Merkle Tree deformation algorithms can also fit
For in data read process, concrete implementation step to be as follows:
It is to have data to be written in hard disk before this when system realizes read operation, for the every of write-in hard disk
One block, just the logical address of each block and the corresponding of this block are closed when first block is written to hard disk
System constitutes index, and is saved in system, for same piece, may be quoted by multiple logical addresses, that is to say, that
Logical address and the relation of block can be many-to-one in index.
User needs to read a file, clicks on some file, and system has just got the instruction for reading file, Ran Houfen
Separate out multiple logical addresses for including in reading instruction, for each logical address, system can according to the index prestored and
Logical address, the block corresponding with the logical address is found, and each block includes respective physical address and content, is
System have found block, that is, the physical address of block in a hard disk, so as to read out content from physical address, then incite somebody to action
These contents return to system.For example, for multiple logical addresses, it is necessary to for first logical address, system is with regard to basis
The corresponding relation of logical address and block inquires corresponding with first logical address in first logical address and index
One block, and include described content in each block, when system queries are to first block, it is possible to first block
In the content that includes be read out.Then above-mentioned operation is carried out to each remaining logical address again, by what is read
Content is integrated into returns to system together.
Illustrate:A file to be preset first, and this document is divided into 3 blocks, the content of first block is a,
The logical address of hard disk is 0010 where first block, and the content of second block is b, and second block is in the logical address of hard disk
0011, the content of the 3rd block is c, and the 3rd block is 0012 in the logical address of disk, each block and each piece of logic
The corresponding relation of address constitutes index, and stores in systems.
User will read some file of disk, click on this file, and system will get this finger for reading file
Order, parses at least one logical address, such as 0010,0011,0012, system can be looked into according to each logical address
Ask, for first logical address 0010, system first obtains first logical address 0011, will be by being stored in system
Searched in index, find first block corresponding with 0011, first is then found in hard disk soon, that is, the
Physical address where one block, the content a included in first block is then read out from physical address, to second logic
Address and the 3rd logical address are also adopted and are read out with the aforedescribed process, and by content a, b and c for reading out in the lump
Return to system.
De-weight method provided in an embodiment of the present invention based on Merkle Tree deformation algorithms, in data write-in and data
In reading process, the quick of multiple consecutive data blocks can be realized using Merkle Tree characteristic using the embodiment of the present invention
Detection, the delet method and device of duplicate data provided in an embodiment of the present invention can improve on the premise of duplicate removal rate is ensured
Deduplicated efficiency, reduce the duplicate removal time so that while duplicate removal rate reaches block rank, deduplicated efficiency is close to file-level.
Alternatively, methods described also includes data record process, is specially:When each piece of reference count is reduced to 0,
The space of described piece of occupancy is released.
On the basis of above-mentioned each embodiment, the embodiment of the present invention also includes data record mechanism, mainly in duplicate removal
During, with increasing for duplicate removal number, the block that the memory space in hard disk is had data content takes, then has new data
When content wants write-in, also no space can write hard disk, so user in hard disk is empty not in the internal memory of needs
Between discharged.
When user deletes some book, equivalent to one disk of book, for example, user wants to empty the sky of C disks
Between, system just needs that reference of all blocks in C disks in other disks will be released, when the reference count of each block subtracts
It is small to 0 when, all blocks are just recovered in the system, and at this moment, C disks also just empty, if new data can be write again
Enter.
When user deletes some file, such as a file is present in C disks, D disks and E disks, then in this file
The reference count of each block is exactly 3, when user will be deleted, the deletion that will first be stored in E disks, then in this file
The reference count of each block subtract 1, become 2, then the deletion that will be stored in D disks, then each block in this file
Reference count subtracts 1, becomes 1, then the deletion that will be stored in C disks, then and the reference count of each block in this file subtracts 1,
Become 0, and the reference count of each block of this when of this that to be deleted file becomes 0, all blocks are all returned by system
Receive, and this file is also discharged in the memory headroom of the occupancy of hard disk by system, user has new content to write again.
De-weight method provided in an embodiment of the present invention based on Merkle Tree deformation algorithms, read in data write-in, data
During taking with data record, multiple consecutive numbers can be realized using Merkle Tree characteristic using the embodiment of the present invention
According to the quick detection of block, the delet method of duplicate data provided in an embodiment of the present invention, can be with the premise of duplicate removal rate is ensured
Deduplicated efficiency is improved, reduces the duplicate removal time so that while duplicate removal rate reaches block rank, deduplicated efficiency is close to file-level.
Fig. 7 is the device knot for the duplicate removal technology based on Merkle Tree deformation algorithms that yet another embodiment of the invention provides
Structure schematic diagram, as shown in fig. 7, described device includes the first piecemeal module 10, first processing module 20, the and of the first judge module 30
Loop module 40, wherein:
First piecemeal module 10 is used to carry out piecemeal to the first data and calculates the cryptographic Hash of each piecemeal, each point
Block sets reference count;
First processing module 20 is used for the cryptographic Hash for taking out the first piecemeal, and establish each piecemeal of first data
First Hash subtree of cryptographic Hash;
First judge module 30 is used for the cryptographic Hash of the first piecemeal and the first Hash subtree and pre-established
First Hash tree is contrasted:
If the cryptographic Hash of first piecemeal is identical with the first cryptographic Hash in the first Hash tree pre-established, and institute
It is identical with the content of the first cryptographic Hash identical block to state the content of the first piecemeal, and the root node of the first Hash subtree
Identical with the second cryptographic Hash in first Hash tree, then the reference count of each piecemeal adds 1;
If the cryptographic Hash of first piecemeal is identical with the first cryptographic Hash in the first Hash tree pre-established, and institute
It is identical with the content of the first cryptographic Hash identical block to state the content of the first piecemeal, and the root node of the first Hash subtree
It is different from the second cryptographic Hash in first Hash tree, then the reference count of first piecemeal is added 1, and by described first
Piecemeal is deleted, and obtains the second data;
Loop module 40 is used for the step above-mentioned to second Data duplication, until second data are last
Terminate during piecemeal.
Specifically, the first piecemeal module 10 is that the first data carry out piecemeal to data to be written, and calculates each point
The cryptographic Hash of block, each piecemeal set reference count;
The cryptographic Hash of the first piecemeal is taken out by first processing module 20, and establish each piecemeal of first data
First Hash subtree of cryptographic Hash;
First judge module 30 by the cryptographic Hash of the first piecemeal and the first Hash subtree and pre-establish first
Hash tree is contrasted:
If the cryptographic Hash of first piecemeal is identical with the first cryptographic Hash in the first Hash tree pre-established, and institute
It is identical with the content of the first cryptographic Hash identical block to state the content of the first piecemeal, and the root node of the first Hash subtree
Identical with the second cryptographic Hash in first Hash tree, then the reference count of each piecemeal adds 1;
If the cryptographic Hash of first piecemeal is identical with the first cryptographic Hash in the first Hash tree pre-established, and institute
It is identical with the content of the first cryptographic Hash identical block to state the content of the first piecemeal, and the root node of the first Hash subtree
It is different from the second cryptographic Hash in first Hash tree, then the reference count of first piecemeal is added 1, and by described first
Piecemeal is deleted, and obtains the second data;
The step above-mentioned to second Data duplication of loop module 40, until second data are last piecemeal
When terminate.
Device provided in an embodiment of the present invention, suitable for method described above, its function specifically can refer to the above method
Embodiment, here is omitted.
Duplicate removal device provided in an embodiment of the present invention based on Merkle Tree deformation algorithms, by removing number to be written
According to piecemeal is carried out, cryptographic Hash is calculated, Hash tree is established based on Merkle Tree deformation algorithms, with having stored Kazakhstan in systems
Uncommon tree is contrasted, if the cryptographic Hash of first piecemeal and the first cryptographic Hash phase in the first Hash tree pre-established
Together, and the content of first piecemeal is identical with the content of the first cryptographic Hash identical block, and the first Hash subtree
Root node it is identical with the second cryptographic Hash in first Hash tree, then the reference count of each piecemeal adds 1;
If the cryptographic Hash of first piecemeal is identical with the first cryptographic Hash in the first Hash tree pre-established, and described first point
The content of block is identical with the content of the first cryptographic Hash identical block, and the root node of the first Hash subtree and described the
The second cryptographic Hash in one Hash tree is different, then the reference count of first piecemeal is added into 1, and first piecemeal is deleted
Remove, obtain the second data, terminate when second data are last piecemeal.Utilized using the embodiment of the present invention
Merkle Tree characteristic can realize the quick detection of multiple consecutive data blocks, duplicate data provided in an embodiment of the present invention
Deletion device on the premise of duplicate removal rate is ensured, can improve deduplicated efficiency, reduce the duplicate removal time so that duplicate removal rate reaches block
While rank, deduplicated efficiency is close to file-level.
Alternatively, it is described by the cryptographic Hash of first piecemeal and the first Hash subtree and pre-establish first
Hash tree is contrasted, in addition to:
If the cryptographic Hash of first piecemeal is identical with the first cryptographic Hash in first Hash tree pre-established,
And the content of first piecemeal is different with the content of the first cryptographic Hash identical block, then the described of first piecemeal is drawn
It is added in count is incremented, and by the content of first piecemeal behind the content of the first cryptographic Hash identical block.
On the basis of above-described embodiment, it is described by the cryptographic Hash of first piecemeal and the first Hash subtree with
When the first Hash tree pre-established is contrasted, also a kind of situation is, when first piecemeal cryptographic Hash with it is described
The first cryptographic Hash in the first Hash tree pre-established is identical, and the content of first piecemeal and first cryptographic Hash
During the content difference of identical block, the reference count of the first piecemeal is added 1, and the content of first piecemeal is added in
Behind the content of the first cryptographic Hash identical block.In such a case it is not necessary to comparing the root of the first Hash subtree
Node is compared with the cryptographic Hash in the first Hash tree pre-established.
Also a kind of possible situation, if the cryptographic Hash of the first piecemeal does not find identical Kazakhstan in the first Hash tree
Uncommon value, then apply for a memory headroom again from system, and the reference count of the first piecemeal adds 1, and by the content of the first piecemeal
It is written in the memory headroom newly applied.
Duplicate removal device provided in an embodiment of the present invention based on Merkle Tree deformation algorithms, by removing number to be written
According to piecemeal is carried out, cryptographic Hash is calculated, Hash tree is established based on Merkle Tree deformation algorithms, with having stored Kazakhstan in systems
Uncommon tree is contrasted, and is performed different operating according to different situations, is gone for a variety of situations, using profit of the embodiment of the present invention
The quick detection of multiple consecutive data blocks, repeat number provided in an embodiment of the present invention can be realized with Merkle Tree characteristic
According to deletion device on the premise of duplicate removal rate is ensured, can improve deduplicated efficiency, reduce the duplicate removal time so that duplicate removal rate reaches
While block rank, deduplicated efficiency is close to file-level.
Alternatively, described device also includes data-reading unit, and the data-reading unit is specially:
Acquisition module, for obtaining an at least logical address for data to be read;
Searching modul, at least logical address got for the object indexing by pre-establishing, foundation, really
Piecemeal to be read corresponding to a fixed at least logical address;
Read module, for reading the data content in the piecemeal to be read.
On the basis of above-described embodiment, the device of the duplicate removal technology based on Merkle Tree deformation algorithms can also fit
For in data read process, concrete implementation step to be as follows:
It is to have data to be written in hard disk before this when system realizes read operation, for the every of write-in hard disk
One block, just the logical address of each block and the corresponding of this block are closed when first block is written to hard disk
System constitutes index, and is saved in system, for same piece, may be quoted by multiple logical addresses, that is to say, that
Logical address and the relation of block can be many-to-one in index.
User needs to read a file, clicks on some file, and acquisition module has just got the instruction for reading file, so
Post analysis go out the multiple logical addresses included in reading instruction, and for each logical address, searching modul meeting basis prestores
Index and logical address, find the block corresponding with the logical address, and each block includes respective physical address
And content, searching modul have found block, that is, the physical address of block in a hard disk, so as to which read module is read from physical address
Content is taken out, these contents are then being returned into system.For example, for multiple logical addresses, it is necessary to from first logically
For location, searching modul just inquires and the according to the corresponding relation of logical address and block in first logical address and index
First corresponding block of one logical address, and include described content in each block, inquired in searching modul
During first block, read module can is read out to the content included in first block.Then again to it is remaining each
Logical address is carried out above-mentioned operation, and the content read is integrated into and returns to system together.
Duplicate removal device provided in an embodiment of the present invention based on Merkle Tree deformation algorithms, in data write-in and data
In reading process, the quick of multiple consecutive data blocks can be realized using Merkle Tree characteristic using the embodiment of the present invention
Detection, the delet method and device of duplicate data provided in an embodiment of the present invention can improve on the premise of duplicate removal rate is ensured
Deduplicated efficiency, reduce the duplicate removal time so that while duplicate removal rate reaches block rank, deduplicated efficiency is close to file-level.
Alternatively, described device also includes data record unit, is specially:When each piece of reference count is reduced to 0,
The space of described piece of occupancy is released.
On the basis of above-mentioned each embodiment, the embodiment of the present invention also includes data record mechanism, mainly in duplicate removal
During, with increasing for duplicate removal number, the block that the memory space in hard disk is had data content takes, then has new data
When content wants write-in, also no space can write hard disk, so user in hard disk is empty not in the internal memory of needs
Between discharged.
Duplicate removal device provided in an embodiment of the present invention based on Merkle Tree deformation algorithms, read in data write-in data
During taking with data record, multiple consecutive numbers can be realized using Merkle Tree characteristic using the embodiment of the present invention
According to the quick detection of block, the deletion device of duplicate data provided in an embodiment of the present invention, can be with the premise of duplicate removal rate is ensured
Deduplicated efficiency is improved, reduces the duplicate removal time so that while duplicate removal rate reaches block rank, deduplicated efficiency is close to file-level.
Fig. 8 is the structured flowchart of computer equipment provided in an embodiment of the present invention.Reference picture 8, the computer equipment, bag
Include:Processor (processor) 801, memory (memory) 802 and bus 803;
Wherein, the processor 801 and the memory 802 complete mutual communication by the bus 803;
The processor 801 is used to call the programmed instruction in the memory 802, to perform above-mentioned each method embodiment
The method provided, such as including:S1:Piecemeal is carried out to the first data and calculates the cryptographic Hash of each piecemeal, each point
Block sets reference count;
S2:The cryptographic Hash of the first piecemeal is taken out, and establishes the first of the cryptographic Hash of each piecemeal of first data and breathes out
Uncommon subtree;
S3:The cryptographic Hash of first piecemeal and the first Hash subtree are entered with the first Hash tree pre-established
Row contrast:
If the cryptographic Hash of first piecemeal is identical with the first cryptographic Hash in the first Hash tree pre-established, and institute
It is identical with the content of the first cryptographic Hash identical block to state the content of the first piecemeal, and the root node of the first Hash subtree
Identical with the second cryptographic Hash in first Hash tree, then the reference count of each piecemeal adds 1;
If the cryptographic Hash of first piecemeal is identical with the first cryptographic Hash in the first Hash tree pre-established, and institute
It is identical with the content of the first cryptographic Hash identical block to state the content of the first piecemeal, and the root node of the first Hash subtree
It is different from the second cryptographic Hash in first Hash tree, then the reference count of first piecemeal is added 1, and by described first
Piecemeal is deleted, and obtains the second data;
S4:The step of to the second Data duplication S2-S3, tied when second data are last piecemeal
Beam.
The present embodiment discloses a kind of computer program product, and the computer program product includes being stored in non-transient calculating
Computer program on machine readable storage medium storing program for executing, the computer program include programmed instruction, when described program instruction is calculated
When machine performs, computer is able to carry out the method that above-mentioned each method embodiment is provided, such as including:S1:First data are entered
Row piecemeal and the cryptographic Hash for calculating each piecemeal, each piecemeal set reference count;
S2:The cryptographic Hash of the first piecemeal is taken out, and establishes the first of the cryptographic Hash of each piecemeal of first data and breathes out
Uncommon subtree;
S3:The cryptographic Hash of first piecemeal and the first Hash subtree are entered with the first Hash tree pre-established
Row contrast:
If the cryptographic Hash of first piecemeal is identical with the first cryptographic Hash in the first Hash tree pre-established, and institute
It is identical with the content of the first cryptographic Hash identical block to state the content of the first piecemeal, and the root node of the first Hash subtree
Identical with the second cryptographic Hash in first Hash tree, then the reference count of each piecemeal adds 1;
If the cryptographic Hash of first piecemeal is identical with the first cryptographic Hash in the first Hash tree pre-established, and institute
It is identical with the content of the first cryptographic Hash identical block to state the content of the first piecemeal, and the root node of the first Hash subtree
It is different from the second cryptographic Hash in first Hash tree, then the reference count of first piecemeal is added 1, and by described first
Piecemeal is deleted, and obtains the second data;
S4:The step of to the second Data duplication S2-S3, tied when second data are last piecemeal
Beam.
The present embodiment provides a kind of non-transient computer readable storage medium storing program for executing, the non-transient computer readable storage medium storing program for executing
Computer instruction is stored, the computer instruction makes the computer perform the method that above-mentioned each method embodiment is provided, example
Such as include:S1:Piecemeal is carried out to the first data and calculates the cryptographic Hash of each piecemeal, each piecemeal sets reference count;
S2:The cryptographic Hash of the first piecemeal is taken out, and establishes the first of the cryptographic Hash of each piecemeal of first data and breathes out
Uncommon subtree;
S3:The cryptographic Hash of first piecemeal and the first Hash subtree are entered with the first Hash tree pre-established
Row contrast:
If the cryptographic Hash of first piecemeal is identical with the first cryptographic Hash in the first Hash tree pre-established, and institute
It is identical with the content of the first cryptographic Hash identical block to state the content of the first piecemeal, and the root node of the first Hash subtree
Identical with the second cryptographic Hash in first Hash tree, then the reference count of each piecemeal adds 1;
If the cryptographic Hash of first piecemeal is identical with the first cryptographic Hash in the first Hash tree pre-established, and institute
It is identical with the content of the first cryptographic Hash identical block to state the content of the first piecemeal, and the root node of the first Hash subtree
It is different from the second cryptographic Hash in first Hash tree, then the reference count of first piecemeal is added 1, and by described first
Piecemeal is deleted, and obtains the second data;
S4:The step of to the second Data duplication S2-S3, tied when second data are last piecemeal
Beam.
One of ordinary skill in the art will appreciate that:Realizing all or part of step of above method embodiment can pass through
Programmed instruction related hardware is completed, and foregoing program can be stored in a computer read/write memory medium, the program
Upon execution, the step of execution includes above method embodiment;And foregoing storage medium includes:ROM, RAM, magnetic disc or light
Disk etc. is various can be with the medium of store program codes.
The embodiments such as the test equipment of display device described above are only schematical, wherein described as separation
The unit of part description can be or may not be it is physically separate, can be as the part that unit is shown or
It can not be physical location, you can with positioned at a place, or can also be distributed on multiple NEs.Can be according to reality
Border needs to select some or all of module therein to realize the purpose of this embodiment scheme.Those of ordinary skill in the art
In the case where not paying performing creative labour, you can to understand and implement.
Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can
Realized by the mode of software plus required general hardware platform, naturally it is also possible to pass through hardware.Based on such understanding, on
The part that technical scheme substantially in other words contributes to prior art is stated to embody in the form of software product, should
Computer software product can store in a computer-readable storage medium, such as ROM/RAM, magnetic disc, CD, including some fingers
Make to cause a computer equipment (can be personal computer, server, or network equipment etc.) to perform each implementation
Method described in some parts of example or embodiment.
Device and system embodiment described above is only schematical, wherein described be used as separating component explanation
Unit can be or may not be physically separate, can be as the part that unit is shown or may not be
Physical location, you can with positioned at a place, or can also be distributed on multiple NEs.Can be according to the actual needs
Some or all of module therein is selected to realize the purpose of this embodiment scheme.Those of ordinary skill in the art are not paying
In the case of performing creative labour, you can to understand and implement.
Claims (10)
- A kind of 1. De-weight method based on Merkle Tree deformation algorithms, it is characterised in that including:S1:Piecemeal is carried out to the first data and calculates the cryptographic Hash of each piecemeal, each piecemeal sets reference count;S2:The cryptographic Hash of the first piecemeal is taken out, and establishes the first Hash of the cryptographic Hash of each piecemeal of first data Tree;S3:The cryptographic Hash of first piecemeal and the first Hash subtree and the first Hash tree pre-established are carried out pair Than:If the cryptographic Hash of first piecemeal is identical with the first cryptographic Hash in the first Hash tree pre-established, and described The content of one piecemeal is identical with the content of the first cryptographic Hash identical block, and the root node of the first Hash subtree and institute The second cryptographic Hash stated in the first Hash tree is identical, then the reference count of each piecemeal adds 1;If the cryptographic Hash of first piecemeal is identical with the first cryptographic Hash in the first Hash tree pre-established, and described The content of one piecemeal is identical with the content of the first cryptographic Hash identical block, and the root node of the first Hash subtree and institute It is different to state the second cryptographic Hash in the first Hash tree, then the reference count of first piecemeal is added 1, and by first piecemeal Delete, obtain the second data;S4:The step of to the second Data duplication S2-S3, terminate when second data are last piecemeal.
- 2. according to the method for claim 1, it is characterised in that described by the cryptographic Hash of first piecemeal and described first Hash subtree is contrasted with the first Hash tree pre-established, in addition to:If the cryptographic Hash of first piecemeal is identical with the first cryptographic Hash in first Hash tree pre-established, and institute It is different with the content of the first cryptographic Hash identical block to state the content of the first piecemeal, then the reference meter of first piecemeal Number plus 1, and the content of first piecemeal is added in behind the content of the first cryptographic Hash identical block.
- 3. according to the method for claim 1, it is characterised in that methods described also includes data read process, the data Reading process is specially:Obtain an at least logical address for data to be read;By the object indexing pre-established, according at least logical address got, it is determined that an at least logic Piecemeal to be read corresponding to address;Read the data content in the piecemeal to be read.
- 4. according to the method for claim 1, it is characterised in that methods described also includes data record process, is specially:When When the reference count of each block is reduced to 0, the space of described piece of occupancy is released.
- A kind of 5. duplicate removal device based on Merkle Tree deformation algorithms, it is characterised in that including:First piecemeal module, for carrying out piecemeal to the first data and calculating the cryptographic Hash of each piecemeal, each piecemeal is set Put reference count;First processing module, for taking out the cryptographic Hash of the first piecemeal, and establish the Hash of each piecemeal of first data First Hash subtree of value;First judge module, for by the cryptographic Hash of the first piecemeal and the first Hash subtree and what is pre-established first breathe out Uncommon tree is contrasted:If the cryptographic Hash of first piecemeal is identical with the first cryptographic Hash in the first Hash tree pre-established, and described The content of one piecemeal is identical with the content of the first cryptographic Hash identical block, and the root node of the first Hash subtree and institute The second cryptographic Hash stated in the first Hash tree is identical, then the reference count of each piecemeal adds 1;If the cryptographic Hash of first piecemeal is identical with the first cryptographic Hash in the first Hash tree pre-established, and described The content of one piecemeal is identical with the content of the first cryptographic Hash identical block, and the root node of the first Hash subtree and institute It is different to state the second cryptographic Hash in the first Hash tree, then the reference count of first piecemeal is added 1, and by first piecemeal Delete, obtain the second data;Loop module, for the step above-mentioned to second Data duplication, until second data are last piecemeal When terminate.
- 6. device according to claim 5, it is characterised in that described by the cryptographic Hash of first piecemeal and described first Hash subtree is contrasted with the first Hash tree pre-established, in addition to:If the cryptographic Hash of first piecemeal is identical with the first cryptographic Hash in first Hash tree pre-established, and institute It is different with the content of the first cryptographic Hash identical block to state the content of the first piecemeal, then the reference meter of first piecemeal Number plus 1, and the content of first piecemeal is added in behind the content of the first cryptographic Hash identical block.
- 7. device according to claim 5, it is characterised in that described device also includes data-reading unit, the data Reading unit is specially:Acquisition module, for obtaining an at least logical address for data to be read;Searching modul, for the object indexing by pre-establishing, according at least logical address got, determine institute State piecemeal to be read corresponding to an at least logical address;Read module, for reading the data content in the piecemeal to be read.
- 8. device according to claim 5, it is characterised in that described device also includes data record unit, is specially:When When the reference count of each block is reduced to 0, the space of described piece of occupancy is released.
- 9. a kind of computer equipment, it is characterised in that including memory and processor, the processor and the memory pass through Bus completes mutual communication;The memory storage has can be by the programmed instruction of the computing device, the processor Described program instruction is called to be able to carry out the method as described in Claims 1-4 is any.
- 10. a kind of computer-readable recording medium, is stored thereon with computer program, it is characterised in that the computer program quilt The method as described in Claims 1-4 is any is realized during computing device.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710507717.9A CN107368545B (en) | 2017-06-28 | 2017-06-28 | A kind of De-weight method and device based on Merkle Tree deformation algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710507717.9A CN107368545B (en) | 2017-06-28 | 2017-06-28 | A kind of De-weight method and device based on Merkle Tree deformation algorithm |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107368545A true CN107368545A (en) | 2017-11-21 |
CN107368545B CN107368545B (en) | 2019-08-27 |
Family
ID=60305694
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710507717.9A Active CN107368545B (en) | 2017-06-28 | 2017-06-28 | A kind of De-weight method and device based on Merkle Tree deformation algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107368545B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108897760A (en) * | 2018-05-22 | 2018-11-27 | 贵阳信息技术研究院(中科院软件所贵阳分部) | Electronic evidence chain integrity verification method based on Merkel tree |
CN110109920A (en) * | 2019-03-19 | 2019-08-09 | 咪咕文化科技有限公司 | Data comparison method and server |
CN110968575A (en) * | 2018-09-30 | 2020-04-07 | 南京工程学院 | Duplication eliminating method for big data processing system |
US11481371B2 (en) | 2020-07-27 | 2022-10-25 | Hewlett Packard Enterprise Development Lp | Storage system capacity usage estimation |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8190835B1 (en) * | 2007-12-31 | 2012-05-29 | Emc Corporation | Global de-duplication in shared architectures |
US8615500B1 (en) * | 2012-03-29 | 2013-12-24 | Emc Corporation | Partial block allocation for file system block compression using virtual block metadata |
CN105354246A (en) * | 2015-10-13 | 2016-02-24 | 华南理工大学 | Distributed memory calculation based data deduplication method |
CN106612320A (en) * | 2016-06-14 | 2017-05-03 | 四川用联信息技术有限公司 | Encrypted data dereplication method for cloud storage |
-
2017
- 2017-06-28 CN CN201710507717.9A patent/CN107368545B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8190835B1 (en) * | 2007-12-31 | 2012-05-29 | Emc Corporation | Global de-duplication in shared architectures |
US8615500B1 (en) * | 2012-03-29 | 2013-12-24 | Emc Corporation | Partial block allocation for file system block compression using virtual block metadata |
CN105354246A (en) * | 2015-10-13 | 2016-02-24 | 华南理工大学 | Distributed memory calculation based data deduplication method |
CN106612320A (en) * | 2016-06-14 | 2017-05-03 | 四川用联信息技术有限公司 | Encrypted data dereplication method for cloud storage |
Non-Patent Citations (2)
Title |
---|
CHANG LIU等: "MuR-DPA: Top-Down Levelled Multi-Replica Merkle Hash Tree Based Secure Public Auditing for Dynamic Big Data Storage on Cloud", 《IEEE TRANSACTIONS ON COMPUTERS》 * |
韩莹等: "一种在去重备份系统中数据完整性校验算法", 《计算机应用研究》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108897760A (en) * | 2018-05-22 | 2018-11-27 | 贵阳信息技术研究院(中科院软件所贵阳分部) | Electronic evidence chain integrity verification method based on Merkel tree |
CN110968575A (en) * | 2018-09-30 | 2020-04-07 | 南京工程学院 | Duplication eliminating method for big data processing system |
CN110968575B (en) * | 2018-09-30 | 2023-06-06 | 南京工程学院 | Deduplication method of big data processing system |
CN110109920A (en) * | 2019-03-19 | 2019-08-09 | 咪咕文化科技有限公司 | Data comparison method and server |
CN110109920B (en) * | 2019-03-19 | 2022-03-22 | 咪咕文化科技有限公司 | Data comparison method and server |
US11481371B2 (en) | 2020-07-27 | 2022-10-25 | Hewlett Packard Enterprise Development Lp | Storage system capacity usage estimation |
Also Published As
Publication number | Publication date |
---|---|
CN107368545B (en) | 2019-08-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104866497B (en) | The metadata updates method, apparatus of distributed file system column storage, host | |
CN104978151B (en) | Data reconstruction method in the data de-duplication storage system perceived based on application | |
US9223660B2 (en) | Storage device to backup content based on a deduplication system | |
CN104657459B (en) | A kind of mass data storage means based on file granularity | |
CN106201771B (en) | Data-storage system and data read-write method | |
US8271456B2 (en) | Efficient backup data retrieval | |
US7805439B2 (en) | Method and apparatus for selecting data records from versioned data | |
CN107368545B (en) | A kind of De-weight method and device based on Merkle Tree deformation algorithm | |
CN106021031B (en) | A kind of the deletion data reconstruction method and device of BTRFS file system | |
JP2005267600A5 (en) | ||
CN105224528B (en) | Big data processing method and device based on graph calculation | |
CN106294595A (en) | A kind of document storage, search method and device | |
CN109358987A (en) | A kind of backup cluster based on two-stage data deduplication | |
CN107451138A (en) | A kind of distributed file system storage method and system | |
CN110347643B (en) | Method and device for cloning NTFS (New technology File System) volume between disks | |
CN106970958A (en) | A kind of inquiry of stream file and storage method and device | |
CN109445703A (en) | A kind of Delta compression storage assembly based on block grade data deduplication | |
CN103942301B (en) | Distributed file system oriented to access and application of multiple data types | |
US7653663B1 (en) | Guaranteeing the authenticity of the data stored in the archive storage | |
CN105493080B (en) | The method and apparatus of data de-duplication based on context-aware | |
CN104750729A (en) | Data management method and system based on journal file | |
CN110297781A (en) | A method of restore to be deleted data in APFS based on copy-on-write | |
CN106528703A (en) | Deduplication mode switching method and apparatus | |
CN105468733A (en) | Source end data deduplication-based volume replication method | |
CN102831240B (en) | The storage means of extended metadata file and storage organization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20220322 Address after: No. 407, floor 4, No. 9, No. 9, shangdijiu street, Haidian District, Beijing 100085 Patentee after: Shenzhou Yunke (Beijing) Technology Co.,Ltd. Address before: 518131 F3, 11th floor, No. 8 Kefa Road, Yuehai Street, Nanshan District, Shenzhen City, Guangdong Province Patentee before: Shenzhen science and Technology Co.,Ltd. digital cloud data |