CN107368545A

CN107368545A - A kind of De-weight method and device based on MerkleTree deformation algorithms

Info

Publication number: CN107368545A
Application number: CN201710507717.9A
Authority: CN
Inventors: 高华龙
Original assignee: Shenzhen Science And Technology Co Ltd Digital Cloud Data
Current assignee: Shenzhou Yunke (Beijing) Technology Co.,Ltd.
Priority date: 2017-06-28
Filing date: 2017-06-28
Publication date: 2017-11-21
Anticipated expiration: 2037-06-28
Also published as: CN107368545B

Abstract

The embodiment of the present invention provides a kind of De-weight method and device based on Merkle Tree deformation algorithms, including：Piecemeal is carried out to the first data and calculates the cryptographic Hash of each piecemeal, each piecemeal sets reference count；The cryptographic Hash of first piecemeal and the first Hash subtree are contrasted with the first Hash tree pre-established：If the cryptographic Hash of the first piecemeal, content are identical with the first cryptographic Hash in the first Hash tree, content, and when the root node of the first Hash subtree is identical with the second cryptographic Hash in the first Hash tree, then the reference count of each piecemeal adds 1；If during the second cryptographic Hash difference in the root node and the first Hash tree of the first Hash subtree, the reference count of the first piecemeal is then added 1, and the first piecemeal is deleted, obtains the second data, aforesaid operations are performed to the second data, terminated when the second data are last piecemeal.Embodiment provided by the invention can improve deduplicated efficiency on the premise of duplicate removal rate is ensured, reduce the duplicate removal time.

Description

A kind of De-weight method and device based on Merkle Tree deformation algorithms

Technical field

The present invention relates to technical field of data processing, and in particular to a kind of duplicate removal based on Merkle Tree deformation algorithms Method and device.

Background technology

Duplicate removal technology, also as data de-duplication technology, the value based on data in magnetic disk protection is drastically increased, Significantly improve long-range and divisional office backup integration and disaster recovery strategy based on wide area network.The technology identification goes out weight Complex data, redundancy is eliminated, so as to reduce the data volume of transmission and storage.

Common mode classification is file-level data de-duplication, the data de-duplication of block rank.

The data de-duplication technology of file-level contrasts what will be backed up or achieve by regarding the attribute of file as index File and existing file.If this file is unique, it will be stored and update its index；If Through existing, the pointer of only one sensing existing file is stored.As a result, only a document instance is saved, and it is subsequent Copy all by one sensing actual file label substituted.

The data de-duplication of block rank, data are split as fragment --- data block or data slice, to these blocks of files Redundancy check is carried out, it is compared with existing information.It is to use Hash to determine the most frequently used mode of redundant data Scheduling algorithm is that data specify a unique mark, generates a unique ID or " fingerprint " of data block.By this unique mark Contrasted with the mark in a central index service.If ID has been present, data block is processed corresponding to explanation Cross and stored.Therefore it may only be necessary to preserve a pointer for having pointed to previously data storage.If this ID is not repeated, So this data block is unique.The ID is added in central index, and stores this unique data block.

The shortcomings that method of traditional file level, is write efficiency and duplicate removal rate all than relatively low.

In the method for file-level, any change in file will all cause the preservation again of whole file.One file, can It can carry out some simply to change as title transformation of page, to reflect new speaker or data, this will cause whole text Part preserves again.By comparison, the duplicate data inspection of block rank can only preserve redaction and be carried out relative to legacy version The data block of modification.The duplicate data ratio of usual file-level is probably 5:1 or less than and the data de-duplication of block rank Have been found as 20:1 to 50:1.

In traditional block level method for distinguishing, because existing concordance list is bigger, so searching continuous duplicate data When, can be relatively time-consuming, such as 1GB file, if being split as 4KB block, need to calculate and search corresponding position 256K times Put.

The duplicate removal rate of data de-duplication method based on file-level present in above-mentioned prior art is not high, block rank weight Existing for complex data delet method take it is long the defects of, how can reduce the time of consumption can also improve duplicate removal rate into For urgent problem to be solved.

The content of the invention

For in the prior art the defects of, the embodiments of the invention provide a kind of based on Merkle Tree deformation algorithms De-weight method and device.

In a first aspect, the embodiments of the invention provide a kind of De-weight method based on Merkle Tree deformation algorithms, bag Include：

S1：Piecemeal is carried out to the first data and calculates the cryptographic Hash of each piecemeal, each piecemeal sets reference count；

S2：The cryptographic Hash of the first piecemeal is taken out, and establishes the first of the cryptographic Hash of each piecemeal of first data and breathes out Uncommon subtree；

S3：The cryptographic Hash of first piecemeal and the first Hash subtree are entered with the first Hash tree pre-established Row contrast：

If the cryptographic Hash of first piecemeal is identical with the first cryptographic Hash in the first Hash tree pre-established, and institute It is identical with the content of the first cryptographic Hash identical block to state the content of the first piecemeal, and the root node of the first Hash subtree Identical with the second cryptographic Hash in first Hash tree, then the reference count of each piecemeal adds 1；

If the cryptographic Hash of first piecemeal is identical with the first cryptographic Hash in the first Hash tree pre-established, and institute It is identical with the content of the first cryptographic Hash identical block to state the content of the first piecemeal, and the root node of the first Hash subtree It is different from the second cryptographic Hash in first Hash tree, then the reference count of first piecemeal is added 1, and by described first Piecemeal is deleted, and obtains the second data；

S4：The step of to the second Data duplication S2-S3, tied when second data are last piecemeal Beam.

Second aspect, the embodiment of the present invention provide a kind of duplicate removal device based on Merkle Tree deformation algorithms, including：

First piecemeal module, for carrying out piecemeal to the first data and calculating the cryptographic Hash of each piecemeal, each point Block sets reference count；

First processing module, for taking out the cryptographic Hash of the first piecemeal, and establish each piecemeal of first data First Hash subtree of cryptographic Hash；

First judge module, for by the cryptographic Hash of the first piecemeal and the first Hash subtree and pre-establish the One Hash tree is contrasted：

Loop module, for the step above-mentioned to second Data duplication, until second data are last Terminate during piecemeal.

The third aspect, the embodiment of the present invention provide a kind of computer equipment, including：Memory and processor, the processing Device and the memory complete mutual communication by bus；The memory storage has can be by the journey of the computing device Sequence instructs, and the processor calls described program instruction to be able to carry out following method：

Fourth aspect, the embodiment of the present invention provide a kind of computer-readable recording medium, are stored thereon with computer program, The method when computer program is executed by processor for storing foregoing computer program.

De-weight method and device provided in an embodiment of the present invention based on Merkle Tree deformation algorithms, it is to be written by going The data entered carry out piecemeal, calculate cryptographic Hash, Hash tree are established based on Merkle Tree deformation algorithms, with being stored in system In Hash tree contrasted, duplicate removal is carried out to data to be written according to the result of judgement.Utilized using the embodiment of the present invention Merkle Tree characteristic can realize the quick detection of multiple consecutive data blocks, duplicate data provided in an embodiment of the present invention Delet method and device on the premise of duplicate removal rate is ensured, can improve deduplicated efficiency, reduce the duplicate removal time so that duplicate removal rate While reaching block rank, deduplicated efficiency is close to file-level.

Brief description of the drawings

In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing There is the required accompanying drawing used in technology description to be briefly described, it should be apparent that, drawings in the following description are this hairs Some bright embodiments, for those of ordinary skill in the art, on the premise of not paying creative work, can be with root Other accompanying drawings are obtained according to these accompanying drawings.

Fig. 1 is that the method flow of the duplicate removal technology provided in an embodiment of the present invention based on Merkle Tree deformation algorithms shows It is intended to；

Fig. 2 is the method stream for the duplicate removal technology based on Merkle Tree deformation algorithms that further embodiment of this invention provides Journey schematic diagram；

Fig. 3 is the first Hash tree that the data provided in an embodiment of the present invention having been written into systems in advance are established；

Fig. 4 is that the cryptographic Hash provided in an embodiment of the present invention in piecemeal is identical, the under the content different situations in piecemeal One Hash tree；

Fig. 5 is the first Hash subtree that the first data provided in an embodiment of the present invention are established；

Fig. 6 is the method stream for the duplicate removal technology based on Merkle Tree deformation algorithms that further embodiment of this invention provides Journey schematic diagram；

Fig. 7 is the device knot for the duplicate removal technology based on Merkle Tree deformation algorithms that yet another embodiment of the invention provides Structure schematic diagram；

Fig. 8 is the device knot for the duplicate removal technology based on Merkle Tree deformation algorithms that yet another embodiment of the invention provides Structure schematic diagram.

Embodiment

To make the purpose, technical scheme and advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is Part of the embodiment of the present invention, rather than whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art The every other embodiment obtained under the premise of creative work is not made, belongs to the scope of protection of the invention.

Fig. 1 is that the method flow of the duplicate removal technology provided in an embodiment of the present invention based on Merkle Tree deformation algorithms shows It is intended to, as shown in figure 1, methods described includes：

Duplicate removal technical method provided in an embodiment of the present invention based on Merkle Tree deformation algorithms can be applied to distribution In the storing process of the software of storage, and globally consistent duplicate removal service is provided out, for example, can be applied to the write-in to hard disk Operation, read operation and data reclaimer operation.

A file is stored in system for computer hard disk, when a file is written to hard disk by user In, when system receives this demand, system will judge whether file to be written is identical with stored file, will be The file translations of this hard disk to be written into one section of data IO to be written, that is, the interface of data interaction, wherein, the number Physical address (offset address), data length, action type, the content either internal memory to be read to be write are carry according to IO Space, cryptographic Hash etc., wherein, when the file translations that system is written into are into data IO to be written, system will give Data IO to be written allocated physical address.

For example：A file has been stored in hard disk, and each block divided in this file has certainly Oneself physical address, for one section of data IO to be written, system just provides physical address to this segment data IO automatically.

This one piece of data IO can be divided into the block of formed objects, such as 4K, 8K or 4K/ by system with certain space size 8K integral multiple can, the embodiment of the present invention is described in detail by taking 4K as an example.

After the data IO that system is written into carries out piecemeal with 4K sizes, the Hash of each piecemeal will be calculated Value, simultaneity factor can give one reference count of each blocking settings, the reference of each piecemeal set before hard disk is write 0 is counted as, each piecemeal is cited once, and reference count adds 1, that is to say, that reference count is used for representing that this piecemeal is drawn Number.For example, be stored in the 100th sector of hard disk for a block number, initially set reference count as 0 according to 123, and For this block number according to when being cited in C disks one time, the reference count of this block just adds 1, and when being quoted by D disks, this block draws Just add 1 again with counting, become 2.

From above-mentioned steps, piecemeal is carried out to the first data, and calculates the cryptographic Hash of each piecemeal, extracts the first point The cryptographic Hash of block, the cryptographic Hash using Merkle Tree deformation algorithms to each piecemeal of first data, establish described First Hash subtree of the cryptographic Hash of each piecemeal of one data.

In embodiment above, some files have been prestored in a hard disk, first have to deform using MerkleTree Algorithm establishes the Hash tree of the file stored, the first Hash tree as pre-established, then with the cryptographic Hash of the first piecemeal Contrasted with the first Hash subtree established with the first Hash tree pre-established.

If it is identical with the first cryptographic Hash in the first Hash tree pre-established in the cryptographic Hash of first piecemeal, and The content of first piecemeal is identical with the content of the first cryptographic Hash identical block, and the root section of the first Hash subtree Point is different from the second cryptographic Hash in first Hash tree, then the reference count of first piecemeal is added into 1, and by described the One piecemeal is deleted, and in the case of obtaining the second data, has obtained the second data, the step of going to perform S2 again, from the second data It is middle to take out the cryptographic Hash of the first piecemeal, and establish the second Hash subtree of the cryptographic Hash of each piecemeal of second data；Again The step of performing S3, by the cryptographic Hash of the first piecemeal described in the second data and the second Hash subtree and pre-establish First Hash tree is contrasted：

If the cryptographic Hash of first piecemeal in the second data is breathed out with first in the first Hash tree pre-established Uncommon value is identical, and the content of first piecemeal is identical with the content of the first cryptographic Hash identical block, and described second breathes out The root node of uncommon subtree is identical with the second cryptographic Hash in first Hash tree, then the institute of each piecemeal in the second data State reference count and add 1；

If the cryptographic Hash of the first piecemeal in second data is breathed out with first in the first Hash tree pre-established Uncommon value is identical, and the content of first piecemeal is identical with the content of the first cryptographic Hash identical block, and described second breathes out The root node of uncommon subtree is different from the second cryptographic Hash in first Hash tree, then by the first piecemeal in second data Reference count add 1, and the first piecemeal in second data is deleted, obtains the 3rd data, circulated according to above-mentioned steps Perform, block number to the last terminates according to when being last piecemeal of first data.

De-weight method provided in an embodiment of the present invention based on Merkle Tree deformation algorithms, by removing number to be written According to piecemeal is carried out, cryptographic Hash is calculated, Hash tree is established based on Merkle Tree deformation algorithms, with having stored Kazakhstan in systems Uncommon tree is contrasted, if the cryptographic Hash of first piecemeal and the first cryptographic Hash phase in the first Hash tree pre-established Together, and the content of first piecemeal is identical with the content of the first cryptographic Hash identical block, and the first Hash subtree Root node it is identical with the second cryptographic Hash in first Hash tree, then the reference count of each piecemeal adds 1； If the cryptographic Hash of first piecemeal is identical with the first cryptographic Hash in the first Hash tree pre-established, and described first point The content of block is identical with the content of the first cryptographic Hash identical block, and the root node of the first Hash subtree and described the The second cryptographic Hash in one Hash tree is different, then the reference count of first piecemeal is added into 1, and first piecemeal is deleted Remove, obtain the second data, terminate when second data are last piecemeal.Utilized using the embodiment of the present invention Merkle Tree characteristic can realize the quick detection of multiple consecutive data blocks, duplicate data provided in an embodiment of the present invention Delet method and device on the premise of duplicate removal rate is ensured, can improve deduplicated efficiency, reduce the duplicate removal time so that duplicate removal rate While reaching block rank, deduplicated efficiency is close to file-level.

Alternatively, it is described by the cryptographic Hash of first piecemeal and the first Hash subtree and pre-establish first Hash tree is contrasted, in addition to：

If the cryptographic Hash of first piecemeal is identical with the first cryptographic Hash in first Hash tree pre-established, And the content of first piecemeal is different with the content of the first cryptographic Hash identical block, then the described of first piecemeal is drawn It is added in count is incremented, and by the content of first piecemeal behind the content of the first cryptographic Hash identical block.

On the basis of above-described embodiment, it is described by the cryptographic Hash of first piecemeal and the first Hash subtree with When the first Hash tree pre-established is contrasted, also a kind of situation is, when first piecemeal cryptographic Hash with it is described The first cryptographic Hash in the first Hash tree pre-established is identical, and the content of first piecemeal and first cryptographic Hash During the content difference of identical block, the reference count of the first piecemeal is added 1, and the content of first piecemeal is added in Behind the content of the first cryptographic Hash identical block.In such a case it is not necessary to the root node to the first Hash subtree Compared with the cryptographic Hash in the first Hash tree pre-established.

Also a kind of possible situation, if the cryptographic Hash of the first piecemeal does not find identical Kazakhstan in the first Hash tree Uncommon value, then apply for a memory headroom again from system, and the reference count of the first piecemeal adds 1, and by the content of the first piecemeal It is written in the memory headroom newly applied.

De-weight method provided in an embodiment of the present invention based on Merkle Tree deformation algorithms, by removing number to be written According to piecemeal is carried out, cryptographic Hash is calculated, Hash tree is established based on Merkle Tree deformation algorithms, with having stored Kazakhstan in systems Uncommon tree is contrasted, and is performed different operating according to different situations, is gone for a variety of situations, using profit of the embodiment of the present invention The quick detection of multiple consecutive data blocks, repeat number provided in an embodiment of the present invention can be realized with Merkle Tree characteristic According to delet method and device on the premise of duplicate removal rate is ensured, can improve deduplicated efficiency, reduce the duplicate removal time so that duplicate removal While rate reaches block rank, deduplicated efficiency is close to file-level.

For embodiments of the invention, it is discussed in detail below using the specific embodiment of an ablation process.

The first step：The data write to needs are divided into several blocks；

Second step：Cryptographic Hash and the reference count of each piecemeal are obtained, specifically, calculates the Hash of each piecemeal Value, while be each reference count of blocking settings one；

3rd step：The cryptographic Hash of first piecemeal is taken, first piecemeal of searching in the first Hash tree pre-established Cryptographic Hash, if finding identical cryptographic Hash, contrast identical cryptographic Hash pair in content and the first Hash tree in the first piecemeal Content in the leaf node answered, if both contents are identical, the reference count of the first piecemeal adds 1, and performs the behaviour of the 4th step Make；If both contents are different, the content in the first piecemeal is increased to identical cryptographic Hash in the first Hash tree and corresponded to by system Leaf node content behind memory space in, and two piecemeals together constitute a list object, common to point to Same cryptographic Hash.

If not finding identical cryptographic Hash in the first Hash tree, apply for an internal memory sky again from system Between, the reference count of the first piecemeal adds 1, and the content of the first piecemeal is written in the memory headroom newly applied.

4th step：The cryptographic Hash of first piecemeal and remaining all piecemeals is established into the first Hash subtree, pre-established Global Hash tree in search the first Hash subtree root node cryptographic Hash, it is remaining every if finding identical cryptographic Hash The reference count of individual piecemeal all adds 1；

If not finding identical cryptographic Hash, system can delete first piecemeal, take the cryptographic Hash of second piecemeal Repeat the operation of above-mentioned 3rd step.

With reference to Fig. 3-Fig. 5, carry out the specific scheme that the embodiment of the present invention is discussed in detail for a specific example.

First in systems, piecemeal is carried out to a file, such as L1, L2, L3, L4, calculates the cryptographic Hash of each block, adopt With Merkle Tree deformation algorithms, each piece of cryptographic Hash is created as to MerkleTree Hash tree, it is real in the present invention Apply the first Hash tree for being referred to as pre-establishing in example, such as the first Hash tree as pre-established shown in Fig. 3.

As shown in figure 5, when user needs to write one section of new data, this one piece of data is divided with 4K size Block, such as L30, L40, and the cryptographic Hash hash (L30) and hash (L40) of each block are calculated, concurrently set each piecemeal Reference count be 0.

The cryptographic Hash hash (L30) of first piecemeal is taken compared with each cryptographic Hash in the first Hash tree, if It is identical with the cryptographic Hash hash (L30) of first piecemeal that hash (L3) value is found in the first Hash tree, then is just compared in L3 Content and L30 in content it is whether identical, if the content in L3 is identical with the content in L30, be written into data IO The first piecemeal L30 reference count add 1, then utilize the first piecemeal L30 cryptographic Hash hash (L30) and the second piecemeal L40 Cryptographic Hash hash (L40) establishes the first Hash subtree, compare the first Hash subtree root node cryptographic Hash Hash60 and in advance Cryptographic Hash in the global Hash tree of foundation, if there is the root section with the first Hash subtree in the first Hash tree pre-established The cryptographic Hash Hash60 identical cryptographic Hash Hash6 of point, the then reference count for being written into remaining piecemeal L40 in data add 1；

If the not no cryptographic Hash Hash60 with the root node of the first Hash subtree in the global Hash tree pre-established Identical cryptographic Hash, then the first piecemeal L30 for being written into data are deleted, and take the second piecemeal L40 cryptographic Hash and establish the Two Hash subtrees repeat above-mentioned step, until remaining piecemeal is taken end compared with the Hash tree pre-established.

If the content in L3 is different with the content in L30, data IO to be written the first piecemeal L30 reference count Add 1, and by behind the L3 contents of L30 content the first Hash tree of increase, two piecemeals form a list objects, point to Same cryptographic Hash, as shown in Figure 4.

If not finding cryptographic Hash hash (L30) identical cryptographic Hash with first piecemeal in the first Hash tree, Then system will apply for a new memory space, and the first piecemeal L30 reference count is added into 1, and by the first piecemeal L30 Appearance is written in new memory space.

Fig. 6 is the method stream for the duplicate removal technology based on Merkle Tree deformation algorithms that further embodiment of this invention provides Journey schematic diagram, as shown in fig. 6, methods described also includes data read process, the data read process is specially：

S21：Obtain an at least logical address for data to be read；

S22：By the object indexing pre-established, according at least logical address got, it is determined that it is described extremely Piecemeal to be read corresponding to a few logical address；

S23：Read the data content in the piecemeal to be read.

On the basis of above-described embodiment, the method for the duplicate removal technology based on Merkle Tree deformation algorithms can also fit For in data read process, concrete implementation step to be as follows：

It is to have data to be written in hard disk before this when system realizes read operation, for the every of write-in hard disk One block, just the logical address of each block and the corresponding of this block are closed when first block is written to hard disk System constitutes index, and is saved in system, for same piece, may be quoted by multiple logical addresses, that is to say, that Logical address and the relation of block can be many-to-one in index.

User needs to read a file, clicks on some file, and system has just got the instruction for reading file, Ran Houfen Separate out multiple logical addresses for including in reading instruction, for each logical address, system can according to the index prestored and Logical address, the block corresponding with the logical address is found, and each block includes respective physical address and content, is System have found block, that is, the physical address of block in a hard disk, so as to read out content from physical address, then incite somebody to action These contents return to system.For example, for multiple logical addresses, it is necessary to for first logical address, system is with regard to basis The corresponding relation of logical address and block inquires corresponding with first logical address in first logical address and index One block, and include described content in each block, when system queries are to first block, it is possible to first block In the content that includes be read out.Then above-mentioned operation is carried out to each remaining logical address again, by what is read Content is integrated into returns to system together.

Illustrate：A file to be preset first, and this document is divided into 3 blocks, the content of first block is a, The logical address of hard disk is 0010 where first block, and the content of second block is b, and second block is in the logical address of hard disk 0011, the content of the 3rd block is c, and the 3rd block is 0012 in the logical address of disk, each block and each piece of logic The corresponding relation of address constitutes index, and stores in systems.

User will read some file of disk, click on this file, and system will get this finger for reading file Order, parses at least one logical address, such as 0010,0011,0012, system can be looked into according to each logical address Ask, for first logical address 0010, system first obtains first logical address 0011, will be by being stored in system Searched in index, find first block corresponding with 0011, first is then found in hard disk soon, that is, the Physical address where one block, the content a included in first block is then read out from physical address, to second logic Address and the 3rd logical address are also adopted and are read out with the aforedescribed process, and by content a, b and c for reading out in the lump Return to system.

De-weight method provided in an embodiment of the present invention based on Merkle Tree deformation algorithms, in data write-in and data In reading process, the quick of multiple consecutive data blocks can be realized using Merkle Tree characteristic using the embodiment of the present invention Detection, the delet method and device of duplicate data provided in an embodiment of the present invention can improve on the premise of duplicate removal rate is ensured Deduplicated efficiency, reduce the duplicate removal time so that while duplicate removal rate reaches block rank, deduplicated efficiency is close to file-level.

Alternatively, methods described also includes data record process, is specially：When each piece of reference count is reduced to 0, The space of described piece of occupancy is released.

On the basis of above-mentioned each embodiment, the embodiment of the present invention also includes data record mechanism, mainly in duplicate removal During, with increasing for duplicate removal number, the block that the memory space in hard disk is had data content takes, then has new data When content wants write-in, also no space can write hard disk, so user in hard disk is empty not in the internal memory of needs Between discharged.

When user deletes some book, equivalent to one disk of book, for example, user wants to empty the sky of C disks Between, system just needs that reference of all blocks in C disks in other disks will be released, when the reference count of each block subtracts It is small to 0 when, all blocks are just recovered in the system, and at this moment, C disks also just empty, if new data can be write again Enter.

When user deletes some file, such as a file is present in C disks, D disks and E disks, then in this file The reference count of each block is exactly 3, when user will be deleted, the deletion that will first be stored in E disks, then in this file The reference count of each block subtract 1, become 2, then the deletion that will be stored in D disks, then each block in this file Reference count subtracts 1, becomes 1, then the deletion that will be stored in C disks, then and the reference count of each block in this file subtracts 1, Become 0, and the reference count of each block of this when of this that to be deleted file becomes 0, all blocks are all returned by system Receive, and this file is also discharged in the memory headroom of the occupancy of hard disk by system, user has new content to write again.

De-weight method provided in an embodiment of the present invention based on Merkle Tree deformation algorithms, read in data write-in, data During taking with data record, multiple consecutive numbers can be realized using Merkle Tree characteristic using the embodiment of the present invention According to the quick detection of block, the delet method of duplicate data provided in an embodiment of the present invention, can be with the premise of duplicate removal rate is ensured Deduplicated efficiency is improved, reduces the duplicate removal time so that while duplicate removal rate reaches block rank, deduplicated efficiency is close to file-level.

Fig. 7 is the device knot for the duplicate removal technology based on Merkle Tree deformation algorithms that yet another embodiment of the invention provides Structure schematic diagram, as shown in fig. 7, described device includes the first piecemeal module 10, first processing module 20, the and of the first judge module 30 Loop module 40, wherein：

First piecemeal module 10 is used to carry out piecemeal to the first data and calculates the cryptographic Hash of each piecemeal, each point Block sets reference count；

First processing module 20 is used for the cryptographic Hash for taking out the first piecemeal, and establish each piecemeal of first data First Hash subtree of cryptographic Hash；

First judge module 30 is used for the cryptographic Hash of the first piecemeal and the first Hash subtree and pre-established First Hash tree is contrasted：

Loop module 40 is used for the step above-mentioned to second Data duplication, until second data are last Terminate during piecemeal.

Specifically, the first piecemeal module 10 is that the first data carry out piecemeal to data to be written, and calculates each point The cryptographic Hash of block, each piecemeal set reference count；

The cryptographic Hash of the first piecemeal is taken out by first processing module 20, and establish each piecemeal of first data First Hash subtree of cryptographic Hash；

First judge module 30 by the cryptographic Hash of the first piecemeal and the first Hash subtree and pre-establish first Hash tree is contrasted：

The step above-mentioned to second Data duplication of loop module 40, until second data are last piecemeal When terminate.

Device provided in an embodiment of the present invention, suitable for method described above, its function specifically can refer to the above method Embodiment, here is omitted.

Duplicate removal device provided in an embodiment of the present invention based on Merkle Tree deformation algorithms, by removing number to be written According to piecemeal is carried out, cryptographic Hash is calculated, Hash tree is established based on Merkle Tree deformation algorithms, with having stored Kazakhstan in systems Uncommon tree is contrasted, if the cryptographic Hash of first piecemeal and the first cryptographic Hash phase in the first Hash tree pre-established Together, and the content of first piecemeal is identical with the content of the first cryptographic Hash identical block, and the first Hash subtree Root node it is identical with the second cryptographic Hash in first Hash tree, then the reference count of each piecemeal adds 1； If the cryptographic Hash of first piecemeal is identical with the first cryptographic Hash in the first Hash tree pre-established, and described first point The content of block is identical with the content of the first cryptographic Hash identical block, and the root node of the first Hash subtree and described the The second cryptographic Hash in one Hash tree is different, then the reference count of first piecemeal is added into 1, and first piecemeal is deleted Remove, obtain the second data, terminate when second data are last piecemeal.Utilized using the embodiment of the present invention Merkle Tree characteristic can realize the quick detection of multiple consecutive data blocks, duplicate data provided in an embodiment of the present invention Deletion device on the premise of duplicate removal rate is ensured, can improve deduplicated efficiency, reduce the duplicate removal time so that duplicate removal rate reaches block While rank, deduplicated efficiency is close to file-level.

On the basis of above-described embodiment, it is described by the cryptographic Hash of first piecemeal and the first Hash subtree with When the first Hash tree pre-established is contrasted, also a kind of situation is, when first piecemeal cryptographic Hash with it is described The first cryptographic Hash in the first Hash tree pre-established is identical, and the content of first piecemeal and first cryptographic Hash During the content difference of identical block, the reference count of the first piecemeal is added 1, and the content of first piecemeal is added in Behind the content of the first cryptographic Hash identical block.In such a case it is not necessary to comparing the root of the first Hash subtree Node is compared with the cryptographic Hash in the first Hash tree pre-established.

Duplicate removal device provided in an embodiment of the present invention based on Merkle Tree deformation algorithms, by removing number to be written According to piecemeal is carried out, cryptographic Hash is calculated, Hash tree is established based on Merkle Tree deformation algorithms, with having stored Kazakhstan in systems Uncommon tree is contrasted, and is performed different operating according to different situations, is gone for a variety of situations, using profit of the embodiment of the present invention The quick detection of multiple consecutive data blocks, repeat number provided in an embodiment of the present invention can be realized with Merkle Tree characteristic According to deletion device on the premise of duplicate removal rate is ensured, can improve deduplicated efficiency, reduce the duplicate removal time so that duplicate removal rate reaches While block rank, deduplicated efficiency is close to file-level.

Alternatively, described device also includes data-reading unit, and the data-reading unit is specially：

Acquisition module, for obtaining an at least logical address for data to be read；

Searching modul, at least logical address got for the object indexing by pre-establishing, foundation, really Piecemeal to be read corresponding to a fixed at least logical address；

Read module, for reading the data content in the piecemeal to be read.

On the basis of above-described embodiment, the device of the duplicate removal technology based on Merkle Tree deformation algorithms can also fit For in data read process, concrete implementation step to be as follows：

User needs to read a file, clicks on some file, and acquisition module has just got the instruction for reading file, so Post analysis go out the multiple logical addresses included in reading instruction, and for each logical address, searching modul meeting basis prestores Index and logical address, find the block corresponding with the logical address, and each block includes respective physical address And content, searching modul have found block, that is, the physical address of block in a hard disk, so as to which read module is read from physical address Content is taken out, these contents are then being returned into system.For example, for multiple logical addresses, it is necessary to from first logically For location, searching modul just inquires and the according to the corresponding relation of logical address and block in first logical address and index First corresponding block of one logical address, and include described content in each block, inquired in searching modul During first block, read module can is read out to the content included in first block.Then again to it is remaining each Logical address is carried out above-mentioned operation, and the content read is integrated into and returns to system together.

Duplicate removal device provided in an embodiment of the present invention based on Merkle Tree deformation algorithms, in data write-in and data In reading process, the quick of multiple consecutive data blocks can be realized using Merkle Tree characteristic using the embodiment of the present invention Detection, the delet method and device of duplicate data provided in an embodiment of the present invention can improve on the premise of duplicate removal rate is ensured Deduplicated efficiency, reduce the duplicate removal time so that while duplicate removal rate reaches block rank, deduplicated efficiency is close to file-level.

Alternatively, described device also includes data record unit, is specially：When each piece of reference count is reduced to 0, The space of described piece of occupancy is released.

Duplicate removal device provided in an embodiment of the present invention based on Merkle Tree deformation algorithms, read in data write-in data During taking with data record, multiple consecutive numbers can be realized using Merkle Tree characteristic using the embodiment of the present invention According to the quick detection of block, the deletion device of duplicate data provided in an embodiment of the present invention, can be with the premise of duplicate removal rate is ensured Deduplicated efficiency is improved, reduces the duplicate removal time so that while duplicate removal rate reaches block rank, deduplicated efficiency is close to file-level.

Fig. 8 is the structured flowchart of computer equipment provided in an embodiment of the present invention.Reference picture 8, the computer equipment, bag Include：Processor (processor) 801, memory (memory) 802 and bus 803；

Wherein, the processor 801 and the memory 802 complete mutual communication by the bus 803；

The processor 801 is used to call the programmed instruction in the memory 802, to perform above-mentioned each method embodiment The method provided, such as including：S1：Piecemeal is carried out to the first data and calculates the cryptographic Hash of each piecemeal, each point Block sets reference count；

The present embodiment discloses a kind of computer program product, and the computer program product includes being stored in non-transient calculating Computer program on machine readable storage medium storing program for executing, the computer program include programmed instruction, when described program instruction is calculated When machine performs, computer is able to carry out the method that above-mentioned each method embodiment is provided, such as including：S1：First data are entered Row piecemeal and the cryptographic Hash for calculating each piecemeal, each piecemeal set reference count；

The present embodiment provides a kind of non-transient computer readable storage medium storing program for executing, the non-transient computer readable storage medium storing program for executing Computer instruction is stored, the computer instruction makes the computer perform the method that above-mentioned each method embodiment is provided, example Such as include：S1：Piecemeal is carried out to the first data and calculates the cryptographic Hash of each piecemeal, each piecemeal sets reference count；

One of ordinary skill in the art will appreciate that：Realizing all or part of step of above method embodiment can pass through Programmed instruction related hardware is completed, and foregoing program can be stored in a computer read/write memory medium, the program Upon execution, the step of execution includes above method embodiment；And foregoing storage medium includes：ROM, RAM, magnetic disc or light Disk etc. is various can be with the medium of store program codes.

The embodiments such as the test equipment of display device described above are only schematical, wherein described as separation The unit of part description can be or may not be it is physically separate, can be as the part that unit is shown or It can not be physical location, you can with positioned at a place, or can also be distributed on multiple NEs.Can be according to reality Border needs to select some or all of module therein to realize the purpose of this embodiment scheme.Those of ordinary skill in the art In the case where not paying performing creative labour, you can to understand and implement.

Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can Realized by the mode of software plus required general hardware platform, naturally it is also possible to pass through hardware.Based on such understanding, on The part that technical scheme substantially in other words contributes to prior art is stated to embody in the form of software product, should Computer software product can store in a computer-readable storage medium, such as ROM/RAM, magnetic disc, CD, including some fingers Make to cause a computer equipment (can be personal computer, server, or network equipment etc.) to perform each implementation Method described in some parts of example or embodiment.

Device and system embodiment described above is only schematical, wherein described be used as separating component explanation Unit can be or may not be physically separate, can be as the part that unit is shown or may not be Physical location, you can with positioned at a place, or can also be distributed on multiple NEs.Can be according to the actual needs Some or all of module therein is selected to realize the purpose of this embodiment scheme.Those of ordinary skill in the art are not paying In the case of performing creative labour, you can to understand and implement.

Claims

A kind of 1. De-weight method based on Merkle Tree deformation algorithms, it is characterised in that including：

S1：Piecemeal is carried out to the first data and calculates the cryptographic Hash of each piecemeal, each piecemeal sets reference count；

S2：The cryptographic Hash of the first piecemeal is taken out, and establishes the first Hash of the cryptographic Hash of each piecemeal of first data Tree；

S3：The cryptographic Hash of first piecemeal and the first Hash subtree and the first Hash tree pre-established are carried out pair Than：

If the cryptographic Hash of first piecemeal is identical with the first cryptographic Hash in the first Hash tree pre-established, and described The content of one piecemeal is identical with the content of the first cryptographic Hash identical block, and the root node of the first Hash subtree and institute The second cryptographic Hash stated in the first Hash tree is identical, then the reference count of each piecemeal adds 1；

If the cryptographic Hash of first piecemeal is identical with the first cryptographic Hash in the first Hash tree pre-established, and described The content of one piecemeal is identical with the content of the first cryptographic Hash identical block, and the root node of the first Hash subtree and institute It is different to state the second cryptographic Hash in the first Hash tree, then the reference count of first piecemeal is added 1, and by first piecemeal Delete, obtain the second data；

S4：The step of to the second Data duplication S2-S3, terminate when second data are last piecemeal.
2. according to the method for claim 1, it is characterised in that described by the cryptographic Hash of first piecemeal and described first Hash subtree is contrasted with the first Hash tree pre-established, in addition to：

If the cryptographic Hash of first piecemeal is identical with the first cryptographic Hash in first Hash tree pre-established, and institute It is different with the content of the first cryptographic Hash identical block to state the content of the first piecemeal, then the reference meter of first piecemeal Number plus 1, and the content of first piecemeal is added in behind the content of the first cryptographic Hash identical block.
3. according to the method for claim 1, it is characterised in that methods described also includes data read process, the data Reading process is specially：

Obtain an at least logical address for data to be read；

By the object indexing pre-established, according at least logical address got, it is determined that an at least logic Piecemeal to be read corresponding to address；

Read the data content in the piecemeal to be read.
4. according to the method for claim 1, it is characterised in that methods described also includes data record process, is specially：When When the reference count of each block is reduced to 0, the space of described piece of occupancy is released.
A kind of 5. duplicate removal device based on Merkle Tree deformation algorithms, it is characterised in that including：

First piecemeal module, for carrying out piecemeal to the first data and calculating the cryptographic Hash of each piecemeal, each piecemeal is set Put reference count；

First processing module, for taking out the cryptographic Hash of the first piecemeal, and establish the Hash of each piecemeal of first data First Hash subtree of value；

First judge module, for by the cryptographic Hash of the first piecemeal and the first Hash subtree and what is pre-established first breathe out Uncommon tree is contrasted：

If the cryptographic Hash of first piecemeal is identical with the first cryptographic Hash in the first Hash tree pre-established, and described The content of one piecemeal is identical with the content of the first cryptographic Hash identical block, and the root node of the first Hash subtree and institute The second cryptographic Hash stated in the first Hash tree is identical, then the reference count of each piecemeal adds 1；

If the cryptographic Hash of first piecemeal is identical with the first cryptographic Hash in the first Hash tree pre-established, and described The content of one piecemeal is identical with the content of the first cryptographic Hash identical block, and the root node of the first Hash subtree and institute It is different to state the second cryptographic Hash in the first Hash tree, then the reference count of first piecemeal is added 1, and by first piecemeal Delete, obtain the second data；

Loop module, for the step above-mentioned to second Data duplication, until second data are last piecemeal When terminate.
6. device according to claim 5, it is characterised in that described by the cryptographic Hash of first piecemeal and described first Hash subtree is contrasted with the first Hash tree pre-established, in addition to：

If the cryptographic Hash of first piecemeal is identical with the first cryptographic Hash in first Hash tree pre-established, and institute It is different with the content of the first cryptographic Hash identical block to state the content of the first piecemeal, then the reference meter of first piecemeal Number plus 1, and the content of first piecemeal is added in behind the content of the first cryptographic Hash identical block.
7. device according to claim 5, it is characterised in that described device also includes data-reading unit, the data Reading unit is specially：

Acquisition module, for obtaining an at least logical address for data to be read；

Searching modul, for the object indexing by pre-establishing, according at least logical address got, determine institute State piecemeal to be read corresponding to an at least logical address；

Read module, for reading the data content in the piecemeal to be read.
8. device according to claim 5, it is characterised in that described device also includes data record unit, is specially：When When the reference count of each block is reduced to 0, the space of described piece of occupancy is released.
9. a kind of computer equipment, it is characterised in that including memory and processor, the processor and the memory pass through Bus completes mutual communication；The memory storage has can be by the programmed instruction of the computing device, the processor Described program instruction is called to be able to carry out the method as described in Claims 1-4 is any.
10. a kind of computer-readable recording medium, is stored thereon with computer program, it is characterised in that the computer program quilt The method as described in Claims 1-4 is any is realized during computing device.