CN109445703B - A kind of Delta compression storage assembly based on block grade data deduplication - Google Patents

A kind of Delta compression storage assembly based on block grade data deduplication Download PDF

Info

Publication number
CN109445703B
CN109445703B CN201811259886.6A CN201811259886A CN109445703B CN 109445703 B CN109445703 B CN 109445703B CN 201811259886 A CN201811259886 A CN 201811259886A CN 109445703 B CN109445703 B CN 109445703B
Authority
CN
China
Prior art keywords
container
block
data
delta
similar
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811259886.6A
Other languages
Chinese (zh)
Other versions
CN109445703A (en
Inventor
杨天明
汤震
李景富
吴海涛
黄平
杨奕
樊宜和
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Lihe Technology Innovation Center Co ltd
Yami Technology Guangzhou Co ltd
Original Assignee
Huanghuai University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huanghuai University filed Critical Huanghuai University
Priority to CN201811259886.6A priority Critical patent/CN109445703B/en
Publication of CN109445703A publication Critical patent/CN109445703A/en
Application granted granted Critical
Publication of CN109445703B publication Critical patent/CN109445703B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • G06F3/0611Improving I/O performance in relation to response time
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • G06F3/0641De-duplication techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0644Management of space entities, e.g. partitions, extents, pools
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/065Replication mechanisms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0674Disk device
    • G06F3/0676Magnetic disk device

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of, and the Delta based on block grade data deduplication compresses storage assembly, and the Delta compression storage assembly includes container access module;The container access module runs container storage algorithm and container recovery algorithms using similarity indexing, similar buffer area and vessel buffers area data structure;What the block grade data deduplication storage that the container storage algorithm receives upper layer sended over writes container order, carries out Delta compression to container, and will be in the container storage pond on the compressed container write-in disk of Delta;The container recovery algorithms receive the reading container order that the block grade data deduplication storage on upper layer sends over, specified container is read from the container storage pond on disk by container index, the block grade data deduplication storage on upper layer is returned to after the container of reading is restored.The present invention can carry out Delta compression to set of metadata of similar data block, eliminate byte level repeated data, further increase data de-duplication ratio and storage space utilization.

Description

A kind of Delta compression storage assembly based on block grade data deduplication
Technical field
The invention belongs to computer storage backup technology field more particularly to a kind of Delta based on block grade data deduplication Compress storage assembly.
Background technique
Currently, data center generallys use data deduplication technology to save memory space, improves resource utilization.Data are gone Refer to the process of again eliminate redundancy file, data block or byte to guarantee that only single data instance is stored on disk, It is also referred to as a kind of capacity optimization protection technique, for reducing the capacity requirement of data protection.Data deduplication, which mainly uses, divides Block compression and Delta compress technique.
Splits' positions are also known as block grade data deduplication, are a kind of data deduplication technology for currently accounting for mainstream, basic thought It is that piecemeal, then elimination of duplicate data block are carried out to data flow (or file).Simple fixed length piecemeal can generate bit offset and ask Topic, is at present commonly used based on the Method of Partitioning of content such as CDC(Content Defined Chucking) etc. determine data block boundary And obtain the elongated data block that size surrounds the variation of some desired value.It is calculated using keyed Hash function (such as MD5, SHA-1) Fingerprint of the cryptographic Hash of each data block contents as the data block is indexed data block using fingerprint and refers to by comparing Line comes elimination of duplicate data block (the identical data block of fingerprint is repeated data block).
However, current block grade data deduplication technology there are still some drawbacks.Firstly, it can not to eliminate content similar Repeated data between data block, secondly, determining that an optimal data block desired size is relatively difficult.Lesser data block has Conducive to raising data compression rate, but data block to be treated is more in the unit time, and the readwrite performance of system can reduce, simultaneously Also increase the metadata storage overheads such as index.The block grade data deduplication technology of mainstream is typically chosen the expectation of 4KB or 8KB at present Data block size, this causes repeated data of the granularity less than 4KB or 8KB that cannot be deleted.Studies have shown that in file system about 50% file be less than 4KB, and be more than 80% file size in 64KB hereinafter, a large amount of granularities can be generated to the modification of these files Less than the repeated data of 4KB or 8KB.For this kind of load, single block grade data deduplication technology is difficult to reach ideal data Compression effectiveness.
Delta compression is also known as Differential Compression, it is by comparing two file V0And V1To generate a lesser Delta File △0,1, in order to give file V0And △0,1, can also original V by Delta inverse operation1, it is shown below: Δ (V0, V1) =△0,1 , (V0,△0,1) = V1 (' Δ ' indicates Delta operation, and ' ' indicates Delta inverse operation).Delta file △0,1It is one from file V0Be converted to file V1Best editor's operation series or character block movement directive minimum vertex-covering Collection.V0Referred to as V1Or △0,1Reference document, and V1Referred to as △0,1Original.If V0And V1With biggish similarity, then File △0,1Meeting very little, in this way, to store △0,1Instead of storing V1Very big memory space can be saved.Delta compression will form Delta chain as follows: V0 ←△0,1←△1,2← ···←△u,u+1(u is the integer more than or equal to 0).One Delta chain, in addition to the file V of end0Outside, other is all Delta file.Delta file △i,i+1(i is more than or equal to 0 Integer less than or equal to u) original be Vi+1, reference document Vi, that is to say, that in addition to △0,1With Delta chain end V0Outside for reference document, other Delta files be all using the original of that Delta file before it as reference document, Therefore, the original on Delta chain is restored, is needed from end node V0Start to carry out Delta along the opposite direction of Delta chain Inverse operation.
When storage has the file set especially version file collection of larger similitude, Delta compression can greatly subtract Few memory space occupation rate, the even up to compression ratio of 100:1 or more.Meanwhile Delta compression algorithm itself has good fortune Row performance.But be not very high file to degree similar to each other, the effect of Delta compression is poor.Due to file in real system Size distribution is wider, and very likely similitude is smaller and be unsuitable for Delta compression for two files for sharing more repeated data. Therefore, it is compressed in file-level using single Delta in actual storage system, the effect is not very satisfactory for data deduplication.
In conclusion ideal data deduplication scheme is to combine splits' positions and Delta compression, learn from other's strong points to offset one's weaknesses. This, which needs to increase Delta on the basis of current block grade data deduplication storage, compresses storage assembly.In view of being configured to This, newly-increased Delta compression storage assembly should be transparent to existing piece of grade data deduplication storage, do not need be to having System improves.In addition, newly-increased Delta compression storage assembly should also have preferably while improving data compression rate Reading and writing data performance.
Delta compression key, which is how to efficiently find and pressed, to be carried out to set of metadata of similar data block in large-scale storage systems The similar data block of contracting data block is as reference block.It is assumed that two data blocksD 1WithD 2The set for the data patch for including is respectivelyS 1WithS 2,H(S 1) andH(S 2) it is to set respectivelyS 1WithS 2In element carry out the set of fingerprint that Hash operation obtains, min (H (S)) indicate setH(S) in the smallest element, following conclusion is had according to the data similarity theory of Broder: if min (H(S 1)) be equal to min (H(S 2)), then data blockD 1WithD 2Content with higher probability have biggish similitude;Instead It, if data blockD 1WithD 2Content there is biggish similitude, then min (H(S 1)) be equal to min (H(S 2)) probability It is larger.
Summary of the invention
The object of the present invention is to provide a kind of, and the Delta based on block grade data deduplication compresses storage assembly, eliminates byte level Repeated data improves data de-duplication ratio and storage space utilization.
In order to achieve the above objectives, the technical solution adopted by the present invention is that: the invention discloses one kind to be gone based on block grade data The Delta of weight compresses storage assembly, and the Delta compression storage assembly includes container access module;The container access module Container storage algorithm and container recovery algorithms are run using similarity indexing, similar buffer area and vessel buffers area data structure; What the block grade data deduplication storage that the container storage algorithm is used to receive upper layer sended over writes container order, to container Delta compression is carried out, and will be in the container storage pond on the compressed container write-in disk unit of Delta;The container restores Algorithm is for receiving the reading container order that the block grade data deduplication storage on upper layer sends over, by container index from disk On container storage pond in read specified container, will reading container restore after return to upper layer block grade data deduplication storage System;The Delta compression storage assembly also receives the reading container metadata that the block grade data deduplication storage on upper layer is sent Order reads the metadata of specified containers from the container storage pond on disk unit, specified metadata is sent to upper layer Block grade data deduplication storage.
The container storage algorithm in turn includes the following steps:
(201), it initializes:
First: parameter S, R, Sr and L are read from configuration file;The configuration file is stationed on disk unit, is used to The configuration information of record system;
The parameter S is preset positive integer, indicates read from the container storage pond of disk when Delta compression Enter the maximum number of the similar vessels of memory, the similar vessels refer to the similar container of content;
The parameter R is preset positive number, indicates that the minimum Delta compression ratio allowed, the Delta compression ratio are Refer to after generating Delta block to data block progress Delta compression using reference block, the ratio of data block size and Delta block size;
The parameter Sr is preset positive integer, and 1/Sr indicates hook signature sampling rate;
The parameter L is preset positive integer, indicates maximum Delta chain length;
Then: judging whether to be system configuration initial stage, then generate an empty similarity indexing in memory in this way;If not, The similarity indexing of backup is read in into memory from configuration file;
It is last: to generate an empty similar buffer area in memory;An empty vessel buffers area is generated in memory, is used To keep in from the container read in the container storage pond on disk in memory;An empty storehouse is generated in memory, is denoted as Stack;
(202), it receives container: receiving the container of writing that one sends over from the block grade data deduplication storage on upper layer and order It enables, extracts container to be written in container order from writing, be denoted as upper layer container;An empty format is generated in memory to hold Device is denoted as Work container, and vessel buffers area is written in Work container;It is spare to empty similar buffer area;The upper layer container refers to Container used in the block grade data deduplication storage on upper layer;The format container refers to Delta compression storage assembly Container used in container access module;
(203), fingerprint copies: reading container identifier from the meta-data region of upper layer container, and is written into Work container Vessel head container identification field;From the meta-data region read block fingerprint of upper layer container, by these data block fingerprints The fingerprint region of Work container is sequentially written according to its original sequence;
(204), similar signatures are calculated: successively calculating the similar signatures of each data block in the data field of upper layer container; A similar signatures block is generated for each similar signatures, and the similar signatures are written to the similar signatures word of similar signatures block Section;Similar signatures block is sequentially written in the similar signatures area of Work container according to the sequence of its corresponding data block;
(205), it extracts hook signature: all similar signatures for including in Work container being taken out according to the ratio of 1/Sr Sample is signed the similar signatures of extraction as hook, and hook signature is sequentially written in the hook signature area of Work container;By work Make similar signatures of the smallest similar signatures as container in all similar signatures for including in container, by the similar signatures of container The container signature field of the vessel head of Work container is written;
(206), similarity indexing updates:
First: the container identifier of Work container is assigned to variable cid;
Secondly: signing, be handled as follows: by the hook for each hook for including in Work container hook signature area Son signature is assigned to variable hook, generates a mapping<hook, and cid>, general<hook, cid>be inserted into similarity indexing;
(207), similar vessels are searched: inquiry similarity indexing, are found out and are shared those of hook signature container with Work container, It is if it is not found, then go to step (228);Otherwise, according to sharing the quantity of hook signature with Work container from big to small from looking for To container in choose most S containers, confirm that these containers for being selected are the similar vessels of Work container;In vessel buffers These similar vessels are searched in area, and the similar vessels not in vessel buffers area are read into vessel buffers from container storage pool Area enters step (208);
(208), similar buffer area is write:
The similar signatures area for successively scanning each similar vessels of Work container in vessel buffers area, reads similar signatures All similar signatures blocks in area are handled as follows each similar signatures block of reading: generation one is similar in memory Buffer area index node, is denoted as Node;The type field value of the similar signatures block and offset word segment value are individually copied to Node Type field and offset field;By the container identifier word of the container identifier write-in Node of the similar signatures block said container Section;The similar signatures field for reading the similar signatures block remembers that the similar signatures of reading are sign, general<sign, Node>insertion To similar buffer area;
(209), prepare processing data block: one read pointer P1 of setting is directed toward first phase in Work container similar signatures area Like signaling block, first data block that a read pointer P2 is directed toward upper layer container data area is set;
(210), read block: data block pointed by a P2 is read from the container data area of upper layer, is denoted as Dr, from Similar signatures block pointed by a P1 is read in Work container similar signatures area, is denoted as Block, is read the similar label of Block The value of file-name field, is denoted as sign1;
(211), similar buffer area is searched: the similar label that similar signatures field value is sign1 are searched in similar buffer area Name node, goes to step (224) if it is not found,;Otherwise, a read pointer P3 is set and is directed toward the similar signatures node just found Index node chained list first index node, enter step (212);
(212), judge data block: if the type field value for the index node that P3 is directed toward for Delta block mark, turn the (214) step;Otherwise, it is data block mark, enters step (213);
(213), short chain Delta operation: by the value and offset word of the container identification field of index node pointed by P3 The value of section is assigned to variable cid0 and offset0 respectively, and from address<cid0 in container buffer area, offset0>place is read One data block, is denoted as D0;With D0For reference block and DrDelta operation is carried out, Delta block △ is generated0,r ;If Delta is pressed Shrinkage is greater than or equal to R, then compresses success, go to step (225);Otherwise, compression failure turns in next step;
(214), skip Delta block: P3 moves forward a step, is directed toward next index node, if P3 non-empty, turns the (212) otherwise step shows the tail portion for having arrived at index node chained list, P3 is directed toward to the first of the index node chained list again A index node turns in next step;
(215), judge Delta block: if the type field value for the index node that P3 is directed toward for data block mark, turn the (223) otherwise step is Delta block mark, turn in next step;
(216), Delta block: the value and offset field of the container identification field of index node pointed by P3 is read Value designate address of the Delta block in vessel buffers area, from the address read a Delta block;
(217), it is pressed into storehouse: the Delta block being pressed into Stack, reference is read from the Delta build of the Delta block The reference block address of reading is stored in variable<cid1 by block address, and offset1>in, wherein cid1 is container identifier, Offset1 is the position of the reference block in the data field of container cid1;
(218), reference block is read: if container cid1 in vessel buffers area, reads reference from container buffer area Block<cid1, offset1>, otherwise, container cid1 is read in into vessel buffers area from container storage pool first, is then read again Reference block<cid1, offset1>;
(219), judgement reference block: if reference block<cid1, offset1>it is Delta block, turn (217) step, it is no Then, it is data block, the content of the data block is stored in variables D0In, general<cid1, offset1>it is assigned to variable<cid0, Offset0 >, it is assigned to variable length by 1, is turned in next step;
(220), long-chain Delta calculation step: with D0For reference block and DrIt carries out Delta operation and generates Delta block △0,r ;If Delta compression ratio is greater than or equal to R and length is less than or equal to L, success is compressed, (225) are gone to step, Otherwise, compression failure turns in next step;
(221), judge storehouse: if Stack goes to step (223) for sky, otherwise, turning in next step;
(222), pop-up a stack: popping up a Delta block from Stack, be denoted as △, by the address of △ deposit variable < Cid0, offset0 >, to D0Delta inverse operation is carried out with △, the result of Delta inverse operation is stored in variables D0In, by variable The value of length increases by 1, goes to step (220);
(223), skip data block: P3 moves forward a step, is directed toward next index node, if P3 non-empty, turns the (215) otherwise step shows the tail portion for having arrived at index node chained list, turn in next step;
(224), storing data block: in data block DrBefore add a data block head, be written in the data block head Data block mark and DrSize information;If the data field non-empty of Work container, by the data block D after additional data buildr It adds in the data field of Work container behind data with existing, otherwise, by the data block D after additional data buildrWork is written Make the initial position of the data field of container;By the type field of data block mark write-in Block, by data block DrIn Work container The offset field of the location information write-in Block of data field, goes to step (226);
(225), Delta block is stored: in Delta block △0,rBefore add a Delta build, in the Delta build In be written Delta block mark, △0,rSize and △0,rReference block address<cid0, offset0>;If work is held The data field non-empty of device, then by the Delta block △ after additional Delta build0,rAddition has number in the data field of Work container Behind, otherwise, by the Delta block △ after additional Delta build0,rThe initial position of the data field of Work container is written;It will The type field of Block is written in Delta block mark, by Delta block △0,rLocation information in Work container data field is written The offset field of Block;Storehouse Stack is emptied;
(226), similar buffer area updates: generating a similar buffer area index node in memory, is denoted as Node1;It will The type field value and offset word segment value of Block are individually copied to the type field and offset field of Node1;By Work container The container identification field of container identifier write-in Node1;General<sign1, Node1>it is inserted into similar buffer area;
(227), data block is disposed judgement: P1 being moved forward a step, is directed toward in the similar signatures area of Work container Next similar signatures block, by P2 move forward a step, be directed toward upper layer container data area in next data block;If P2 For sky, then shows that the data block in the container of upper layer is all disposed, go to step (229);Otherwise, (210) are gone to step;
(228), former container is stored: first: from first data BOB(beginning of block), successively handling every in upper layer container data area One data block adds data block head before the data block, writing data blocks mark and the data block in data block head Size information;It is if the data field non-empty of Work container, the data block after additional data build is additional in Work container In data field behind data with existing, otherwise, by rising for the data field of the data block write-in Work container after additional data build Beginning position;Similar signatures block corresponding with the data block in the similar signatures area of Work container is handled, by data block mark The type field of the similar signatures block is written in will, and the similar label are written in the location information by the data block in Work container data field The offset field of name block (NAM);
Secondly: abandoning upper layer container;It calculates the size of Work container and size information is written to the vessel head of Work container; If container storage pond non-empty, Work container is added in container storage pond behind data with existing, otherwise, work is held The initial position in device write-in container storage pond;
It is last: by the container identifier of the Work container in rigid write-in container storage pond and the Work container in container storage Container index is written in location information in pond, goes to step (230);
(229), new container is stored:
First: abandoning upper layer container;It calculates the size of Work container and size information is written to the vessel head of Work container; If container storage pond non-empty, Work container is added in container storage pond behind data with existing, otherwise, work is held The initial position in device write-in container storage pond;
Secondly: by the container identifier of the Work container in rigid write-in container storage pond and the Work container in container storage Container index is written in location information in pond;
(230), end of run judges: judging whether to receive end of run instruction, such as otherwise goes to step (202);If so, Then turn in next step;
(231), terminate:
First: stopping receiving the container that the block grade data deduplication storage on upper layer sends over;
Then: configuration file is written into the similarity indexing in memory;
It is last: to destroy similarity indexing, vessel buffers area, similar buffer area and the storehouse Stack backed off after random in memory.
The container recovery algorithms in turn include the following steps:
(301), it initializes: generating an empty vessel buffers area in memory, deposited for temporary from the container on disk The container in memory is read in reservoir;An empty storehouse is generated in memory, is denoted as Stack;
(302), it receives read command: receiving a reading container sended over from the block grade data deduplication storage on upper layer Order is denoted as cid from extraction vessel identifier in container order is read;An empty upper layer format container, note are generated in memory For upper layer container;
(303), it reads container: reading the container that container identifier is cid in the container storage pond on disk, be denoted as work Make container, vessel buffers area is written into Work container;
(304), metadata recovering step: according to the type and call format of upper layer container metadata from the member of Work container Corresponding metadata is read in data field, by the meta-data region of these metadata write-in upper layer container of reading;
(305), prepare processing data field: one read pointer P1 of setting is directed toward first object of Work container data field;
(306), judge object: if object pointed by P1 is a data block, which being denoted as Dr, turn Step (312);Otherwise, it is Delta block, turns in next step;
(307), it is pressed into storehouse: the Delta block being pressed into Stack, reference is read from the Delta build of the Delta block Block address, remember reading reference block address be<cid1, offset1>, wherein cid1 is container identifier, and offset1 is this Quote position of the block in the data field of container cid1;
(308), reference block is read: if container cid1 in vessel buffers area, reads reference from container buffer area Block<cid1, offset1>, otherwise, container cid1 is read in into vessel buffers area from container storage pool first, is then read again Reference block<cid1, offset1>;
(309), judgement reference block step: if reference block<cid1, offset1>it is Delta block, go to step (307); Otherwise, it is data block, which is stored in variables D, turns in next step;
(310), pop-up a stack: popping up a Delta block from Stack, be denoted as △, carries out the inverse fortune of Delta to D and △ It calculates, it will be in the result deposit variables D of Delta inverse operation;
(311), judge storehouse: if Stack non-empty, turning (310) step;Otherwise, the content of variables D is denoted as data block Dr, turn in next step;
(312), data block is copied: if upper layer container data area non-empty, by data block DrIt adds in upper layer container In data field behind data with existing;Otherwise, by data block DrThe initial position in upper layer container data area is written;
(313), judge data field: read pointer P1 moves forward a step, and it is next right in Work container data field to be directed toward As going to step (306) if P1 non-empty;Otherwise, data field is disposed, and turns in next step;
(314), end of run judges: the upper layer container handled well is sent to the block grade data deduplication storage system on upper layer System goes to step (302) if being not received by end operation order;Otherwise, turn in next step;
(315), terminate: destroying vessel buffers area and the operation of storehouse Stack backed off after random.
The similarity indexing is memory Hash table;The memory Hash table includes a bucket group;Each of in the bucket group The corresponding number of bucket, and the mapping between hook signature and bucket number, the hook being mapped in bucket are established using hash function Signature is stored in hook signature node;Each hook signature node stores a unique hook and signs and be associated with an appearance Device identifier queue, the identifier of container of the storage comprising hook signature in container identifier queue;The hook signature knot Point is made of hook signature field, spilling chain table pointer field and container identifier queue field;The hook signature field is used In storage one unique hook signature;The spilling chain table pointer field is mapped to same for storage when handling hash-collision The address of another hook signature node in a bucket;Container identifier queue pointer field is for storing hook signature The first address of the associated container identifier queue of node.
The similar buffer area is memory Hash table;The memory Hash table includes a bucket group;It is every in the bucket group A barrel of correspondence one number, and the mapping between similar signatures and bucket number, the phase being mapped in bucket are established using hash function It is stored in similar signatures node like signature;Each similar signatures node stores a unique similar signatures and is associated with one Index node chained list stores index node in index node chained list, has this similar wherein each index node stores one The data block of signature or the information of Delta block;The similar signatures of the Delta block refer to the corresponding data block of Delta block Similar signatures;The similar signatures node is by similar signatures field, spilling chain table pointer field and index node chain table pointer word Duan Zucheng;The similar signatures field is for storing a unique similar signatures;The spilling chain table pointer field is for locating The address for another similar signatures node that storage is mapped in the same bucket when managing hash-collision;The index node chained list refers to Needle field is used to store the first address of the associated index node chained list of the similar signatures node;The index node is by class type-word Section, container identification field, offset field and chain table pointer field composition;The type field for store data block mark or Delta mark;The container identification field and offset field give the address information of data block or Delta block;The chain Table pointer field is used to store the address of next index node in the index node chained list.
The vessel buffers area is the logical memory chained list known, and the container being read into vessel buffers area is linked at the memory In chained list, when vessel buffers area is full, deleted from the memory chained list using the least recently used replacement algorithm of logical knowledge Container;Work container and its similar vessels are stationed always in vessel buffers area until new Work container and its similar vessels are read When entering vessel buffers area, old Work container and its similar vessels are likely to be set to scaling method to choose and delete from container buffer area It removes.
The container storage pond is stationed on disk unit, and storage container is used for;The container index is stationed to be set in disk It is standby upper, for establishing container identifier and container the reflecting between the position in container storage pond with the container identifier It penetrates.
The present invention proposes a kind of Delta compress technique based on block grade data deduplication, the block series applied to current mainstream According to the backstage of deduplication storage, Delta compression is carried out to set of metadata of similar data block, byte level repeated data is eliminated, further increases Data de-duplication ratio and storage space utilization.The present invention determines phase by calculating and comparing the similar signatures of data block Likelihood data block, the data block with same and similar signature is set of metadata of similar data block, and is handled as unit of container, and backstage is grasped Make such as container compression, storage and restores transparent to upper-level system, the seamless interfacing of realization and upper-level system.Using data buffering And index technology, it realizes the instant lookup of set of metadata of similar data, the readwrite performance of Delta compression and container can be effectively improved, so that should Technology is able to satisfy the needs of extensive high-performance data backup.Specific advantage is as described below:
1, Delta compression is carried out to set of metadata of similar data block, eliminates byte level repeated data, further increases data de-duplication Ratio and storage space utilization;
2, without modifying to existing piece of grade data deduplication storage and can use the present invention on backstage;
3, it is handled as unit of container, protects the redundancy locality of data flow, while using similarity indexing, similar The technologies such as buffer area and vessel buffers area, can effectively improve the data processing performance on backstage;
4, container is added in order in the container storage pond on disk, avoids the random small letter I/O of disk, reading and writing data Performance is high.
Detailed description of the invention
Fig. 1 is component structure diagram;
Fig. 2 is container storage flow chart;
Fig. 3 is that container restores flow chart;
Fig. 4 is structure of container figure;
Fig. 5 is similar signatures block structural diagram;
Fig. 6 is data block head structure chart;
Fig. 7 is Delta build structure chart;
Fig. 8 is similarity indexing structural schematic diagram;
Fig. 9 is hook signature node structure figure;
Figure 10 is similar buffer area structural schematic diagram;
Figure 11 is similar signatures node structure figure;
Figure 12 is similar buffer area index node structure chart.
Specific embodiment
The invention discloses a kind of, and the Delta based on block grade data deduplication compresses storage assembly, as shown in Figure 1, described It includes container access module that Delta, which compresses storage assembly,;The container access module uses similarity indexing, similar buffer area and appearance Device buffer data structure runs container storage algorithm and container recovery algorithms;The container storage algorithm receives the block on upper layer What grade data deduplication storage sended over writes container order, carries out Delta compression to container, and Delta is compressed Container is written in the container storage pond on disk;The container recovery algorithms receive the block grade data deduplication storage hair on upper layer The reading container order brought reads specified container from the container storage pond on disk by container index, by reading Container returns to the block grade data deduplication storage on upper layer after restoring.Delta compression storage assembly operates in block grade data The backstage of weight storage system is responsible for the container sended over to block grade data deduplication storage and carries out Delta compression, further Byte level repeated data between set of metadata of similar data block is eliminated, improves data de-duplication ratio and storage space utilization to reach Purpose.The present invention determines set of metadata of similar data block by calculating and comparing the similar signatures of data block, and as unit of container into Row is handled, and consistency operation such as container compression, storage and recovery etc. is transparent to upper-level system, and it is seamless right with upper-level system to realize It connects.Using technologies such as similarity indexing, similar buffer area and vessel buffers areas, realizes the instant lookup of set of metadata of similar data, can effectively mention The readwrite performance of high Delta compression and container, so that the technology is able to satisfy the needs of extensive high-performance data backup.
As shown in Fig. 2, the container storage algorithm in turn includes the following steps:
(201), it initializes:
First: parameter S, R, Sr and L are read from configuration file;The configuration file is stationed on disk unit, is used to The configuration information of record system;
The parameter S is preset positive integer, indicates read from the container storage pond of disk when Delta compression Enter the maximum number of the similar vessels of memory, the similar vessels refer to the similar container of content;Parameter S setting is excessive, can drop The Delta compression performance of low data block, setting is too small, then can reduce the Delta compression ratio of data block, in an implementation, S can be set It is set to 2,3 or 4.
The parameter R is preset positive number, indicates that the minimum Delta compression ratio allowed, the Delta compression ratio are Refer to after generating Delta block to data block progress Delta compression using reference block, the ratio of data block size and Delta block size; In an implementation, R may be configured as 2,2.5 or 3.
The parameter Sr is preset positive integer, and 1/Sr indicates hook signature sampling rate;Hook signature sampling rate be A critically important parameter, if its value is too small, the hook signature generated is very little, will affect the lookup precision of set of metadata of similar data block, If its value is excessive, the hook signature generated is too many, and similarity indexing can be made excessive, and memory overhead is high.In an implementation, root According to the size of system scale, Sr can value 64 or 32.
The parameter L is preset positive integer, indicates maximum Delta chain length;Parameter L setting is too small, can reduce Delta compression effectiveness, is arranged excessive, then can reduce reading and writing data performance, while data compression income obtained and little;? In implementation, parameter L may be configured as 5,6 or 7.
Then: judging whether to be system configuration initial stage, then generate an empty similarity indexing in memory in this way;If not, The similarity indexing of backup is read in into memory from configuration file;
The similarity indexing is memory Hash table;As shown in Figure 8: the memory Hash table includes a bucket group;The bucket A barrel corresponding number each of in group, and the mapping between hook signature and bucket number is established using hash function, it is mapped to Hook signature in bucket is stored in hook signature node;Each hook signature node stores a unique hook and signs and close Join a container identifier queue, the identifier of container of the storage comprising hook signature in container identifier queue;Such as Fig. 9 Shown, the hook signature node is by hook signature field, spilling chain table pointer field and container identifier queue pointer field Composition;The hook signature field is for storing a unique hook signature;The spilling chain table pointer field is for handling The address for another hook signature node being mapped in the same bucket is stored when hash-collision;The container identifier queue refers to Needle field is used to store the first address of the hook signature associated container identifier queue of node;
The similarity indexing is used to establish the mapping between hook signature and the container signed comprising the hook, comprising same The possible more than one of container of a hook signature, in this way, shared hook signature can be quickly found out by inquiring similarity indexing Container;The present invention confirms that the container of shared hook signature is similar vessels, and the similar vessels refer to the similar container of content.
In the present embodiment, the similarity indexing is stationed in memory, is convenient for quick search.Wherein, at the beginning of judging system configuration The method of phase is the mature prior art.
It is last: to generate an empty similar buffer area in memory;An empty vessel buffers area is generated in memory, is used To keep in from the container read in the container storage pond on disk in memory;An empty storehouse is generated in memory, is denoted as Stack;
The similar buffer area is memory Hash table;As shown in Figure 10, the memory Hash table includes a bucket group;It is described A barrel corresponding number each of in bucket group, and the mapping between similar signatures and bucket number, mapping are established using hash function Similar signatures in bucket are stored in similar signatures node;Each similar signatures node stores a unique similar signatures simultaneously It is associated with an index node chained list, stores index node in index node chained list, wherein each index node stores one The information of data block or Delta block with the similar signatures;The similar signatures of the Delta block refer to that the Delta block is corresponding Data block similar signatures;As shown in figure 11: the similar signatures node is by similar signatures field, spilling chain table pointer field It is formed with index node chain table pointer field;The similar signatures field is for storing a unique similar signatures;It is described to overflow The ground for another similar signatures node that storage is mapped in the same bucket when chain table pointer field is used to handle hash-collision out Location;The index node chain table pointer field is used to store the first address of the associated index node chained list of the similar signatures node; As shown in figure 12: the index node is made of type field, container identification field, offset field and chain table pointer field; The type field is for storing data block mark or Delta mark;The container identification field and offset field give The address information of data block or Delta block;The chain table pointer field is for storing next index in the index node chained list The address of node.
Each similar signatures node is associated with an index node chained list, the index knot in the similar buffer area Each index node stores the index information of a data block or Delta block in point chained list, in the same index node chained list Data block or Delta block similar signatures having the same.
In the present embodiment, the similar buffer area is stationed in memory, convenient for being quickly found out when carrying out Delta compression The reference block of data block to be compressed.
The vessel buffers area is the logical memory chained list known, and the container being read into vessel buffers area is linked at the memory In chained list, when vessel buffers area is full, deleted from the memory chained list using the least recently used replacement algorithm of logical knowledge Container;Work container and its similar vessels are stationed always in vessel buffers area until new Work container and its similar vessels are read When entering vessel buffers area, old Work container and its similar vessels are likely to be set to scaling method to choose and delete from container buffer area It removes.
Being provided with for the vessel buffers area is conducive to improve reading and writing data performance, because container protects the redundancy of data flow Locality so that in the same container set of metadata of similar data block of data block very likely also in a same vessel, in this way, from disk Last time reads whole container, not only can be to avoid the random small letter I/O of disk, but also buffer area hit rate can be improved, and reduces disk Read and write number.
(202), it receives container: receiving the container of writing that one sends over from the block grade data deduplication storage on upper layer and order It enables, extracts container to be written in container order from writing, be denoted as upper layer container;An empty format is generated in memory to hold Device is denoted as Work container, and vessel buffers area is written in Work container;It is spare to empty similar buffer area;The upper layer container refers to Container used in the block grade data deduplication storage on upper layer;The format container refers to Delta compression storage assembly Container used in container access module.As shown in figure 4, container is by meta-data region and data district's groups at the read-write of meta-data region From top to bottom, the read-write sequence of data field is packaged into appearance from bottom to top, by the meta-data region after finishing writing and data field docking to sequence Device;The meta-data region is made of vessel head, fingerprint region, similar signatures area and hook signature area.The vessel head is by container mark Know symbol field, size field and container signature field composition, be respectively used to store the container identifier of the container, container size and Container similar signatures;Fingerprint region block fingerprint for storing data;The similar signatures area is for storing similar signatures block;Such as Shown in Fig. 5, the similar signatures block is made of similar signatures field, type field and offset field;The similar signatures field For storing the similar signatures of corresponding data block;The type field block mark or Delta block mark for storing data;It is described Offset field is for storing the address of corresponding data block or Delta block within a data area.The hook signature area is for storing hook Son signature;The data field block or Delta block for storing data;The data block is added in front when storage is to data field One data block head, as shown in fig. 6, the data block head is made of data block mark and data block size field.It is described Delta block attached a Delta build when storage is to data field in front, as shown in fig. 7, the Delta build by Delta block mark, Delta block size field and reference block address field composition, the reference block address field is by container identification Accord with field and offset field composition;
(203), fingerprint copies: reading container identifier from the meta-data region of upper layer container, and is written into Work container Vessel head container identification field;From the meta-data region read block fingerprint of upper layer container, by these data block fingerprints The fingerprint region of Work container is sequentially written according to its original sequence;
(204), similar signatures are calculated: successively calculating the similar signatures of each data block in the data field of upper layer container; A similar signatures block is generated for each similar signatures, and the similar signatures are written to the similar signatures word of similar signatures block Section;Similar signatures block is sequentially written in the similar signatures area of Work container according to the sequence of its corresponding data block;
The calculation method of the similar signatures of the data block is the mature prior art, method are as follows:: from data block Beginning position starts, and is slided in data block with the window of a fixed size, as soon as before every sliding byte, use logical sieve known Guest's fingerprint algorithm calculates sieve guest's fingerprint for falling into the data patch in window, and the smallest guest sieve fingerprint in all data patch is taken to make For the similar signatures of data block.
In the present embodiment, the size of the window is predetermined a constant, can use 512 bytes, and guest sieve refers to The length of line can use 4 bytes.
(205), it extracts hook signature: all similar signatures for including in Work container being taken out according to the ratio of 1/Sr Sample is signed the similar signatures of extraction as hook, and hook signature is sequentially written in the hook signature area of Work container;By work Make similar signatures of the smallest similar signatures as container in all similar signatures for including in container, by the similar signatures of container The container signature field of the vessel head of Work container is written;
(206), similarity indexing updates:
First: the container identifier of Work container is assigned to variable cid;
Secondly: signing, be handled as follows: by the hook for each hook for including in Work container hook signature area Son signature is assigned to variable hook, generates a mapping<hook, and cid>, general<hook, cid>be inserted into similarity indexing;
In the present embodiment, the method for general<hook, cid>be inserted into similarity indexing is equal into memory Hash table Insertion<key, value>, for the mature prior art.
(207), similar vessels are searched: inquiry similarity indexing, are found out and are shared those of hook signature container with Work container, It is if it is not found, then go to step (228);Otherwise, according to sharing the quantity of hook signature with Work container from big to small from looking for To container in choose most S containers, confirm that these containers for being selected are the similar vessels of Work container;In vessel buffers These similar vessels are searched in area, and the similar vessels not in vessel buffers area are read into vessel buffers from container storage pool Area;
(208), similar buffer area is write:
The similar signatures area for successively scanning each similar vessels of Work container in vessel buffers area, reads similar signatures All similar signatures blocks in area are handled as follows each similar signatures block of reading: generation one is similar in memory Buffer area index node, is denoted as Node;The type field value of the similar signatures block and offset word segment value are individually copied to Node Type field and offset field;By the container identifier word of the container identifier write-in Node of the similar signatures block said container Section;The similar signatures field for reading the similar signatures block remembers that the similar signatures of reading are sign, general<sign, Node>insertion To similar buffer area;
In the present embodiment, the method for general<sign, Node>be inserted into similar buffer area is equal to memory Hash table Middle insertion<key, value>, for the mature prior art.
(209), prepare processing data block: one read pointer P1 of setting is directed toward first phase in Work container similar signatures area Like signaling block, first data block that a read pointer P2 is directed toward upper layer container data area is set;
(210), read block: data block pointed by a P2 is read from the container data area of upper layer, is denoted as Dr, from Similar signatures block pointed by a P1 is read in Work container similar signatures area, is denoted as Block, is read the similar label of Block The value of file-name field, is denoted as sign1;
(211), similar buffer area is searched: the similar label that similar signatures field value is sign1 are searched in similar buffer area Name node, goes to step (224) if it is not found,;Otherwise, a read pointer P3 is set and is directed toward the similar signatures node just found Index node chained list first index node, turn in next step;
To data block DrBefore carrying out Delta compression, need to find and DrThe similar data block of content is as reference block.At this In embodiment, with data block DrSimilar signatures sign1 be keyword corresponding similar signatures knot is searched in similar buffer area Point then shows not finding data block D if it is not found,rReference block, go to data block D in (224) steprIt deposits as former state Storage, if it is found, the then index node storage of linked list of similar signatures node data block DrAll potential reference blocks Information, then, further find data block D by traversing the index node chained listrReference block.
Following (212), (213), (214) step operation in, preferential detection data block index node, only when all Data block index node pointed by data block be unsuitable for be used as data block DrReference block when just further detect Delta Block index node.This processing method can effectively improve Delta compression performance, because data block can be directly used as reference block, And Delta block then needs to be first converted into and just can serve as quoting block after data block, this is related to traversing Delta chain, Delta inverse operation Deng operation, time overhead is larger.
(212), judge data block: if the type field value for the index node that P3 is directed toward for Delta block mark, turn the (214) step;Otherwise, it is data block mark, turns in next step;
(213), short chain Delta operation: by the value and offset word of the container identification field of index node pointed by P3 The value of section is assigned to variable cid0 and offset0 respectively, and from address<cid0 in container buffer area, offset0>place is read One data block, is denoted as D0;With D0For reference block and DrDelta operation is carried out, Delta block △ is generated0,r ;If Delta is pressed Shrinkage is greater than or equal to R, then compresses success, go to step (225);Otherwise, compression failure turns in next step;
(214), skip Delta block: P3 moves forward a step, is directed toward next index node, if P3 non-empty, turns the (212) otherwise step shows the tail portion for having arrived at index node chained list, P3 is directed toward to the first of the index node chained list again A index node turns in next step;
The data block pointed by all data block index nodes is unsuitable for being used as data block DrReference block when Further detect Delta block index node.Index node is successively detected and handles in the operation of following (215) ~ (223) step Each of chained list Delta block index node, until finding a reference block appropriate to data block DrCarry out Delta compression Succeed and turns storing data block D in (225) steprDelta block, or can not find reference block appropriate and turn (224) step It is middle by data block DrIt stores as former state.
For any Delta block index node in index node chained list, detects and handle and be divided to two processes, first Process is traversal Delta chain, and second process is detection reference block.
(216) ~ (219) step is traversal Delta chain process below, and the process is pointed by Delta block index node Delta block is starting point, data of each Delta block until Delta chain end on the direction of Delta chain reading Delta chain Block.
(220) ~ (222) step is detection reference block process below, which uses the data block of Delta chain end first It is reference block to data block DrDelta compression is carried out, turns storing data block D in (225) step if compressing successfullyrDelta Otherwise block carries out Delta inverse operation against the direction of Delta chain, be reference block pair with the data block that Delta inverse operation generates Data block DrDelta compression is carried out, successfully turns storing data block D in (225) step until Delta compressesrDelta block, or All Delta blocks in Delta chain, which all detect to finish, does not find reference block appropriate yet, at this moment, turn (223) step detection and Handle next Delta block index node in index node chained list.
(215), judge Delta block: if the type field value for the index node that P3 is directed toward for data block mark, turn the (223) otherwise step is Delta block mark, turn in next step;
(216), Delta block: the value and offset field of the container identification field of index node pointed by P3 is read Value designate address of the Delta block in vessel buffers area, from the address read a Delta block;
(217), it is pressed into storehouse: the Delta block being pressed into Stack, reference is read from the Delta build of the Delta block The reference block address of reading is stored in variable<cid1 by block address, and offset1>in, wherein cid1 is container identifier, Offset1 is the position of the reference block in the data field of container cid1;
(218), reference block is read: if container cid1 in vessel buffers area, reads reference from container buffer area Block<cid1, offset1>, otherwise, container cid1 is read in into vessel buffers area from container storage pool first, is then read again Reference block<cid1, offset1>;
(219), judgement reference block: if reference block<cid1, offset1>it is Delta block, turn (217) step, it is no Then, it is data block, the content of the data block is stored in variables D0In, general<cid1, offset1>it is assigned to variable<cid0, Offset0 >, it is assigned to variable length by 1, is turned in next step;
(220), long-chain Delta calculation step: with D0For reference block and DrIt carries out Delta operation and generates Delta block △0,r ;If Delta compression ratio is greater than or equal to R and length is less than or equal to L, success is compressed, (225) are gone to step, Otherwise, compression failure turns in next step;
(221), judge storehouse: if Stack goes to step (223) for sky, otherwise, turning in next step;
(222), pop-up a stack: popping up a Delta block from Stack, be denoted as △, by the address of △ deposit variable < Cid0, offset0 >, to D0Delta inverse operation is carried out with △, the result of Delta inverse operation is stored in variables D0In, by variable The value of length increases by 1, goes to step (220);
(223), skip data block: P3 moves forward a step, is directed toward next index node, if P3 non-empty, turns the (215) otherwise step shows the tail portion for having arrived at index node chained list, turn in next step;
(224), storing data block: in data block DrBefore add a data block head, be written in the data block head Data block mark and DrSize information;If the data field non-empty of Work container, by the data block D after additional data buildr It adds in the data field of Work container behind data with existing, otherwise, by the data block D after additional data buildrWork is written Make the initial position of the data field of container;By the type field of data block mark write-in Block, by data block DrIn Work container The offset field of the location information write-in Block of data field, goes to step (226);
(225), Delta block is stored: in Delta block △0,rBefore add a Delta build, in the Delta build In be written Delta block mark, △0,rSize and △0,rReference block address<cid0, offset0>;If work is held The data field non-empty of device, then by the Delta block △ after additional Delta build0,rAddition has number in the data field of Work container Behind, otherwise, by the Delta block △ after additional Delta build0,rThe initial position of the data field of Work container is written;It will The type field of Block is written in Delta block mark, by Delta block △0,rLocation information in Work container data field is written The offset field of Block;Storehouse Stack is emptied;
(226), similar buffer area updates: generating a similar buffer area index node in memory, is denoted as Node1;It will The type field value and offset word segment value of Block are individually copied to the type field and offset field of Node1;By Work container The container identification field of container identifier write-in Node1;General<sign1, Node1>it is inserted into similar buffer area;
In the present embodiment, the method for general<sign1, Node1>be inserted into similar buffer area is equal to memory Hash Insertion<key in table, value>, for the mature prior art.
(227), data block is disposed judgement: P1 being moved forward a step, is directed toward in the similar signatures area of Work container Next similar signatures block, by P2 move forward a step, be directed toward upper layer container data area in next data block;If P2 For sky, then shows that the data block in the container of upper layer is all disposed, go to step (229);Otherwise, (210) are gone to step;
(228), former container is stored: first: from first data BOB(beginning of block), successively handling every in upper layer container data area One data block adds data block head before the data block, writing data blocks mark and the data block in data block head Size information;It is if the data field non-empty of Work container, the data block after additional data build is additional in Work container In data field behind data with existing, otherwise, by rising for the data field of the data block write-in Work container after additional data build Beginning position;Similar signatures block corresponding with the data block in the similar signatures area of Work container is handled, by data block mark The type field of the similar signatures block is written in will, and the similar label are written in the location information by the data block in Work container data field The offset field of name block (NAM);
Secondly: abandoning upper layer container;It calculates the size of Work container and size information is written to the vessel head of Work container; If container storage pond non-empty, Work container is added in container storage pond behind data with existing, otherwise, work is held The initial position in device write-in container storage pond;
It is last: by the container identifier of the Work container in rigid write-in container storage pond and the Work container in container storage Container index is written in location information in pond, goes to step (230);
(229), new container is stored:
First: abandoning upper layer container;It calculates the size of Work container and size information is written to the vessel head of Work container; If container storage pond non-empty, Work container is added in container storage pond behind data with existing, otherwise, work is held The initial position in device write-in container storage pond;
Secondly: by the container identifier of the Work container in rigid write-in container storage pond and the Work container in container storage Container index is written in location information in pond;
The container storage pond is stationed on disk unit, and storage container is used for.
The container index is stationed on disk unit, for establishing container identifier and with the appearance of the container identifier Mapping of the device between the position in container storage pond.
(230), end of run judges: judging whether to receive end of run instruction, such as otherwise goes to step (202);If so, Then turn in next step;
(231), terminate:
First: stopping receiving the container that the block grade data deduplication storage on upper layer sends over;
Then: configuration file is written into the similarity indexing in memory;
It is last: to destroy similarity indexing, vessel buffers area, similar buffer area and the storehouse Stack backed off after random in memory.
As shown in figure 3, the container recovery algorithms in turn include the following steps:
(301), it initializes: generating an empty vessel buffers area in memory, deposited for temporary from the container on disk The container in memory is read in reservoir;An empty storehouse is generated in memory, is denoted as Stack;
(302), it receives read command: receiving a reading container sended over from the block grade data deduplication storage on upper layer Order is denoted as cid from extraction vessel identifier in container order is read;An empty upper layer format container, note are generated in memory For upper layer container;
(303), it reads container: reading the container that container identifier is cid in the container storage pond on disk, be denoted as work Make container, vessel buffers area is written into Work container;
(304), metadata recovering step: according to the type and call format of upper layer container metadata from the member of Work container Corresponding metadata is read in data field, by the meta-data region of these metadata write-in upper layer container of reading;
(305), prepare processing data field: one read pointer P1 of setting is directed toward first object of Work container data field;
(306), judge object: if object pointed by P1 is a data block, which being denoted as Dr, turn Step (312);Otherwise, it is Delta block, turns in next step;
Delta block pointed by P1 is reduced into data block by (307) ~ (310) step below, and operation is divided into two mistakes Journey, first process are traversal Delta chain, and second process is Delta chain inverse operation.
(307), (308), (309) step are traversal Delta chain process below, and the process is with Delta block pointed by P1 For starting point, data block of each Delta block until Delta chain end on the direction of Delta chain reading Delta chain.
(310), (311) step carry out the inverse operation of Delta chain below, i.e., carry out the inverse fortune of Delta against the direction of Delta chain It calculates, Delta block pointed by P1 is finally reduced into data block.
(307), it is pressed into storehouse: the Delta block being pressed into Stack, reference is read from the Delta build of the Delta block Block address, remember reading reference block address be<cid1, offset1>, wherein cid1 is container identifier, and offset1 is this Quote position of the block in the data field of container cid1;
(308), reference block is read: if container cid1 in vessel buffers area, reads reference from container buffer area Block<cid1, offset1>, otherwise, container cid1 is read in into vessel buffers area from container storage pool first, is then read again Reference block<cid1, offset1>;
(309), judgement reference block step: if reference block<cid1, offset1>it is Delta block, go to step (307); Otherwise, it is data block, which is stored in variables D, turns in next step;
(310), pop-up a stack: popping up a Delta block from Stack, be denoted as △, carries out the inverse fortune of Delta to D and △ It calculates, it will be in the result deposit variables D of Delta inverse operation;
(311), judge storehouse: if Stack non-empty, turning (310) step;Otherwise, the content of variables D is denoted as data block Dr, turn in next step;
(312), data block is copied: if upper layer container data area non-empty, by data block DrIt adds in upper layer container In data field behind data with existing;Otherwise, by data block DrThe initial position in upper layer container data area is written;
(313), judge data field: read pointer P1 moves forward a step, and it is next right in Work container data field to be directed toward As going to step (306) if P1 non-empty;Otherwise, data field is disposed, and turns in next step;
(314), end of run judges: the upper layer container handled well is sent to the block grade data deduplication storage system on upper layer System goes to step (302) if being not received by end operation order;Otherwise, turn in next step;
(315), terminate: destroying vessel buffers area and the operation of storehouse Stack backed off after random.
In the implementation that said vesse stores algorithm and container recovery algorithms, the Delta operation and Delta inverse operation can To select the Delta tool of compression such as vdelta, xdelta and zdelta, the Delta such as described vdelta, xdelta and zdelta pressure Contracting tool is the mature prior art.
Other than what the block grade data deduplication storage in addition to executing upper layer sended over writes container order and reads container order, In an implementation, the reading that the block grade data deduplication storage that the Delta compression storage assembly also executes upper layer sends over is held Device metadata order, including read the order of container fingerprint;When executing reading container metadata order, the Delta compresses storage assembly Specified metadata is sent to the block grade on upper layer by the meta-data region that specified container is read from the container storage pond on disk Data deduplication storage system.

Claims (6)

1. a kind of Delta based on block grade data deduplication compresses storage assembly, it is characterised in that: the Delta compresses storage group Part includes container access module;The container access module uses similarity indexing, similar buffer area and vessel buffers area data knot Structure runs container storage algorithm and container recovery algorithms;The container storage algorithm is used to receive the block grade data deduplication on upper layer What storage system sended over writes container order, carries out Delta compression to container, and magnetic is written in the compressed container of Delta In container storage pond on disc apparatus;The block grade data deduplication storage that the container recovery algorithms are used to receive upper layer is sent The reading container order to come over reads specified container from the container storage pond on disk by container index, by the appearance of reading Device returns to the block grade data deduplication storage on upper layer after restoring;The Delta compression storage assembly also receives the block on upper layer Specified hold is read in the reading container metadata order that grade data deduplication storage is sent from the container storage pond on disk unit Specified metadata is sent to the block grade data deduplication storage on upper layer by the metadata of device;
The similar buffer area is memory Hash table;The memory Hash table includes a bucket group;Each bucket in the bucket group A corresponding number, and the mapping between similar signatures and bucket number, the similar label being mapped in bucket are established using hash function Name is stored in similar signatures node;Each similar signatures node stores a unique similar signatures and is associated with an index Node chained list stores index node in index node chained list, wherein each index node, which stores one, has the similar signatures Data block or Delta block information;The similar signatures of the Delta block refer to the similar of the corresponding data block of Delta block Signature;The similar signatures node is by similar signatures field, spilling chain table pointer field and index node chain table pointer field groups At;The similar signatures field is for storing a unique similar signatures;The spilling chain table pointer field is breathed out for handling The address for another similar signatures node that storage is mapped in the same bucket when uncommon conflict;The index node chain table pointer word Section is for storing the first address of the associated index node chained list of the similar signatures node;The index node is by type field, appearance Device identifier field, offset field and chain table pointer field composition;The type field is for storing data block mark or Delta Mark;The container identification field and offset field give the address information of data block or Delta block;The chain table pointer Field is used to store the address of next index node in the index node chained list.
2. the Delta based on block grade data deduplication compresses storage assembly as described in claim 1, it is characterised in that: the appearance Device storage algorithm in turn includes the following steps:
(201), it initializes:
First: parameter S, R, Sr and L are read from configuration file;The configuration file is stationed on disk unit, for recording The configuration information of system;
The parameter S is preset positive integer, when indicating to carry out Delta compression out of in the container storage pond of disk read in The maximum number for the similar vessels deposited, the similar vessels refer to the similar container of content;
The parameter R is preset positive number, indicates the minimum Delta compression ratio allowed, the Delta compression ratio is to instigate After carrying out Delta compression generation Delta block to data block with reference block, the ratio of data block size and Delta block size;
The parameter Sr is preset positive integer, and 1/Sr indicates hook signature sampling rate;
The parameter L is preset positive integer, indicates maximum Delta chain length;
Then: judging whether to be system configuration initial stage, then generate an empty similarity indexing in memory in this way;If not, from matching It sets and the similarity indexing of backup is read in into memory on file;
It is last: to generate an empty similar buffer area in memory;An empty vessel buffers area is generated in memory, is used to temporary It deposits from the container read in the container storage pond on disk in memory;An empty storehouse is generated in memory, is denoted as Stack;
(202), it receives container: receiving one and write container order from what the block grade data deduplication storage on upper layer sended over, Container to be written is extracted in container order from writing, and is denoted as upper layer container;An empty format container, note are generated in memory For Work container, vessel buffers area is written into Work container;It is spare to empty similar buffer area;The upper layer container refers to upper layer Container used in block grade data deduplication storage;The format container refers to that the container of Delta compression storage assembly is deposited Container used in modulus block;
(203), fingerprint copies: reading container identifier from the meta-data region of upper layer container, and is written into the appearance of Work container The container identification field of device head;From the meta-data region read block fingerprint of upper layer container, by these data block fingerprints according to Its original sequence is sequentially written in the fingerprint region of Work container;
(204), similar signatures are calculated: successively calculating the similar signatures of each data block in the data field of upper layer container;It is every One similar signatures generates a similar signatures block, and the similar signatures are written to the similar signatures field of similar signatures block;It presses Similar signatures block is sequentially written in the similar signatures area of Work container according to the sequence of its corresponding data block;
(205), it extracts hook signature: all similar signatures for including in Work container being sampled according to the ratio of 1/Sr, It signs the similar signatures of extraction as hook, and hook signature is sequentially written in the hook signature area of Work container;By work Similar signatures of the smallest similar signatures as container, the similar signatures of container are write in all similar signatures for including in container Enter the container signature field of the vessel head of Work container;
(206), similarity indexing updates:
First: the container identifier of Work container is assigned to variable cid;
Secondly: signing, be handled as follows: by the hook label for each hook for including in Work container hook signature area Name is assigned to variable hook, generates a mapping<hook, and cid>, general<hook, cid>be inserted into similarity indexing;
(207), similar vessels are searched: inquiry similarity indexing, are found out and are shared those of hook signature container with Work container, if It does not find, then goes to step (228);Otherwise, according to sharing the quantity of hook signature with Work container from big to small from finding Most S containers are chosen in container, confirm that these containers being selected are the similar vessels of Work container;In vessel buffers area These similar vessels are searched, the similar vessels not in vessel buffers area are read into vessel buffers area from container storage pool, into Enter step (208);
(208), similar buffer area is write:
The similar signatures area for successively scanning each similar vessels of Work container in vessel buffers area, reads in similar signatures area All similar signatures blocks, each similar signatures block of reading is handled as follows: in memory generate a similar buffering Area's index node, is denoted as Node;The type field value of the similar signatures block and offset word segment value are individually copied to the class of Node Type-word section and offset field;By the container identification field of the container identifier write-in Node of the similar signatures block said container; The similar signatures field for reading the similar signatures block remembers that the similar signatures of reading are sign, general<sign, Node>be inserted into phase Like buffer area;
(209), prepare processing data block: one read pointer P1 of setting is directed toward first similar label in Work container similar signatures area First data block that a read pointer P2 is directed toward upper layer container data area is arranged in name block (NAM);
(210), read block: data block pointed by a P2 is read from the container data area of upper layer, is denoted as Dr, from work Similar signatures block pointed by a P1 is read in container similar signatures area, is denoted as Block, is read the similar signatures word of Block The value of section, is denoted as sign1;
(211), similar buffer area is searched: the similar signatures knot that similar signatures field value is sign1 is searched in similar buffer area Point goes to step (224) if it is not found,;Otherwise, the rope that a read pointer P3 is directed toward the similar signatures node just found is set First index node for drawing node chained list, enters step (212);
(212), judge data block: if the type field value for the index node that P3 is directed toward for Delta block mark, turns (214) Step;Otherwise, it is data block mark, enters step (213);
(213), short chain Delta operation: by the value of the container identification field of index node pointed by P3 and offset field Value is assigned to variable cid0 and offset0 respectively, and from address<cid0 in container buffer area, offset0>place reads one Data block is denoted as D0;With D0For reference block and DrDelta operation is carried out, Delta block △ is generated0,r ;If Delta compression ratio More than or equal to R, then success is compressed, goes to step (225);Otherwise, compression failure turns in next step;
(214), skip Delta block: P3 moves forward a step, is directed toward next index node, if P3 non-empty, turns (212) Step, otherwise, shows the tail portion for having arrived at index node chained list, and P3 is directed toward to first index of the index node chained list again Node turns in next step;
(215), judge Delta block: if the type field value for the index node that P3 is directed toward for data block mark, turns (223) Step, is Delta block mark otherwise, is turned in next step;
(216), Delta block: the value of the container identification field of index node pointed by P3 and the value of offset field is read Address of the Delta block in vessel buffers area is designated, a Delta block is read from the address;
(217), it is pressed into storehouse: the Delta block is pressed into Stack, reference block is read from the Delta build of the Delta block The reference block address of reading is stored in variable<cid1 by location, and offset1>in, wherein cid1 is container identifier, offset1 It is the position of the reference block in the data field of container cid1;
(218), read reference block: if container cid1 in vessel buffers area, from container buffer area read reference block < Cid1, offset1 >, otherwise, container cid1 is read in into vessel buffers area from container storage pool first, then reads reference block again < cid1, offset1>;
(219), judgement reference block: if reference block<cid1, offset1>it is Delta block, otherwise turning (217) step is The content of the data block is stored in variables D by data block0In, general<cid1, offset1>it is assigned to variable<cid0, Offset0 >, it is assigned to variable length by 1, is turned in next step;
(220), long-chain Delta calculation step: with D0For reference block and DrIt carries out Delta operation and generates Delta block △0,r ;Such as Fruit Delta compression ratio is greater than or equal to R and length is less than or equal to L, then compresses success, go to step (225), otherwise, pressure Contracting failure, turns in next step;
(221), judge storehouse: if Stack goes to step (223) for sky, otherwise, turning in next step;
(222), pop-up a stack: popping up a Delta block from Stack, be denoted as △, and the address of △ is stored in variable < cid0, Offset0 >, to D0Delta inverse operation is carried out with △, the result of Delta inverse operation is stored in variables D0In, by variable length Value increase by 1, go to step (220);
(223), skip data block: P3 moves forward a step, is directed toward next index node, if P3 non-empty, turns (215) Step, otherwise, shows the tail portion for having arrived at index node chained list, turns in next step;
(224), storing data block: in data block DrBefore add a data block head, the writing data blocks in the data block head Mark and DrSize information;If the data field non-empty of Work container, by the data block D after additional data buildrIt adds In the data field of Work container behind data with existing, otherwise, by the data block D after additional data buildrWork container is written Data field initial position;By the type field of data block mark write-in Block, by data block DrIn Work container data field Location information write-in Block offset field, go to step (226);
(225), Delta block is stored: in Delta block △0,rBefore add a Delta build, write in the Delta build Enter Delta block mark, △0,rSize and △0,rReference block address<cid0, offset0>;If the number of Work container According to area's non-empty, then by the Delta block △ after additional Delta build0,rIt adds in the data field of Work container after data with existing Face, otherwise, by the Delta block △ after additional Delta build0,rThe initial position of the data field of Work container is written;By Delta The type field of Block is written in block mark, by Delta block △0,rLocation information in Work container data field is written Block's Offset field;Storehouse Stack is emptied;
(226), similar buffer area updates: generating a similar buffer area index node in memory, is denoted as Node1;By Block Type field value and offset word segment value be individually copied to the type field and offset field of Node1;By the container of Work container The container identification field of identifier write-in Node1;General<sign1, Node1>it is inserted into similar buffer area;
(227), data block is disposed judgement: P1 being moved forward a step, under being directed toward in the similar signatures area of Work container P2 is moved forward a step, the next data block being directed toward in upper layer container data area by one similar signatures block;If P2 is Sky then shows that the data block in the container of upper layer is all disposed, goes to step (229);Otherwise, (210) are gone to step;
(228), former container is stored: first: from first data BOB(beginning of block), successively handling each in upper layer container data area Data block adds data block head before the data block, the size of writing data blocks mark and the data block in data block head Information;If the data field non-empty of Work container, by the additional data in Work container of data block after additional data build In area behind data with existing, otherwise, by the start bit of the data field of the data block write-in Work container after additional data build It sets;Similar signatures block corresponding with the data block in the similar signatures area of Work container is handled, data block mark is write The similar signatures block is written in the type field for entering the similar signatures block, the location information by the data block in Work container data field Offset field;
Secondly: abandoning upper layer container;It calculates the size of Work container and size information is written to the vessel head of Work container;If Work container is then added and behind data with existing, otherwise, Work container is write in container storage pond by container storage pond non-empty Enter the initial position in container storage pond;
It is last: by the container identifier of the Work container in rigid write-in container storage pond and the Work container in container storage pond Location information be written container index, go to step (230);
(229), new container is stored:
First: abandoning upper layer container;It calculates the size of Work container and size information is written to the vessel head of Work container;If Work container is then added and behind data with existing, otherwise, Work container is write in container storage pond by container storage pond non-empty Enter the initial position in container storage pond;
Secondly: by the container identifier of the Work container in rigid write-in container storage pond and the Work container in container storage pond Location information be written container index;
(230), end of run judges: judging whether to receive end of run instruction, such as otherwise goes to step (202);If so, then turning In next step;
(231), terminate:
First: stopping receiving the container that the block grade data deduplication storage on upper layer sends over;
Then: configuration file is written into the similarity indexing in memory;
It is last: to destroy similarity indexing, vessel buffers area, similar buffer area and the storehouse Stack backed off after random in memory.
3. the Delta based on block grade data deduplication compresses storage assembly as described in claim 1, it is characterised in that: the appearance Device recovery algorithms in turn include the following steps:
(301), it initializes: generating an empty vessel buffers area in memory, for temporary from the container storage pond on disk In read in memory in container;An empty storehouse is generated in memory, is denoted as Stack;
(302), it receives read command: receiving the reading container sended over from the block grade data deduplication storage on upper layer a life It enables, from extraction vessel identifier in container order is read, is denoted as cid;An empty upper layer format container is generated in memory, is denoted as Upper layer container;
(303), it reads container: reading the container that container identifier is cid in the container storage pond on disk, be denoted as work appearance Vessel buffers area is written in Work container by device;
(304), metadata recovering step: according to the type and call format of upper layer container metadata from the metadata of Work container Area reads corresponding metadata, by the meta-data region of these metadata write-in upper layer container of reading;
(305), prepare processing data field: one read pointer P1 of setting is directed toward first object of Work container data field;
(306), judge object: if object pointed by P1 is a data block, which being denoted as Dr, go to step (312);Otherwise, it is Delta block, turns in next step;
(307), it is pressed into storehouse: the Delta block is pressed into Stack, reference block is read from the Delta build of the Delta block Location, remember reading reference block address be<cid1, offset1>, wherein cid1 is container identifier, and offset1 is the reference block Position in the data field of container cid1;
(308), read reference block: if container cid1 in vessel buffers area, from container buffer area read reference block < Cid1, offset1 >, otherwise, container cid1 is read in into vessel buffers area from container storage pool first, then reads reference block again <cid1, offset1>;
(309), judgement reference block step: if reference block<cid1, offset1>it is Delta block, go to step (307);It is no Then, it is data block, which is stored in variables D, turns in next step;
(310), pop-up a stack: popping up a Delta block from Stack, be denoted as △, carries out Delta inverse operation to D and △, will In the result deposit variables D of Delta inverse operation;
(311), judge storehouse: if Stack non-empty, turning (310) step;Otherwise, the content of variables D is denoted as data block Dr, turn In next step;
(312), data block is copied: if upper layer container data area non-empty, by data block DrIt adds in upper layer container data area In behind data with existing;Otherwise, by data block DrThe initial position in upper layer container data area is written;
(313), judge data field: read pointer P1 moves forward a step, the next object being directed toward in Work container data field, such as Fruit P1 non-empty, goes to step (306);Otherwise, data field is disposed, and turns in next step;
(314), end of run judges: the upper layer container handled well being sent to the block grade data deduplication storage on upper layer, such as Fruit is not received by end operation order, then goes to step (302);Otherwise, turn in next step;
(315), terminate: destroying vessel buffers area and the operation of storehouse Stack backed off after random.
4. the Delta based on block grade data deduplication compresses storage assembly as described in claim 1, it is characterised in that: the phase It is memory Hash table like index;The memory Hash table includes a bucket group;A barrel corresponding number each of in the bucket group, And the mapping between hook signature and bucket number is established using hash function, the hook signature being mapped in bucket is stored in hook label In name node;Each hook signature node stores a unique hook and signs and be associated with a container identifier queue, holds The identifier of container of the storage comprising hook signature in device identifier queue;Hook signature node is signed word by hook Section overflows chain table pointer field and container identifier queue field composition;The hook signature field is unique for storing one Hook signature;Described overflow when chain table pointer field is used to handle hash-collision stores another being mapped in the same bucket The address of hook signature node;Container identifier queue pointer field is for storing the hook signature associated container of node The first address of identifier queue.
5. the Delta based on block grade data deduplication compresses storage assembly as described in claim 1, it is characterised in that: the appearance Device buffer area is the logical memory chained list known, and the container being read into vessel buffers area is linked in the memory chained list, works as container When buffer area is full, some containers are deleted from the memory chained list using the least recently used replacement algorithm of logical knowledge;Work container It stations always in vessel buffers area with its similar vessels until new Work container and its similar vessels read in vessel buffers area When, old Work container and its similar vessels are likely to be set to scaling method to choose and delete from container buffer area.
6. the Delta based on block grade data deduplication compresses storage assembly as described in claim 1, it is characterised in that: the appearance Device storage pool is stationed on disk unit, and storage container is used for;The container index is stationed on disk unit, is held for establishing Device identifier and mapping of the container between the position in container storage pond with the container identifier.
CN201811259886.6A 2018-10-26 2018-10-26 A kind of Delta compression storage assembly based on block grade data deduplication Active CN109445703B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811259886.6A CN109445703B (en) 2018-10-26 2018-10-26 A kind of Delta compression storage assembly based on block grade data deduplication

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811259886.6A CN109445703B (en) 2018-10-26 2018-10-26 A kind of Delta compression storage assembly based on block grade data deduplication

Publications (2)

Publication Number Publication Date
CN109445703A CN109445703A (en) 2019-03-08
CN109445703B true CN109445703B (en) 2019-10-25

Family

ID=65548437

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811259886.6A Active CN109445703B (en) 2018-10-26 2018-10-26 A kind of Delta compression storage assembly based on block grade data deduplication

Country Status (1)

Country Link
CN (1) CN109445703B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112104725B (en) * 2020-09-09 2022-05-27 中国联合网络通信集团有限公司 Container mirror image duplicate removal method, system, computer equipment and storage medium
CN112817962B (en) * 2021-03-16 2022-02-18 广州鼎甲计算机科技有限公司 Data storage method and device based on object storage and computer equipment
CN115617770B (en) * 2022-11-17 2023-03-28 达芬骑动力科技(北京)有限公司 Data disk storage management method for vehicle state signal data storage
CN116501709B (en) * 2023-06-25 2023-09-05 深圳市双合电气股份有限公司 IEC61850 data service function-based data storage method and device

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101084499A (en) * 2004-09-15 2007-12-05 迪利根特技术公司 Systems and methods for searching and storage of data
EP2042979A2 (en) * 2007-09-26 2009-04-01 Hitachi, Ltd. Power efficient data storage with data de-duplication
CN103379160A (en) * 2012-04-25 2013-10-30 上海咏云信息技术有限公司 Difference synchronizing method for oversized file
CN103902686A (en) * 2014-03-25 2014-07-02 华为技术有限公司 Data duplicate removing method and data duplicate removing device
CN104035949A (en) * 2013-12-10 2014-09-10 南京信息工程大学 Similarity data retrieval method based on locality sensitive hashing (LASH) improved algorithm
CN104050103A (en) * 2014-06-06 2014-09-17 华中科技大学 Cache replacement method and system for data recovery
CN105069111A (en) * 2015-08-10 2015-11-18 广东工业大学 Similarity based data-block-grade data duplication removal method for cloud storage
CN107025070A (en) * 2016-01-15 2017-08-08 三星电子株式会社 versioned storage device and method
CN107678892A (en) * 2017-11-07 2018-02-09 黄淮学院 Recover the continuous data protection method of chain based on jump
CN108415669A (en) * 2018-03-15 2018-08-17 深信服科技股份有限公司 The data duplicate removal method and device of storage system, computer installation and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8051050B2 (en) * 2009-07-16 2011-11-01 Lsi Corporation Block-level data de-duplication using thinly provisioned data storage volumes
US20180095985A1 (en) * 2016-09-30 2018-04-05 Cubistolabs, Inc. Physical Location Scrambler for Hashed Data De-Duplicating Content-Addressable Redundant Data Storage Clusters

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101084499A (en) * 2004-09-15 2007-12-05 迪利根特技术公司 Systems and methods for searching and storage of data
EP2042979A2 (en) * 2007-09-26 2009-04-01 Hitachi, Ltd. Power efficient data storage with data de-duplication
CN103379160A (en) * 2012-04-25 2013-10-30 上海咏云信息技术有限公司 Difference synchronizing method for oversized file
CN104035949A (en) * 2013-12-10 2014-09-10 南京信息工程大学 Similarity data retrieval method based on locality sensitive hashing (LASH) improved algorithm
CN103902686A (en) * 2014-03-25 2014-07-02 华为技术有限公司 Data duplicate removing method and data duplicate removing device
CN104050103A (en) * 2014-06-06 2014-09-17 华中科技大学 Cache replacement method and system for data recovery
CN105069111A (en) * 2015-08-10 2015-11-18 广东工业大学 Similarity based data-block-grade data duplication removal method for cloud storage
CN107025070A (en) * 2016-01-15 2017-08-08 三星电子株式会社 versioned storage device and method
CN107678892A (en) * 2017-11-07 2018-02-09 黄淮学院 Recover the continuous data protection method of chain based on jump
CN108415669A (en) * 2018-03-15 2018-08-17 深信服科技股份有限公司 The data duplicate removal method and device of storage system, computer installation and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于差异数据的块级数据库备份系统;李自尊;《四川大学学报(自然科学版)》;20120731;第783-789页 *
网络备份中重复数据删除技术研究;杨天明;《中国博士学位论文全文数据库 信息科技辑》;20110715;正文第7-17、24-50、60-72、81-91页 *

Also Published As

Publication number Publication date
CN109445703A (en) 2019-03-08

Similar Documents

Publication Publication Date Title
CN109445703B (en) A kind of Delta compression storage assembly based on block grade data deduplication
CN109358987B (en) A kind of backup cluster based on two-stage data deduplication
CN104978151B (en) Data reconstruction method in the data de-duplication storage system perceived based on application
CN109445702B (en) block-level data deduplication storage system
US9317218B1 (en) Memory efficient sanitization of a deduplicated storage system using a perfect hash function
CN102779180B (en) The operation processing method of data-storage system, data-storage system
US11182256B2 (en) Backup item metadata including range information
US7434015B2 (en) Efficient data storage system
CN103870514B (en) Data de-duplication method and device
CN107391774B (en) The rubbish recovering method of log file system based on data de-duplication
CN106201771B (en) Data-storage system and data read-write method
CN102323958A (en) Data de-duplication method
US8271456B2 (en) Efficient backup data retrieval
US20110040763A1 (en) Data processing apparatus and method of processing data
CN107544873A (en) A kind of standby system and method for depositing Backup Data
CN110399310A (en) A kind of recovery method and device of memory space
CN102436408A (en) Data storage cloud and cloud backup method based on Map/Dedup
CN108415671B (en) Method and system for deleting repeated data facing green cloud computing
CN110569245A (en) Fingerprint index prefetching method based on reinforcement learning in data de-duplication system
CN107368545B (en) A kind of De-weight method and device based on Merkle Tree deformation algorithm
CN102024034A (en) Fragment processing method for high-definition media-oriented embedded file system
CN104050103A (en) Cache replacement method and system for data recovery
CN105493080B (en) The method and apparatus of data de-duplication based on context-aware
CN105677238A (en) Method for distributed storage based data deduplication on virtual machine system disk
CN106648991A (en) Duplicated data deletion method in data recovery system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20210805

Address after: 401336 building 10, No.1 Jiangxia Road, Nan'an District, Chongqing

Patentee after: Chongqing Lihe Technology Innovation Center Co.,Ltd.

Address before: 510700 room 801, No. 85, Kefeng Road, Huangpu District, Guangzhou City, Guangdong Province

Patentee before: Yami Technology (Guangzhou) Co.,Ltd.

Effective date of registration: 20210805

Address after: 510700 room 801, No. 85, Kefeng Road, Huangpu District, Guangzhou City, Guangdong Province

Patentee after: Yami Technology (Guangzhou) Co.,Ltd.

Address before: 463000 Huanghuai college, No. 6, Kaiyuan Avenue, Yicheng District, Zhumadian City, Henan Province

Patentee before: HUANGHUAI University

TR01 Transfer of patent right