Summary of the invention
The object of the present invention is to provide a kind of, and the Delta based on block grade data deduplication compresses storage assembly, eliminates byte level
Repeated data improves data de-duplication ratio and storage space utilization.
In order to achieve the above objectives, the technical solution adopted by the present invention is that: the invention discloses one kind to be gone based on block grade data
The Delta of weight compresses storage assembly, and the Delta compression storage assembly includes container access module;The container access module
Container storage algorithm and container recovery algorithms are run using similarity indexing, similar buffer area and vessel buffers area data structure;
What the block grade data deduplication storage that the container storage algorithm is used to receive upper layer sended over writes container order, to container
Delta compression is carried out, and will be in the container storage pond on the compressed container write-in disk unit of Delta;The container restores
Algorithm is for receiving the reading container order that the block grade data deduplication storage on upper layer sends over, by container index from disk
On container storage pond in read specified container, will reading container restore after return to upper layer block grade data deduplication storage
System;The Delta compression storage assembly also receives the reading container metadata that the block grade data deduplication storage on upper layer is sent
Order reads the metadata of specified containers from the container storage pond on disk unit, specified metadata is sent to upper layer
Block grade data deduplication storage.
The container storage algorithm in turn includes the following steps:
(201), it initializes:
First: parameter S, R, Sr and L are read from configuration file;The configuration file is stationed on disk unit, is used to
The configuration information of record system;
The parameter S is preset positive integer, indicates read from the container storage pond of disk when Delta compression
Enter the maximum number of the similar vessels of memory, the similar vessels refer to the similar container of content;
The parameter R is preset positive number, indicates that the minimum Delta compression ratio allowed, the Delta compression ratio are
Refer to after generating Delta block to data block progress Delta compression using reference block, the ratio of data block size and Delta block size;
The parameter Sr is preset positive integer, and 1/Sr indicates hook signature sampling rate;
The parameter L is preset positive integer, indicates maximum Delta chain length;
Then: judging whether to be system configuration initial stage, then generate an empty similarity indexing in memory in this way;If not,
The similarity indexing of backup is read in into memory from configuration file;
It is last: to generate an empty similar buffer area in memory;An empty vessel buffers area is generated in memory, is used
To keep in from the container read in the container storage pond on disk in memory;An empty storehouse is generated in memory, is denoted as
Stack;
(202), it receives container: receiving the container of writing that one sends over from the block grade data deduplication storage on upper layer and order
It enables, extracts container to be written in container order from writing, be denoted as upper layer container;An empty format is generated in memory to hold
Device is denoted as Work container, and vessel buffers area is written in Work container;It is spare to empty similar buffer area;The upper layer container refers to
Container used in the block grade data deduplication storage on upper layer;The format container refers to Delta compression storage assembly
Container used in container access module;
(203), fingerprint copies: reading container identifier from the meta-data region of upper layer container, and is written into Work container
Vessel head container identification field;From the meta-data region read block fingerprint of upper layer container, by these data block fingerprints
The fingerprint region of Work container is sequentially written according to its original sequence;
(204), similar signatures are calculated: successively calculating the similar signatures of each data block in the data field of upper layer container;
A similar signatures block is generated for each similar signatures, and the similar signatures are written to the similar signatures word of similar signatures block
Section;Similar signatures block is sequentially written in the similar signatures area of Work container according to the sequence of its corresponding data block;
(205), it extracts hook signature: all similar signatures for including in Work container being taken out according to the ratio of 1/Sr
Sample is signed the similar signatures of extraction as hook, and hook signature is sequentially written in the hook signature area of Work container;By work
Make similar signatures of the smallest similar signatures as container in all similar signatures for including in container, by the similar signatures of container
The container signature field of the vessel head of Work container is written;
(206), similarity indexing updates:
First: the container identifier of Work container is assigned to variable cid;
Secondly: signing, be handled as follows: by the hook for each hook for including in Work container hook signature area
Son signature is assigned to variable hook, generates a mapping<hook, and cid>, general<hook, cid>be inserted into similarity indexing;
(207), similar vessels are searched: inquiry similarity indexing, are found out and are shared those of hook signature container with Work container,
It is if it is not found, then go to step (228);Otherwise, according to sharing the quantity of hook signature with Work container from big to small from looking for
To container in choose most S containers, confirm that these containers for being selected are the similar vessels of Work container;In vessel buffers
These similar vessels are searched in area, and the similar vessels not in vessel buffers area are read into vessel buffers from container storage pool
Area enters step (208);
(208), similar buffer area is write:
The similar signatures area for successively scanning each similar vessels of Work container in vessel buffers area, reads similar signatures
All similar signatures blocks in area are handled as follows each similar signatures block of reading: generation one is similar in memory
Buffer area index node, is denoted as Node;The type field value of the similar signatures block and offset word segment value are individually copied to Node
Type field and offset field;By the container identifier word of the container identifier write-in Node of the similar signatures block said container
Section;The similar signatures field for reading the similar signatures block remembers that the similar signatures of reading are sign, general<sign, Node>insertion
To similar buffer area;
(209), prepare processing data block: one read pointer P1 of setting is directed toward first phase in Work container similar signatures area
Like signaling block, first data block that a read pointer P2 is directed toward upper layer container data area is set;
(210), read block: data block pointed by a P2 is read from the container data area of upper layer, is denoted as Dr, from
Similar signatures block pointed by a P1 is read in Work container similar signatures area, is denoted as Block, is read the similar label of Block
The value of file-name field, is denoted as sign1;
(211), similar buffer area is searched: the similar label that similar signatures field value is sign1 are searched in similar buffer area
Name node, goes to step (224) if it is not found,;Otherwise, a read pointer P3 is set and is directed toward the similar signatures node just found
Index node chained list first index node, enter step (212);
(212), judge data block: if the type field value for the index node that P3 is directed toward for Delta block mark, turn the
(214) step;Otherwise, it is data block mark, enters step (213);
(213), short chain Delta operation: by the value and offset word of the container identification field of index node pointed by P3
The value of section is assigned to variable cid0 and offset0 respectively, and from address<cid0 in container buffer area, offset0>place is read
One data block, is denoted as D0;With D0For reference block and DrDelta operation is carried out, Delta block △ is generated0,r ;If Delta is pressed
Shrinkage is greater than or equal to R, then compresses success, go to step (225);Otherwise, compression failure turns in next step;
(214), skip Delta block: P3 moves forward a step, is directed toward next index node, if P3 non-empty, turns the
(212) otherwise step shows the tail portion for having arrived at index node chained list, P3 is directed toward to the first of the index node chained list again
A index node turns in next step;
(215), judge Delta block: if the type field value for the index node that P3 is directed toward for data block mark, turn the
(223) otherwise step is Delta block mark, turn in next step;
(216), Delta block: the value and offset field of the container identification field of index node pointed by P3 is read
Value designate address of the Delta block in vessel buffers area, from the address read a Delta block;
(217), it is pressed into storehouse: the Delta block being pressed into Stack, reference is read from the Delta build of the Delta block
The reference block address of reading is stored in variable<cid1 by block address, and offset1>in, wherein cid1 is container identifier,
Offset1 is the position of the reference block in the data field of container cid1;
(218), reference block is read: if container cid1 in vessel buffers area, reads reference from container buffer area
Block<cid1, offset1>, otherwise, container cid1 is read in into vessel buffers area from container storage pool first, is then read again
Reference block<cid1, offset1>;
(219), judgement reference block: if reference block<cid1, offset1>it is Delta block, turn (217) step, it is no
Then, it is data block, the content of the data block is stored in variables D0In, general<cid1, offset1>it is assigned to variable<cid0,
Offset0 >, it is assigned to variable length by 1, is turned in next step;
(220), long-chain Delta calculation step: with D0For reference block and DrIt carries out Delta operation and generates Delta block
△0,r ;If Delta compression ratio is greater than or equal to R and length is less than or equal to L, success is compressed, (225) are gone to step,
Otherwise, compression failure turns in next step;
(221), judge storehouse: if Stack goes to step (223) for sky, otherwise, turning in next step;
(222), pop-up a stack: popping up a Delta block from Stack, be denoted as △, by the address of △ deposit variable <
Cid0, offset0 >, to D0Delta inverse operation is carried out with △, the result of Delta inverse operation is stored in variables D0In, by variable
The value of length increases by 1, goes to step (220);
(223), skip data block: P3 moves forward a step, is directed toward next index node, if P3 non-empty, turns the
(215) otherwise step shows the tail portion for having arrived at index node chained list, turn in next step;
(224), storing data block: in data block DrBefore add a data block head, be written in the data block head
Data block mark and DrSize information;If the data field non-empty of Work container, by the data block D after additional data buildr
It adds in the data field of Work container behind data with existing, otherwise, by the data block D after additional data buildrWork is written
Make the initial position of the data field of container;By the type field of data block mark write-in Block, by data block DrIn Work container
The offset field of the location information write-in Block of data field, goes to step (226);
(225), Delta block is stored: in Delta block △0,rBefore add a Delta build, in the Delta build
In be written Delta block mark, △0,rSize and △0,rReference block address<cid0, offset0>;If work is held
The data field non-empty of device, then by the Delta block △ after additional Delta build0,rAddition has number in the data field of Work container
Behind, otherwise, by the Delta block △ after additional Delta build0,rThe initial position of the data field of Work container is written;It will
The type field of Block is written in Delta block mark, by Delta block △0,rLocation information in Work container data field is written
The offset field of Block;Storehouse Stack is emptied;
(226), similar buffer area updates: generating a similar buffer area index node in memory, is denoted as Node1;It will
The type field value and offset word segment value of Block are individually copied to the type field and offset field of Node1;By Work container
The container identification field of container identifier write-in Node1;General<sign1, Node1>it is inserted into similar buffer area;
(227), data block is disposed judgement: P1 being moved forward a step, is directed toward in the similar signatures area of Work container
Next similar signatures block, by P2 move forward a step, be directed toward upper layer container data area in next data block;If P2
For sky, then shows that the data block in the container of upper layer is all disposed, go to step (229);Otherwise, (210) are gone to step;
(228), former container is stored: first: from first data BOB(beginning of block), successively handling every in upper layer container data area
One data block adds data block head before the data block, writing data blocks mark and the data block in data block head
Size information;It is if the data field non-empty of Work container, the data block after additional data build is additional in Work container
In data field behind data with existing, otherwise, by rising for the data field of the data block write-in Work container after additional data build
Beginning position;Similar signatures block corresponding with the data block in the similar signatures area of Work container is handled, by data block mark
The type field of the similar signatures block is written in will, and the similar label are written in the location information by the data block in Work container data field
The offset field of name block (NAM);
Secondly: abandoning upper layer container;It calculates the size of Work container and size information is written to the vessel head of Work container;
If container storage pond non-empty, Work container is added in container storage pond behind data with existing, otherwise, work is held
The initial position in device write-in container storage pond;
It is last: by the container identifier of the Work container in rigid write-in container storage pond and the Work container in container storage
Container index is written in location information in pond, goes to step (230);
(229), new container is stored:
First: abandoning upper layer container;It calculates the size of Work container and size information is written to the vessel head of Work container;
If container storage pond non-empty, Work container is added in container storage pond behind data with existing, otherwise, work is held
The initial position in device write-in container storage pond;
Secondly: by the container identifier of the Work container in rigid write-in container storage pond and the Work container in container storage
Container index is written in location information in pond;
(230), end of run judges: judging whether to receive end of run instruction, such as otherwise goes to step (202);If so,
Then turn in next step;
(231), terminate:
First: stopping receiving the container that the block grade data deduplication storage on upper layer sends over;
Then: configuration file is written into the similarity indexing in memory;
It is last: to destroy similarity indexing, vessel buffers area, similar buffer area and the storehouse Stack backed off after random in memory.
The container recovery algorithms in turn include the following steps:
(301), it initializes: generating an empty vessel buffers area in memory, deposited for temporary from the container on disk
The container in memory is read in reservoir;An empty storehouse is generated in memory, is denoted as Stack;
(302), it receives read command: receiving a reading container sended over from the block grade data deduplication storage on upper layer
Order is denoted as cid from extraction vessel identifier in container order is read;An empty upper layer format container, note are generated in memory
For upper layer container;
(303), it reads container: reading the container that container identifier is cid in the container storage pond on disk, be denoted as work
Make container, vessel buffers area is written into Work container;
(304), metadata recovering step: according to the type and call format of upper layer container metadata from the member of Work container
Corresponding metadata is read in data field, by the meta-data region of these metadata write-in upper layer container of reading;
(305), prepare processing data field: one read pointer P1 of setting is directed toward first object of Work container data field;
(306), judge object: if object pointed by P1 is a data block, which being denoted as Dr, turn
Step (312);Otherwise, it is Delta block, turns in next step;
(307), it is pressed into storehouse: the Delta block being pressed into Stack, reference is read from the Delta build of the Delta block
Block address, remember reading reference block address be<cid1, offset1>, wherein cid1 is container identifier, and offset1 is this
Quote position of the block in the data field of container cid1;
(308), reference block is read: if container cid1 in vessel buffers area, reads reference from container buffer area
Block<cid1, offset1>, otherwise, container cid1 is read in into vessel buffers area from container storage pool first, is then read again
Reference block<cid1, offset1>;
(309), judgement reference block step: if reference block<cid1, offset1>it is Delta block, go to step (307);
Otherwise, it is data block, which is stored in variables D, turns in next step;
(310), pop-up a stack: popping up a Delta block from Stack, be denoted as △, carries out the inverse fortune of Delta to D and △
It calculates, it will be in the result deposit variables D of Delta inverse operation;
(311), judge storehouse: if Stack non-empty, turning (310) step;Otherwise, the content of variables D is denoted as data block
Dr, turn in next step;
(312), data block is copied: if upper layer container data area non-empty, by data block DrIt adds in upper layer container
In data field behind data with existing;Otherwise, by data block DrThe initial position in upper layer container data area is written;
(313), judge data field: read pointer P1 moves forward a step, and it is next right in Work container data field to be directed toward
As going to step (306) if P1 non-empty;Otherwise, data field is disposed, and turns in next step;
(314), end of run judges: the upper layer container handled well is sent to the block grade data deduplication storage system on upper layer
System goes to step (302) if being not received by end operation order;Otherwise, turn in next step;
(315), terminate: destroying vessel buffers area and the operation of storehouse Stack backed off after random.
The similarity indexing is memory Hash table;The memory Hash table includes a bucket group;Each of in the bucket group
The corresponding number of bucket, and the mapping between hook signature and bucket number, the hook being mapped in bucket are established using hash function
Signature is stored in hook signature node;Each hook signature node stores a unique hook and signs and be associated with an appearance
Device identifier queue, the identifier of container of the storage comprising hook signature in container identifier queue;The hook signature knot
Point is made of hook signature field, spilling chain table pointer field and container identifier queue field;The hook signature field is used
In storage one unique hook signature;The spilling chain table pointer field is mapped to same for storage when handling hash-collision
The address of another hook signature node in a bucket;Container identifier queue pointer field is for storing hook signature
The first address of the associated container identifier queue of node.
The similar buffer area is memory Hash table;The memory Hash table includes a bucket group;It is every in the bucket group
A barrel of correspondence one number, and the mapping between similar signatures and bucket number, the phase being mapped in bucket are established using hash function
It is stored in similar signatures node like signature;Each similar signatures node stores a unique similar signatures and is associated with one
Index node chained list stores index node in index node chained list, has this similar wherein each index node stores one
The data block of signature or the information of Delta block;The similar signatures of the Delta block refer to the corresponding data block of Delta block
Similar signatures;The similar signatures node is by similar signatures field, spilling chain table pointer field and index node chain table pointer word
Duan Zucheng;The similar signatures field is for storing a unique similar signatures;The spilling chain table pointer field is for locating
The address for another similar signatures node that storage is mapped in the same bucket when managing hash-collision;The index node chained list refers to
Needle field is used to store the first address of the associated index node chained list of the similar signatures node;The index node is by class type-word
Section, container identification field, offset field and chain table pointer field composition;The type field for store data block mark or
Delta mark;The container identification field and offset field give the address information of data block or Delta block;The chain
Table pointer field is used to store the address of next index node in the index node chained list.
The vessel buffers area is the logical memory chained list known, and the container being read into vessel buffers area is linked at the memory
In chained list, when vessel buffers area is full, deleted from the memory chained list using the least recently used replacement algorithm of logical knowledge
Container;Work container and its similar vessels are stationed always in vessel buffers area until new Work container and its similar vessels are read
When entering vessel buffers area, old Work container and its similar vessels are likely to be set to scaling method to choose and delete from container buffer area
It removes.
The container storage pond is stationed on disk unit, and storage container is used for;The container index is stationed to be set in disk
It is standby upper, for establishing container identifier and container the reflecting between the position in container storage pond with the container identifier
It penetrates.
The present invention proposes a kind of Delta compress technique based on block grade data deduplication, the block series applied to current mainstream
According to the backstage of deduplication storage, Delta compression is carried out to set of metadata of similar data block, byte level repeated data is eliminated, further increases
Data de-duplication ratio and storage space utilization.The present invention determines phase by calculating and comparing the similar signatures of data block
Likelihood data block, the data block with same and similar signature is set of metadata of similar data block, and is handled as unit of container, and backstage is grasped
Make such as container compression, storage and restores transparent to upper-level system, the seamless interfacing of realization and upper-level system.Using data buffering
And index technology, it realizes the instant lookup of set of metadata of similar data, the readwrite performance of Delta compression and container can be effectively improved, so that should
Technology is able to satisfy the needs of extensive high-performance data backup.Specific advantage is as described below:
1, Delta compression is carried out to set of metadata of similar data block, eliminates byte level repeated data, further increases data de-duplication
Ratio and storage space utilization;
2, without modifying to existing piece of grade data deduplication storage and can use the present invention on backstage;
3, it is handled as unit of container, protects the redundancy locality of data flow, while using similarity indexing, similar
The technologies such as buffer area and vessel buffers area, can effectively improve the data processing performance on backstage;
4, container is added in order in the container storage pond on disk, avoids the random small letter I/O of disk, reading and writing data
Performance is high.
Specific embodiment
The invention discloses a kind of, and the Delta based on block grade data deduplication compresses storage assembly, as shown in Figure 1, described
It includes container access module that Delta, which compresses storage assembly,;The container access module uses similarity indexing, similar buffer area and appearance
Device buffer data structure runs container storage algorithm and container recovery algorithms;The container storage algorithm receives the block on upper layer
What grade data deduplication storage sended over writes container order, carries out Delta compression to container, and Delta is compressed
Container is written in the container storage pond on disk;The container recovery algorithms receive the block grade data deduplication storage hair on upper layer
The reading container order brought reads specified container from the container storage pond on disk by container index, by reading
Container returns to the block grade data deduplication storage on upper layer after restoring.Delta compression storage assembly operates in block grade data
The backstage of weight storage system is responsible for the container sended over to block grade data deduplication storage and carries out Delta compression, further
Byte level repeated data between set of metadata of similar data block is eliminated, improves data de-duplication ratio and storage space utilization to reach
Purpose.The present invention determines set of metadata of similar data block by calculating and comparing the similar signatures of data block, and as unit of container into
Row is handled, and consistency operation such as container compression, storage and recovery etc. is transparent to upper-level system, and it is seamless right with upper-level system to realize
It connects.Using technologies such as similarity indexing, similar buffer area and vessel buffers areas, realizes the instant lookup of set of metadata of similar data, can effectively mention
The readwrite performance of high Delta compression and container, so that the technology is able to satisfy the needs of extensive high-performance data backup.
As shown in Fig. 2, the container storage algorithm in turn includes the following steps:
(201), it initializes:
First: parameter S, R, Sr and L are read from configuration file;The configuration file is stationed on disk unit, is used to
The configuration information of record system;
The parameter S is preset positive integer, indicates read from the container storage pond of disk when Delta compression
Enter the maximum number of the similar vessels of memory, the similar vessels refer to the similar container of content;Parameter S setting is excessive, can drop
The Delta compression performance of low data block, setting is too small, then can reduce the Delta compression ratio of data block, in an implementation, S can be set
It is set to 2,3 or 4.
The parameter R is preset positive number, indicates that the minimum Delta compression ratio allowed, the Delta compression ratio are
Refer to after generating Delta block to data block progress Delta compression using reference block, the ratio of data block size and Delta block size;
In an implementation, R may be configured as 2,2.5 or 3.
The parameter Sr is preset positive integer, and 1/Sr indicates hook signature sampling rate;Hook signature sampling rate be
A critically important parameter, if its value is too small, the hook signature generated is very little, will affect the lookup precision of set of metadata of similar data block,
If its value is excessive, the hook signature generated is too many, and similarity indexing can be made excessive, and memory overhead is high.In an implementation, root
According to the size of system scale, Sr can value 64 or 32.
The parameter L is preset positive integer, indicates maximum Delta chain length;Parameter L setting is too small, can reduce
Delta compression effectiveness, is arranged excessive, then can reduce reading and writing data performance, while data compression income obtained and little;?
In implementation, parameter L may be configured as 5,6 or 7.
Then: judging whether to be system configuration initial stage, then generate an empty similarity indexing in memory in this way;If not,
The similarity indexing of backup is read in into memory from configuration file;
The similarity indexing is memory Hash table;As shown in Figure 8: the memory Hash table includes a bucket group;The bucket
A barrel corresponding number each of in group, and the mapping between hook signature and bucket number is established using hash function, it is mapped to
Hook signature in bucket is stored in hook signature node;Each hook signature node stores a unique hook and signs and close
Join a container identifier queue, the identifier of container of the storage comprising hook signature in container identifier queue;Such as Fig. 9
Shown, the hook signature node is by hook signature field, spilling chain table pointer field and container identifier queue pointer field
Composition;The hook signature field is for storing a unique hook signature;The spilling chain table pointer field is for handling
The address for another hook signature node being mapped in the same bucket is stored when hash-collision;The container identifier queue refers to
Needle field is used to store the first address of the hook signature associated container identifier queue of node;
The similarity indexing is used to establish the mapping between hook signature and the container signed comprising the hook, comprising same
The possible more than one of container of a hook signature, in this way, shared hook signature can be quickly found out by inquiring similarity indexing
Container;The present invention confirms that the container of shared hook signature is similar vessels, and the similar vessels refer to the similar container of content.
In the present embodiment, the similarity indexing is stationed in memory, is convenient for quick search.Wherein, at the beginning of judging system configuration
The method of phase is the mature prior art.
It is last: to generate an empty similar buffer area in memory;An empty vessel buffers area is generated in memory, is used
To keep in from the container read in the container storage pond on disk in memory;An empty storehouse is generated in memory, is denoted as
Stack;
The similar buffer area is memory Hash table;As shown in Figure 10, the memory Hash table includes a bucket group;It is described
A barrel corresponding number each of in bucket group, and the mapping between similar signatures and bucket number, mapping are established using hash function
Similar signatures in bucket are stored in similar signatures node;Each similar signatures node stores a unique similar signatures simultaneously
It is associated with an index node chained list, stores index node in index node chained list, wherein each index node stores one
The information of data block or Delta block with the similar signatures;The similar signatures of the Delta block refer to that the Delta block is corresponding
Data block similar signatures;As shown in figure 11: the similar signatures node is by similar signatures field, spilling chain table pointer field
It is formed with index node chain table pointer field;The similar signatures field is for storing a unique similar signatures;It is described to overflow
The ground for another similar signatures node that storage is mapped in the same bucket when chain table pointer field is used to handle hash-collision out
Location;The index node chain table pointer field is used to store the first address of the associated index node chained list of the similar signatures node;
As shown in figure 12: the index node is made of type field, container identification field, offset field and chain table pointer field;
The type field is for storing data block mark or Delta mark;The container identification field and offset field give
The address information of data block or Delta block;The chain table pointer field is for storing next index in the index node chained list
The address of node.
Each similar signatures node is associated with an index node chained list, the index knot in the similar buffer area
Each index node stores the index information of a data block or Delta block in point chained list, in the same index node chained list
Data block or Delta block similar signatures having the same.
In the present embodiment, the similar buffer area is stationed in memory, convenient for being quickly found out when carrying out Delta compression
The reference block of data block to be compressed.
The vessel buffers area is the logical memory chained list known, and the container being read into vessel buffers area is linked at the memory
In chained list, when vessel buffers area is full, deleted from the memory chained list using the least recently used replacement algorithm of logical knowledge
Container;Work container and its similar vessels are stationed always in vessel buffers area until new Work container and its similar vessels are read
When entering vessel buffers area, old Work container and its similar vessels are likely to be set to scaling method to choose and delete from container buffer area
It removes.
Being provided with for the vessel buffers area is conducive to improve reading and writing data performance, because container protects the redundancy of data flow
Locality so that in the same container set of metadata of similar data block of data block very likely also in a same vessel, in this way, from disk
Last time reads whole container, not only can be to avoid the random small letter I/O of disk, but also buffer area hit rate can be improved, and reduces disk
Read and write number.
(202), it receives container: receiving the container of writing that one sends over from the block grade data deduplication storage on upper layer and order
It enables, extracts container to be written in container order from writing, be denoted as upper layer container;An empty format is generated in memory to hold
Device is denoted as Work container, and vessel buffers area is written in Work container;It is spare to empty similar buffer area;The upper layer container refers to
Container used in the block grade data deduplication storage on upper layer;The format container refers to Delta compression storage assembly
Container used in container access module.As shown in figure 4, container is by meta-data region and data district's groups at the read-write of meta-data region
From top to bottom, the read-write sequence of data field is packaged into appearance from bottom to top, by the meta-data region after finishing writing and data field docking to sequence
Device;The meta-data region is made of vessel head, fingerprint region, similar signatures area and hook signature area.The vessel head is by container mark
Know symbol field, size field and container signature field composition, be respectively used to store the container identifier of the container, container size and
Container similar signatures;Fingerprint region block fingerprint for storing data;The similar signatures area is for storing similar signatures block;Such as
Shown in Fig. 5, the similar signatures block is made of similar signatures field, type field and offset field;The similar signatures field
For storing the similar signatures of corresponding data block;The type field block mark or Delta block mark for storing data;It is described
Offset field is for storing the address of corresponding data block or Delta block within a data area.The hook signature area is for storing hook
Son signature;The data field block or Delta block for storing data;The data block is added in front when storage is to data field
One data block head, as shown in fig. 6, the data block head is made of data block mark and data block size field.It is described
Delta block attached a Delta build when storage is to data field in front, as shown in fig. 7, the Delta build by
Delta block mark, Delta block size field and reference block address field composition, the reference block address field is by container identification
Accord with field and offset field composition;
(203), fingerprint copies: reading container identifier from the meta-data region of upper layer container, and is written into Work container
Vessel head container identification field;From the meta-data region read block fingerprint of upper layer container, by these data block fingerprints
The fingerprint region of Work container is sequentially written according to its original sequence;
(204), similar signatures are calculated: successively calculating the similar signatures of each data block in the data field of upper layer container;
A similar signatures block is generated for each similar signatures, and the similar signatures are written to the similar signatures word of similar signatures block
Section;Similar signatures block is sequentially written in the similar signatures area of Work container according to the sequence of its corresponding data block;
The calculation method of the similar signatures of the data block is the mature prior art, method are as follows:: from data block
Beginning position starts, and is slided in data block with the window of a fixed size, as soon as before every sliding byte, use logical sieve known
Guest's fingerprint algorithm calculates sieve guest's fingerprint for falling into the data patch in window, and the smallest guest sieve fingerprint in all data patch is taken to make
For the similar signatures of data block.
In the present embodiment, the size of the window is predetermined a constant, can use 512 bytes, and guest sieve refers to
The length of line can use 4 bytes.
(205), it extracts hook signature: all similar signatures for including in Work container being taken out according to the ratio of 1/Sr
Sample is signed the similar signatures of extraction as hook, and hook signature is sequentially written in the hook signature area of Work container;By work
Make similar signatures of the smallest similar signatures as container in all similar signatures for including in container, by the similar signatures of container
The container signature field of the vessel head of Work container is written;
(206), similarity indexing updates:
First: the container identifier of Work container is assigned to variable cid;
Secondly: signing, be handled as follows: by the hook for each hook for including in Work container hook signature area
Son signature is assigned to variable hook, generates a mapping<hook, and cid>, general<hook, cid>be inserted into similarity indexing;
In the present embodiment, the method for general<hook, cid>be inserted into similarity indexing is equal into memory Hash table
Insertion<key, value>, for the mature prior art.
(207), similar vessels are searched: inquiry similarity indexing, are found out and are shared those of hook signature container with Work container,
It is if it is not found, then go to step (228);Otherwise, according to sharing the quantity of hook signature with Work container from big to small from looking for
To container in choose most S containers, confirm that these containers for being selected are the similar vessels of Work container;In vessel buffers
These similar vessels are searched in area, and the similar vessels not in vessel buffers area are read into vessel buffers from container storage pool
Area;
(208), similar buffer area is write:
The similar signatures area for successively scanning each similar vessels of Work container in vessel buffers area, reads similar signatures
All similar signatures blocks in area are handled as follows each similar signatures block of reading: generation one is similar in memory
Buffer area index node, is denoted as Node;The type field value of the similar signatures block and offset word segment value are individually copied to Node
Type field and offset field;By the container identifier word of the container identifier write-in Node of the similar signatures block said container
Section;The similar signatures field for reading the similar signatures block remembers that the similar signatures of reading are sign, general<sign, Node>insertion
To similar buffer area;
In the present embodiment, the method for general<sign, Node>be inserted into similar buffer area is equal to memory Hash table
Middle insertion<key, value>, for the mature prior art.
(209), prepare processing data block: one read pointer P1 of setting is directed toward first phase in Work container similar signatures area
Like signaling block, first data block that a read pointer P2 is directed toward upper layer container data area is set;
(210), read block: data block pointed by a P2 is read from the container data area of upper layer, is denoted as Dr, from
Similar signatures block pointed by a P1 is read in Work container similar signatures area, is denoted as Block, is read the similar label of Block
The value of file-name field, is denoted as sign1;
(211), similar buffer area is searched: the similar label that similar signatures field value is sign1 are searched in similar buffer area
Name node, goes to step (224) if it is not found,;Otherwise, a read pointer P3 is set and is directed toward the similar signatures node just found
Index node chained list first index node, turn in next step;
To data block DrBefore carrying out Delta compression, need to find and DrThe similar data block of content is as reference block.At this
In embodiment, with data block DrSimilar signatures sign1 be keyword corresponding similar signatures knot is searched in similar buffer area
Point then shows not finding data block D if it is not found,rReference block, go to data block D in (224) steprIt deposits as former state
Storage, if it is found, the then index node storage of linked list of similar signatures node data block DrAll potential reference blocks
Information, then, further find data block D by traversing the index node chained listrReference block.
Following (212), (213), (214) step operation in, preferential detection data block index node, only when all
Data block index node pointed by data block be unsuitable for be used as data block DrReference block when just further detect Delta
Block index node.This processing method can effectively improve Delta compression performance, because data block can be directly used as reference block,
And Delta block then needs to be first converted into and just can serve as quoting block after data block, this is related to traversing Delta chain, Delta inverse operation
Deng operation, time overhead is larger.
(212), judge data block: if the type field value for the index node that P3 is directed toward for Delta block mark, turn the
(214) step;Otherwise, it is data block mark, turns in next step;
(213), short chain Delta operation: by the value and offset word of the container identification field of index node pointed by P3
The value of section is assigned to variable cid0 and offset0 respectively, and from address<cid0 in container buffer area, offset0>place is read
One data block, is denoted as D0;With D0For reference block and DrDelta operation is carried out, Delta block △ is generated0,r ;If Delta is pressed
Shrinkage is greater than or equal to R, then compresses success, go to step (225);Otherwise, compression failure turns in next step;
(214), skip Delta block: P3 moves forward a step, is directed toward next index node, if P3 non-empty, turns the
(212) otherwise step shows the tail portion for having arrived at index node chained list, P3 is directed toward to the first of the index node chained list again
A index node turns in next step;
The data block pointed by all data block index nodes is unsuitable for being used as data block DrReference block when
Further detect Delta block index node.Index node is successively detected and handles in the operation of following (215) ~ (223) step
Each of chained list Delta block index node, until finding a reference block appropriate to data block DrCarry out Delta compression
Succeed and turns storing data block D in (225) steprDelta block, or can not find reference block appropriate and turn (224) step
It is middle by data block DrIt stores as former state.
For any Delta block index node in index node chained list, detects and handle and be divided to two processes, first
Process is traversal Delta chain, and second process is detection reference block.
(216) ~ (219) step is traversal Delta chain process below, and the process is pointed by Delta block index node
Delta block is starting point, data of each Delta block until Delta chain end on the direction of Delta chain reading Delta chain
Block.
(220) ~ (222) step is detection reference block process below, which uses the data block of Delta chain end first
It is reference block to data block DrDelta compression is carried out, turns storing data block D in (225) step if compressing successfullyrDelta
Otherwise block carries out Delta inverse operation against the direction of Delta chain, be reference block pair with the data block that Delta inverse operation generates
Data block DrDelta compression is carried out, successfully turns storing data block D in (225) step until Delta compressesrDelta block, or
All Delta blocks in Delta chain, which all detect to finish, does not find reference block appropriate yet, at this moment, turn (223) step detection and
Handle next Delta block index node in index node chained list.
(215), judge Delta block: if the type field value for the index node that P3 is directed toward for data block mark, turn the
(223) otherwise step is Delta block mark, turn in next step;
(216), Delta block: the value and offset field of the container identification field of index node pointed by P3 is read
Value designate address of the Delta block in vessel buffers area, from the address read a Delta block;
(217), it is pressed into storehouse: the Delta block being pressed into Stack, reference is read from the Delta build of the Delta block
The reference block address of reading is stored in variable<cid1 by block address, and offset1>in, wherein cid1 is container identifier,
Offset1 is the position of the reference block in the data field of container cid1;
(218), reference block is read: if container cid1 in vessel buffers area, reads reference from container buffer area
Block<cid1, offset1>, otherwise, container cid1 is read in into vessel buffers area from container storage pool first, is then read again
Reference block<cid1, offset1>;
(219), judgement reference block: if reference block<cid1, offset1>it is Delta block, turn (217) step, it is no
Then, it is data block, the content of the data block is stored in variables D0In, general<cid1, offset1>it is assigned to variable<cid0,
Offset0 >, it is assigned to variable length by 1, is turned in next step;
(220), long-chain Delta calculation step: with D0For reference block and DrIt carries out Delta operation and generates Delta block
△0,r ;If Delta compression ratio is greater than or equal to R and length is less than or equal to L, success is compressed, (225) are gone to step,
Otherwise, compression failure turns in next step;
(221), judge storehouse: if Stack goes to step (223) for sky, otherwise, turning in next step;
(222), pop-up a stack: popping up a Delta block from Stack, be denoted as △, by the address of △ deposit variable <
Cid0, offset0 >, to D0Delta inverse operation is carried out with △, the result of Delta inverse operation is stored in variables D0In, by variable
The value of length increases by 1, goes to step (220);
(223), skip data block: P3 moves forward a step, is directed toward next index node, if P3 non-empty, turns the
(215) otherwise step shows the tail portion for having arrived at index node chained list, turn in next step;
(224), storing data block: in data block DrBefore add a data block head, be written in the data block head
Data block mark and DrSize information;If the data field non-empty of Work container, by the data block D after additional data buildr
It adds in the data field of Work container behind data with existing, otherwise, by the data block D after additional data buildrWork is written
Make the initial position of the data field of container;By the type field of data block mark write-in Block, by data block DrIn Work container
The offset field of the location information write-in Block of data field, goes to step (226);
(225), Delta block is stored: in Delta block △0,rBefore add a Delta build, in the Delta build
In be written Delta block mark, △0,rSize and △0,rReference block address<cid0, offset0>;If work is held
The data field non-empty of device, then by the Delta block △ after additional Delta build0,rAddition has number in the data field of Work container
Behind, otherwise, by the Delta block △ after additional Delta build0,rThe initial position of the data field of Work container is written;It will
The type field of Block is written in Delta block mark, by Delta block △0,rLocation information in Work container data field is written
The offset field of Block;Storehouse Stack is emptied;
(226), similar buffer area updates: generating a similar buffer area index node in memory, is denoted as Node1;It will
The type field value and offset word segment value of Block are individually copied to the type field and offset field of Node1;By Work container
The container identification field of container identifier write-in Node1;General<sign1, Node1>it is inserted into similar buffer area;
In the present embodiment, the method for general<sign1, Node1>be inserted into similar buffer area is equal to memory Hash
Insertion<key in table, value>, for the mature prior art.
(227), data block is disposed judgement: P1 being moved forward a step, is directed toward in the similar signatures area of Work container
Next similar signatures block, by P2 move forward a step, be directed toward upper layer container data area in next data block;If P2
For sky, then shows that the data block in the container of upper layer is all disposed, go to step (229);Otherwise, (210) are gone to step;
(228), former container is stored: first: from first data BOB(beginning of block), successively handling every in upper layer container data area
One data block adds data block head before the data block, writing data blocks mark and the data block in data block head
Size information;It is if the data field non-empty of Work container, the data block after additional data build is additional in Work container
In data field behind data with existing, otherwise, by rising for the data field of the data block write-in Work container after additional data build
Beginning position;Similar signatures block corresponding with the data block in the similar signatures area of Work container is handled, by data block mark
The type field of the similar signatures block is written in will, and the similar label are written in the location information by the data block in Work container data field
The offset field of name block (NAM);
Secondly: abandoning upper layer container;It calculates the size of Work container and size information is written to the vessel head of Work container;
If container storage pond non-empty, Work container is added in container storage pond behind data with existing, otherwise, work is held
The initial position in device write-in container storage pond;
It is last: by the container identifier of the Work container in rigid write-in container storage pond and the Work container in container storage
Container index is written in location information in pond, goes to step (230);
(229), new container is stored:
First: abandoning upper layer container;It calculates the size of Work container and size information is written to the vessel head of Work container;
If container storage pond non-empty, Work container is added in container storage pond behind data with existing, otherwise, work is held
The initial position in device write-in container storage pond;
Secondly: by the container identifier of the Work container in rigid write-in container storage pond and the Work container in container storage
Container index is written in location information in pond;
The container storage pond is stationed on disk unit, and storage container is used for.
The container index is stationed on disk unit, for establishing container identifier and with the appearance of the container identifier
Mapping of the device between the position in container storage pond.
(230), end of run judges: judging whether to receive end of run instruction, such as otherwise goes to step (202);If so,
Then turn in next step;
(231), terminate:
First: stopping receiving the container that the block grade data deduplication storage on upper layer sends over;
Then: configuration file is written into the similarity indexing in memory;
It is last: to destroy similarity indexing, vessel buffers area, similar buffer area and the storehouse Stack backed off after random in memory.
As shown in figure 3, the container recovery algorithms in turn include the following steps:
(301), it initializes: generating an empty vessel buffers area in memory, deposited for temporary from the container on disk
The container in memory is read in reservoir;An empty storehouse is generated in memory, is denoted as Stack;
(302), it receives read command: receiving a reading container sended over from the block grade data deduplication storage on upper layer
Order is denoted as cid from extraction vessel identifier in container order is read;An empty upper layer format container, note are generated in memory
For upper layer container;
(303), it reads container: reading the container that container identifier is cid in the container storage pond on disk, be denoted as work
Make container, vessel buffers area is written into Work container;
(304), metadata recovering step: according to the type and call format of upper layer container metadata from the member of Work container
Corresponding metadata is read in data field, by the meta-data region of these metadata write-in upper layer container of reading;
(305), prepare processing data field: one read pointer P1 of setting is directed toward first object of Work container data field;
(306), judge object: if object pointed by P1 is a data block, which being denoted as Dr, turn
Step (312);Otherwise, it is Delta block, turns in next step;
Delta block pointed by P1 is reduced into data block by (307) ~ (310) step below, and operation is divided into two mistakes
Journey, first process are traversal Delta chain, and second process is Delta chain inverse operation.
(307), (308), (309) step are traversal Delta chain process below, and the process is with Delta block pointed by P1
For starting point, data block of each Delta block until Delta chain end on the direction of Delta chain reading Delta chain.
(310), (311) step carry out the inverse operation of Delta chain below, i.e., carry out the inverse fortune of Delta against the direction of Delta chain
It calculates, Delta block pointed by P1 is finally reduced into data block.
(307), it is pressed into storehouse: the Delta block being pressed into Stack, reference is read from the Delta build of the Delta block
Block address, remember reading reference block address be<cid1, offset1>, wherein cid1 is container identifier, and offset1 is this
Quote position of the block in the data field of container cid1;
(308), reference block is read: if container cid1 in vessel buffers area, reads reference from container buffer area
Block<cid1, offset1>, otherwise, container cid1 is read in into vessel buffers area from container storage pool first, is then read again
Reference block<cid1, offset1>;
(309), judgement reference block step: if reference block<cid1, offset1>it is Delta block, go to step (307);
Otherwise, it is data block, which is stored in variables D, turns in next step;
(310), pop-up a stack: popping up a Delta block from Stack, be denoted as △, carries out the inverse fortune of Delta to D and △
It calculates, it will be in the result deposit variables D of Delta inverse operation;
(311), judge storehouse: if Stack non-empty, turning (310) step;Otherwise, the content of variables D is denoted as data block
Dr, turn in next step;
(312), data block is copied: if upper layer container data area non-empty, by data block DrIt adds in upper layer container
In data field behind data with existing;Otherwise, by data block DrThe initial position in upper layer container data area is written;
(313), judge data field: read pointer P1 moves forward a step, and it is next right in Work container data field to be directed toward
As going to step (306) if P1 non-empty;Otherwise, data field is disposed, and turns in next step;
(314), end of run judges: the upper layer container handled well is sent to the block grade data deduplication storage system on upper layer
System goes to step (302) if being not received by end operation order;Otherwise, turn in next step;
(315), terminate: destroying vessel buffers area and the operation of storehouse Stack backed off after random.
In the implementation that said vesse stores algorithm and container recovery algorithms, the Delta operation and Delta inverse operation can
To select the Delta tool of compression such as vdelta, xdelta and zdelta, the Delta such as described vdelta, xdelta and zdelta pressure
Contracting tool is the mature prior art.
Other than what the block grade data deduplication storage in addition to executing upper layer sended over writes container order and reads container order,
In an implementation, the reading that the block grade data deduplication storage that the Delta compression storage assembly also executes upper layer sends over is held
Device metadata order, including read the order of container fingerprint;When executing reading container metadata order, the Delta compresses storage assembly
Specified metadata is sent to the block grade on upper layer by the meta-data region that specified container is read from the container storage pond on disk
Data deduplication storage system.