CN106407226A - Data processing method, backup server and storage system - Google Patents
Data processing method, backup server and storage system Download PDFInfo
- Publication number
- CN106407226A CN106407226A CN201510468057.9A CN201510468057A CN106407226A CN 106407226 A CN106407226 A CN 106407226A CN 201510468057 A CN201510468057 A CN 201510468057A CN 106407226 A CN106407226 A CN 106407226A
- Authority
- CN
- China
- Prior art keywords
- fingerprint
- index
- stored
- probability
- data block
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/174—Redundancy elimination performed by the file system
- G06F16/1748—De-duplication implemented within the file system, e.g. based on file segments
- G06F16/1752—De-duplication implemented within the file system, e.g. based on file segments based on file chunks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/1666—Error detection or correction of the data by redundancy in hardware where the redundant component is memory or memory area
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/174—Redundancy elimination performed by the file system
- G06F16/1748—De-duplication implemented within the file system, e.g. based on file segments
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Collating Specific Patterns (AREA)
- Measuring And Recording Apparatus For Diagnosis (AREA)
Abstract
The invention discloses a data processing method, a backup server and a storage system, and aims to solve the problems of low data storage efficiency caused by fingerprint comparison consuming mass I/O resources; the data processing method comprises the following steps: determining a first fingerprint set according to index fingerprints in a fingerprint index table and a fingerprint of a to-be-stored data block; obtaining a first probability that the first fingerprint table contains the fingerprint identical to the fingerprint of the to-be-stored data block according to the first index fingerprint, and obtaining a second probability that the second fingerprint table contains the fingerprint identical to the fingerprint of the to-be-stored data block according to the second index fingerprint; determining a second fingerprint set according to the first and second probabilities; obtaining a matching result between the plurality of fingerprints represented by the first index fingerprint and the fingerprint of the to-be-stored data block.
Description
Technical field
The present invention relates to field of computer technology, particularly to a kind of data processing method, backup server and
Storage system.
Background technology
In field of data storage, data de-duplication technology is a kind of crucial skill saving data space
Art, it can detect and eliminate data redundancy, and identical data is left behind with a copy, not only can be relatively
The earth saves disk space, and can lift the write performance of data, and saves the network bandwidth, wide
General it is applied to the fields such as file backup, online storage service, E-mail service.
In prior art, be stored with storage system fingerprint table, preserves and be stored in storage system in fingerprint table
The fingerprint of the data in system.When receiving data storage request, storage system is by the finger of data block to be stored
Stricture of vagina is compared with the fingerprint in fingerprint table, whether to determine data block to be stored as repeated data, and then really
Surely treat the storage mode of data storage block.
But, when the fingerprint in the fingerprint of data block to be stored and fingerprint table is compared, need fingerprint
Fingerprint in table reads in the internal memory of backup server, due to preserving the fingerprint of magnanimity in fingerprint table, because
This, carry out fingerprint comparison in whole fingerprint tables and will produce the carrying of substantial amounts of data, needs to consume substantial amounts of defeated
Enter output (English:input/output;Referred to as:I/O) resource, devotes a tremendous amount of time, and leads to storage system
The inefficiency of system data storage.
Content of the invention
The embodiment of the present invention provides a kind of data processing method, backup server and storage system, for solving
Because fingerprint comparison consumption a large amount of I/O resource leads to the less efficient problem of data storage.
In a first aspect, the embodiment of the present invention provides a kind of data processing method, methods described is by storage system
Backup server execution, described storage system includes described backup server and multiple memorizer, institute
State multiple fingerprint tables that are stored with storage system, in the plurality of fingerprint table record have be stored in the plurality of
The fingerprint of the data block in memorizer, methods described includes:
First fingerprint collection is determined according to the fingerprint of the index fingerprint in fingerprint index table and data block to be stored
Close, wherein, in described first fingerprint set, include the first index fingerprint, the second index fingerprint, described the
One index fingerprint is used for representing multiple fingerprints in the first fingerprint table, and described second index fingerprint is used for representing the
Multiple fingerprints in two fingerprint tables, the fingerprint of described data block to be stored belongs to described first index fingerprint institute's generation
The fingerprint region of the multiple fingerprints representated by multiple fingerprints of table and described second index fingerprint;
The finger including in first fingerprint table with described data block to be stored is obtained according to the described first index fingerprint
First probability of stricture of vagina identical fingerprint, and bag in described second fingerprint table is obtained according to the described second index fingerprint
Containing the second probability with the fingerprint identical fingerprint of described data block to be stored, wherein, described first probability
It is to be determined according to the multiple fingerprints that the described first index fingerprint represents, described second probability is according to described the
Multiple fingerprints that two index fingerprints represent determine;
According to described first probability and second determine the probability the second fingerprint set, wherein, described second fingerprint collection
Include at least in conjunction and have described first index fingerprint, the first probability according to the described first index fingerprint determination is not
Less than predetermined threshold value;
Obtain the multiple fingerprints representated by described first index fingerprint and the fingerprint of described data block to be stored
Matching result.
In conjunction with a first aspect, in the first possible implementation of first aspect, described first fingerprint table
It is stored in the first memory in the plurality of memorizer, described second fingerprint table is stored in the plurality of depositing
In second memory in reservoir;Described acquisition in the first fingerprint table according to the described first index fingerprint includes
With the first probability of the fingerprint identical fingerprint of described data block to be stored, including:
The fingerprint of described data block to be stored and described first index fingerprint are sent to the described first storage
Device;
Receive described first probability that described first memory returns, described first probability is used for representing described
The fingerprint identical with described data block to be stored is included in multiple fingerprints representated by first index fingerprint
The probability of fingerprint;
Described to be stored with described according to including in the described second index fingerprint described second fingerprint table of acquisition
Second probability of the fingerprint identical fingerprint of data block, including:
The fingerprint of described data block to be stored and described second index fingerprint are sent to the described second storage
Device;
Receive described first probability that described second memory returns, described first probability is used for representing described
The fingerprint identical with described data block to be stored is included in multiple fingerprints representated by second index fingerprint
The probability of fingerprint.
In conjunction with a first aspect, in the possible implementation of the second of first aspect, described backup server
Including additional storage, described first fingerprint table and described second fingerprint table are stored in described additional storage
In;
Described acquisition in the first fingerprint table according to the described first index fingerprint includes and described data to be stored
First probability of the fingerprint identical fingerprint of block, and described second fingerprint is obtained according to the described second index fingerprint
The second probability with the fingerprint identical fingerprint of described data block to be stored is included in table, including:
The fingerprint of described data block to be stored and described first index fingerprint, described second index fingerprint are sent out
Deliver to described additional storage;
Receive in the multiple fingerprints representated by the described first index fingerprint that described additional storage returns and wrap
Containing described first probability with the fingerprint identical fingerprint of described data block to be stored, and described second
The fingerprint identical fingerprint with described data block to be stored is included in multiple fingerprints representated by index fingerprint
Described second probability.
In conjunction with first aspect, the first possible implementation of first aspect and the second of first aspect
Any one of possible implementation, in the third possible implementation of first aspect, described
Each fingerprint in one fingerprint table comprises M position, and each M position fingerprint comprises N number of interval, described N number of area
Between in each interval include continuous S position in M position, in described N number of interval, any two interval does not weigh
Folded, described N number of interval digit sum is M, and N is the natural number more than or equal to 2, and S is nature
Number;
Be stored with described storage system the first statistical table, and described first statistical table includes described first index
, in the statistical information of described N number of interval numerical value, described first probability is really for multiple fingerprints representated by fingerprint
Determine mode to include:
A is determined according to described first statistical tableiThe institute of the multiple fingerprints representated by the described first index fingerprint
State the frequency of occurrence t in the numerical value in the i-th intervali, wherein, aiI-th area for the fingerprint of data block to be stored
Between numerical value, the span of i is 1 to N;
According to the t obtaining1To tNIn minima determine described first probability.
In conjunction with first aspect, the first possible implementation of first aspect and the second of first aspect
Any one of possible implementation, in the 4th kind of possible implementation of first aspect, described deposits
Be stored with storage system the first statistical table, and described first statistical table comprises representated by described first index fingerprint
The statistical information of the numerical value of the first interval of multiple fingerprints, and multiple representated by described first index fingerprint
The statistical information of the numerical value of the second interval of fingerprint, described first interval is the h position of each fingerprint to i-th bit
Interval, described second interval is the interval of the jth position of each fingerprint to kth position, wherein, h, i, j, k
It is natural number, the value of h is not more than the value of i, the value of j is not more than the value of k, described first interval and described
Second interval is not overlapping;The determination mode of described first probability includes:
Described in multiple fingerprints according to representated by described first statistical table determines a in the described first index fingerprint
Frequency of occurrence t in the numerical value of first interval1And multiple fingerprints representated by the described first index fingerprint for the b
The numerical value of described second interval in occur frequency t2, wherein, a is the h of the fingerprint of data block to be stored
Position to i-th bit numerical value, b is the numerical value of the jth position of the fingerprint of data block to be stored to kth position;
According to described t1And t2In minima determine described first probability.
Second aspect, the embodiment of the present invention provides a kind of backup server, and described backup server is applied to deposit
In storage system, described storage system includes described backup server and multiple memorizer, described storage system
In be stored with multiple fingerprint tables, in the plurality of fingerprint table, record has and is stored in the plurality of memorizer
The fingerprint of data block, described backup server includes:
Determining module, true for the fingerprint according to the index fingerprint in fingerprint index table and data block to be stored
Fixed first fingerprint set, wherein, includes the first index fingerprint, the second index in described first fingerprint set
Fingerprint, described first index fingerprint is used for representing the multiple fingerprints in the first fingerprint table, and described second index refers to
Stricture of vagina is used for representing the multiple fingerprints in the second fingerprint table, and the fingerprint of described data block to be stored belongs to described first
Multiple fingerprints representated by index fingerprint and the fingerprint of the multiple fingerprints representated by described second index fingerprint
Scope;
Obtain module, treat with described for obtaining to include in the first fingerprint table according to the described first index fingerprint
First probability of the fingerprint identical fingerprint of data storage block, and according to the described second index fingerprint obtains
The second probability with the fingerprint identical fingerprint of described data block to be stored is included in second fingerprint table, its
In, described first probability is to be determined according to multiple fingerprints that the described first index fingerprint represents, described second
Probability is to be determined according to multiple fingerprints that the described second index fingerprint represents;
Described determining module, is additionally operable to according to described first probability and second determine the probability the second fingerprint set,
Wherein, including at least in described second fingerprint set has described first index fingerprint, according to the described first index
The first probability that fingerprint determines is not less than predetermined threshold value;
Processing module, for obtaining multiple fingerprints and the described number to be stored representated by described first index fingerprint
Matching result according to the fingerprint of block.
In conjunction with second aspect, in the first possible implementation of second aspect, described first fingerprint table
It is stored in the first memory in the plurality of memorizer, described second fingerprint table is stored in the plurality of depositing
In second memory in reservoir;
Described acquisition module specifically for:The fingerprint of described data block to be stored and described first index are referred to
Stricture of vagina sends to described first memory;And receive described first probability that described first memory returns, described
First probability is used for representing in the multiple fingerprints representated by the described first index fingerprint including to be treated with described
The probability of the fingerprint identical fingerprint of data storage block;And
The fingerprint of described data block to be stored and described second index fingerprint are sent to the described second storage
Device;And receiving described second probability that described second memory returns, described second probability is used for representing in institute
State include in the multiple fingerprints representated by the second index fingerprint identical with the fingerprint of described data block to be stored
Fingerprint probability.
In conjunction with second aspect, in the possible implementation of the second of second aspect, described backup server
Also include:
Additional storage, for storing the first fingerprint table and described second fingerprint table;
Described acquisition module specifically for:The fingerprint of described data block to be stored and described first index are referred to
Stricture of vagina, described second index fingerprint send to described additional storage;Receive that described additional storage returns
The fingerprint phase with described data block to be stored is included in multiple fingerprints representated by described first index fingerprint
Comprise in described first probability of same fingerprint, and the multiple fingerprints representated by the described second index fingerprint
There is described second probability with the fingerprint identical fingerprint of described data block to be stored.
In conjunction with the possible implementation of the second of second aspect, in the third possible realization of second aspect
In mode, each fingerprint in described first fingerprint table comprises M position, and each M position fingerprint comprises N number of area
Between, each interval in described N number of interval includes continuous S position in M position, appointing in described N number of interval
Two intervals of meaning are not overlapping, and described N number of interval digit sum is M, and N is more than or equal to 2
Natural number, S is natural number;It is additionally operable to store the first statistical table, described first system in described additional storage
Meter table comprises the statistics of the described N number of interval numerical value of multiple fingerprints representated by described first index fingerprint
Information;
Described additional storage is additionally operable to:A is determined according to described first statistical tableiIndex fingerprint described first
Frequency of occurrence t in the numerical value in described i-th interval of representative multiple fingerprintsi, wherein, aiFor number to be stored
According to the numerical value in the i-th interval of the fingerprint of block, the span of i is 1 to N, and according to institute t1To tNIn
Minima determines described first probability.
In conjunction with the possible implementation of the second of second aspect, in the 4th kind of possible realization of second aspect
In mode, described additional storage is additionally operable to store the first statistical table, and described first statistical table comprises described
The statistical information of the numerical value of the first interval of multiple fingerprints representated by one index fingerprint, and described first rope
Draw the statistical information of the numerical value of the second interval of multiple fingerprints representated by fingerprint, described first interval is each finger
The h position of stricture of vagina to i-th bit interval, described second interval is the interval of the jth position of each fingerprint to kth position,
Wherein, h, i, j, k are natural number, and the value of h is not more than the value of i, and the value of j is not more than the value of k, institute
State first interval and described second interval is not overlapping;
Described additional storage is additionally operable to:Determine that a indexes fingerprint described first according to described first statistical table
Frequency of occurrence t in the numerical value of described first interval of representative multiple fingerprints1And b is in described first rope
Draw frequency t occurring in the numerical value of described second interval of the multiple fingerprints representated by fingerprint2, wherein, a is
The h position of the fingerprint of data block to be stored to i-th bit numerical value, b is the jth of the fingerprint of data block to be stored
The numerical value of position to kth position;And according to described t1And t2In minima determine described first probability.
The third aspect, the embodiment of the present invention provides a kind of storage system, including backup server and multiple deposit
Reservoir, be stored with described storage system multiple fingerprint tables, and in the plurality of fingerprint table, record has and is stored in
The fingerprint of the data block in the plurality of memorizer;
Described backup server is used for:
First fingerprint collection is determined according to the fingerprint of the index fingerprint in fingerprint index table and data block to be stored
Close, wherein, in described first fingerprint set, include the first index fingerprint, the second index fingerprint, described the
One index fingerprint is used for representing multiple fingerprints in the first fingerprint table, and described second index fingerprint is used for representing the
Multiple fingerprints in two fingerprint tables, the fingerprint of described data block to be stored belongs to described first index fingerprint institute's generation
The fingerprint region of the multiple fingerprints representated by multiple fingerprints of table and described second index fingerprint;
The finger including in first fingerprint table with described data block to be stored is obtained according to the described first index fingerprint
First probability of stricture of vagina identical fingerprint, and bag in described second fingerprint table is obtained according to the described second index fingerprint
Containing the second probability with the fingerprint identical fingerprint of described data block to be stored, wherein, described first probability
It is to be determined according to the multiple fingerprints that the described first index fingerprint represents, described second probability is according to described the
Multiple fingerprints that two index fingerprints represent determine;
According to described first probability and second determine the probability the second fingerprint set, wherein, described second fingerprint collection
Include at least in conjunction and have described first index fingerprint, the first probability according to the described first index fingerprint determination is not
Less than predetermined threshold value;
Obtain the multiple fingerprints representated by described first index fingerprint and the fingerprint of described data block to be stored
Matching result.
In conjunction with the third aspect, in the first possible implementation of the third aspect, described first fingerprint table
It is stored in the first memory in the plurality of memorizer, described second fingerprint table is stored in the plurality of depositing
In second memory in reservoir;Described backup server specifically for:
The fingerprint of described data block to be stored and described first index fingerprint are sent to the described first storage
Device;And receiving described first probability that described first memory returns, described first probability is used for representing
The fingerprint phase with described data block to be stored is included in multiple fingerprints representated by described first index fingerprint
The probability of same fingerprint;
The fingerprint of described data block to be stored and described second index fingerprint are sent to the described second storage
Device;And receiving described first probability that described second memory returns, described first probability is used for representing
The fingerprint phase with described data block to be stored is included in multiple fingerprints representated by described second index fingerprint
The probability of same fingerprint;
Described first memory specifically for:Receive the first index fingerprint that described backup server sends and
The fingerprint of described data block to be stored, and determine described first index fingerprint represent multiple fingerprints in comprise
With the first probability of the fingerprint identical fingerprint of described data block to be stored, and by described first probability send to
Described backup server;
Described second memory specifically for:Receive the second index fingerprint that described backup server sends and
The fingerprint of described data block to be stored, and determine described second index fingerprint represent multiple fingerprints in comprise
With the second probability of the fingerprint identical fingerprint of described data block to be stored, and by described second probability send to
Described backup server.
In conjunction with the first possible implementation of the third aspect or the third aspect, in the second of the third aspect
In possible implementation, each fingerprint in described first fingerprint table comprises M position, each M position fingerprint
Comprise N number of interval, each interval in described N number of interval includes continuous S position, described N in M position
In individual interval, any two interval is not overlapping, and described N number of interval digit sum is M, N be more than or
The natural number that person is equal to 2, S is natural number;Be stored with described first memory the first statistical table, described
First statistical table comprises described N number of interval numerical value of the multiple fingerprints representated by described first index fingerprint
Statistical information;
Described first memory specifically for:A is determined according to described first statistical tableiRefer in the described first index
Frequency of occurrence t in the numerical value in described i-th interval of the multiple fingerprints representated by stricture of vaginai, wherein, aiFor to be stored
The numerical value in the i-th interval of the fingerprint of data block, the span of i is 1 to N;
According to institute t1To tNIn minima determine described first probability.
In conjunction with the first possible implementation of the third aspect or the third aspect, the third aspect the third
In possible implementation, described first memory is stored with the first statistical table, described first statistical table bag
The statistical information of the numerical value of the first interval containing the multiple fingerprints representated by the described first index fingerprint, Yi Jisuo
State the statistical information of the numerical value of the second interval of multiple fingerprints representated by the first index fingerprint, described firstth area
Between be the h position of each fingerprint to i-th bit interval, described second interval is the jth position of each fingerprint to kth
The interval of position, wherein, h, i, j, k are natural number, and the value of h is not more than the value of i, and the value of j is not more than
The value of k, described first interval and described second interval be not overlapping;
Described first memory specifically for:
Described in multiple fingerprints according to representated by described first statistical table determines a in the described first index fingerprint
Frequency of occurrence t in the numerical value of first interval1And multiple fingerprints representated by the described first index fingerprint for the b
The numerical value of described second interval in occur frequency t2, wherein, a is the h of the fingerprint of data block to be stored
Position to i-th bit numerical value, b is the numerical value of the jth position of the fingerprint of data block to be stored to kth position;
According to described t1And t2In minima determine described first probability.
In the embodiment of the present invention, backup server first according to the index fingerprint in fingerprint index table and is waited to deposit
The fingerprint of storage data block determines the first fingerprint set of the fingerprint that may include data block to be stored.So
Afterwards, backup server obtains in the first fingerprint table according to the first index fingerprint in described first fingerprint set and wraps
Containing the first probability with the fingerprint identical fingerprint of described data block to be stored, and according to described first fingerprint
The second index fingerprint in set obtains the fingerprint including in the second fingerprint table with described data block to be stored
Second probability of identical fingerprint.It is more than predetermined threshold value further according in the first probability obtaining and the second probability
Determine the probability the second fingerprint set, and by described second fingerprint set index fingerprint representated by multiple fingers
Stricture of vagina is mated with the fingerprint of described data block to be stored, to obtain matching result.By the embodiment of the present invention
The data processing method providing, during fingerprint matching, can only by the fingerprint of data block to be stored with obtain
The second fingerprint set in the multiple fingerprints representated by index fingerprint mated, and need not will be to be stored
The fingerprint of data block is mated with the fingerprint of all data in fingerprint base, during decreasing fingerprint comparison
The carrying amount of data, improves the efficiency of data processing.
Brief description
For the technical scheme being illustrated more clearly that in the embodiment of the present invention, below will be to institute in embodiment description
Need the accompanying drawing using to briefly introduce it should be apparent that, drawings in the following description are only the present invention
The accompanying drawing of some embodiments.
Fig. 1 is a kind of structural representation of storage system provided in an embodiment of the present invention;
Fig. 2 is a kind of schematic diagram of fingerprint table provided in an embodiment of the present invention;
Fig. 3 is a kind of schematic flow sheet of data processing method provided in an embodiment of the present invention;
Fig. 4 is a kind of structural schematic block diagram of storage system provided in an embodiment of the present invention;
Fig. 5 is a kind of schematic diagram of fingerprint index table provided in an embodiment of the present invention;
Fig. 6 is a kind of schematic diagram of the index fingerprint of fingerprint table provided in an embodiment of the present invention;
Fig. 7 is the structural schematic block diagram of another storage system provided in an embodiment of the present invention;
Fig. 8 is a kind of schematic diagram of the implementation of statistical table provided in an embodiment of the present invention;
Fig. 9 is the schematic diagram of the implementation of another statistical table provided in an embodiment of the present invention;
Figure 10 is a kind of structural schematic block diagram of backup server provided in an embodiment of the present invention.
Specific embodiment
Consume a large amount of I/O resources when carrying out fingerprint comparison for storage system, lead to the efficiency of data storage
Relatively low problem, the embodiment of the present invention provides a kind of data processing method, backup server and storage system.
Below by accompanying drawing and technical solution of the present invention is described in detail specific embodiment it should be understood that this
Specific features in bright embodiment and embodiment are the detailed description to technical solution of the present invention, rather than
Restriction to technical solution of the present invention, in the case of not conflicting, in the embodiment of the present invention and embodiment
Technical characteristic can be mutually combined.
For the ease of understanding technical scheme provided in an embodiment of the present invention, introduce the embodiment of the present invention first below
A kind of application scenarios.As shown in figure 1, Fig. 1 is a kind of knot of storage system provided in an embodiment of the present invention
Structure schematic diagram.Storage system 10 includes backup server 11 and multiple memorizer 12, wherein, memorizer
12 are used for data storage, and backup server 11 is used for determining whether data block to be stored is repeated data, with
And schedule memory 12 treats data storage block and stored.Be stored with storage system 10 fingerprint (English:
Fingerprint;Referred to as:FP) table, Fig. 2 is the schematic diagram of fingerprint table, preserves and be stored in fingerprint table
The fingerprint of the data of memorizer 12 and this data storage location.
, after the request receiving data storage, backup server 11 is by data block to be stored for storage system 10
Fingerprint compare with the fingerprint in fingerprint table, if retrieved in fingerprint table and data block to be stored
Fingerprint identical fingerprint, then show that data block to be stored is repeated data, storage system is without storing this number again
According to block, only its adduction relationship need to be updated;Whereas if not retrieving in fingerprint table and data to be stored
The fingerprint identical fingerprint of block, then show that data block to be stored is new data, storage system is empty by distributing storage
Between store this data block.
Introduce the technical scheme for fingerprint comparison provided in an embodiment of the present invention below in conjunction with the accompanying drawings.
Fig. 3 is the schematic flow sheet of data processing method provided in an embodiment of the present invention, and the method can include:
Step 101:Determine the according to the fingerprint of the index fingerprint in fingerprint index table and data block to be stored
One fingerprint set, wherein, includes the first index fingerprint, the second index fingerprint in the first fingerprint set, the
One index fingerprint is used for representing the multiple fingerprints in the first fingerprint table, and the second index fingerprint is used for representing the second finger
Multiple fingerprints in stricture of vagina table, the fingerprint of data block to be stored belongs to the multiple fingerprints representated by the first index fingerprint
And second index fingerprint representated by multiple fingerprints fingerprint region;
Step 102:The finger including in first fingerprint table with data block to be stored is obtained according to the first index fingerprint
First probability of stricture of vagina identical fingerprint, and obtained in the second fingerprint table according to the second index fingerprint and include and treat
Second probability of the fingerprint identical fingerprint of data storage block, wherein, the first probability is to be referred to according to the first index
Multiple fingerprints that stricture of vagina represents determine, the second probability is to be determined according to multiple fingerprints that the second index fingerprint represents
's;
Step 103:According to the first probability and second determine the probability the second fingerprint set, wherein, the second fingerprint
Including at least in set has the first index fingerprint, is not less than pre- according to the first probability that the first index fingerprint determines
If threshold value;
Step 104:Obtain the multiple fingerprints representated by the first index fingerprint and the fingerprint of data block to be stored
Matching result.
In the embodiment of the present invention, the corresponding data processing method of step 101~step 104 can have at least two
Embodiment, is introduced separately below.
Embodiment 1
Fig. 4 is the structural representation of the corresponding storage system 20 of embodiment 1, and storage system 20 includes:
Backup server 21 and multiple memorizer 22.In embodiment 1, step 101~step 104 is by standby
Part server 21 is executing.
Wherein, backup server 21 includes:Processor 211, internal memory 212 and additional storage 213,
Additional storage 213 includes processing unit 214, and therefore, additional storage 213 has operational capability.Deposit
Fingerprint table in storage system is stored in additional storage 213, in practical situation, additional storage 213
Quantity can be 1 or 2 or more.Be stored with backup server 21 fingerprint index table,
Fingerprint index table specifically can leave among internal memory 212 it is also possible to leave among additional storage 213,
Or among other memory element of backup server 21.
Fig. 5 is the schematic diagram of fingerprint index table, preserves at least two of each fingerprint table in fingerprint index table
Index fingerprint, each index fingerprint represents the multiple fingerprints in a fingerprint table, all ropes of each fingerprint table
Draw the fingerprint sum representated by fingerprint and be all fingerprints that this fingerprint table is comprised.Ti in Fig. 5 represents
I-th fingerprint table, Li represents the i-th index fingerprint of fingerprint table, for example, FP14For the 1st fingerprint table the 2nd
Index fingerprint, its FP in representing in the 1st fingerprint table14To FP17Between (do not include FP17) fingerprint,
It is specially FP14、FP15、FP16.
In practical situation, in each fingerprint table, fingerprint can be arranged according to fingerprint size, and Fig. 6 is fingerprint
The schematic diagram of table 1 corresponding index fingerprint, the fingerprint FP in Fig. 61To FP9Increase successively, can will refer to
The fingerprint of stricture of vagina table 1 is divided into FP1~FP3、FP4~FP6、FP7~FP9This 3 parts, with FP1、FP4、FP7
As 3 index fingerprints of this fingerprint table, the FP in index fingerprint table1Represent the FP in fingerprint table1~FP3,
FP in index fingerprint table4Represent the FP in fingerprint table4~FP6, index the FP in fingerprint table7Represent fingerprint
FP in table7~FP9.Due to FP1To FP9Fingerprint size increase successively, so index fingerprint table in
FP1、FP4、FP7The fingerprint region of the multiple fingerprints each representing is not overlapping.
In the embodiment of the present invention, can determine that multiple representated by each index fingerprint from index fingerprint table
The fingerprint region of fingerprint, its implementation can be:First, with fingerprint table continuously in multiple fingerprints
First fingerprint, can be according to phase in fingerprint table corresponding index fingerprint used as the index fingerprint of this partial fingerprints
Two adjacent index fingerprints determine the fingerprint region of the multiple fingerprints representing in front index fingerprint, continue to use Fig. 6
Fingerprint table 1, according to the FP of connected index fingerprint1、FP4Can determine FP1The multiple fingerprints representing
Fingerprint region is [FP1, FP4).Second, comprising fingerprint region attribute in index fingerprint table, for each rope
Draw the fingerprint region that fingerprint preserves its representative multiple fingerprint.
In step 101, the processor 211 of backup server 21 is big according to the fingerprint of data block to be stored
Little, the fingerprint region determining representative multiple fingerprints from fingerprint index table includes the finger of data block to be stored
The index fingerprint of stricture of vagina, the collection of the index fingerprint determined is combined into the first fingerprint set.
For the ease of description, illustrate in the embodiment of the present invention taking the first fingerprint table, the second fingerprint table as a example,
But can not be limited with this and in storage system in embodiment of the present invention, only include the first fingerprint table and the second fingerprint
Table, nor limit in the embodiment of the present invention only to the first fingerprint table in fingerprint index table and the second fingerprint table pair
The index fingerprint answered carries out the operation of step 101.In practical situation, for the corresponding index of each fingerprint table
Fingerprint all carries out step 101 and operates, due to each self-corresponding fingerprint model of multiple index fingerprints of each fingerprint table
Enclose not overlapping, so, for each fingerprint table, at most can determine from its corresponding multiple index fingerprint
Go out an index fingerprint.
Then, execution step 102, backup server 21 is determined and is comprised in each fingerprint table and number to be stored
According to the probability of block identical fingerprints, due to having determined corresponding fingerprint in step 101 from each fingerprint table
Scope includes the index fingerprint of data block to be stored, then the fingerprint representated by other index fingerprints of this fingerprint table
In necessarily do not comprise the fingerprint of data block to be stored, therefore, comprise in the first fingerprint table and data block to be stored
In multiple fingerprints that the first index fingerprint determined in the probability of identical fingerprints, substantially step 101 represents
Comprise the probability with data block identical fingerprints to be stored, in the same manner, comprise in the second fingerprint table and data to be stored
Multiple fingerprints that the second index fingerprint determined in the probability of block identical fingerprints, substantially step 101 represents
In comprise with data block identical fingerprints to be stored probability.
When being embodied as, processor 211 is by the rope in the fingerprint of data block to be stored and the first fingerprint set
Draw fingerprint to send to additional storage 213, by additional storage 213 by its processing unit 214 determine from
Data block to be stored is retrieved in each multiple fingerprint representated by index fingerprint in first fingerprint set
The probability of fingerprint, then, the probit determined is returned to processor 211 by additional storage 213.Its
In, additional storage 213 specifically can be according to the statistics letter of the multiple fingerprints representated by the index fingerprint receiving
Breath (as the distributed intelligence near the numerical value of the fingerprint in data block to be stored for the fingerprint, fingerprint value in each fingerprint
Frequency statistics information under dimension, etc.) come to determine index fingerprint representated by multiple fingerprints in comprise with
The probability of the fingerprint identical fingerprint of data block to be stored.
When backup server 21 includes two or more additional storages 213, processor can only to
Additional storage 213 sends related to the fingerprint table that this additional storage preserves index in the first fingerprint set
Fingerprint.For example, backup server 21 includes the first additional storage and the second additional storage, wherein,
First additional storage preserves the first fingerprint table, and the second additional storage preserves the second fingerprint table, processes
Device 211 sends the fingerprint of data block to be stored and the first index fingerprint to the first additional storage, will treat
The fingerprint of data storage block and the second index fingerprint send to the second additional storage.First additional storage
First probability is determined by its processing unit, and returns to processor 211;Second additional storage passes through
Its processing unit determines the second probability, and returns to processor 211.
Then, execution step 103, backup server 21 retrieves according to receiving from each fingerprint table
The probability of the fingerprint of data block to be stored, determines the second fingerprint set, comprises first in the second fingerprint set
Pre-conditioned index fingerprint is met, this is pre-conditioned to be in fingerprint set:Multiple in index fingerprint representative
The probability of fingerprint of data block to be stored is retrieved (i.e. in fingerprint:Determine in step 102, storage should
The fingerprint table of multiple fingerprints that index fingerprint represents includes identical with the fingerprint of the fingerprint of data block to be stored
Fingerprint probability) be more than predetermined threshold value.Wherein, the value of predetermined threshold value can be 0, that is, from the first fingerprint
Reject part index fingerprint in set and form the second fingerprint set, disallowable index fingerprint is in its representative
Multiple fingerprints in retrieve data block to be stored probability be 0 this partial index fingerprint.
For example, might as well set additional storage 213 return to processor 211 second index fingerprint represent
The second probability retrieving the fingerprint of data block to be stored in multiple fingerprints is less than predetermined threshold value, and in the first rope
The first probability retrieving the fingerprint of data block to be stored in the multiple fingerprints drawing fingerprint representative is more than default threshold
Value, then the first index fingerprint be included among the second fingerprint set, and the second index fingerprint is not comprised in the
Among two fingerprint set.
Then, execution step 104, index fingerprint in the second fingerprint set for the backup server 21 represents
Carry out fingerprint comparison in multiple fingerprints, obtain fingerprint comparison result, determine that whether data block to be stored is with this
Repeated data.When being embodied as, including two kinds of implementations:First, processor 211 is by additional storage
The multiple fingerprints representated by index fingerprint in the second fingerprint set in the fingerprint table preserving in 213 read
In internal memory 212, in internal memory 212, then carry out the comparison of fingerprint.Second, being led to by additional storage 213
Cross the comparison that self-contained processing unit 214 completes fingerprint, i.e. the place by itself for the additional storage 213
Many representated by the fingerprint table that itself is stored by reason unit 214, in the second fingerprint set index fingerprint
Individual fingerprint reads in the buffer of additional storage 213, carries out fingerprint comparison.Wherein, additional storage
213 buffer can be random access memory (English:Random Access Memory;Referred to as:RAM),
It can also be cache (Cache).During due to fingerprint comparison being carried out by additional storage 213, no
The outside carrying of data need to be carried out, that can reduce factor data carrying and produce is time-consuming.
In practical situation, the predetermined threshold value in step 103 can also be the probit more than 0, backup services
Index fingerprint institute's generation first in the second fingerprint set determined according to the predetermined threshold value more than 0 for the device 21
Carry out fingerprint comparison, if comparison result cannot confirm whether data block to be stored attaches most importance in multiple fingerprints of table
Complex data, then backup server 21 in corresponding probit the index fingerprint between (0, predetermined threshold value)
Carry out fingerprint comparison in representative multiple fingerprints.That is, backup server is first in the larger index fingerprint of probability
Carry out fingerprint comparison in the multiple fingerprints representing, the repeatability of data block to be stored cannot be confirmed in comparison result
When, then carry out fingerprint comparison in multiple fingerprints that probability less index fingerprint represents, can effectively reduce
Fingerprint comparison time-consuming.
Another kind of implementation of fingerprint comparison is:In step 103, the value of predetermined threshold value is 0, determines and corresponds to
Probit be more than 0 index fingerprint be the second fingerprint set element, then, backup server 21 can
To be ranked up by its corresponding probit to the index fingerprint in the second fingerprint set, processor 211 is being incited somebody to action
In second fingerprint set index fingerprint represent multiple fingerprints read internal memory carry out fingerprint retrieval when, according to
The probit sequence of index fingerprint determines the order that fingerprint reads.That is, read corresponding probit first maximum
The multiple fingerprints representated by index fingerprint, cannot determine that whether fingerprint to be stored is according to this partial fingerprints
After repeated data, then by probit come deputy index fingerprint represent multiple fingerprints read in deposit into
Row fingerprint comparison.By that analogy, until retrieving the fingerprint identical fingerprint with data block to be stored;Or,
Determine to be more than in the multiple fingerprints representated by 0 index fingerprint in all of probit and all do not comprise and wait to deposit
The fingerprint identical fingerprint of storage data block.The previous case shows that data block to be stored is repeated data, afterwards one
The situation of kind shows that data block to be stored is new data.
In technique scheme, backup server 21 is determined in each fingerprint table by fingerprint index table
Multiple fingerprints that one index fingerprint represents, the fingerprint region of multiple fingerprints that this index fingerprint represents includes treating
Multiple fingerprints of the fingerprint of data storage block, the multiple fingerprints being then based on determining from each fingerprint table enter
The fingerprint comparison operation of row next step, decreases the workload of fingerprint carrying.Moreover, refer to for each
Stricture of vagina meter calculates the probability of the fingerprint retrieving data block to be stored in its (the multiple fingerprints determined), so
It is not less than in probability in the multiple fingerprints determined of fingerprint table of predetermined threshold value afterwards and carries out fingerprint comparison, reduce
The carrying amount of fingerprint during fingerprint comparison, reduces the time-consuming of fingerprint comparison, improves the efficiency of data storage.
Embodiment 2
Fig. 7 is the structural representation of the corresponding storage system 30 of embodiment 2, and storage system 30 includes:
Backup server 31 and multiple memorizer 32.In embodiment 2, step 101~step 104 is by standby
Part server 31 is executing.
Wherein, memorizer 32 is used for data storage block and fingerprint table, and the form of fingerprint table is referred to Fig. 5,
The fingerprint table of memorizer 32 storage can be by the data block being stored on this memorizer corresponding finger print information institute
Formed, the fingerprint table of memorizer 32 storage can also be with unrelated its of data block of memorizer 32 itself storage
The finger print information of his data block.Backup server 31 includes processor 311 and internal memory 312.Backup services
Be stored with device 31 fingerprint index table, and fingerprint index table specifically can leave among internal memory 312, also may be used
To leave among other memory element of backup server 31.
In step 101, processor 311 determines that the mode of the first fingerprint set is determined with aforementioned processor 211
The mode of the first fingerprint set is identical, and the embodiment of the present invention is refused to repeat.
Then, backup server 31 execution step 102, obtains and (corresponding belongs to first in each fingerprint table
Fingerprint set index fingerprint represent multiple fingerprints) in retrieve data block to be stored fingerprint probability.
Specific implementation includes:First, the memorizer 32 of the fingerprint table that is stored with comprises processing unit 321, and standby
Part server 21 obtains probit and is similar to, backup server 31 can by the fingerprint of data block to be stored and
Index fingerprint in first fingerprint set sends to memorizer 32, the processing unit by itself for the memorizer 32
321 and itself preserve fingerprint statistic information determine above-mentioned probit, then by this probit send to
Backup server 31.Second, the statistical information that processor 311 will be stored in the fingerprint on memorizer 32 is read
Get in internal memory 312, processor 311 oneself determines above-mentioned probability according to the statistical information being stored in internal memory 312
Value.
Then, backup server 31 execution step 103, its implementation is held with aforementioned backup server 21
Row step 103 is consistent.
Then, backup server 31 execution step 104, obtains the index fingerprint in the second fingerprint set and represents
Multiple fingerprints in treat the result that the fingerprint of data storage block is compared.With aforementioned backup server 21
Execution step 104 is similar to, and backup server 31 can be will be stored on memorizer 32 by processor 311
The second fingerprint set in multiple fingerprints of representing of index fingerprint read and carry out fingerprint comparison in internal memory, that is,
The work of fingerprint comparison oneself is completed by backup server 31.Additionally, backup server 31 obtains fingerprint ratio
To the another way of result it is:Backup server 31 by the index fingerprint in the second fingerprint set and is treated
The fingerprint of data storage block is sent to memorizer 32, is existed by the processing unit 321 of itself by memorizer 32
Memorizer 32 locally carries out fingerprint comparison, and this mode can reduce the carrying of data, and multiple storage
Fingerprint comparison can be carried out in a parallel fashion, it is possible to increase the efficiency of fingerprint comparison between device 32.
According to a kind of implementation that step 103~step 104 carries out fingerprint comparison it is:Default in step 103
The value of threshold value is 0, determines the element that the index fingerprint that corresponding probit is more than 0 is the second fingerprint set,
Then, backup server 31 can be entered by its corresponding probit to the index fingerprint in the second fingerprint set
Row sequence, processor 311 is in the multiple fingerprints representing the index fingerprint in the second fingerprint set read
When depositing into the retrieval of row fingerprint, the order that fingerprint reads is determined according to the probit sequence of index fingerprint.That is, first
First read the maximum multiple fingerprints representated by index fingerprint of corresponding probit, cannot referred to according to this part
After stricture of vagina determines whether fingerprint to be stored is repeated data, then probit is come deputy index fingerprint representative
Multiple fingerprints read internal memory and carry out fingerprint comparison.By that analogy, until retrieving and data block to be stored
Fingerprint identical fingerprint;Or, determination is more than many representated by 0 index fingerprint in all of probit
The fingerprint identical fingerprint with data block to be stored is not all comprised in individual fingerprint.The previous case shows to wait to deposit
Storage data block is repeated data, and latter event shows that data block to be stored is new data.
In two kinds of implementations of above-mentioned steps 101~step 104, during fingerprint matching, can only will
The multiple fingerprints representated by index fingerprint in second fingerprint set of the fingerprint of data block to be stored and acquisition
Mated, and the fingerprint of all data in the fingerprint of data block to be stored and fingerprint base need not be carried out
Join, decrease the carrying amount of data during fingerprint comparison, improve the efficiency of data processing.
Optionally, the backup server 21 in storage system 20 comprises the additional storage of 2 or more
The positional information of the first fingerprint table belonging to the first index fingerprint when 213, can also be comprised in fingerprint index table,
I.e. the first fingerprint table is saved in the information in which additional storage, backup server 21 execution step 102
When, processor 211 only need to orient storage the first fingerprint table according to the first index corresponding positional information of fingerprint
Additional storage, the corresponding fingerprint by the first index fingerprint and data block to be stored sends to this auxiliary storage
Device, so that this additional storage determines that the fingerprint identical including data block to be stored in the first fingerprint table refers to
The probability of stricture of vagina.
Optionally, in storage system 30, can also comprise in fingerprint index table belonging to the first index fingerprint
The first fingerprint table positional information, that is, preserve the mark of the first memory of the first fingerprint table, backup clothes
During business device 31 execution step 102, processor 311 only need to be according to the first index corresponding positional information of fingerprint
Orient first memory, the corresponding fingerprint by the first index fingerprint and data block to be stored sends to be deposited to first
Reservoir, so that first memory determines that the fingerprint identical including data block to be stored in the first fingerprint table refers to
The probability of stricture of vagina.
In one case, the first fingerprint table is stored in the first memory in multiple memorizeies, the second finger
Stricture of vagina table is stored in the second memory in multiple memorizeies.
In step 102, obtained in the first fingerprint table according to the first index fingerprint and include and data block to be stored
Fingerprint identical fingerprint the first probability, comprise the steps during enforcement:
The fingerprint of data block to be stored and the first index fingerprint are sent to first memory;
Receive the first probability that first memory returns, the first probability was used for representing in first index fingerprint institute's generation
The probability with the fingerprint identical fingerprint of data block to be stored is included in multiple fingerprints of table.
In a step 102, obtained in the second fingerprint table according to the second index fingerprint and include and data to be stored
Second probability of the fingerprint identical fingerprint of block, comprises the steps during enforcement:Finger by data block to be stored
Stricture of vagina and the second index fingerprint send to second memory;Receive the second probability that second memory returns, the
Two probability are used for including and data block to be stored in multiple fingerprints representated by the second index fingerprint for the expression
Fingerprint identical fingerprint probability.
Specifically, backup server 31 execution step 102 in the corresponding aforementioned embodiments 2 of aforesaid way
Situation, first memory and second memory are two memorizeies 32, can be according to self-contained process
Unit 321 carries out fingerprint comparison.Its specific embodiment has had a detailed description in embodiment 2,
This is not repeated.
In another case, backup server includes additional storage, the first fingerprint table and the second fingerprint
Table is stored in additional storage.
In a step 102:Obtained in the first fingerprint table according to the first index fingerprint and include and data to be stored
First probability of the fingerprint identical fingerprint of block, and comprise according in second index fingerprint acquisition the second fingerprint table
There is the second probability with the fingerprint identical fingerprint of data block to be stored, comprise the steps during enforcement:
The fingerprint of data block to be stored and the first index fingerprint, the second index fingerprint are sent to auxiliary storage
Device;Receive and include in the multiple fingerprints representated by the first index fingerprint that additional storage returns and wait to deposit
First probability of the fingerprint identical fingerprint of storage data block, and the multiple fingers representated by the second index fingerprint
The second probability with the fingerprint identical fingerprint of data block to be stored is included in stricture of vagina.
Specifically, backup server 21 execution step 102 in the corresponding aforementioned embodiments 1 of aforesaid way
Situation, the additional storage in the present embodiment is the additional storage 213, Neng Gougen in embodiment 1
Carry out fingerprint comparison according to self-contained processing unit 214.Its specific embodiment in embodiment 1
Through having a detailed description, here is not repeated.
Optionally, in the embodiment of the present invention, each fingerprint in the first fingerprint table comprises M position, each M
Position fingerprint comprises N number of interval, and each interval in N number of interval includes continuous S position in M position, N number of
In interval, any two interval is not overlapping, and N number of interval digit sum is M, and N is more than or equal to 2
Natural number, S be natural number.By above-mentioned setting, each fingerprint in fingerprint table can be divided into N
Individual interval, each interval is equivalent to a fingerprint dimension.For example, the fingerprint of 64 bits (bit) can divide
For the numerical value composition of 4 dimension 16bit, that is, 1bit~16bit is the first dimension, and 17bit~32bit is the second dimension,
33bit~48bit is the third dimension, and 49bit~64bit is fourth dimension.In practical situation, do not limit every one dimensional numerical
Shared bit number, does not limit the bit number all same shared by all dimensions yet.
Be stored with storage system the first statistical table, and the first statistical table includes representated by the first index fingerprint
Multiple fingerprints are in the statistical information of N number of interval numerical value.Fig. 8 is a kind of schematic diagram of the first statistical table, no
Harm set 3 fingerprints representated by the first index fingerprint be respectively 01020504H, 01030504H,
02030102H, each fingerprint is divided into 4 dimensions, taking 01025004H as a example, the value of its four fingerprint dimensions
It is respectively 1,2,5,4.First statistical table have recorded possible numerical value in each dimension and indexes fingerprint first
Frequency of occurrence in respective dimensions in representative multiple fingerprints, for example, numerical value " 1 " goes out in the first dimension
The existing frequency 2, the frequency 1 that numerical value " 2 " occurs in the first dimension, numerical value " 3 ", " 4 ", " 5 " are first
The frequency occurring in dimension is 0.
The determination mode of the first probability includes:A is determined according to the first statistical tableiRepresentated by the first index fingerprint
The numerical value in the i-th interval of multiple fingerprints in frequency of occurrence ti, wherein, aiFingerprint for data block to be stored
I-th interval numerical value, the span of i is 1 to N;According to the t obtaining1To tNIn minima
Determine the first probability.
Specifically, in aforementioned storage system 20, the first statistical table can be stored in preservation the first fingerprint table
Additional storage 213 on, the first probability by this additional storage 213 pass through the processing unit 214 of itself
Determined.For example, the fingerprint of data block to be stored might as well be set as 01020404H, this fingerprint is in four fingerprints
Numerical value in dimension is respectively 1,2,4,4.Calculate in the table block shown in Fig. 5 and retrieve this finger to be retrieved
The mode of the probability level of stricture of vagina is:The frequency searching numerical value 1 in the first dimension in table is 2, in the second dimension
The frequency that degree searches numerical value 2 is 1, and the frequency searching numerical value 4 in third dimension is 0, in fourth dimension degree
The frequency searching numerical value 4 is 2, then probability level is the minima 0 in the frequency.
In aforementioned storage system 30, the first statistical table can be stored in the memorizer preserving the first fingerprint table
In 32, the first probability is determined by its processing unit 321 by this memorizer, concrete determine method with above-mentioned
According to the first statistical table, processing unit 214 determines that the mode of the first probability is identical.
In practical situation, the second probability can also adopt aforesaid way, and the statistical table using fingerprint is carried out really
Fixed, embodiment of the present invention here is not repeated.
In technique scheme, determine the multiple fingerprints representing in the first index fingerprint by the first statistical table
In retrieve data block to be stored fingerprint the first probability, its implementation is simple, and operand is little, takes
Less, and result is accurate.
Optionally, as another embodiment, be stored with storage system the first statistical table, and Fig. 9 is another kind
The schematic diagram of the first statistical table, the first statistical table comprises first of the multiple fingerprints representated by the first index fingerprint
The statistical information of interval numerical value, and the number of the second interval of multiple fingerprints representated by the first index fingerprint
The statistical information of value, first interval is the interval of the h position of each fingerprint to i-th bit, and second interval is each finger
The interval of the jth position of stricture of vagina to kth position, wherein, h, i, j, k are natural number, and the value of h is not more than i
Value, the value of j is not more than the value of k, and first interval and second interval be not overlapping.
The determination mode of the first probability includes:According to representated by the first statistical table determines a in the first index fingerprint
The numerical value of the first interval of multiple fingerprints in frequency of occurrence t1And b is representated by the first index fingerprint
Frequency t occurring in the numerical value of the second interval of multiple fingerprints2, wherein, a is the fingerprint of data block to be stored
H position to i-th bit numerical value, b is the numerical value of the jth position of the fingerprint of data block to be stored to kth position;
According to t1And t2In minima determine the first probability.
In practical situation, determine that the effect of the first probability predominantly excludes a part of rope from the first fingerprint set
Draw fingerprint, in multiple fingerprints that the index fingerprint that these are excluded represents, comprise the fingerprint with data block to be stored
The probability of identical fingerprint is 0.In the embodiment of the present invention, with reference to Fig. 9, only comprise in the first statistical table to refer to
The statistical information of the partial dimensional of stricture of vagina, continue to use the first index fingerprint represent 01020504H, 01030504H,
The example of this 3 fingerprints of 02030102H, can only preserve the first dimension and the third dimension in the first statistical table
Statistical information, might as well set the fingerprint of data block to be stored as 01020404H, and its third dimension is " 4 ", root
Can determine that the frequency in third dimension appearance " 4 " is 0 according to first statistical table of Fig. 9, show in the first index
The fingerprint of data block to be stored is not comprised in multiple fingerprints that fingerprint represents.In the same manner, first in the present embodiment
Statistical table can be saved among the additional storage 213 of storage system 20, then the first probability is deposited by auxiliary
Reservoir 213 is determined according to the processing unit 214 of its own.First statistical table can also be saved in storage system
Among memorizer 32 in system 30, then the first probability is true by processing unit 321 institute of memorizer 32 itself
Fixed.
In technique scheme, determined by the statistical information of the partial dimensional of fingerprint and index fingerprint first
The first probability of the fingerprint of data block to be stored is retrieved, with from the first fingerprint set in the multiple fingerprints representing
Middle reject partly corresponding probit be 0 index fingerprint, reduce fingerprint comparison when data carrying amount, and
And its implementation is simple, operand is little, takes less.
Optionally, as another embodiment, when the first fingerprint collection is combined into null set, backup server 21
Or backup server 31 determines that data block to be stored is new data.
Optionally, as another embodiment, when predetermined threshold value is 0, it is combined into null set in the second fingerprint collection
When, backup server 21 or backup server 31 determine that data block to be stored is new data.
Optionally, as another embodiment, backup server, before execution step 101, also includes as follows
Step:The fingerprint treating data storage block carries out fingerprint filtration, and determine by fingerprint filter cannot judge to treat
Whether the fingerprint of data storage block is to repeat fingerprint.
Specifically, backup server is retrieving the fingerprint of data block to be stored before from fingerprint table, Ke Yigen
According to fingerprint filtering technique, fingerprint to be retrieved being carried out with anticipation, the result of anticipation includes three kinds, referring to first, determining
There is the fingerprint of data block to be stored, data block to be stored is repeated data in stricture of vagina table;Second, determination fingerprint
The fingerprint of data block to be stored is not comprised, data block to be stored is new data in table;Third, finger cannot be asserted
The fingerprint of data block to be stored whether is comprised, only in this case, backup server just executes in stricture of vagina table
Step 101~step 104.
When being embodied as, Bloom filter (Bloom Filter) can be adopted, or locality keeps (English
Literary composition:Locality Preserved Caching;Referred to as:LPC) the fingerprint filtering technique such as technology is it is also possible to adopt
With the combination of two kinds or more of fingerprint filtering technique, for example, first using Bloom filter, fingerprint was carried out
Filter, if Bloom filter cannot judge whether the fingerprint of data block to be stored is to repeat fingerprint, further
Filtered using LPC technique.The specific implementation of fingerprint filtering technique refer to prior art, this
Inventive embodiments are refused to describe in detail.
In technique scheme, backup server first passes through the fingerprint that fingerprint filtering technique treats data storage block
Carry out anticipation, only in the case of being unable to anticipation fingerprint to be retrieved, ability execution step 101~step 104.
Taken by the comparison that fingerprint filtering technique can significantly shorten partial fingerprints, improve the property of backup server
Energy.
Optionally, in storage system 20, the maintenance mode of fingerprint table can be:Real in internal memory 212
When create new fingerprint table, when determining the fingerprint not comprising data block to be stored in currently stored fingerprint table
When, determine that data block to be stored is new data, and its fingerprint be added to internal memory and implement in the fingerprint table creating,
After the fingerprint number of the fingerprint table in internal memory reaches setting value, this fingerprint table is stored on additional storage 213.
In addition, each index fingerprint corresponding for this fingerprint table is added among index fingerprint table.Furthermore, this is referred to
The statistical table of the dimension values of stricture of vagina table each index corresponding multiple fingerprint of fingerprint corresponding is stored to additional storage
On 213.
When fingerprint table is safeguarded using aforesaid way, backup server when carrying out fingerprint comparison, first including
Deposit and compare in the fingerprint table creating, comparison result not can determine that whether data block to be stored is repetition
During fingerprint, ability execution step 101~step 104, carry out fingerprint ratio in the fingerprint table outside being stored in internal memory
Right.
Optionally, in storage system 30, the maintenance mode of fingerprint table can be:Determining number to be stored
In preserving the fingerprint table of memorizer 32 of this data block, during according to block for new data, add the finger of this data block
Stricture of vagina, then, updates the statistical table of this fingerprint table.
It should be noted that above processor 211, processor 311, processing unit 214 and process are single
Unit 321, can be the general designation of an independent processor or multiple treatment element.For example, locate
Reason device 211, processor 311, processing unit 214 and processing unit 321 can be central processing unit (English
Literary composition:Central Processing Unit;Referred to as:CPU) or specific integrated circuit (English:
Application Specific Intergrated Circuit;Referred to as:ASIC), or be arranged to implement this
One or more integrated circuits of inventive embodiments, for example:One or more microprocessors (English:digital
singnal processor;Referred to as:DSP), or, one or more field programmable gate array is (English:
Field Programmable Gate Array;Referred to as:FPGA).
Based on identical inventive concept, the embodiment of the present invention provides a kind of backup server 40, is applied to store
In system, this storage system includes backup server and multiple memorizer, is stored with multiple in storage system
Fingerprint table, in multiple fingerprint tables, record has the fingerprint of the data block being stored in multiple memorizeies.Figure 10
For the structural schematic block diagram of backup server 40, backup server 40 includes:
Determining module 41, for the fingerprint according to the index fingerprint in fingerprint index table and data block to be stored
Determine the first fingerprint set, wherein, in the first fingerprint set, include the first index fingerprint, the second index refers to
Stricture of vagina, the first index fingerprint is used for representing the multiple fingerprints in the first fingerprint table, and the second index fingerprint is used for representing
Multiple fingerprints in second fingerprint table, the fingerprint of data block to be stored belongs to many representated by the first index fingerprint
The fingerprint region of the multiple fingerprints representated by individual fingerprint and the second index fingerprint;
Obtain module 42, include and number to be stored for being obtained in the first fingerprint table according to the first index fingerprint
According to the first probability of the fingerprint identical fingerprint of block, and bag in the second fingerprint table is obtained according to the second index fingerprint
Containing the second probability with the fingerprint identical fingerprint of data block to be stored, wherein, the first probability is according to the
Multiple fingerprints that one index fingerprint represents determine, the second probability is multiple according to the second index fingerprint representative
Fingerprint determines;
Determining module 41, is additionally operable to according to the first probability and second determine the probability the second fingerprint set, wherein,
Including at least in second fingerprint set has the first index fingerprint, the first probability being determined according to the first index fingerprint
Not less than predetermined threshold value;
Processing module 43, for obtaining multiple fingerprints and the data block to be stored representated by the first index fingerprint
The matching result of fingerprint.
Optionally, in the embodiment of the present invention, the first fingerprint table is stored in the first memory in multiple memorizeies
In, the second fingerprint table is stored in the second memory in multiple memorizeies;
Obtain module 42 specifically for:By the fingerprint of data block to be stored and first index fingerprint send to
First memory;And receiving the first probability that first memory returns, the first probability is used for representing in the first rope
Draw the probability including in the multiple fingerprints representated by fingerprint with the fingerprint identical fingerprint of data block to be stored;
And
The fingerprint of data block to be stored and the second index fingerprint are sent to second memory;And receive second
The second probability that memorizer returns, the second probability is used for representing the multiple fingerprints representated by the second index fingerprint
In include probability with the fingerprint identical fingerprint of data block to be stored.
Optionally, in the embodiment of the present invention, backup server 40 also includes:
Additional storage, for storing the first fingerprint table and the second fingerprint table;
Obtain module 42 specifically for:By the fingerprint of data block to be stored and first index fingerprint, second
Index fingerprint sends to additional storage;Receive indexing representated by fingerprint first of additional storage return
The first probability with the fingerprint identical fingerprint of data block to be stored is included in multiple fingerprints, and second
The with the fingerprint identical fingerprint of data block to be stored is included in multiple fingerprints representated by index fingerprint
Two probability.
Optionally, in the embodiment of the present invention, additional storage is additionally operable to store the first statistical table, the first statistics
Table comprises the statistical information of the numerical value of the first interval of multiple fingerprints representated by the first index fingerprint, Yi Ji
The statistical information of the numerical value of the second interval of multiple fingerprints representated by one index fingerprint, first interval is each finger
The h position of stricture of vagina to i-th bit interval, second interval is the interval of the jth position of each fingerprint to kth position, its
In, h, i, j, k are natural number, and the value of h is not more than the value of i, and the value of j is not more than the value of k, and first
Interval and second interval is not overlapping;
Additional storage is additionally operable to:It is multiple according to representated by the first statistical table determines a in the first index fingerprint
Frequency of occurrence t in the numerical value of the first interval of fingerprint1And multiple fingers representated by the first index fingerprint for the b
Frequency t occurring in the numerical value of the second interval of stricture of vagina2, wherein, a is the h of the fingerprint of data block to be stored
Position to i-th bit numerical value, b is the numerical value of the jth position of the fingerprint of data block to be stored to kth position;And according to
t1And t2In minima determine the first probability.
Optionally, in the embodiment of the present invention, each fingerprint in the first fingerprint table comprises M position, each M
Position fingerprint comprises N number of interval, and each interval in N number of interval includes continuous S position in M position, N number of
In interval, any two interval is not overlapping, and N number of interval digit sum is M, and N is more than or equal to 2
Natural number, S be natural number;It is additionally operable to store the first statistical table, the first statistical table bag in additional storage
The statistical information of the N number of interval numerical value containing the multiple fingerprints representated by the first index fingerprint;
Additional storage is additionally operable to:A is determined according to the first statistical tableiMultiple representated by the first index fingerprint
Frequency of occurrence t in the numerical value in the i-th interval of fingerprinti, wherein, aiFor the fingerprint of data block to be stored i-th
Interval numerical value, the span of i is 1 to N, and according to institute t1To tNIn minima determine that first is general
Rate.
Backup server 40 in the present embodiment data processing method corresponding with Fig. 3 is based on same invention
Two aspects under design, being above described in detail to the implementation process of method, so ability
Field technique personnel can according to 40 structures of the backup server being well understood in the present embodiment described above and
Implementation process, succinct for description, here just repeats no more.
Based on identical inventive concept, provide a kind of storage system in the embodiment of the present invention, including backup services
Device and multiple memorizer, be stored with storage system multiple fingerprint tables, and in multiple fingerprint tables, record has and deposits
It is stored in the fingerprint of the data block in multiple memorizeies.
This backup server is used for:
First fingerprint collection is determined according to the fingerprint of the index fingerprint in fingerprint index table and data block to be stored
Close, wherein, in the first fingerprint set, include the first index fingerprint, the second index fingerprint, the first index refers to
Stricture of vagina is used for representing the multiple fingerprints in the first fingerprint table, and the second index fingerprint is used for representing in the second fingerprint table
Multiple fingerprints, the fingerprint of data block to be stored belongs to the multiple fingerprints and second representated by the first index fingerprint
The fingerprint region of the multiple fingerprints representated by index fingerprint;
The fingerprint identical including in first fingerprint table with data block to be stored is obtained according to the first index fingerprint
First probability of fingerprint, and include and data to be stored according in second index fingerprint acquisition the second fingerprint table
Second probability of the fingerprint identical fingerprint of block, wherein, the first probability is to be represented according to the first index fingerprint
Multiple fingerprints determine, the second probability is to be determined according to multiple fingerprints that the second index fingerprint represents;
According to the first probability and second determine the probability the second fingerprint set, wherein, in the second fingerprint set at least
Include the first index fingerprint, predetermined threshold value is not less than according to the first probability that the first index fingerprint determines;
Obtain the matching result of the multiple fingerprints representated by the first index fingerprint and the fingerprint of data block to be stored.
Optionally, in the embodiment of the present invention, the first fingerprint table is stored in the first memory in multiple memorizeies
In, the second fingerprint table is stored in the second memory in multiple memorizeies;Backup server specifically for:
The fingerprint of data block to be stored and the first index fingerprint are sent to first memory;And receive the
The first probability that one memorizer returns, the first probability is used for representing the multiple fingers representated by the first index fingerprint
The probability with the fingerprint identical fingerprint of data block to be stored is included in stricture of vagina;
The fingerprint of data block to be stored and the second index fingerprint are sent to second memory;And receive the
The first probability that two memorizeies return, the first probability is used for representing the multiple fingers representated by the second index fingerprint
The probability with the fingerprint identical fingerprint of data block to be stored is included in stricture of vagina;
First memory specifically for:Receive the first index fingerprint and the number to be stored that backup server sends
According to the fingerprint of block, and determine and comprise and data block to be stored in multiple fingerprints that the first index fingerprint represents
First probability of fingerprint identical fingerprint, and the first probability is sent to backup server;
Second memory specifically for:Receive the second index fingerprint and the number to be stored that backup server sends
According to the fingerprint of block, and determine and comprise and data block to be stored in multiple fingerprints that the second index fingerprint represents
Second probability of fingerprint identical fingerprint, and the second probability is sent to backup server.
Optionally, in the embodiment of the present invention, first memory is stored with the first statistical table, the first statistical table
Comprise the statistical information of the numerical value of the first interval of multiple fingerprints representated by the first index fingerprint, and first
The statistical information of the numerical value of the second interval of multiple fingerprints representated by index fingerprint, first interval is each fingerprint
H position to i-th bit interval, second interval is the interval of the jth position of each fingerprint to kth position, wherein,
H, i, j, k are natural number, and the value of h is not more than the value of i, and the value of j is not more than the value of k, first interval
Not overlapping with second interval;
First memory specifically for:
The number of the first interval of the multiple fingerprints according to representated by the first statistical table determines a in the first index fingerprint
Frequency of occurrence t in value1And the numerical value of the second interval of multiple fingerprints representated by the first index fingerprint for the b
Frequency t of middle appearance2, wherein, a is the numerical value of the h position of the fingerprint of data block to be stored to i-th bit, b
Numerical value for jth position to the kth position of the fingerprint of data block to be stored;
According to t1And t2In minima determine the first probability.
Optionally, in the embodiment of the present invention, each fingerprint in the first fingerprint table comprises M position, each M
Position fingerprint comprises N number of interval, and each interval in N number of interval includes continuous S position in M position, N number of
In interval, any two interval is not overlapping, and N number of interval digit sum is M, and N is more than or equal to 2
Natural number, S be natural number;Be stored with first memory the first statistical table, and the first statistical table comprises
The statistical information of N number of interval numerical value of the multiple fingerprints representated by one index fingerprint;
First memory specifically for:A is determined according to the first statistical tableiMany representated by the first index fingerprint
Frequency of occurrence t in the numerical value in the i-th interval of individual fingerprinti, wherein, aiFor the fingerprint of data block to be stored
The interval numerical value of i, the span of i is 1 to N;
According to institute t1To tNIn minima determine the first probability.
Storage system in the present embodiment data processing method corresponding with Fig. 3 is based under same inventive concept
Two aspects, being above described in detail to the implementation process of method, so art technology
Personnel according to the structure of the storage system being well understood in the present embodiment described above and implementation process can be
Description succinct, here just repeats no more.
The one or more technical schemes providing in the embodiment of the present invention, at least have the following technical effect that or excellent
Point:
In technique scheme, backup server determines one by fingerprint index table in each fingerprint table
Index fingerprint represent multiple fingerprints, this index fingerprint represent multiple fingerprints fingerprint region include to be stored
The fingerprint of data block, the multiple fingerprints being then based on determining from each fingerprint table carry out the fingerprint of next step
Compare operation, decrease the workload of fingerprint carrying.Moreover, calculate at it (really for each fingerprint table
The multiple fingerprints made) in retrieve data block to be stored fingerprint probability, be then not less than pre- in probability
If carrying out fingerprint comparison in the multiple fingerprints determined of the fingerprint table of threshold value, fingerprint when reducing fingerprint comparison
Carrying amount, reduces the time-consuming of fingerprint comparison, that is, reduces whether determination data block to be stored is repeated data block
Time, improve data storage efficiency.
Those skilled in the art are it should be appreciated that in description and claims of this specification and above-mentioned accompanying drawing
Term " first ", " second ", the (if present) such as " the 3rd " " the 4th " be for distinguishing similar object,
Without for describing specific order or precedence.It should be appreciated that such data using is in suitable situation
Under can exchange, so that embodiments of the invention described herein for example can be with except illustrating or retouching here
Order beyond those stated is implemented.Additionally, term " comprising " and " having " and their any change
Shape, it is intended that covering non-exclusive comprising, for example, contains process, the side of series of steps or unit
Method, system, product or equipment are not necessarily limited to those steps clearly listed or unit, but may include not
Have clearly listing or for these processes, method, product or the intrinsic other steps of equipment or unit.
The embodiment of the present invention also provides a kind of computer program of data processing, including storing program generation
The computer-readable recording medium of code, the instruction that described program code includes is used for executing any one side aforementioned
Method flow described in method embodiment.It will appreciated by the skilled person that aforesaid storage medium bag
Include:USB flash disk, portable hard drive, magnetic disc, CD, random access memory (Random-Access Memory, RAM),
Solid state hard disc (Solid State Disk, SSD) or nonvolatile memory (non-volatile memory)
Etc. various can be with non-transitory (non-transitory) machine readable media of store program codes.
It should be noted that embodiment provided herein is only schematically.The technology of art
Personnel can be understood that, for convenience of description and succinctly, in the above-described embodiments, real to each
The description applying example all emphasizes particularly on different fields, and does not have the part describing in detail, may refer to other embodiment in certain embodiment
Associated description.The feature disclosing in the embodiment of the present invention, claim and accompanying drawing can be individually present
Presence can also be combined.The feature describing in the form of hardware in embodiments of the present invention can be held by software
OK, vice versa.Here does not limit.
Claims (14)
1. a kind of data processing method is it is characterised in that methods described is by the backup services in storage system
Device executes, and described storage system includes described backup server and multiple memorizer, described storage system
In be stored with multiple fingerprint tables, in the plurality of fingerprint table, record has and is stored in the plurality of memorizer
The fingerprint of data block, methods described includes:
First fingerprint collection is determined according to the fingerprint of the index fingerprint in fingerprint index table and data block to be stored
Close, wherein, in described first fingerprint set, include the first index fingerprint, the second index fingerprint, described the
One index fingerprint is used for representing multiple fingerprints in the first fingerprint table, and described second index fingerprint is used for representing the
Multiple fingerprints in two fingerprint tables, the fingerprint of described data block to be stored belongs to described first index fingerprint institute's generation
The fingerprint region of the multiple fingerprints representated by multiple fingerprints of table and described second index fingerprint;
The finger including in first fingerprint table with described data block to be stored is obtained according to the described first index fingerprint
First probability of stricture of vagina identical fingerprint, and bag in described second fingerprint table is obtained according to the described second index fingerprint
Containing the second probability with the fingerprint identical fingerprint of described data block to be stored, wherein, described first probability
It is to be determined according to the multiple fingerprints that the described first index fingerprint represents, described second probability is according to described the
Multiple fingerprints that two index fingerprints represent determine;
According to described first probability and second determine the probability the second fingerprint set, wherein, described second fingerprint collection
Include at least in conjunction and have described first index fingerprint, the first probability according to the described first index fingerprint determination is not
Less than predetermined threshold value;
Obtain the multiple fingerprints representated by described first index fingerprint and the fingerprint of described data block to be stored
Matching result.
2. the method for claim 1 it is characterised in that described first fingerprint table be stored in described
In first memory in multiple memorizeies, described second fingerprint table is stored in the plurality of memorizer
In two memorizeies;Described acquisition in the first fingerprint table according to the described first index fingerprint includes and waits to deposit with described
First probability of the fingerprint identical fingerprint of storage data block, including:
The fingerprint of described data block to be stored and described first index fingerprint are sent to the described first storage
Device;
Receive described first probability that described first memory returns, described first probability is used for representing described
The fingerprint identical with described data block to be stored is included in multiple fingerprints representated by first index fingerprint
The probability of fingerprint;
Described to be stored with described according to including in the described second index fingerprint described second fingerprint table of acquisition
Second probability of the fingerprint identical fingerprint of data block, including:
The fingerprint of described data block to be stored and described second index fingerprint are sent to the described second storage
Device;
Receive described first probability that described second memory returns, described first probability is used for representing described
The fingerprint identical with described data block to be stored is included in multiple fingerprints representated by second index fingerprint
The probability of fingerprint.
3. the method for claim 1 is it is characterised in that described backup server includes auxiliary deposits
Reservoir, described first fingerprint table and described second fingerprint table are stored in described additional storage;
Described acquisition in the first fingerprint table according to the described first index fingerprint includes and described data to be stored
First probability of the fingerprint identical fingerprint of block, and described second fingerprint is obtained according to the described second index fingerprint
The second probability with the fingerprint identical fingerprint of described data block to be stored is included in table, including:
The fingerprint of described data block to be stored and described first index fingerprint, described second index fingerprint are sent out
Deliver to described additional storage;
Receive in the multiple fingerprints representated by the described first index fingerprint that described additional storage returns and wrap
Containing described first probability with the fingerprint identical fingerprint of described data block to be stored, and described second
The fingerprint identical fingerprint with described data block to be stored is included in multiple fingerprints representated by index fingerprint
Described second probability.
4. the method as described in any one of claims 1 to 3 is it is characterised in that described first fingerprint table
In each fingerprint comprise M position, each M position fingerprint comprises N number of interval, every in described N number of interval
Continuous S position in individual interval inclusion M position, in described N number of interval, any two interval is not overlapping, described
N number of interval digit sum is M, and N is the natural number more than or equal to 2, and S is natural number;
Be stored with described storage system the first statistical table, and described first statistical table includes described first index
, in the statistical information of described N number of interval numerical value, described first probability is really for multiple fingerprints representated by fingerprint
Determine mode to include:
A is determined according to described first statistical tableiThe institute of the multiple fingerprints representated by the described first index fingerprint
State the frequency of occurrence t in the numerical value in the i-th intervali, wherein, aiI-th area for the fingerprint of data block to be stored
Between numerical value, the span of i is 1 to N;
According to the t obtaining1To tNIn minima determine described first probability.
5. the method as described in any one of claims 1 to 3 is it is characterised in that in described storage system
Be stored with the first statistical table, and described first statistical table comprises the multiple fingerprints representated by described first index fingerprint
The numerical value of first interval statistical information, and the of the multiple fingerprints representated by described first index fingerprint
The statistical information of the numerical value in two intervals, described first interval is the interval of the h position of each fingerprint to i-th bit,
Described second interval is the interval of the jth position of each fingerprint to kth position, and wherein, h, i, j, k are nature
Number, the value of h is not more than the value of i, and the value of j is not more than value, described first interval and the described second interval of k
Not overlapping;The determination mode of described first probability includes:
Described in multiple fingerprints according to representated by described first statistical table determines a in the described first index fingerprint
Frequency of occurrence t in the numerical value of first interval1And multiple fingerprints representated by the described first index fingerprint for the b
The numerical value of described second interval in occur frequency t2, wherein, a is the h of the fingerprint of data block to be stored
Position to i-th bit numerical value, b is the numerical value of the jth position of the fingerprint of data block to be stored to kth position;
According to described t1And t2In minima determine described first probability.
6. a kind of backup server is it is characterised in that described backup server is applied in storage system,
Described storage system includes described backup server and multiple memorizer, is stored with many in described storage system
Individual fingerprint table, in the plurality of fingerprint table, record has the finger of the data block being stored in the plurality of memorizer
Stricture of vagina, described backup server includes:
Determining module, true for the fingerprint according to the index fingerprint in fingerprint index table and data block to be stored
Fixed first fingerprint set, wherein, includes the first index fingerprint, the second index in described first fingerprint set
Fingerprint, described first index fingerprint is used for representing the multiple fingerprints in the first fingerprint table, and described second index refers to
Stricture of vagina is used for representing the multiple fingerprints in the second fingerprint table, and the fingerprint of described data block to be stored belongs to described first
Multiple fingerprints representated by index fingerprint and the fingerprint of the multiple fingerprints representated by described second index fingerprint
Scope;
Obtain module, treat with described for obtaining to include in the first fingerprint table according to the described first index fingerprint
First probability of the fingerprint identical fingerprint of data storage block, and according to the described second index fingerprint obtains
The second probability with the fingerprint identical fingerprint of described data block to be stored is included in second fingerprint table, its
In, described first probability is to be determined according to multiple fingerprints that the described first index fingerprint represents, described second
Probability is to be determined according to multiple fingerprints that the described second index fingerprint represents;
Described determining module, is additionally operable to according to described first probability and second determine the probability the second fingerprint set,
Wherein, including at least in described second fingerprint set has described first index fingerprint, according to the described first index
The first probability that fingerprint determines is not less than predetermined threshold value;
Processing module, for obtaining multiple fingerprints and the described number to be stored representated by described first index fingerprint
Matching result according to the fingerprint of block.
7. backup server as claimed in claim 6 is it is characterised in that described first fingerprint table stores
In the first memory in the plurality of memorizer, described second fingerprint table is stored in the plurality of memorizer
In second memory in;
Described acquisition module specifically for:The fingerprint of described data block to be stored and described first index are referred to
Stricture of vagina sends to described first memory;And receive described first probability that described first memory returns, described
First probability is used for representing in the multiple fingerprints representated by the described first index fingerprint including to be treated with described
The probability of the fingerprint identical fingerprint of data storage block;And
The fingerprint of described data block to be stored and described second index fingerprint are sent to the described second storage
Device;And receiving described second probability that described second memory returns, described second probability is used for representing in institute
State include in the multiple fingerprints representated by the second index fingerprint identical with the fingerprint of described data block to be stored
Fingerprint probability.
8. backup server as claimed in claim 6 is it is characterised in that described backup server also wraps
Include:
Additional storage, for storing the first fingerprint table and described second fingerprint table;
Described acquisition module specifically for:The fingerprint of described data block to be stored and described first index are referred to
Stricture of vagina, described second index fingerprint send to described additional storage;Receive that described additional storage returns
The fingerprint phase with described data block to be stored is included in multiple fingerprints representated by described first index fingerprint
Comprise in described first probability of same fingerprint, and the multiple fingerprints representated by the described second index fingerprint
There is described second probability with the fingerprint identical fingerprint of described data block to be stored.
9. backup server as claimed in claim 8 it is characterised in that:
Each fingerprint in described first fingerprint table comprises M position, and each M position fingerprint comprises N number of interval,
Each interval in described N number of interval includes continuous S position in M position, any two in described N number of interval
Individual interval is not overlapping, and described N number of interval digit sum is M, and N is the nature more than or equal to 2
Number, S is natural number;It is additionally operable to store the first statistical table, described first statistical table in described additional storage
Comprise the statistical information of the described N number of interval numerical value of multiple fingerprints representated by described first index fingerprint;
Described additional storage is additionally operable to:A is determined according to described first statistical tableiIndex fingerprint described first
Frequency of occurrence t in the numerical value in described i-th interval of representative multiple fingerprintsi, wherein, aiFor number to be stored
According to the numerical value in the i-th interval of the fingerprint of block, the span of i is 1 to N, and according to institute t1To tNIn
Minima determines described first probability.
10. backup server as claimed in claim 8 is it is characterised in that described additional storage is also used
In storing the first statistical table, described first statistical table comprises the multiple fingerprints representated by described first index fingerprint
The numerical value of first interval statistical information, and the of the multiple fingerprints representated by described first index fingerprint
The statistical information of the numerical value in two intervals, described first interval is the interval of the h position of each fingerprint to i-th bit,
Described second interval is the interval of the jth position of each fingerprint to kth position, and wherein, h, i, j, k are nature
Number, the value of h is not more than the value of i, and the value of j is not more than value, described first interval and the described second interval of k
Not overlapping;
Described additional storage is additionally operable to:Determine that a indexes fingerprint described first according to described first statistical table
Frequency of occurrence t in the numerical value of described first interval of representative multiple fingerprints1And b is in described first rope
Draw frequency t occurring in the numerical value of described second interval of the multiple fingerprints representated by fingerprint2, wherein, a is
The h position of the fingerprint of data block to be stored to i-th bit numerical value, b is the jth of the fingerprint of data block to be stored
The numerical value of position to kth position;And according to described t1And t2In minima determine described first probability.
A kind of 11. storage systems are it is characterised in that including backup server and multiple memorizer, described
Be stored with storage system multiple fingerprint tables, and in the plurality of fingerprint table, record has and is stored in the plurality of depositing
The fingerprint of the data block in reservoir;
Described backup server is used for:
First fingerprint collection is determined according to the fingerprint of the index fingerprint in fingerprint index table and data block to be stored
Close, wherein, in described first fingerprint set, include the first index fingerprint, the second index fingerprint, described the
One index fingerprint is used for representing multiple fingerprints in the first fingerprint table, and described second index fingerprint is used for representing the
Multiple fingerprints in two fingerprint tables, the fingerprint of described data block to be stored belongs to described first index fingerprint institute's generation
The fingerprint region of the multiple fingerprints representated by multiple fingerprints of table and described second index fingerprint;
The finger including in first fingerprint table with described data block to be stored is obtained according to the described first index fingerprint
First probability of stricture of vagina identical fingerprint, and bag in described second fingerprint table is obtained according to the described second index fingerprint
Containing the second probability with the fingerprint identical fingerprint of described data block to be stored, wherein, described first probability
It is to be determined according to the multiple fingerprints that the described first index fingerprint represents, described second probability is according to described the
Multiple fingerprints that two index fingerprints represent determine;
According to described first probability and second determine the probability the second fingerprint set, wherein, described second fingerprint collection
Include at least in conjunction and have described first index fingerprint, the first probability according to the described first index fingerprint determination is not
Less than predetermined threshold value;
Obtain the multiple fingerprints representated by described first index fingerprint and the fingerprint of described data block to be stored
Matching result.
12. storage systems as claimed in claim 11 are it is characterised in that described first fingerprint table stores
In the first memory in the plurality of memorizer, described second fingerprint table is stored in the plurality of memorizer
In second memory in;Described backup server specifically for:
The fingerprint of described data block to be stored and described first index fingerprint are sent to the described first storage
Device;And receiving described first probability that described first memory returns, described first probability is used for representing
The fingerprint phase with described data block to be stored is included in multiple fingerprints representated by described first index fingerprint
The probability of same fingerprint;
The fingerprint of described data block to be stored and described second index fingerprint are sent to the described second storage
Device;And receiving described first probability that described second memory returns, described first probability is used for representing
The fingerprint phase with described data block to be stored is included in multiple fingerprints representated by described second index fingerprint
The probability of same fingerprint;
Described first memory specifically for:Receive the first index fingerprint that described backup server sends and
The fingerprint of described data block to be stored, and determine described first index fingerprint represent multiple fingerprints in comprise
With the first probability of the fingerprint identical fingerprint of described data block to be stored, and by described first probability send to
Described backup server;
Described second memory specifically for:Receive the second index fingerprint that described backup server sends and
The fingerprint of described data block to be stored, and determine described second index fingerprint represent multiple fingerprints in comprise
With the second probability of the fingerprint identical fingerprint of described data block to be stored, and by described second probability send to
Described backup server.
13. storage systems as described in claim 11 or 12 are it is characterised in that described first fingerprint table
In each fingerprint comprise M position, each M position fingerprint comprises N number of interval, every in described N number of interval
Continuous S position in individual interval inclusion M position, in described N number of interval, any two interval is not overlapping, described
N number of interval digit sum is M, and N is the natural number more than or equal to 2, and S is natural number;Described
Be stored with first memory the first statistical table, and described first statistical table comprises described first index fingerprint institute's generation
The statistical information of described N number of interval numerical value of multiple fingerprints of table;
Described first memory specifically for:A is determined according to described first statistical tableiRefer in the described first index
Frequency of occurrence t in the numerical value in described i-th interval of the multiple fingerprints representated by stricture of vaginai, wherein, aiFor to be stored
The numerical value in the i-th interval of the fingerprint of data block, the span of i is 1 to N;
According to institute t1To tNIn minima determine described first probability.
14. storage systems as described in claim 11 or 12 it is characterised in that:Described first memory
On be stored with the first statistical table, described first statistical table comprises the multiple fingers representated by described first index fingerprint
The statistical information of the numerical value of the first interval of stricture of vagina, and the multiple fingerprints representated by described first index fingerprint
The statistical information of the numerical value of second interval, described first interval is the interval of the h position of each fingerprint to i-th bit,
Described second interval is the interval of the jth position of each fingerprint to kth position, and wherein, h, i, j, k are nature
Number, the value of h is not more than the value of i, and the value of j is not more than value, described first interval and the described second interval of k
Not overlapping;
Described first memory specifically for:
Described in multiple fingerprints according to representated by described first statistical table determines a in the described first index fingerprint
Frequency of occurrence t in the numerical value of first interval1And multiple fingerprints representated by the described first index fingerprint for the b
The numerical value of described second interval in occur frequency t2, wherein, a is the h of the fingerprint of data block to be stored
Position to i-th bit numerical value, b is the numerical value of the jth position of the fingerprint of data block to be stored to kth position;
According to described t1And t2In minima determine described first probability.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510468057.9A CN106407226B (en) | 2015-07-31 | 2015-07-31 | A kind of data processing method, backup server and storage system |
PCT/CN2016/091054 WO2017020735A1 (en) | 2015-07-31 | 2016-07-22 | Data processing method, backup server and storage system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510468057.9A CN106407226B (en) | 2015-07-31 | 2015-07-31 | A kind of data processing method, backup server and storage system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106407226A true CN106407226A (en) | 2017-02-15 |
CN106407226B CN106407226B (en) | 2019-09-13 |
Family
ID=57942441
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510468057.9A Active CN106407226B (en) | 2015-07-31 | 2015-07-31 | A kind of data processing method, backup server and storage system |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN106407226B (en) |
WO (1) | WO2017020735A1 (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107317723A (en) * | 2017-05-27 | 2017-11-03 | 北京金山安全软件有限公司 | Data processing method and server |
CN110582091A (en) * | 2018-06-11 | 2019-12-17 | 中国移动通信集团浙江有限公司 | method and apparatus for locating wireless quality problems |
CN111427871A (en) * | 2019-01-09 | 2020-07-17 | 阿里巴巴集团控股有限公司 | Data processing method, device and equipment |
TWI700905B (en) * | 2018-01-18 | 2020-08-01 | 香港商阿里巴巴集團服務有限公司 | Data processing method, device and equipment |
CN115988002A (en) * | 2023-02-16 | 2023-04-18 | 荣耀终端有限公司 | Data transmission method and electronic equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103019887A (en) * | 2012-12-12 | 2013-04-03 | 华为技术有限公司 | Data backup method and device |
CN103514250A (en) * | 2013-06-20 | 2014-01-15 | 易乐天 | Method and system for deleting global repeating data and storage device |
CN103678293A (en) * | 2012-08-29 | 2014-03-26 | 百度在线网络技术(北京)有限公司 | Data storage method and device |
US20150088899A1 (en) * | 2013-09-23 | 2015-03-26 | Spotify Ab | System and method for identifying a segment of a file that includes target content |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2005050620A1 (en) * | 2003-11-18 | 2005-06-02 | Koninklijke Philips Electronics N.V. | Matching data objects by matching derived fingerprints |
CN101477523B (en) * | 2008-11-24 | 2011-07-20 | 北京邮电大学 | Index structure and retrieval method for ultra-large fingerprint base |
CN103235791B (en) * | 2013-03-29 | 2019-03-26 | 厦门雅迅网络股份有限公司 | A kind of fingerprint matching optimum position method based on rank |
-
2015
- 2015-07-31 CN CN201510468057.9A patent/CN106407226B/en active Active
-
2016
- 2016-07-22 WO PCT/CN2016/091054 patent/WO2017020735A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103678293A (en) * | 2012-08-29 | 2014-03-26 | 百度在线网络技术(北京)有限公司 | Data storage method and device |
CN103019887A (en) * | 2012-12-12 | 2013-04-03 | 华为技术有限公司 | Data backup method and device |
CN103514250A (en) * | 2013-06-20 | 2014-01-15 | 易乐天 | Method and system for deleting global repeating data and storage device |
US20150088899A1 (en) * | 2013-09-23 | 2015-03-26 | Spotify Ab | System and method for identifying a segment of a file that includes target content |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107317723A (en) * | 2017-05-27 | 2017-11-03 | 北京金山安全软件有限公司 | Data processing method and server |
TWI700905B (en) * | 2018-01-18 | 2020-08-01 | 香港商阿里巴巴集團服務有限公司 | Data processing method, device and equipment |
CN110582091A (en) * | 2018-06-11 | 2019-12-17 | 中国移动通信集团浙江有限公司 | method and apparatus for locating wireless quality problems |
CN111427871A (en) * | 2019-01-09 | 2020-07-17 | 阿里巴巴集团控股有限公司 | Data processing method, device and equipment |
CN111427871B (en) * | 2019-01-09 | 2024-03-29 | 阿里巴巴集团控股有限公司 | Data processing method, device and equipment |
CN115988002A (en) * | 2023-02-16 | 2023-04-18 | 荣耀终端有限公司 | Data transmission method and electronic equipment |
CN115988002B (en) * | 2023-02-16 | 2023-08-15 | 荣耀终端有限公司 | Data transmission method and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
WO2017020735A1 (en) | 2017-02-09 |
CN106407226B (en) | 2019-09-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106407226A (en) | Data processing method, backup server and storage system | |
CN104461390B (en) | Write data into the method and device of imbricate magnetic recording SMR hard disks | |
CN104679778B (en) | A kind of generation method and device of search result | |
CN104346458B (en) | Date storage method and storage device | |
CN110110006A (en) | Data managing method and Related product | |
CN104133661A (en) | Multi-core parallel hash partitioning optimizing method based on column storage | |
CN103914363B (en) | A kind of internal memory monitoring method and relevant apparatus | |
CN106407224A (en) | Method and device for file compaction in KV (Key-Value)-Store system | |
CN110471900A (en) | Data processing method and terminal device | |
CN102508872A (en) | Data processing method and system of online processing system based on memory | |
CN104102549B (en) | A kind of method, apparatus and chip for realizing multithreading mutually exclusive operation | |
CN110008246A (en) | Metadata management method and device | |
CN104615684A (en) | Mass data communication concurrent processing method and system | |
CN107682395A (en) | A kind of big data cloud computing runtime and method | |
CN107122354A (en) | Affairs perform method, apparatus and system | |
CN102722450A (en) | Storage method for redundancy deletion block device based on location-sensitive hash | |
CN201804331U (en) | Date deduplication system based on co-processor | |
CN110119396A (en) | Data managing method and Related product | |
CN104050189B (en) | The page shares processing method and processing device | |
CN107346342A (en) | A kind of file call method calculated based on storage and system | |
CN104298614B (en) | Data block storage method and storage device in storage device | |
CN104598171B (en) | Array method for reconstructing and device based on metadata | |
CN106775450B (en) | A kind of data distribution method in mixing storage system | |
CN104077282B (en) | The method and apparatus of processing data | |
CN108710606A (en) | A kind of Task Progress monitoring method, computer readable storage medium and terminal device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |