CN106407226A - Data processing method, backup server and storage system - Google Patents

Data processing method, backup server and storage system Download PDF

Info

Publication number
CN106407226A
CN106407226A CN201510468057.9A CN201510468057A CN106407226A CN 106407226 A CN106407226 A CN 106407226A CN 201510468057 A CN201510468057 A CN 201510468057A CN 106407226 A CN106407226 A CN 106407226A
Authority
CN
China
Prior art keywords
fingerprint
index
stored
probability
data block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510468057.9A
Other languages
Chinese (zh)
Other versions
CN106407226B (en
Inventor
吴晨涛
黄洵松
薛常亮
王元钢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Shanghai Jiaotong University
Original Assignee
Huawei Technologies Co Ltd
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd, Shanghai Jiaotong University filed Critical Huawei Technologies Co Ltd
Priority to CN201510468057.9A priority Critical patent/CN106407226B/en
Priority to PCT/CN2016/091054 priority patent/WO2017020735A1/en
Publication of CN106407226A publication Critical patent/CN106407226A/en
Application granted granted Critical
Publication of CN106407226B publication Critical patent/CN106407226B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1748De-duplication implemented within the file system, e.g. based on file segments
    • G06F16/1752De-duplication implemented within the file system, e.g. based on file segments based on file chunks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/1666Error detection or correction of the data by redundancy in hardware where the redundant component is memory or memory area
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1748De-duplication implemented within the file system, e.g. based on file segments

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Collating Specific Patterns (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

The invention discloses a data processing method, a backup server and a storage system, and aims to solve the problems of low data storage efficiency caused by fingerprint comparison consuming mass I/O resources; the data processing method comprises the following steps: determining a first fingerprint set according to index fingerprints in a fingerprint index table and a fingerprint of a to-be-stored data block; obtaining a first probability that the first fingerprint table contains the fingerprint identical to the fingerprint of the to-be-stored data block according to the first index fingerprint, and obtaining a second probability that the second fingerprint table contains the fingerprint identical to the fingerprint of the to-be-stored data block according to the second index fingerprint; determining a second fingerprint set according to the first and second probabilities; obtaining a matching result between the plurality of fingerprints represented by the first index fingerprint and the fingerprint of the to-be-stored data block.

Description

A kind of data processing method, backup server and storage system
Technical field
The present invention relates to field of computer technology, particularly to a kind of data processing method, backup server and Storage system.
Background technology
In field of data storage, data de-duplication technology is a kind of crucial skill saving data space Art, it can detect and eliminate data redundancy, and identical data is left behind with a copy, not only can be relatively The earth saves disk space, and can lift the write performance of data, and saves the network bandwidth, wide General it is applied to the fields such as file backup, online storage service, E-mail service.
In prior art, be stored with storage system fingerprint table, preserves and be stored in storage system in fingerprint table The fingerprint of the data in system.When receiving data storage request, storage system is by the finger of data block to be stored Stricture of vagina is compared with the fingerprint in fingerprint table, whether to determine data block to be stored as repeated data, and then really Surely treat the storage mode of data storage block.
But, when the fingerprint in the fingerprint of data block to be stored and fingerprint table is compared, need fingerprint Fingerprint in table reads in the internal memory of backup server, due to preserving the fingerprint of magnanimity in fingerprint table, because This, carry out fingerprint comparison in whole fingerprint tables and will produce the carrying of substantial amounts of data, needs to consume substantial amounts of defeated Enter output (English:input/output;Referred to as:I/O) resource, devotes a tremendous amount of time, and leads to storage system The inefficiency of system data storage.
Content of the invention
The embodiment of the present invention provides a kind of data processing method, backup server and storage system, for solving Because fingerprint comparison consumption a large amount of I/O resource leads to the less efficient problem of data storage.
In a first aspect, the embodiment of the present invention provides a kind of data processing method, methods described is by storage system Backup server execution, described storage system includes described backup server and multiple memorizer, institute State multiple fingerprint tables that are stored with storage system, in the plurality of fingerprint table record have be stored in the plurality of The fingerprint of the data block in memorizer, methods described includes:
First fingerprint collection is determined according to the fingerprint of the index fingerprint in fingerprint index table and data block to be stored Close, wherein, in described first fingerprint set, include the first index fingerprint, the second index fingerprint, described the One index fingerprint is used for representing multiple fingerprints in the first fingerprint table, and described second index fingerprint is used for representing the Multiple fingerprints in two fingerprint tables, the fingerprint of described data block to be stored belongs to described first index fingerprint institute's generation The fingerprint region of the multiple fingerprints representated by multiple fingerprints of table and described second index fingerprint;
The finger including in first fingerprint table with described data block to be stored is obtained according to the described first index fingerprint First probability of stricture of vagina identical fingerprint, and bag in described second fingerprint table is obtained according to the described second index fingerprint Containing the second probability with the fingerprint identical fingerprint of described data block to be stored, wherein, described first probability It is to be determined according to the multiple fingerprints that the described first index fingerprint represents, described second probability is according to described the Multiple fingerprints that two index fingerprints represent determine;
According to described first probability and second determine the probability the second fingerprint set, wherein, described second fingerprint collection Include at least in conjunction and have described first index fingerprint, the first probability according to the described first index fingerprint determination is not Less than predetermined threshold value;
Obtain the multiple fingerprints representated by described first index fingerprint and the fingerprint of described data block to be stored Matching result.
In conjunction with a first aspect, in the first possible implementation of first aspect, described first fingerprint table It is stored in the first memory in the plurality of memorizer, described second fingerprint table is stored in the plurality of depositing In second memory in reservoir;Described acquisition in the first fingerprint table according to the described first index fingerprint includes With the first probability of the fingerprint identical fingerprint of described data block to be stored, including:
The fingerprint of described data block to be stored and described first index fingerprint are sent to the described first storage Device;
Receive described first probability that described first memory returns, described first probability is used for representing described The fingerprint identical with described data block to be stored is included in multiple fingerprints representated by first index fingerprint The probability of fingerprint;
Described to be stored with described according to including in the described second index fingerprint described second fingerprint table of acquisition Second probability of the fingerprint identical fingerprint of data block, including:
The fingerprint of described data block to be stored and described second index fingerprint are sent to the described second storage Device;
Receive described first probability that described second memory returns, described first probability is used for representing described The fingerprint identical with described data block to be stored is included in multiple fingerprints representated by second index fingerprint The probability of fingerprint.
In conjunction with a first aspect, in the possible implementation of the second of first aspect, described backup server Including additional storage, described first fingerprint table and described second fingerprint table are stored in described additional storage In;
Described acquisition in the first fingerprint table according to the described first index fingerprint includes and described data to be stored First probability of the fingerprint identical fingerprint of block, and described second fingerprint is obtained according to the described second index fingerprint The second probability with the fingerprint identical fingerprint of described data block to be stored is included in table, including:
The fingerprint of described data block to be stored and described first index fingerprint, described second index fingerprint are sent out Deliver to described additional storage;
Receive in the multiple fingerprints representated by the described first index fingerprint that described additional storage returns and wrap Containing described first probability with the fingerprint identical fingerprint of described data block to be stored, and described second The fingerprint identical fingerprint with described data block to be stored is included in multiple fingerprints representated by index fingerprint Described second probability.
In conjunction with first aspect, the first possible implementation of first aspect and the second of first aspect Any one of possible implementation, in the third possible implementation of first aspect, described Each fingerprint in one fingerprint table comprises M position, and each M position fingerprint comprises N number of interval, described N number of area Between in each interval include continuous S position in M position, in described N number of interval, any two interval does not weigh Folded, described N number of interval digit sum is M, and N is the natural number more than or equal to 2, and S is nature Number;
Be stored with described storage system the first statistical table, and described first statistical table includes described first index , in the statistical information of described N number of interval numerical value, described first probability is really for multiple fingerprints representated by fingerprint Determine mode to include:
A is determined according to described first statistical tableiThe institute of the multiple fingerprints representated by the described first index fingerprint State the frequency of occurrence t in the numerical value in the i-th intervali, wherein, aiI-th area for the fingerprint of data block to be stored Between numerical value, the span of i is 1 to N;
According to the t obtaining1To tNIn minima determine described first probability.
In conjunction with first aspect, the first possible implementation of first aspect and the second of first aspect Any one of possible implementation, in the 4th kind of possible implementation of first aspect, described deposits Be stored with storage system the first statistical table, and described first statistical table comprises representated by described first index fingerprint The statistical information of the numerical value of the first interval of multiple fingerprints, and multiple representated by described first index fingerprint The statistical information of the numerical value of the second interval of fingerprint, described first interval is the h position of each fingerprint to i-th bit Interval, described second interval is the interval of the jth position of each fingerprint to kth position, wherein, h, i, j, k It is natural number, the value of h is not more than the value of i, the value of j is not more than the value of k, described first interval and described Second interval is not overlapping;The determination mode of described first probability includes:
Described in multiple fingerprints according to representated by described first statistical table determines a in the described first index fingerprint Frequency of occurrence t in the numerical value of first interval1And multiple fingerprints representated by the described first index fingerprint for the b The numerical value of described second interval in occur frequency t2, wherein, a is the h of the fingerprint of data block to be stored Position to i-th bit numerical value, b is the numerical value of the jth position of the fingerprint of data block to be stored to kth position;
According to described t1And t2In minima determine described first probability.
Second aspect, the embodiment of the present invention provides a kind of backup server, and described backup server is applied to deposit In storage system, described storage system includes described backup server and multiple memorizer, described storage system In be stored with multiple fingerprint tables, in the plurality of fingerprint table, record has and is stored in the plurality of memorizer The fingerprint of data block, described backup server includes:
Determining module, true for the fingerprint according to the index fingerprint in fingerprint index table and data block to be stored Fixed first fingerprint set, wherein, includes the first index fingerprint, the second index in described first fingerprint set Fingerprint, described first index fingerprint is used for representing the multiple fingerprints in the first fingerprint table, and described second index refers to Stricture of vagina is used for representing the multiple fingerprints in the second fingerprint table, and the fingerprint of described data block to be stored belongs to described first Multiple fingerprints representated by index fingerprint and the fingerprint of the multiple fingerprints representated by described second index fingerprint Scope;
Obtain module, treat with described for obtaining to include in the first fingerprint table according to the described first index fingerprint First probability of the fingerprint identical fingerprint of data storage block, and according to the described second index fingerprint obtains The second probability with the fingerprint identical fingerprint of described data block to be stored is included in second fingerprint table, its In, described first probability is to be determined according to multiple fingerprints that the described first index fingerprint represents, described second Probability is to be determined according to multiple fingerprints that the described second index fingerprint represents;
Described determining module, is additionally operable to according to described first probability and second determine the probability the second fingerprint set, Wherein, including at least in described second fingerprint set has described first index fingerprint, according to the described first index The first probability that fingerprint determines is not less than predetermined threshold value;
Processing module, for obtaining multiple fingerprints and the described number to be stored representated by described first index fingerprint Matching result according to the fingerprint of block.
In conjunction with second aspect, in the first possible implementation of second aspect, described first fingerprint table It is stored in the first memory in the plurality of memorizer, described second fingerprint table is stored in the plurality of depositing In second memory in reservoir;
Described acquisition module specifically for:The fingerprint of described data block to be stored and described first index are referred to Stricture of vagina sends to described first memory;And receive described first probability that described first memory returns, described First probability is used for representing in the multiple fingerprints representated by the described first index fingerprint including to be treated with described The probability of the fingerprint identical fingerprint of data storage block;And
The fingerprint of described data block to be stored and described second index fingerprint are sent to the described second storage Device;And receiving described second probability that described second memory returns, described second probability is used for representing in institute State include in the multiple fingerprints representated by the second index fingerprint identical with the fingerprint of described data block to be stored Fingerprint probability.
In conjunction with second aspect, in the possible implementation of the second of second aspect, described backup server Also include:
Additional storage, for storing the first fingerprint table and described second fingerprint table;
Described acquisition module specifically for:The fingerprint of described data block to be stored and described first index are referred to Stricture of vagina, described second index fingerprint send to described additional storage;Receive that described additional storage returns The fingerprint phase with described data block to be stored is included in multiple fingerprints representated by described first index fingerprint Comprise in described first probability of same fingerprint, and the multiple fingerprints representated by the described second index fingerprint There is described second probability with the fingerprint identical fingerprint of described data block to be stored.
In conjunction with the possible implementation of the second of second aspect, in the third possible realization of second aspect In mode, each fingerprint in described first fingerprint table comprises M position, and each M position fingerprint comprises N number of area Between, each interval in described N number of interval includes continuous S position in M position, appointing in described N number of interval Two intervals of meaning are not overlapping, and described N number of interval digit sum is M, and N is more than or equal to 2 Natural number, S is natural number;It is additionally operable to store the first statistical table, described first system in described additional storage Meter table comprises the statistics of the described N number of interval numerical value of multiple fingerprints representated by described first index fingerprint Information;
Described additional storage is additionally operable to:A is determined according to described first statistical tableiIndex fingerprint described first Frequency of occurrence t in the numerical value in described i-th interval of representative multiple fingerprintsi, wherein, aiFor number to be stored According to the numerical value in the i-th interval of the fingerprint of block, the span of i is 1 to N, and according to institute t1To tNIn Minima determines described first probability.
In conjunction with the possible implementation of the second of second aspect, in the 4th kind of possible realization of second aspect In mode, described additional storage is additionally operable to store the first statistical table, and described first statistical table comprises described The statistical information of the numerical value of the first interval of multiple fingerprints representated by one index fingerprint, and described first rope Draw the statistical information of the numerical value of the second interval of multiple fingerprints representated by fingerprint, described first interval is each finger The h position of stricture of vagina to i-th bit interval, described second interval is the interval of the jth position of each fingerprint to kth position, Wherein, h, i, j, k are natural number, and the value of h is not more than the value of i, and the value of j is not more than the value of k, institute State first interval and described second interval is not overlapping;
Described additional storage is additionally operable to:Determine that a indexes fingerprint described first according to described first statistical table Frequency of occurrence t in the numerical value of described first interval of representative multiple fingerprints1And b is in described first rope Draw frequency t occurring in the numerical value of described second interval of the multiple fingerprints representated by fingerprint2, wherein, a is The h position of the fingerprint of data block to be stored to i-th bit numerical value, b is the jth of the fingerprint of data block to be stored The numerical value of position to kth position;And according to described t1And t2In minima determine described first probability.
The third aspect, the embodiment of the present invention provides a kind of storage system, including backup server and multiple deposit Reservoir, be stored with described storage system multiple fingerprint tables, and in the plurality of fingerprint table, record has and is stored in The fingerprint of the data block in the plurality of memorizer;
Described backup server is used for:
First fingerprint collection is determined according to the fingerprint of the index fingerprint in fingerprint index table and data block to be stored Close, wherein, in described first fingerprint set, include the first index fingerprint, the second index fingerprint, described the One index fingerprint is used for representing multiple fingerprints in the first fingerprint table, and described second index fingerprint is used for representing the Multiple fingerprints in two fingerprint tables, the fingerprint of described data block to be stored belongs to described first index fingerprint institute's generation The fingerprint region of the multiple fingerprints representated by multiple fingerprints of table and described second index fingerprint;
The finger including in first fingerprint table with described data block to be stored is obtained according to the described first index fingerprint First probability of stricture of vagina identical fingerprint, and bag in described second fingerprint table is obtained according to the described second index fingerprint Containing the second probability with the fingerprint identical fingerprint of described data block to be stored, wherein, described first probability It is to be determined according to the multiple fingerprints that the described first index fingerprint represents, described second probability is according to described the Multiple fingerprints that two index fingerprints represent determine;
According to described first probability and second determine the probability the second fingerprint set, wherein, described second fingerprint collection Include at least in conjunction and have described first index fingerprint, the first probability according to the described first index fingerprint determination is not Less than predetermined threshold value;
Obtain the multiple fingerprints representated by described first index fingerprint and the fingerprint of described data block to be stored Matching result.
In conjunction with the third aspect, in the first possible implementation of the third aspect, described first fingerprint table It is stored in the first memory in the plurality of memorizer, described second fingerprint table is stored in the plurality of depositing In second memory in reservoir;Described backup server specifically for:
The fingerprint of described data block to be stored and described first index fingerprint are sent to the described first storage Device;And receiving described first probability that described first memory returns, described first probability is used for representing The fingerprint phase with described data block to be stored is included in multiple fingerprints representated by described first index fingerprint The probability of same fingerprint;
The fingerprint of described data block to be stored and described second index fingerprint are sent to the described second storage Device;And receiving described first probability that described second memory returns, described first probability is used for representing The fingerprint phase with described data block to be stored is included in multiple fingerprints representated by described second index fingerprint The probability of same fingerprint;
Described first memory specifically for:Receive the first index fingerprint that described backup server sends and The fingerprint of described data block to be stored, and determine described first index fingerprint represent multiple fingerprints in comprise With the first probability of the fingerprint identical fingerprint of described data block to be stored, and by described first probability send to Described backup server;
Described second memory specifically for:Receive the second index fingerprint that described backup server sends and The fingerprint of described data block to be stored, and determine described second index fingerprint represent multiple fingerprints in comprise With the second probability of the fingerprint identical fingerprint of described data block to be stored, and by described second probability send to Described backup server.
In conjunction with the first possible implementation of the third aspect or the third aspect, in the second of the third aspect In possible implementation, each fingerprint in described first fingerprint table comprises M position, each M position fingerprint Comprise N number of interval, each interval in described N number of interval includes continuous S position, described N in M position In individual interval, any two interval is not overlapping, and described N number of interval digit sum is M, N be more than or The natural number that person is equal to 2, S is natural number;Be stored with described first memory the first statistical table, described First statistical table comprises described N number of interval numerical value of the multiple fingerprints representated by described first index fingerprint Statistical information;
Described first memory specifically for:A is determined according to described first statistical tableiRefer in the described first index Frequency of occurrence t in the numerical value in described i-th interval of the multiple fingerprints representated by stricture of vaginai, wherein, aiFor to be stored The numerical value in the i-th interval of the fingerprint of data block, the span of i is 1 to N;
According to institute t1To tNIn minima determine described first probability.
In conjunction with the first possible implementation of the third aspect or the third aspect, the third aspect the third In possible implementation, described first memory is stored with the first statistical table, described first statistical table bag The statistical information of the numerical value of the first interval containing the multiple fingerprints representated by the described first index fingerprint, Yi Jisuo State the statistical information of the numerical value of the second interval of multiple fingerprints representated by the first index fingerprint, described firstth area Between be the h position of each fingerprint to i-th bit interval, described second interval is the jth position of each fingerprint to kth The interval of position, wherein, h, i, j, k are natural number, and the value of h is not more than the value of i, and the value of j is not more than The value of k, described first interval and described second interval be not overlapping;
Described first memory specifically for:
Described in multiple fingerprints according to representated by described first statistical table determines a in the described first index fingerprint Frequency of occurrence t in the numerical value of first interval1And multiple fingerprints representated by the described first index fingerprint for the b The numerical value of described second interval in occur frequency t2, wherein, a is the h of the fingerprint of data block to be stored Position to i-th bit numerical value, b is the numerical value of the jth position of the fingerprint of data block to be stored to kth position;
According to described t1And t2In minima determine described first probability.
In the embodiment of the present invention, backup server first according to the index fingerprint in fingerprint index table and is waited to deposit The fingerprint of storage data block determines the first fingerprint set of the fingerprint that may include data block to be stored.So Afterwards, backup server obtains in the first fingerprint table according to the first index fingerprint in described first fingerprint set and wraps Containing the first probability with the fingerprint identical fingerprint of described data block to be stored, and according to described first fingerprint The second index fingerprint in set obtains the fingerprint including in the second fingerprint table with described data block to be stored Second probability of identical fingerprint.It is more than predetermined threshold value further according in the first probability obtaining and the second probability Determine the probability the second fingerprint set, and by described second fingerprint set index fingerprint representated by multiple fingers Stricture of vagina is mated with the fingerprint of described data block to be stored, to obtain matching result.By the embodiment of the present invention The data processing method providing, during fingerprint matching, can only by the fingerprint of data block to be stored with obtain The second fingerprint set in the multiple fingerprints representated by index fingerprint mated, and need not will be to be stored The fingerprint of data block is mated with the fingerprint of all data in fingerprint base, during decreasing fingerprint comparison The carrying amount of data, improves the efficiency of data processing.
Brief description
For the technical scheme being illustrated more clearly that in the embodiment of the present invention, below will be to institute in embodiment description Need the accompanying drawing using to briefly introduce it should be apparent that, drawings in the following description are only the present invention The accompanying drawing of some embodiments.
Fig. 1 is a kind of structural representation of storage system provided in an embodiment of the present invention;
Fig. 2 is a kind of schematic diagram of fingerprint table provided in an embodiment of the present invention;
Fig. 3 is a kind of schematic flow sheet of data processing method provided in an embodiment of the present invention;
Fig. 4 is a kind of structural schematic block diagram of storage system provided in an embodiment of the present invention;
Fig. 5 is a kind of schematic diagram of fingerprint index table provided in an embodiment of the present invention;
Fig. 6 is a kind of schematic diagram of the index fingerprint of fingerprint table provided in an embodiment of the present invention;
Fig. 7 is the structural schematic block diagram of another storage system provided in an embodiment of the present invention;
Fig. 8 is a kind of schematic diagram of the implementation of statistical table provided in an embodiment of the present invention;
Fig. 9 is the schematic diagram of the implementation of another statistical table provided in an embodiment of the present invention;
Figure 10 is a kind of structural schematic block diagram of backup server provided in an embodiment of the present invention.
Specific embodiment
Consume a large amount of I/O resources when carrying out fingerprint comparison for storage system, lead to the efficiency of data storage Relatively low problem, the embodiment of the present invention provides a kind of data processing method, backup server and storage system. Below by accompanying drawing and technical solution of the present invention is described in detail specific embodiment it should be understood that this Specific features in bright embodiment and embodiment are the detailed description to technical solution of the present invention, rather than Restriction to technical solution of the present invention, in the case of not conflicting, in the embodiment of the present invention and embodiment Technical characteristic can be mutually combined.
For the ease of understanding technical scheme provided in an embodiment of the present invention, introduce the embodiment of the present invention first below A kind of application scenarios.As shown in figure 1, Fig. 1 is a kind of knot of storage system provided in an embodiment of the present invention Structure schematic diagram.Storage system 10 includes backup server 11 and multiple memorizer 12, wherein, memorizer 12 are used for data storage, and backup server 11 is used for determining whether data block to be stored is repeated data, with And schedule memory 12 treats data storage block and stored.Be stored with storage system 10 fingerprint (English: Fingerprint;Referred to as:FP) table, Fig. 2 is the schematic diagram of fingerprint table, preserves and be stored in fingerprint table The fingerprint of the data of memorizer 12 and this data storage location.
, after the request receiving data storage, backup server 11 is by data block to be stored for storage system 10 Fingerprint compare with the fingerprint in fingerprint table, if retrieved in fingerprint table and data block to be stored Fingerprint identical fingerprint, then show that data block to be stored is repeated data, storage system is without storing this number again According to block, only its adduction relationship need to be updated;Whereas if not retrieving in fingerprint table and data to be stored The fingerprint identical fingerprint of block, then show that data block to be stored is new data, storage system is empty by distributing storage Between store this data block.
Introduce the technical scheme for fingerprint comparison provided in an embodiment of the present invention below in conjunction with the accompanying drawings.
Fig. 3 is the schematic flow sheet of data processing method provided in an embodiment of the present invention, and the method can include:
Step 101:Determine the according to the fingerprint of the index fingerprint in fingerprint index table and data block to be stored One fingerprint set, wherein, includes the first index fingerprint, the second index fingerprint in the first fingerprint set, the One index fingerprint is used for representing the multiple fingerprints in the first fingerprint table, and the second index fingerprint is used for representing the second finger Multiple fingerprints in stricture of vagina table, the fingerprint of data block to be stored belongs to the multiple fingerprints representated by the first index fingerprint And second index fingerprint representated by multiple fingerprints fingerprint region;
Step 102:The finger including in first fingerprint table with data block to be stored is obtained according to the first index fingerprint First probability of stricture of vagina identical fingerprint, and obtained in the second fingerprint table according to the second index fingerprint and include and treat Second probability of the fingerprint identical fingerprint of data storage block, wherein, the first probability is to be referred to according to the first index Multiple fingerprints that stricture of vagina represents determine, the second probability is to be determined according to multiple fingerprints that the second index fingerprint represents 's;
Step 103:According to the first probability and second determine the probability the second fingerprint set, wherein, the second fingerprint Including at least in set has the first index fingerprint, is not less than pre- according to the first probability that the first index fingerprint determines If threshold value;
Step 104:Obtain the multiple fingerprints representated by the first index fingerprint and the fingerprint of data block to be stored Matching result.
In the embodiment of the present invention, the corresponding data processing method of step 101~step 104 can have at least two Embodiment, is introduced separately below.
Embodiment 1
Fig. 4 is the structural representation of the corresponding storage system 20 of embodiment 1, and storage system 20 includes: Backup server 21 and multiple memorizer 22.In embodiment 1, step 101~step 104 is by standby Part server 21 is executing.
Wherein, backup server 21 includes:Processor 211, internal memory 212 and additional storage 213, Additional storage 213 includes processing unit 214, and therefore, additional storage 213 has operational capability.Deposit Fingerprint table in storage system is stored in additional storage 213, in practical situation, additional storage 213 Quantity can be 1 or 2 or more.Be stored with backup server 21 fingerprint index table, Fingerprint index table specifically can leave among internal memory 212 it is also possible to leave among additional storage 213, Or among other memory element of backup server 21.
Fig. 5 is the schematic diagram of fingerprint index table, preserves at least two of each fingerprint table in fingerprint index table Index fingerprint, each index fingerprint represents the multiple fingerprints in a fingerprint table, all ropes of each fingerprint table Draw the fingerprint sum representated by fingerprint and be all fingerprints that this fingerprint table is comprised.Ti in Fig. 5 represents I-th fingerprint table, Li represents the i-th index fingerprint of fingerprint table, for example, FP14For the 1st fingerprint table the 2nd Index fingerprint, its FP in representing in the 1st fingerprint table14To FP17Between (do not include FP17) fingerprint, It is specially FP14、FP15、FP16.
In practical situation, in each fingerprint table, fingerprint can be arranged according to fingerprint size, and Fig. 6 is fingerprint The schematic diagram of table 1 corresponding index fingerprint, the fingerprint FP in Fig. 61To FP9Increase successively, can will refer to The fingerprint of stricture of vagina table 1 is divided into FP1~FP3、FP4~FP6、FP7~FP9This 3 parts, with FP1、FP4、FP7 As 3 index fingerprints of this fingerprint table, the FP in index fingerprint table1Represent the FP in fingerprint table1~FP3, FP in index fingerprint table4Represent the FP in fingerprint table4~FP6, index the FP in fingerprint table7Represent fingerprint FP in table7~FP9.Due to FP1To FP9Fingerprint size increase successively, so index fingerprint table in FP1、FP4、FP7The fingerprint region of the multiple fingerprints each representing is not overlapping.
In the embodiment of the present invention, can determine that multiple representated by each index fingerprint from index fingerprint table The fingerprint region of fingerprint, its implementation can be:First, with fingerprint table continuously in multiple fingerprints First fingerprint, can be according to phase in fingerprint table corresponding index fingerprint used as the index fingerprint of this partial fingerprints Two adjacent index fingerprints determine the fingerprint region of the multiple fingerprints representing in front index fingerprint, continue to use Fig. 6 Fingerprint table 1, according to the FP of connected index fingerprint1、FP4Can determine FP1The multiple fingerprints representing Fingerprint region is [FP1, FP4).Second, comprising fingerprint region attribute in index fingerprint table, for each rope Draw the fingerprint region that fingerprint preserves its representative multiple fingerprint.
In step 101, the processor 211 of backup server 21 is big according to the fingerprint of data block to be stored Little, the fingerprint region determining representative multiple fingerprints from fingerprint index table includes the finger of data block to be stored The index fingerprint of stricture of vagina, the collection of the index fingerprint determined is combined into the first fingerprint set.
For the ease of description, illustrate in the embodiment of the present invention taking the first fingerprint table, the second fingerprint table as a example, But can not be limited with this and in storage system in embodiment of the present invention, only include the first fingerprint table and the second fingerprint Table, nor limit in the embodiment of the present invention only to the first fingerprint table in fingerprint index table and the second fingerprint table pair The index fingerprint answered carries out the operation of step 101.In practical situation, for the corresponding index of each fingerprint table Fingerprint all carries out step 101 and operates, due to each self-corresponding fingerprint model of multiple index fingerprints of each fingerprint table Enclose not overlapping, so, for each fingerprint table, at most can determine from its corresponding multiple index fingerprint Go out an index fingerprint.
Then, execution step 102, backup server 21 is determined and is comprised in each fingerprint table and number to be stored According to the probability of block identical fingerprints, due to having determined corresponding fingerprint in step 101 from each fingerprint table Scope includes the index fingerprint of data block to be stored, then the fingerprint representated by other index fingerprints of this fingerprint table In necessarily do not comprise the fingerprint of data block to be stored, therefore, comprise in the first fingerprint table and data block to be stored In multiple fingerprints that the first index fingerprint determined in the probability of identical fingerprints, substantially step 101 represents Comprise the probability with data block identical fingerprints to be stored, in the same manner, comprise in the second fingerprint table and data to be stored Multiple fingerprints that the second index fingerprint determined in the probability of block identical fingerprints, substantially step 101 represents In comprise with data block identical fingerprints to be stored probability.
When being embodied as, processor 211 is by the rope in the fingerprint of data block to be stored and the first fingerprint set Draw fingerprint to send to additional storage 213, by additional storage 213 by its processing unit 214 determine from Data block to be stored is retrieved in each multiple fingerprint representated by index fingerprint in first fingerprint set The probability of fingerprint, then, the probit determined is returned to processor 211 by additional storage 213.Its In, additional storage 213 specifically can be according to the statistics letter of the multiple fingerprints representated by the index fingerprint receiving Breath (as the distributed intelligence near the numerical value of the fingerprint in data block to be stored for the fingerprint, fingerprint value in each fingerprint Frequency statistics information under dimension, etc.) come to determine index fingerprint representated by multiple fingerprints in comprise with The probability of the fingerprint identical fingerprint of data block to be stored.
When backup server 21 includes two or more additional storages 213, processor can only to Additional storage 213 sends related to the fingerprint table that this additional storage preserves index in the first fingerprint set Fingerprint.For example, backup server 21 includes the first additional storage and the second additional storage, wherein, First additional storage preserves the first fingerprint table, and the second additional storage preserves the second fingerprint table, processes Device 211 sends the fingerprint of data block to be stored and the first index fingerprint to the first additional storage, will treat The fingerprint of data storage block and the second index fingerprint send to the second additional storage.First additional storage First probability is determined by its processing unit, and returns to processor 211;Second additional storage passes through Its processing unit determines the second probability, and returns to processor 211.
Then, execution step 103, backup server 21 retrieves according to receiving from each fingerprint table The probability of the fingerprint of data block to be stored, determines the second fingerprint set, comprises first in the second fingerprint set Pre-conditioned index fingerprint is met, this is pre-conditioned to be in fingerprint set:Multiple in index fingerprint representative The probability of fingerprint of data block to be stored is retrieved (i.e. in fingerprint:Determine in step 102, storage should The fingerprint table of multiple fingerprints that index fingerprint represents includes identical with the fingerprint of the fingerprint of data block to be stored Fingerprint probability) be more than predetermined threshold value.Wherein, the value of predetermined threshold value can be 0, that is, from the first fingerprint Reject part index fingerprint in set and form the second fingerprint set, disallowable index fingerprint is in its representative Multiple fingerprints in retrieve data block to be stored probability be 0 this partial index fingerprint.
For example, might as well set additional storage 213 return to processor 211 second index fingerprint represent The second probability retrieving the fingerprint of data block to be stored in multiple fingerprints is less than predetermined threshold value, and in the first rope The first probability retrieving the fingerprint of data block to be stored in the multiple fingerprints drawing fingerprint representative is more than default threshold Value, then the first index fingerprint be included among the second fingerprint set, and the second index fingerprint is not comprised in the Among two fingerprint set.
Then, execution step 104, index fingerprint in the second fingerprint set for the backup server 21 represents Carry out fingerprint comparison in multiple fingerprints, obtain fingerprint comparison result, determine that whether data block to be stored is with this Repeated data.When being embodied as, including two kinds of implementations:First, processor 211 is by additional storage The multiple fingerprints representated by index fingerprint in the second fingerprint set in the fingerprint table preserving in 213 read In internal memory 212, in internal memory 212, then carry out the comparison of fingerprint.Second, being led to by additional storage 213 Cross the comparison that self-contained processing unit 214 completes fingerprint, i.e. the place by itself for the additional storage 213 Many representated by the fingerprint table that itself is stored by reason unit 214, in the second fingerprint set index fingerprint Individual fingerprint reads in the buffer of additional storage 213, carries out fingerprint comparison.Wherein, additional storage 213 buffer can be random access memory (English:Random Access Memory;Referred to as:RAM), It can also be cache (Cache).During due to fingerprint comparison being carried out by additional storage 213, no The outside carrying of data need to be carried out, that can reduce factor data carrying and produce is time-consuming.
In practical situation, the predetermined threshold value in step 103 can also be the probit more than 0, backup services Index fingerprint institute's generation first in the second fingerprint set determined according to the predetermined threshold value more than 0 for the device 21 Carry out fingerprint comparison, if comparison result cannot confirm whether data block to be stored attaches most importance in multiple fingerprints of table Complex data, then backup server 21 in corresponding probit the index fingerprint between (0, predetermined threshold value) Carry out fingerprint comparison in representative multiple fingerprints.That is, backup server is first in the larger index fingerprint of probability Carry out fingerprint comparison in the multiple fingerprints representing, the repeatability of data block to be stored cannot be confirmed in comparison result When, then carry out fingerprint comparison in multiple fingerprints that probability less index fingerprint represents, can effectively reduce Fingerprint comparison time-consuming.
Another kind of implementation of fingerprint comparison is:In step 103, the value of predetermined threshold value is 0, determines and corresponds to Probit be more than 0 index fingerprint be the second fingerprint set element, then, backup server 21 can To be ranked up by its corresponding probit to the index fingerprint in the second fingerprint set, processor 211 is being incited somebody to action In second fingerprint set index fingerprint represent multiple fingerprints read internal memory carry out fingerprint retrieval when, according to The probit sequence of index fingerprint determines the order that fingerprint reads.That is, read corresponding probit first maximum The multiple fingerprints representated by index fingerprint, cannot determine that whether fingerprint to be stored is according to this partial fingerprints After repeated data, then by probit come deputy index fingerprint represent multiple fingerprints read in deposit into Row fingerprint comparison.By that analogy, until retrieving the fingerprint identical fingerprint with data block to be stored;Or, Determine to be more than in the multiple fingerprints representated by 0 index fingerprint in all of probit and all do not comprise and wait to deposit The fingerprint identical fingerprint of storage data block.The previous case shows that data block to be stored is repeated data, afterwards one The situation of kind shows that data block to be stored is new data.
In technique scheme, backup server 21 is determined in each fingerprint table by fingerprint index table Multiple fingerprints that one index fingerprint represents, the fingerprint region of multiple fingerprints that this index fingerprint represents includes treating Multiple fingerprints of the fingerprint of data storage block, the multiple fingerprints being then based on determining from each fingerprint table enter The fingerprint comparison operation of row next step, decreases the workload of fingerprint carrying.Moreover, refer to for each Stricture of vagina meter calculates the probability of the fingerprint retrieving data block to be stored in its (the multiple fingerprints determined), so It is not less than in probability in the multiple fingerprints determined of fingerprint table of predetermined threshold value afterwards and carries out fingerprint comparison, reduce The carrying amount of fingerprint during fingerprint comparison, reduces the time-consuming of fingerprint comparison, improves the efficiency of data storage.
Embodiment 2
Fig. 7 is the structural representation of the corresponding storage system 30 of embodiment 2, and storage system 30 includes: Backup server 31 and multiple memorizer 32.In embodiment 2, step 101~step 104 is by standby Part server 31 is executing.
Wherein, memorizer 32 is used for data storage block and fingerprint table, and the form of fingerprint table is referred to Fig. 5, The fingerprint table of memorizer 32 storage can be by the data block being stored on this memorizer corresponding finger print information institute Formed, the fingerprint table of memorizer 32 storage can also be with unrelated its of data block of memorizer 32 itself storage The finger print information of his data block.Backup server 31 includes processor 311 and internal memory 312.Backup services Be stored with device 31 fingerprint index table, and fingerprint index table specifically can leave among internal memory 312, also may be used To leave among other memory element of backup server 31.
In step 101, processor 311 determines that the mode of the first fingerprint set is determined with aforementioned processor 211 The mode of the first fingerprint set is identical, and the embodiment of the present invention is refused to repeat.
Then, backup server 31 execution step 102, obtains and (corresponding belongs to first in each fingerprint table Fingerprint set index fingerprint represent multiple fingerprints) in retrieve data block to be stored fingerprint probability. Specific implementation includes:First, the memorizer 32 of the fingerprint table that is stored with comprises processing unit 321, and standby Part server 21 obtains probit and is similar to, backup server 31 can by the fingerprint of data block to be stored and Index fingerprint in first fingerprint set sends to memorizer 32, the processing unit by itself for the memorizer 32 321 and itself preserve fingerprint statistic information determine above-mentioned probit, then by this probit send to Backup server 31.Second, the statistical information that processor 311 will be stored in the fingerprint on memorizer 32 is read Get in internal memory 312, processor 311 oneself determines above-mentioned probability according to the statistical information being stored in internal memory 312 Value.
Then, backup server 31 execution step 103, its implementation is held with aforementioned backup server 21 Row step 103 is consistent.
Then, backup server 31 execution step 104, obtains the index fingerprint in the second fingerprint set and represents Multiple fingerprints in treat the result that the fingerprint of data storage block is compared.With aforementioned backup server 21 Execution step 104 is similar to, and backup server 31 can be will be stored on memorizer 32 by processor 311 The second fingerprint set in multiple fingerprints of representing of index fingerprint read and carry out fingerprint comparison in internal memory, that is, The work of fingerprint comparison oneself is completed by backup server 31.Additionally, backup server 31 obtains fingerprint ratio To the another way of result it is:Backup server 31 by the index fingerprint in the second fingerprint set and is treated The fingerprint of data storage block is sent to memorizer 32, is existed by the processing unit 321 of itself by memorizer 32 Memorizer 32 locally carries out fingerprint comparison, and this mode can reduce the carrying of data, and multiple storage Fingerprint comparison can be carried out in a parallel fashion, it is possible to increase the efficiency of fingerprint comparison between device 32.
According to a kind of implementation that step 103~step 104 carries out fingerprint comparison it is:Default in step 103 The value of threshold value is 0, determines the element that the index fingerprint that corresponding probit is more than 0 is the second fingerprint set, Then, backup server 31 can be entered by its corresponding probit to the index fingerprint in the second fingerprint set Row sequence, processor 311 is in the multiple fingerprints representing the index fingerprint in the second fingerprint set read When depositing into the retrieval of row fingerprint, the order that fingerprint reads is determined according to the probit sequence of index fingerprint.That is, first First read the maximum multiple fingerprints representated by index fingerprint of corresponding probit, cannot referred to according to this part After stricture of vagina determines whether fingerprint to be stored is repeated data, then probit is come deputy index fingerprint representative Multiple fingerprints read internal memory and carry out fingerprint comparison.By that analogy, until retrieving and data block to be stored Fingerprint identical fingerprint;Or, determination is more than many representated by 0 index fingerprint in all of probit The fingerprint identical fingerprint with data block to be stored is not all comprised in individual fingerprint.The previous case shows to wait to deposit Storage data block is repeated data, and latter event shows that data block to be stored is new data.
In two kinds of implementations of above-mentioned steps 101~step 104, during fingerprint matching, can only will The multiple fingerprints representated by index fingerprint in second fingerprint set of the fingerprint of data block to be stored and acquisition Mated, and the fingerprint of all data in the fingerprint of data block to be stored and fingerprint base need not be carried out Join, decrease the carrying amount of data during fingerprint comparison, improve the efficiency of data processing.
Optionally, the backup server 21 in storage system 20 comprises the additional storage of 2 or more The positional information of the first fingerprint table belonging to the first index fingerprint when 213, can also be comprised in fingerprint index table, I.e. the first fingerprint table is saved in the information in which additional storage, backup server 21 execution step 102 When, processor 211 only need to orient storage the first fingerprint table according to the first index corresponding positional information of fingerprint Additional storage, the corresponding fingerprint by the first index fingerprint and data block to be stored sends to this auxiliary storage Device, so that this additional storage determines that the fingerprint identical including data block to be stored in the first fingerprint table refers to The probability of stricture of vagina.
Optionally, in storage system 30, can also comprise in fingerprint index table belonging to the first index fingerprint The first fingerprint table positional information, that is, preserve the mark of the first memory of the first fingerprint table, backup clothes During business device 31 execution step 102, processor 311 only need to be according to the first index corresponding positional information of fingerprint Orient first memory, the corresponding fingerprint by the first index fingerprint and data block to be stored sends to be deposited to first Reservoir, so that first memory determines that the fingerprint identical including data block to be stored in the first fingerprint table refers to The probability of stricture of vagina.
In one case, the first fingerprint table is stored in the first memory in multiple memorizeies, the second finger Stricture of vagina table is stored in the second memory in multiple memorizeies.
In step 102, obtained in the first fingerprint table according to the first index fingerprint and include and data block to be stored Fingerprint identical fingerprint the first probability, comprise the steps during enforcement:
The fingerprint of data block to be stored and the first index fingerprint are sent to first memory;
Receive the first probability that first memory returns, the first probability was used for representing in first index fingerprint institute's generation The probability with the fingerprint identical fingerprint of data block to be stored is included in multiple fingerprints of table.
In a step 102, obtained in the second fingerprint table according to the second index fingerprint and include and data to be stored Second probability of the fingerprint identical fingerprint of block, comprises the steps during enforcement:Finger by data block to be stored Stricture of vagina and the second index fingerprint send to second memory;Receive the second probability that second memory returns, the Two probability are used for including and data block to be stored in multiple fingerprints representated by the second index fingerprint for the expression Fingerprint identical fingerprint probability.
Specifically, backup server 31 execution step 102 in the corresponding aforementioned embodiments 2 of aforesaid way Situation, first memory and second memory are two memorizeies 32, can be according to self-contained process Unit 321 carries out fingerprint comparison.Its specific embodiment has had a detailed description in embodiment 2, This is not repeated.
In another case, backup server includes additional storage, the first fingerprint table and the second fingerprint Table is stored in additional storage.
In a step 102:Obtained in the first fingerprint table according to the first index fingerprint and include and data to be stored First probability of the fingerprint identical fingerprint of block, and comprise according in second index fingerprint acquisition the second fingerprint table There is the second probability with the fingerprint identical fingerprint of data block to be stored, comprise the steps during enforcement:
The fingerprint of data block to be stored and the first index fingerprint, the second index fingerprint are sent to auxiliary storage Device;Receive and include in the multiple fingerprints representated by the first index fingerprint that additional storage returns and wait to deposit First probability of the fingerprint identical fingerprint of storage data block, and the multiple fingers representated by the second index fingerprint The second probability with the fingerprint identical fingerprint of data block to be stored is included in stricture of vagina.
Specifically, backup server 21 execution step 102 in the corresponding aforementioned embodiments 1 of aforesaid way Situation, the additional storage in the present embodiment is the additional storage 213, Neng Gougen in embodiment 1 Carry out fingerprint comparison according to self-contained processing unit 214.Its specific embodiment in embodiment 1 Through having a detailed description, here is not repeated.
Optionally, in the embodiment of the present invention, each fingerprint in the first fingerprint table comprises M position, each M Position fingerprint comprises N number of interval, and each interval in N number of interval includes continuous S position in M position, N number of In interval, any two interval is not overlapping, and N number of interval digit sum is M, and N is more than or equal to 2 Natural number, S be natural number.By above-mentioned setting, each fingerprint in fingerprint table can be divided into N Individual interval, each interval is equivalent to a fingerprint dimension.For example, the fingerprint of 64 bits (bit) can divide For the numerical value composition of 4 dimension 16bit, that is, 1bit~16bit is the first dimension, and 17bit~32bit is the second dimension, 33bit~48bit is the third dimension, and 49bit~64bit is fourth dimension.In practical situation, do not limit every one dimensional numerical Shared bit number, does not limit the bit number all same shared by all dimensions yet.
Be stored with storage system the first statistical table, and the first statistical table includes representated by the first index fingerprint Multiple fingerprints are in the statistical information of N number of interval numerical value.Fig. 8 is a kind of schematic diagram of the first statistical table, no Harm set 3 fingerprints representated by the first index fingerprint be respectively 01020504H, 01030504H, 02030102H, each fingerprint is divided into 4 dimensions, taking 01025004H as a example, the value of its four fingerprint dimensions It is respectively 1,2,5,4.First statistical table have recorded possible numerical value in each dimension and indexes fingerprint first Frequency of occurrence in respective dimensions in representative multiple fingerprints, for example, numerical value " 1 " goes out in the first dimension The existing frequency 2, the frequency 1 that numerical value " 2 " occurs in the first dimension, numerical value " 3 ", " 4 ", " 5 " are first The frequency occurring in dimension is 0.
The determination mode of the first probability includes:A is determined according to the first statistical tableiRepresentated by the first index fingerprint The numerical value in the i-th interval of multiple fingerprints in frequency of occurrence ti, wherein, aiFingerprint for data block to be stored I-th interval numerical value, the span of i is 1 to N;According to the t obtaining1To tNIn minima Determine the first probability.
Specifically, in aforementioned storage system 20, the first statistical table can be stored in preservation the first fingerprint table Additional storage 213 on, the first probability by this additional storage 213 pass through the processing unit 214 of itself Determined.For example, the fingerprint of data block to be stored might as well be set as 01020404H, this fingerprint is in four fingerprints Numerical value in dimension is respectively 1,2,4,4.Calculate in the table block shown in Fig. 5 and retrieve this finger to be retrieved The mode of the probability level of stricture of vagina is:The frequency searching numerical value 1 in the first dimension in table is 2, in the second dimension The frequency that degree searches numerical value 2 is 1, and the frequency searching numerical value 4 in third dimension is 0, in fourth dimension degree The frequency searching numerical value 4 is 2, then probability level is the minima 0 in the frequency.
In aforementioned storage system 30, the first statistical table can be stored in the memorizer preserving the first fingerprint table In 32, the first probability is determined by its processing unit 321 by this memorizer, concrete determine method with above-mentioned According to the first statistical table, processing unit 214 determines that the mode of the first probability is identical.
In practical situation, the second probability can also adopt aforesaid way, and the statistical table using fingerprint is carried out really Fixed, embodiment of the present invention here is not repeated.
In technique scheme, determine the multiple fingerprints representing in the first index fingerprint by the first statistical table In retrieve data block to be stored fingerprint the first probability, its implementation is simple, and operand is little, takes Less, and result is accurate.
Optionally, as another embodiment, be stored with storage system the first statistical table, and Fig. 9 is another kind The schematic diagram of the first statistical table, the first statistical table comprises first of the multiple fingerprints representated by the first index fingerprint The statistical information of interval numerical value, and the number of the second interval of multiple fingerprints representated by the first index fingerprint The statistical information of value, first interval is the interval of the h position of each fingerprint to i-th bit, and second interval is each finger The interval of the jth position of stricture of vagina to kth position, wherein, h, i, j, k are natural number, and the value of h is not more than i Value, the value of j is not more than the value of k, and first interval and second interval be not overlapping.
The determination mode of the first probability includes:According to representated by the first statistical table determines a in the first index fingerprint The numerical value of the first interval of multiple fingerprints in frequency of occurrence t1And b is representated by the first index fingerprint Frequency t occurring in the numerical value of the second interval of multiple fingerprints2, wherein, a is the fingerprint of data block to be stored H position to i-th bit numerical value, b is the numerical value of the jth position of the fingerprint of data block to be stored to kth position; According to t1And t2In minima determine the first probability.
In practical situation, determine that the effect of the first probability predominantly excludes a part of rope from the first fingerprint set Draw fingerprint, in multiple fingerprints that the index fingerprint that these are excluded represents, comprise the fingerprint with data block to be stored The probability of identical fingerprint is 0.In the embodiment of the present invention, with reference to Fig. 9, only comprise in the first statistical table to refer to The statistical information of the partial dimensional of stricture of vagina, continue to use the first index fingerprint represent 01020504H, 01030504H, The example of this 3 fingerprints of 02030102H, can only preserve the first dimension and the third dimension in the first statistical table Statistical information, might as well set the fingerprint of data block to be stored as 01020404H, and its third dimension is " 4 ", root Can determine that the frequency in third dimension appearance " 4 " is 0 according to first statistical table of Fig. 9, show in the first index The fingerprint of data block to be stored is not comprised in multiple fingerprints that fingerprint represents.In the same manner, first in the present embodiment Statistical table can be saved among the additional storage 213 of storage system 20, then the first probability is deposited by auxiliary Reservoir 213 is determined according to the processing unit 214 of its own.First statistical table can also be saved in storage system Among memorizer 32 in system 30, then the first probability is true by processing unit 321 institute of memorizer 32 itself Fixed.
In technique scheme, determined by the statistical information of the partial dimensional of fingerprint and index fingerprint first The first probability of the fingerprint of data block to be stored is retrieved, with from the first fingerprint set in the multiple fingerprints representing Middle reject partly corresponding probit be 0 index fingerprint, reduce fingerprint comparison when data carrying amount, and And its implementation is simple, operand is little, takes less.
Optionally, as another embodiment, when the first fingerprint collection is combined into null set, backup server 21 Or backup server 31 determines that data block to be stored is new data.
Optionally, as another embodiment, when predetermined threshold value is 0, it is combined into null set in the second fingerprint collection When, backup server 21 or backup server 31 determine that data block to be stored is new data.
Optionally, as another embodiment, backup server, before execution step 101, also includes as follows Step:The fingerprint treating data storage block carries out fingerprint filtration, and determine by fingerprint filter cannot judge to treat Whether the fingerprint of data storage block is to repeat fingerprint.
Specifically, backup server is retrieving the fingerprint of data block to be stored before from fingerprint table, Ke Yigen According to fingerprint filtering technique, fingerprint to be retrieved being carried out with anticipation, the result of anticipation includes three kinds, referring to first, determining There is the fingerprint of data block to be stored, data block to be stored is repeated data in stricture of vagina table;Second, determination fingerprint The fingerprint of data block to be stored is not comprised, data block to be stored is new data in table;Third, finger cannot be asserted The fingerprint of data block to be stored whether is comprised, only in this case, backup server just executes in stricture of vagina table Step 101~step 104.
When being embodied as, Bloom filter (Bloom Filter) can be adopted, or locality keeps (English Literary composition:Locality Preserved Caching;Referred to as:LPC) the fingerprint filtering technique such as technology is it is also possible to adopt With the combination of two kinds or more of fingerprint filtering technique, for example, first using Bloom filter, fingerprint was carried out Filter, if Bloom filter cannot judge whether the fingerprint of data block to be stored is to repeat fingerprint, further Filtered using LPC technique.The specific implementation of fingerprint filtering technique refer to prior art, this Inventive embodiments are refused to describe in detail.
In technique scheme, backup server first passes through the fingerprint that fingerprint filtering technique treats data storage block Carry out anticipation, only in the case of being unable to anticipation fingerprint to be retrieved, ability execution step 101~step 104. Taken by the comparison that fingerprint filtering technique can significantly shorten partial fingerprints, improve the property of backup server Energy.
Optionally, in storage system 20, the maintenance mode of fingerprint table can be:Real in internal memory 212 When create new fingerprint table, when determining the fingerprint not comprising data block to be stored in currently stored fingerprint table When, determine that data block to be stored is new data, and its fingerprint be added to internal memory and implement in the fingerprint table creating, After the fingerprint number of the fingerprint table in internal memory reaches setting value, this fingerprint table is stored on additional storage 213. In addition, each index fingerprint corresponding for this fingerprint table is added among index fingerprint table.Furthermore, this is referred to The statistical table of the dimension values of stricture of vagina table each index corresponding multiple fingerprint of fingerprint corresponding is stored to additional storage On 213.
When fingerprint table is safeguarded using aforesaid way, backup server when carrying out fingerprint comparison, first including Deposit and compare in the fingerprint table creating, comparison result not can determine that whether data block to be stored is repetition During fingerprint, ability execution step 101~step 104, carry out fingerprint ratio in the fingerprint table outside being stored in internal memory Right.
Optionally, in storage system 30, the maintenance mode of fingerprint table can be:Determining number to be stored In preserving the fingerprint table of memorizer 32 of this data block, during according to block for new data, add the finger of this data block Stricture of vagina, then, updates the statistical table of this fingerprint table.
It should be noted that above processor 211, processor 311, processing unit 214 and process are single Unit 321, can be the general designation of an independent processor or multiple treatment element.For example, locate Reason device 211, processor 311, processing unit 214 and processing unit 321 can be central processing unit (English Literary composition:Central Processing Unit;Referred to as:CPU) or specific integrated circuit (English: Application Specific Intergrated Circuit;Referred to as:ASIC), or be arranged to implement this One or more integrated circuits of inventive embodiments, for example:One or more microprocessors (English:digital singnal processor;Referred to as:DSP), or, one or more field programmable gate array is (English: Field Programmable Gate Array;Referred to as:FPGA).
Based on identical inventive concept, the embodiment of the present invention provides a kind of backup server 40, is applied to store In system, this storage system includes backup server and multiple memorizer, is stored with multiple in storage system Fingerprint table, in multiple fingerprint tables, record has the fingerprint of the data block being stored in multiple memorizeies.Figure 10 For the structural schematic block diagram of backup server 40, backup server 40 includes:
Determining module 41, for the fingerprint according to the index fingerprint in fingerprint index table and data block to be stored Determine the first fingerprint set, wherein, in the first fingerprint set, include the first index fingerprint, the second index refers to Stricture of vagina, the first index fingerprint is used for representing the multiple fingerprints in the first fingerprint table, and the second index fingerprint is used for representing Multiple fingerprints in second fingerprint table, the fingerprint of data block to be stored belongs to many representated by the first index fingerprint The fingerprint region of the multiple fingerprints representated by individual fingerprint and the second index fingerprint;
Obtain module 42, include and number to be stored for being obtained in the first fingerprint table according to the first index fingerprint According to the first probability of the fingerprint identical fingerprint of block, and bag in the second fingerprint table is obtained according to the second index fingerprint Containing the second probability with the fingerprint identical fingerprint of data block to be stored, wherein, the first probability is according to the Multiple fingerprints that one index fingerprint represents determine, the second probability is multiple according to the second index fingerprint representative Fingerprint determines;
Determining module 41, is additionally operable to according to the first probability and second determine the probability the second fingerprint set, wherein, Including at least in second fingerprint set has the first index fingerprint, the first probability being determined according to the first index fingerprint Not less than predetermined threshold value;
Processing module 43, for obtaining multiple fingerprints and the data block to be stored representated by the first index fingerprint The matching result of fingerprint.
Optionally, in the embodiment of the present invention, the first fingerprint table is stored in the first memory in multiple memorizeies In, the second fingerprint table is stored in the second memory in multiple memorizeies;
Obtain module 42 specifically for:By the fingerprint of data block to be stored and first index fingerprint send to First memory;And receiving the first probability that first memory returns, the first probability is used for representing in the first rope Draw the probability including in the multiple fingerprints representated by fingerprint with the fingerprint identical fingerprint of data block to be stored; And
The fingerprint of data block to be stored and the second index fingerprint are sent to second memory;And receive second The second probability that memorizer returns, the second probability is used for representing the multiple fingerprints representated by the second index fingerprint In include probability with the fingerprint identical fingerprint of data block to be stored.
Optionally, in the embodiment of the present invention, backup server 40 also includes:
Additional storage, for storing the first fingerprint table and the second fingerprint table;
Obtain module 42 specifically for:By the fingerprint of data block to be stored and first index fingerprint, second Index fingerprint sends to additional storage;Receive indexing representated by fingerprint first of additional storage return The first probability with the fingerprint identical fingerprint of data block to be stored is included in multiple fingerprints, and second The with the fingerprint identical fingerprint of data block to be stored is included in multiple fingerprints representated by index fingerprint Two probability.
Optionally, in the embodiment of the present invention, additional storage is additionally operable to store the first statistical table, the first statistics Table comprises the statistical information of the numerical value of the first interval of multiple fingerprints representated by the first index fingerprint, Yi Ji The statistical information of the numerical value of the second interval of multiple fingerprints representated by one index fingerprint, first interval is each finger The h position of stricture of vagina to i-th bit interval, second interval is the interval of the jth position of each fingerprint to kth position, its In, h, i, j, k are natural number, and the value of h is not more than the value of i, and the value of j is not more than the value of k, and first Interval and second interval is not overlapping;
Additional storage is additionally operable to:It is multiple according to representated by the first statistical table determines a in the first index fingerprint Frequency of occurrence t in the numerical value of the first interval of fingerprint1And multiple fingers representated by the first index fingerprint for the b Frequency t occurring in the numerical value of the second interval of stricture of vagina2, wherein, a is the h of the fingerprint of data block to be stored Position to i-th bit numerical value, b is the numerical value of the jth position of the fingerprint of data block to be stored to kth position;And according to t1And t2In minima determine the first probability.
Optionally, in the embodiment of the present invention, each fingerprint in the first fingerprint table comprises M position, each M Position fingerprint comprises N number of interval, and each interval in N number of interval includes continuous S position in M position, N number of In interval, any two interval is not overlapping, and N number of interval digit sum is M, and N is more than or equal to 2 Natural number, S be natural number;It is additionally operable to store the first statistical table, the first statistical table bag in additional storage The statistical information of the N number of interval numerical value containing the multiple fingerprints representated by the first index fingerprint;
Additional storage is additionally operable to:A is determined according to the first statistical tableiMultiple representated by the first index fingerprint Frequency of occurrence t in the numerical value in the i-th interval of fingerprinti, wherein, aiFor the fingerprint of data block to be stored i-th Interval numerical value, the span of i is 1 to N, and according to institute t1To tNIn minima determine that first is general Rate.
Backup server 40 in the present embodiment data processing method corresponding with Fig. 3 is based on same invention Two aspects under design, being above described in detail to the implementation process of method, so ability Field technique personnel can according to 40 structures of the backup server being well understood in the present embodiment described above and Implementation process, succinct for description, here just repeats no more.
Based on identical inventive concept, provide a kind of storage system in the embodiment of the present invention, including backup services Device and multiple memorizer, be stored with storage system multiple fingerprint tables, and in multiple fingerprint tables, record has and deposits It is stored in the fingerprint of the data block in multiple memorizeies.
This backup server is used for:
First fingerprint collection is determined according to the fingerprint of the index fingerprint in fingerprint index table and data block to be stored Close, wherein, in the first fingerprint set, include the first index fingerprint, the second index fingerprint, the first index refers to Stricture of vagina is used for representing the multiple fingerprints in the first fingerprint table, and the second index fingerprint is used for representing in the second fingerprint table Multiple fingerprints, the fingerprint of data block to be stored belongs to the multiple fingerprints and second representated by the first index fingerprint The fingerprint region of the multiple fingerprints representated by index fingerprint;
The fingerprint identical including in first fingerprint table with data block to be stored is obtained according to the first index fingerprint First probability of fingerprint, and include and data to be stored according in second index fingerprint acquisition the second fingerprint table Second probability of the fingerprint identical fingerprint of block, wherein, the first probability is to be represented according to the first index fingerprint Multiple fingerprints determine, the second probability is to be determined according to multiple fingerprints that the second index fingerprint represents;
According to the first probability and second determine the probability the second fingerprint set, wherein, in the second fingerprint set at least Include the first index fingerprint, predetermined threshold value is not less than according to the first probability that the first index fingerprint determines;
Obtain the matching result of the multiple fingerprints representated by the first index fingerprint and the fingerprint of data block to be stored.
Optionally, in the embodiment of the present invention, the first fingerprint table is stored in the first memory in multiple memorizeies In, the second fingerprint table is stored in the second memory in multiple memorizeies;Backup server specifically for:
The fingerprint of data block to be stored and the first index fingerprint are sent to first memory;And receive the The first probability that one memorizer returns, the first probability is used for representing the multiple fingers representated by the first index fingerprint The probability with the fingerprint identical fingerprint of data block to be stored is included in stricture of vagina;
The fingerprint of data block to be stored and the second index fingerprint are sent to second memory;And receive the The first probability that two memorizeies return, the first probability is used for representing the multiple fingers representated by the second index fingerprint The probability with the fingerprint identical fingerprint of data block to be stored is included in stricture of vagina;
First memory specifically for:Receive the first index fingerprint and the number to be stored that backup server sends According to the fingerprint of block, and determine and comprise and data block to be stored in multiple fingerprints that the first index fingerprint represents First probability of fingerprint identical fingerprint, and the first probability is sent to backup server;
Second memory specifically for:Receive the second index fingerprint and the number to be stored that backup server sends According to the fingerprint of block, and determine and comprise and data block to be stored in multiple fingerprints that the second index fingerprint represents Second probability of fingerprint identical fingerprint, and the second probability is sent to backup server.
Optionally, in the embodiment of the present invention, first memory is stored with the first statistical table, the first statistical table Comprise the statistical information of the numerical value of the first interval of multiple fingerprints representated by the first index fingerprint, and first The statistical information of the numerical value of the second interval of multiple fingerprints representated by index fingerprint, first interval is each fingerprint H position to i-th bit interval, second interval is the interval of the jth position of each fingerprint to kth position, wherein, H, i, j, k are natural number, and the value of h is not more than the value of i, and the value of j is not more than the value of k, first interval Not overlapping with second interval;
First memory specifically for:
The number of the first interval of the multiple fingerprints according to representated by the first statistical table determines a in the first index fingerprint Frequency of occurrence t in value1And the numerical value of the second interval of multiple fingerprints representated by the first index fingerprint for the b Frequency t of middle appearance2, wherein, a is the numerical value of the h position of the fingerprint of data block to be stored to i-th bit, b Numerical value for jth position to the kth position of the fingerprint of data block to be stored;
According to t1And t2In minima determine the first probability.
Optionally, in the embodiment of the present invention, each fingerprint in the first fingerprint table comprises M position, each M Position fingerprint comprises N number of interval, and each interval in N number of interval includes continuous S position in M position, N number of In interval, any two interval is not overlapping, and N number of interval digit sum is M, and N is more than or equal to 2 Natural number, S be natural number;Be stored with first memory the first statistical table, and the first statistical table comprises The statistical information of N number of interval numerical value of the multiple fingerprints representated by one index fingerprint;
First memory specifically for:A is determined according to the first statistical tableiMany representated by the first index fingerprint Frequency of occurrence t in the numerical value in the i-th interval of individual fingerprinti, wherein, aiFor the fingerprint of data block to be stored The interval numerical value of i, the span of i is 1 to N;
According to institute t1To tNIn minima determine the first probability.
Storage system in the present embodiment data processing method corresponding with Fig. 3 is based under same inventive concept Two aspects, being above described in detail to the implementation process of method, so art technology Personnel according to the structure of the storage system being well understood in the present embodiment described above and implementation process can be Description succinct, here just repeats no more.
The one or more technical schemes providing in the embodiment of the present invention, at least have the following technical effect that or excellent Point:
In technique scheme, backup server determines one by fingerprint index table in each fingerprint table Index fingerprint represent multiple fingerprints, this index fingerprint represent multiple fingerprints fingerprint region include to be stored The fingerprint of data block, the multiple fingerprints being then based on determining from each fingerprint table carry out the fingerprint of next step Compare operation, decrease the workload of fingerprint carrying.Moreover, calculate at it (really for each fingerprint table The multiple fingerprints made) in retrieve data block to be stored fingerprint probability, be then not less than pre- in probability If carrying out fingerprint comparison in the multiple fingerprints determined of the fingerprint table of threshold value, fingerprint when reducing fingerprint comparison Carrying amount, reduces the time-consuming of fingerprint comparison, that is, reduces whether determination data block to be stored is repeated data block Time, improve data storage efficiency.
Those skilled in the art are it should be appreciated that in description and claims of this specification and above-mentioned accompanying drawing Term " first ", " second ", the (if present) such as " the 3rd " " the 4th " be for distinguishing similar object, Without for describing specific order or precedence.It should be appreciated that such data using is in suitable situation Under can exchange, so that embodiments of the invention described herein for example can be with except illustrating or retouching here Order beyond those stated is implemented.Additionally, term " comprising " and " having " and their any change Shape, it is intended that covering non-exclusive comprising, for example, contains process, the side of series of steps or unit Method, system, product or equipment are not necessarily limited to those steps clearly listed or unit, but may include not Have clearly listing or for these processes, method, product or the intrinsic other steps of equipment or unit.
The embodiment of the present invention also provides a kind of computer program of data processing, including storing program generation The computer-readable recording medium of code, the instruction that described program code includes is used for executing any one side aforementioned Method flow described in method embodiment.It will appreciated by the skilled person that aforesaid storage medium bag Include:USB flash disk, portable hard drive, magnetic disc, CD, random access memory (Random-Access Memory, RAM), Solid state hard disc (Solid State Disk, SSD) or nonvolatile memory (non-volatile memory) Etc. various can be with non-transitory (non-transitory) machine readable media of store program codes.
It should be noted that embodiment provided herein is only schematically.The technology of art Personnel can be understood that, for convenience of description and succinctly, in the above-described embodiments, real to each The description applying example all emphasizes particularly on different fields, and does not have the part describing in detail, may refer to other embodiment in certain embodiment Associated description.The feature disclosing in the embodiment of the present invention, claim and accompanying drawing can be individually present Presence can also be combined.The feature describing in the form of hardware in embodiments of the present invention can be held by software OK, vice versa.Here does not limit.

Claims (14)

1. a kind of data processing method is it is characterised in that methods described is by the backup services in storage system Device executes, and described storage system includes described backup server and multiple memorizer, described storage system In be stored with multiple fingerprint tables, in the plurality of fingerprint table, record has and is stored in the plurality of memorizer The fingerprint of data block, methods described includes:
First fingerprint collection is determined according to the fingerprint of the index fingerprint in fingerprint index table and data block to be stored Close, wherein, in described first fingerprint set, include the first index fingerprint, the second index fingerprint, described the One index fingerprint is used for representing multiple fingerprints in the first fingerprint table, and described second index fingerprint is used for representing the Multiple fingerprints in two fingerprint tables, the fingerprint of described data block to be stored belongs to described first index fingerprint institute's generation The fingerprint region of the multiple fingerprints representated by multiple fingerprints of table and described second index fingerprint;
The finger including in first fingerprint table with described data block to be stored is obtained according to the described first index fingerprint First probability of stricture of vagina identical fingerprint, and bag in described second fingerprint table is obtained according to the described second index fingerprint Containing the second probability with the fingerprint identical fingerprint of described data block to be stored, wherein, described first probability It is to be determined according to the multiple fingerprints that the described first index fingerprint represents, described second probability is according to described the Multiple fingerprints that two index fingerprints represent determine;
According to described first probability and second determine the probability the second fingerprint set, wherein, described second fingerprint collection Include at least in conjunction and have described first index fingerprint, the first probability according to the described first index fingerprint determination is not Less than predetermined threshold value;
Obtain the multiple fingerprints representated by described first index fingerprint and the fingerprint of described data block to be stored Matching result.
2. the method for claim 1 it is characterised in that described first fingerprint table be stored in described In first memory in multiple memorizeies, described second fingerprint table is stored in the plurality of memorizer In two memorizeies;Described acquisition in the first fingerprint table according to the described first index fingerprint includes and waits to deposit with described First probability of the fingerprint identical fingerprint of storage data block, including:
The fingerprint of described data block to be stored and described first index fingerprint are sent to the described first storage Device;
Receive described first probability that described first memory returns, described first probability is used for representing described The fingerprint identical with described data block to be stored is included in multiple fingerprints representated by first index fingerprint The probability of fingerprint;
Described to be stored with described according to including in the described second index fingerprint described second fingerprint table of acquisition Second probability of the fingerprint identical fingerprint of data block, including:
The fingerprint of described data block to be stored and described second index fingerprint are sent to the described second storage Device;
Receive described first probability that described second memory returns, described first probability is used for representing described The fingerprint identical with described data block to be stored is included in multiple fingerprints representated by second index fingerprint The probability of fingerprint.
3. the method for claim 1 is it is characterised in that described backup server includes auxiliary deposits Reservoir, described first fingerprint table and described second fingerprint table are stored in described additional storage;
Described acquisition in the first fingerprint table according to the described first index fingerprint includes and described data to be stored First probability of the fingerprint identical fingerprint of block, and described second fingerprint is obtained according to the described second index fingerprint The second probability with the fingerprint identical fingerprint of described data block to be stored is included in table, including:
The fingerprint of described data block to be stored and described first index fingerprint, described second index fingerprint are sent out Deliver to described additional storage;
Receive in the multiple fingerprints representated by the described first index fingerprint that described additional storage returns and wrap Containing described first probability with the fingerprint identical fingerprint of described data block to be stored, and described second The fingerprint identical fingerprint with described data block to be stored is included in multiple fingerprints representated by index fingerprint Described second probability.
4. the method as described in any one of claims 1 to 3 is it is characterised in that described first fingerprint table In each fingerprint comprise M position, each M position fingerprint comprises N number of interval, every in described N number of interval Continuous S position in individual interval inclusion M position, in described N number of interval, any two interval is not overlapping, described N number of interval digit sum is M, and N is the natural number more than or equal to 2, and S is natural number;
Be stored with described storage system the first statistical table, and described first statistical table includes described first index , in the statistical information of described N number of interval numerical value, described first probability is really for multiple fingerprints representated by fingerprint Determine mode to include:
A is determined according to described first statistical tableiThe institute of the multiple fingerprints representated by the described first index fingerprint State the frequency of occurrence t in the numerical value in the i-th intervali, wherein, aiI-th area for the fingerprint of data block to be stored Between numerical value, the span of i is 1 to N;
According to the t obtaining1To tNIn minima determine described first probability.
5. the method as described in any one of claims 1 to 3 is it is characterised in that in described storage system Be stored with the first statistical table, and described first statistical table comprises the multiple fingerprints representated by described first index fingerprint The numerical value of first interval statistical information, and the of the multiple fingerprints representated by described first index fingerprint The statistical information of the numerical value in two intervals, described first interval is the interval of the h position of each fingerprint to i-th bit, Described second interval is the interval of the jth position of each fingerprint to kth position, and wherein, h, i, j, k are nature Number, the value of h is not more than the value of i, and the value of j is not more than value, described first interval and the described second interval of k Not overlapping;The determination mode of described first probability includes:
Described in multiple fingerprints according to representated by described first statistical table determines a in the described first index fingerprint Frequency of occurrence t in the numerical value of first interval1And multiple fingerprints representated by the described first index fingerprint for the b The numerical value of described second interval in occur frequency t2, wherein, a is the h of the fingerprint of data block to be stored Position to i-th bit numerical value, b is the numerical value of the jth position of the fingerprint of data block to be stored to kth position;
According to described t1And t2In minima determine described first probability.
6. a kind of backup server is it is characterised in that described backup server is applied in storage system, Described storage system includes described backup server and multiple memorizer, is stored with many in described storage system Individual fingerprint table, in the plurality of fingerprint table, record has the finger of the data block being stored in the plurality of memorizer Stricture of vagina, described backup server includes:
Determining module, true for the fingerprint according to the index fingerprint in fingerprint index table and data block to be stored Fixed first fingerprint set, wherein, includes the first index fingerprint, the second index in described first fingerprint set Fingerprint, described first index fingerprint is used for representing the multiple fingerprints in the first fingerprint table, and described second index refers to Stricture of vagina is used for representing the multiple fingerprints in the second fingerprint table, and the fingerprint of described data block to be stored belongs to described first Multiple fingerprints representated by index fingerprint and the fingerprint of the multiple fingerprints representated by described second index fingerprint Scope;
Obtain module, treat with described for obtaining to include in the first fingerprint table according to the described first index fingerprint First probability of the fingerprint identical fingerprint of data storage block, and according to the described second index fingerprint obtains The second probability with the fingerprint identical fingerprint of described data block to be stored is included in second fingerprint table, its In, described first probability is to be determined according to multiple fingerprints that the described first index fingerprint represents, described second Probability is to be determined according to multiple fingerprints that the described second index fingerprint represents;
Described determining module, is additionally operable to according to described first probability and second determine the probability the second fingerprint set, Wherein, including at least in described second fingerprint set has described first index fingerprint, according to the described first index The first probability that fingerprint determines is not less than predetermined threshold value;
Processing module, for obtaining multiple fingerprints and the described number to be stored representated by described first index fingerprint Matching result according to the fingerprint of block.
7. backup server as claimed in claim 6 is it is characterised in that described first fingerprint table stores In the first memory in the plurality of memorizer, described second fingerprint table is stored in the plurality of memorizer In second memory in;
Described acquisition module specifically for:The fingerprint of described data block to be stored and described first index are referred to Stricture of vagina sends to described first memory;And receive described first probability that described first memory returns, described First probability is used for representing in the multiple fingerprints representated by the described first index fingerprint including to be treated with described The probability of the fingerprint identical fingerprint of data storage block;And
The fingerprint of described data block to be stored and described second index fingerprint are sent to the described second storage Device;And receiving described second probability that described second memory returns, described second probability is used for representing in institute State include in the multiple fingerprints representated by the second index fingerprint identical with the fingerprint of described data block to be stored Fingerprint probability.
8. backup server as claimed in claim 6 is it is characterised in that described backup server also wraps Include:
Additional storage, for storing the first fingerprint table and described second fingerprint table;
Described acquisition module specifically for:The fingerprint of described data block to be stored and described first index are referred to Stricture of vagina, described second index fingerprint send to described additional storage;Receive that described additional storage returns The fingerprint phase with described data block to be stored is included in multiple fingerprints representated by described first index fingerprint Comprise in described first probability of same fingerprint, and the multiple fingerprints representated by the described second index fingerprint There is described second probability with the fingerprint identical fingerprint of described data block to be stored.
9. backup server as claimed in claim 8 it is characterised in that:
Each fingerprint in described first fingerprint table comprises M position, and each M position fingerprint comprises N number of interval, Each interval in described N number of interval includes continuous S position in M position, any two in described N number of interval Individual interval is not overlapping, and described N number of interval digit sum is M, and N is the nature more than or equal to 2 Number, S is natural number;It is additionally operable to store the first statistical table, described first statistical table in described additional storage Comprise the statistical information of the described N number of interval numerical value of multiple fingerprints representated by described first index fingerprint;
Described additional storage is additionally operable to:A is determined according to described first statistical tableiIndex fingerprint described first Frequency of occurrence t in the numerical value in described i-th interval of representative multiple fingerprintsi, wherein, aiFor number to be stored According to the numerical value in the i-th interval of the fingerprint of block, the span of i is 1 to N, and according to institute t1To tNIn Minima determines described first probability.
10. backup server as claimed in claim 8 is it is characterised in that described additional storage is also used In storing the first statistical table, described first statistical table comprises the multiple fingerprints representated by described first index fingerprint The numerical value of first interval statistical information, and the of the multiple fingerprints representated by described first index fingerprint The statistical information of the numerical value in two intervals, described first interval is the interval of the h position of each fingerprint to i-th bit, Described second interval is the interval of the jth position of each fingerprint to kth position, and wherein, h, i, j, k are nature Number, the value of h is not more than the value of i, and the value of j is not more than value, described first interval and the described second interval of k Not overlapping;
Described additional storage is additionally operable to:Determine that a indexes fingerprint described first according to described first statistical table Frequency of occurrence t in the numerical value of described first interval of representative multiple fingerprints1And b is in described first rope Draw frequency t occurring in the numerical value of described second interval of the multiple fingerprints representated by fingerprint2, wherein, a is The h position of the fingerprint of data block to be stored to i-th bit numerical value, b is the jth of the fingerprint of data block to be stored The numerical value of position to kth position;And according to described t1And t2In minima determine described first probability.
A kind of 11. storage systems are it is characterised in that including backup server and multiple memorizer, described Be stored with storage system multiple fingerprint tables, and in the plurality of fingerprint table, record has and is stored in the plurality of depositing The fingerprint of the data block in reservoir;
Described backup server is used for:
First fingerprint collection is determined according to the fingerprint of the index fingerprint in fingerprint index table and data block to be stored Close, wherein, in described first fingerprint set, include the first index fingerprint, the second index fingerprint, described the One index fingerprint is used for representing multiple fingerprints in the first fingerprint table, and described second index fingerprint is used for representing the Multiple fingerprints in two fingerprint tables, the fingerprint of described data block to be stored belongs to described first index fingerprint institute's generation The fingerprint region of the multiple fingerprints representated by multiple fingerprints of table and described second index fingerprint;
The finger including in first fingerprint table with described data block to be stored is obtained according to the described first index fingerprint First probability of stricture of vagina identical fingerprint, and bag in described second fingerprint table is obtained according to the described second index fingerprint Containing the second probability with the fingerprint identical fingerprint of described data block to be stored, wherein, described first probability It is to be determined according to the multiple fingerprints that the described first index fingerprint represents, described second probability is according to described the Multiple fingerprints that two index fingerprints represent determine;
According to described first probability and second determine the probability the second fingerprint set, wherein, described second fingerprint collection Include at least in conjunction and have described first index fingerprint, the first probability according to the described first index fingerprint determination is not Less than predetermined threshold value;
Obtain the multiple fingerprints representated by described first index fingerprint and the fingerprint of described data block to be stored Matching result.
12. storage systems as claimed in claim 11 are it is characterised in that described first fingerprint table stores In the first memory in the plurality of memorizer, described second fingerprint table is stored in the plurality of memorizer In second memory in;Described backup server specifically for:
The fingerprint of described data block to be stored and described first index fingerprint are sent to the described first storage Device;And receiving described first probability that described first memory returns, described first probability is used for representing The fingerprint phase with described data block to be stored is included in multiple fingerprints representated by described first index fingerprint The probability of same fingerprint;
The fingerprint of described data block to be stored and described second index fingerprint are sent to the described second storage Device;And receiving described first probability that described second memory returns, described first probability is used for representing The fingerprint phase with described data block to be stored is included in multiple fingerprints representated by described second index fingerprint The probability of same fingerprint;
Described first memory specifically for:Receive the first index fingerprint that described backup server sends and The fingerprint of described data block to be stored, and determine described first index fingerprint represent multiple fingerprints in comprise With the first probability of the fingerprint identical fingerprint of described data block to be stored, and by described first probability send to Described backup server;
Described second memory specifically for:Receive the second index fingerprint that described backup server sends and The fingerprint of described data block to be stored, and determine described second index fingerprint represent multiple fingerprints in comprise With the second probability of the fingerprint identical fingerprint of described data block to be stored, and by described second probability send to Described backup server.
13. storage systems as described in claim 11 or 12 are it is characterised in that described first fingerprint table In each fingerprint comprise M position, each M position fingerprint comprises N number of interval, every in described N number of interval Continuous S position in individual interval inclusion M position, in described N number of interval, any two interval is not overlapping, described N number of interval digit sum is M, and N is the natural number more than or equal to 2, and S is natural number;Described Be stored with first memory the first statistical table, and described first statistical table comprises described first index fingerprint institute's generation The statistical information of described N number of interval numerical value of multiple fingerprints of table;
Described first memory specifically for:A is determined according to described first statistical tableiRefer in the described first index Frequency of occurrence t in the numerical value in described i-th interval of the multiple fingerprints representated by stricture of vaginai, wherein, aiFor to be stored The numerical value in the i-th interval of the fingerprint of data block, the span of i is 1 to N;
According to institute t1To tNIn minima determine described first probability.
14. storage systems as described in claim 11 or 12 it is characterised in that:Described first memory On be stored with the first statistical table, described first statistical table comprises the multiple fingers representated by described first index fingerprint The statistical information of the numerical value of the first interval of stricture of vagina, and the multiple fingerprints representated by described first index fingerprint The statistical information of the numerical value of second interval, described first interval is the interval of the h position of each fingerprint to i-th bit, Described second interval is the interval of the jth position of each fingerprint to kth position, and wherein, h, i, j, k are nature Number, the value of h is not more than the value of i, and the value of j is not more than value, described first interval and the described second interval of k Not overlapping;
Described first memory specifically for:
Described in multiple fingerprints according to representated by described first statistical table determines a in the described first index fingerprint Frequency of occurrence t in the numerical value of first interval1And multiple fingerprints representated by the described first index fingerprint for the b The numerical value of described second interval in occur frequency t2, wherein, a is the h of the fingerprint of data block to be stored Position to i-th bit numerical value, b is the numerical value of the jth position of the fingerprint of data block to be stored to kth position;
According to described t1And t2In minima determine described first probability.
CN201510468057.9A 2015-07-31 2015-07-31 A kind of data processing method, backup server and storage system Active CN106407226B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201510468057.9A CN106407226B (en) 2015-07-31 2015-07-31 A kind of data processing method, backup server and storage system
PCT/CN2016/091054 WO2017020735A1 (en) 2015-07-31 2016-07-22 Data processing method, backup server and storage system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510468057.9A CN106407226B (en) 2015-07-31 2015-07-31 A kind of data processing method, backup server and storage system

Publications (2)

Publication Number Publication Date
CN106407226A true CN106407226A (en) 2017-02-15
CN106407226B CN106407226B (en) 2019-09-13

Family

ID=57942441

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510468057.9A Active CN106407226B (en) 2015-07-31 2015-07-31 A kind of data processing method, backup server and storage system

Country Status (2)

Country Link
CN (1) CN106407226B (en)
WO (1) WO2017020735A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107317723A (en) * 2017-05-27 2017-11-03 北京金山安全软件有限公司 Data processing method and server
CN110582091A (en) * 2018-06-11 2019-12-17 中国移动通信集团浙江有限公司 method and apparatus for locating wireless quality problems
CN111427871A (en) * 2019-01-09 2020-07-17 阿里巴巴集团控股有限公司 Data processing method, device and equipment
TWI700905B (en) * 2018-01-18 2020-08-01 香港商阿里巴巴集團服務有限公司 Data processing method, device and equipment
CN115988002A (en) * 2023-02-16 2023-04-18 荣耀终端有限公司 Data transmission method and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103019887A (en) * 2012-12-12 2013-04-03 华为技术有限公司 Data backup method and device
CN103514250A (en) * 2013-06-20 2014-01-15 易乐天 Method and system for deleting global repeating data and storage device
CN103678293A (en) * 2012-08-29 2014-03-26 百度在线网络技术(北京)有限公司 Data storage method and device
US20150088899A1 (en) * 2013-09-23 2015-03-26 Spotify Ab System and method for identifying a segment of a file that includes target content

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005050620A1 (en) * 2003-11-18 2005-06-02 Koninklijke Philips Electronics N.V. Matching data objects by matching derived fingerprints
CN101477523B (en) * 2008-11-24 2011-07-20 北京邮电大学 Index structure and retrieval method for ultra-large fingerprint base
CN103235791B (en) * 2013-03-29 2019-03-26 厦门雅迅网络股份有限公司 A kind of fingerprint matching optimum position method based on rank

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103678293A (en) * 2012-08-29 2014-03-26 百度在线网络技术(北京)有限公司 Data storage method and device
CN103019887A (en) * 2012-12-12 2013-04-03 华为技术有限公司 Data backup method and device
CN103514250A (en) * 2013-06-20 2014-01-15 易乐天 Method and system for deleting global repeating data and storage device
US20150088899A1 (en) * 2013-09-23 2015-03-26 Spotify Ab System and method for identifying a segment of a file that includes target content

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107317723A (en) * 2017-05-27 2017-11-03 北京金山安全软件有限公司 Data processing method and server
TWI700905B (en) * 2018-01-18 2020-08-01 香港商阿里巴巴集團服務有限公司 Data processing method, device and equipment
CN110582091A (en) * 2018-06-11 2019-12-17 中国移动通信集团浙江有限公司 method and apparatus for locating wireless quality problems
CN111427871A (en) * 2019-01-09 2020-07-17 阿里巴巴集团控股有限公司 Data processing method, device and equipment
CN111427871B (en) * 2019-01-09 2024-03-29 阿里巴巴集团控股有限公司 Data processing method, device and equipment
CN115988002A (en) * 2023-02-16 2023-04-18 荣耀终端有限公司 Data transmission method and electronic equipment
CN115988002B (en) * 2023-02-16 2023-08-15 荣耀终端有限公司 Data transmission method and electronic equipment

Also Published As

Publication number Publication date
WO2017020735A1 (en) 2017-02-09
CN106407226B (en) 2019-09-13

Similar Documents

Publication Publication Date Title
CN106407226A (en) Data processing method, backup server and storage system
CN104461390B (en) Write data into the method and device of imbricate magnetic recording SMR hard disks
CN104679778B (en) A kind of generation method and device of search result
CN104346458B (en) Date storage method and storage device
CN110110006A (en) Data managing method and Related product
CN104133661A (en) Multi-core parallel hash partitioning optimizing method based on column storage
CN103914363B (en) A kind of internal memory monitoring method and relevant apparatus
CN106407224A (en) Method and device for file compaction in KV (Key-Value)-Store system
CN110471900A (en) Data processing method and terminal device
CN102508872A (en) Data processing method and system of online processing system based on memory
CN104102549B (en) A kind of method, apparatus and chip for realizing multithreading mutually exclusive operation
CN110008246A (en) Metadata management method and device
CN104615684A (en) Mass data communication concurrent processing method and system
CN107682395A (en) A kind of big data cloud computing runtime and method
CN107122354A (en) Affairs perform method, apparatus and system
CN102722450A (en) Storage method for redundancy deletion block device based on location-sensitive hash
CN201804331U (en) Date deduplication system based on co-processor
CN110119396A (en) Data managing method and Related product
CN104050189B (en) The page shares processing method and processing device
CN107346342A (en) A kind of file call method calculated based on storage and system
CN104298614B (en) Data block storage method and storage device in storage device
CN104598171B (en) Array method for reconstructing and device based on metadata
CN106775450B (en) A kind of data distribution method in mixing storage system
CN104077282B (en) The method and apparatus of processing data
CN108710606A (en) A kind of Task Progress monitoring method, computer readable storage medium and terminal device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant