CN104169917B - A kind of method based on whois lookup data flow point cutpoint and server - Google Patents
A kind of method based on whois lookup data flow point cutpoint and server Download PDFInfo
- Publication number
- CN104169917B CN104169917B CN201480000347.4A CN201480000347A CN104169917B CN 104169917 B CN104169917 B CN 104169917B CN 201480000347 A CN201480000347 A CN 201480000347A CN 104169917 B CN104169917 B CN 104169917B
- Authority
- CN
- China
- Prior art keywords
- window
- point
- data
- predetermined condition
- potential cut
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 92
- 239000011159 matrix material Substances 0.000 claims description 299
- 239000000203 mixture Substances 0.000 claims description 33
- 230000011218 segmentation Effects 0.000 claims description 33
- 238000003860 storage Methods 0.000 claims description 31
- 238000012545 processing Methods 0.000 claims description 30
- 230000008569 process Effects 0.000 claims description 17
- 230000009191 jumping Effects 0.000 claims description 11
- 238000004064 recycling Methods 0.000 claims description 8
- 230000006870 function Effects 0.000 description 70
- 238000010586 diagram Methods 0.000 description 17
- 241001269238 Data Species 0.000 description 13
- 230000001143 conditioned effect Effects 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 8
- 230000009189 diving Effects 0.000 description 4
- 239000004744 fabric Substances 0.000 description 4
- 230000008859 change Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 238000012217 deletion Methods 0.000 description 2
- 230000037430 deletion Effects 0.000 description 2
- 238000005315 distribution function Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000010354 integration Effects 0.000 description 2
- 230000009916 joint effect Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008094 contradictory effect Effects 0.000 description 1
- 238000000151 deposition Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Image Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Embodiments provide a kind of method based on whois lookup data flow point cutpoint.By judging in M window, in some window, whether at least part of data meet predetermined condition in the embodiment of the present invention, search data flow point cutpoint, when data at least part of in some window are unsatisfactory for predetermined condition, then skip N × U length, obtain next potential cut-point, improve data flow point cutpoint search efficiency.
Description
Technical field
The present invention relates to areas of information technology, particularly relate to a kind of based on whois lookup data stream
The method of cut-point and server.
Background technology
The continuous growth of data volume so that the data storage providing sufficient becomes currently stored field
The severe challenge faced.A kind of mode tackling this challenge at present is the number utilizing and needing storage
According to redundancy properties, use data de-duplication technology, thus reduce the data volume of storage.
In prior art, repetition based on content piecemeal (Content Defined Chunk, CDC)
Data deletion algorithm, first has to data stream to be stored is divided into a lot of data block, and by data
Stream is divided into data block and is accomplished by searching suitable cut-point in a stream, two adjacent data streams
Data between cut-point constitute a data block.Calculate the eigenvalue of data block, thus search
Whether there is the data block of same characteristic features value, if finding the data block that same characteristic features refers to, then
Think that existence repeats data.Concrete, data de-duplication technology based on content piecemeal is should
Search by sliding window technique (Sliding Window Technique) content based on file
The cut-point of piecemeal, i.e. determines data flow point by the Rabin fingerprint of data in calculation window
Cutpoint.Assume to search cut-point from the left side of data stream to the right, calculate in sliding window every time
The fingerprint of data, and by fingerprint value based on given integer K delivery after, with given remainder
R compares;If equal, the right-hand member of window is data flow point cutpoints, is otherwise continued by window
Turn right slip one byte, the most cyclically carry out calculating and comparison, until arrive data stream end
Tail.During data de-duplication based on content piecemeal, search data flow point cutpoint, need
Consume substantial amounts of calculating resource, thus become the bottleneck promoting data de-duplication performance.
Summary of the invention
First aspect, embodiments provides a kind of based on the segmentation of whois lookup data stream
The method of point, is preset with rule on described server, and described rule is: for potential cut-point
K determines M some px, some pxCorresponding window Wx[px-Ax,px+Bx] and window Wx[px-
Ax,px+Bx] corresponding predetermined condition Cx, wherein, x is 1 to M continuous print natural number, M >=2,
AxAnd BxFor integer;Described method includes:
A) it is current potential cut-point k according to described ruleiDetermine a pizAnd described some pizCorresponding
Window Wiz[piz-Az,piz+Bz], i and z is integer, and 1≤z≤M;
B) described window W is judgediz[piz-Az,piz+BzIn], whether at least part of data meet pre-
Fixed condition Cz;
As described window Wiz[piz-Az,piz+BzIn], at least part of data are unsatisfactory for described predetermined bar
Part Cz, from described some pizAlong described data flow point cutpoint search direction jump N number of data stream segmentation
Point minimum searches unit U, and N*U is not more than ‖ Bz‖+maxx(‖Ax‖+‖(ki-pix) ‖),
Obtain new potential cut-point, perform step a);
C) as described current potential cut-point kiM window in each window Wix[pix-
Ax,pix+BxIn], at least part of data meet predetermined condition Cx, the most described current potential cut-point
kiFor data flow point cutpoint.
In conjunction with in first aspect, the first possible implementation, described rule also includes: extremely
Few two some peAnd pf, meet condition Ae=Af, Be=Bf, Ce=Cf。
In conjunction with the first possible implementation of first aspect, the implementation that the second is possible
In, described rule also includes: described at least two point peAnd pf, relative to described potential segmentation
Point k, searches in the reverse direction at described data flow point cutpoint.
In conjunction with the realization side that the first possible implementation of first aspect or the second are possible
Formula, in the third possible implementation, described rule also includes: described at least two point pe
And pfBetween distance be 1 U.
In conjunction with first aspect, or first aspect first is to the third arbitrary possible implementation,
In 4th kind of possible implementation, it is judged that described window Wiz[piz-Az,piz+BzAt least portion in]
Whether divided data meets described predetermined condition Cz, specifically include:
Random function is used to judge described window Wiz[piz-Az,piz+BzIn], at least part of data are
No meet described predetermined condition Cz。
In conjunction with the 4th kind of possible implementation of first aspect, the 5th kind of possible implementation
In, described use random function judges described window Wiz[piz-Az,piz+BzAt least partly count in]
According to whether meeting described predetermined condition Cz, it is specially and uses hash function to judge described window Wiz
[piz-Az,piz+BzIn], whether at least part of data meet described predetermined condition Cz。
In conjunction with first aspect, or first aspect first is to the 5th kind of arbitrary possible implementation,
In 6th kind of possible implementation, as described window Wiz[piz-Az,piz+BzIn] at least partly
Data are unsatisfactory for described predetermined condition Cz, from described some pizSearch along described data flow point cutpoint
The direction N number of data flow point cutpoint minimum of jump searches unit U, it is thus achieved that described new potential segmentation
Point, according to described rule, the some p determined for described new potential cut-pointicCorresponding window
Wic[pic-Ac,pic+Bc] left margin and described window Wiz[piz-Az,piz+Bz] right margin weight
The described some p closed or determine for described new potential cut-pointicCorresponding described window Wic
[pic-Ac,pic+Bc] left margin be positioned at described window Wiz[piz-Az,piz+BzWithin the scope of];Its
In, the described some p determined for described new potential cut-pointicIt is according to described rule, for institute
State in the sequence that M point that new potential cut-point determines obtains according to data stream search direction
The point of sequence first.
In conjunction with the 4th kind of possible implementation of first aspect, the 7th kind of possible implementation
In, use random function to judge described window Wiz[piz-Az,piz+BzAt least part of data in]
Whether meet described predetermined condition Cz, specifically include:
At described window Wiz[piz-Az,piz+BzF byte is selected, by described F byte in]
Recycling H time, obtain F*H byte altogether, the most each byte is formed by 8, is designated as am,1…
am,8, represent the 1st to the 8th of m-th byte in described F*H byte, described F*H word
The position that joint is corresponding can be expressed as: Work as am,nWhen=1, Vam,n
=1, work as am,nWhen=0, Vam,n=-1, wherein am,nRepresent am,1…am,8In any one, described
Position corresponding to F*H byte is according to am,nWith Vam,nTransformational relation obtain matrix Va, described matrix
VaIt is expressed as: Select from the random number of service normal distribution
Select F*H*8 random number composition matrix R, described matrix R to be expressed as: By described matrix VaThe m row of m row and described matrix R
Random number be multiplied, then summation obtain a value, be embodied as Sam=Vam,1*hm,1+Vam,2
*hm,2+…+Vam,8*hm,8, in like manner, it is thus achieved that Sa1、Sa2... to SaF*H, add up Sa1、Sa2…
To SaF*HIn meet number K of value more than 0, when K is even number, the most described window Wiz[piz-Az,
piz+BzIn], at least part of data meet described predetermined condition Cz。
Second aspect, embodiments provides a kind of based on the segmentation of whois lookup data stream
The method of point, is preset with rule on described server, and described rule is: for potential cut-point
K determines M window Wx[k-Ax,k+Bx] and window Wx[k-Ax,k+Bx] corresponding predetermined bar
Part Cx, wherein, x is 1 to M continuous print natural number, M >=2, AxAnd BxFor integer;
Described method includes:
A) it is current potential cut-point k according to described ruleiDetermine the window W of correspondenceiz[ki-Az,
ki+Bz], i and z is integer, and 1≤z≤M;
B) described window W is judgediz[ki-Az,ki+BzIn], whether at least part of data meet predetermined
Condition Cz;
As described window Wiz[ki-Az,ki+BzIn], at least part of data are unsatisfactory for described predetermined
Condition Cz, from described current potential cut-point kiSearch along described data flow point cutpoint
The direction N number of data flow point cutpoint minimum of jump searches unit U, and N*U is not more than ‖
Bz‖+maxx(‖Ax‖), it is thus achieved that new potential cut-point, step a) is performed;
C) as described current potential cut-point kiM window in each window Wix[ki-Ax,
ki+BxIn], at least part of data meet predetermined condition Cx, the most described current potential cut-point kiFor
Data flow point cutpoint.
In conjunction with in second aspect, the first possible implementation, described rule also includes: extremely
Few two window Wie[ki-Ae,ki+Be] and Wif[ki-Af,ki+Bf], meet condition: | Ae+Be
|=| Af+Bf|, Ce=Cf。
In conjunction with the first possible implementation of second aspect, the implementation that the second is possible
In, described rule also includes: AeAnd AfFor positive integer.
In conjunction with the realization side that the first possible implementation of second aspect or the second are possible
Formula, in the implementation that the third is possible, described rule also includes: Ae-1=Af, Be+ 1=
Bf。
In conjunction with second aspect, or second aspect first is to the 3rd arbitrary possible implementation, the
In four kinds of possible implementations, it is judged that described window Wiz[ki-Az,ki+BzAt least partly count in]
Predetermined condition C is met according to the most describedz, specifically include:
Random function is used to judge described window Wiz[ki-Az,ki+BzIn], at least part of data are
No meet described predetermined condition Cz。
In conjunction with the 4th kind of possible implementation of second aspect, the 5th kind of possible implementation
In, described use random function judges described window Wiz[ki-Az,ki+BzAt least part of data in]
Whether meet described predetermined condition Cz, it is specially and uses hash function to judge described window Wiz[ki-
Az,ki+BzIn], whether at least part of data meet described predetermined condition Cz。
In conjunction with second aspect, or second aspect first is to the 5th arbitrary possible implementation, the
In six kinds of possible implementations, as described window Wiz[ki-Az,ki+BzAt least partly count in]
According to being unsatisfactory for described predetermined condition Cz, from described current potential cut-point kiAlong described data flow point
The cutpoint search direction N number of data flow point cutpoint minimum of jump searches unit U, it is thus achieved that described new diving
At cut-point, according to described rule, the window W determined for described new potential cut-pointic[ki
-Ac,ki+Bc] left margin and described window Wiz[ki-Az,ki+Bz] right margin overlap or
The described window W determined for described new potential cut-pointic[ki-Ac,ki+Bc] left margin position
In described window Wiz[ki-Az,ki+BzWithin the scope of];Wherein, for described new potential segmentation
The described window W that point determinesic[ki-Ac,ki+Bc] it is according to described rule, for described new diving
Sequence the in the sequence that M the window determined at cut-point obtains according to data stream search direction
The window of one.
In conjunction with the 4th kind of possible implementation of second aspect, the 7th kind of possible implementation
In, use random function to judge described window Wiz[ki-Az,ki+BzIn], at least part of data are
No meet described predetermined condition Cz, specifically include:
At described window Wiz[ki-Az,ki+BzF byte is selected, by anti-for described F byte in]
Utilizing H time again, obtain F*H byte altogether, the most each byte is formed by 8, is designated as am,1…
am,8, represent the 1st to the 8th of m-th byte in described F*H byte, described F*H word
The position that joint is corresponding can be expressed as: Work as am,nWhen=1, Vam,n
=1, work as am,nWhen=0, Vam,n=-1, wherein am,nRepresent am,1…am,8In any one, described
Position corresponding to F*H byte is according to am,nWith Vam,nTransformational relation obtain matrix Va, described matrix
VaIt is expressed as: Select from the random number of service normal distribution
Select F*H*8 random number composition matrix R, described matrix R to be expressed as: By described matrix VaThe m row of m row and described matrix R
Random number be multiplied, then summation obtain a value, be embodied as Sam=Vam,1*hm,1+Vam,2
*hm,2+…+Vam,8*hm,8, in like manner, it is thus achieved that Sa1、Sa2... to SaF*H, add up Sa1、Sa2…
To SaF*HIn meet number K of value more than 0, when K is even number, the most described window Wiz[ki-Az,
ki+BzIn], at least part of data meet described predetermined condition Cz。
The third aspect, embodiments provides a kind of clothes for searching data flow point cutpoint
Business device, described server includes CPU and main storage, described CPU
Communicating with described main storage, be preset with rule on described server, described rule is: for
Potential cut-point k determines M some px, some pxCorresponding window Wx[px-Ax,px+Bx] and window
Mouth Wx[px-Ax,px+Bx] corresponding predetermined condition Cx, wherein, x be 1 to M continuous print from
So number, M >=2, AxAnd BxFor integer;
Described main storage is used for storing executable instruction, and described CPU performs described
Executable instruction, to perform following steps:
A) it is current potential cut-point k according to described ruleiDetermine a pizAnd described some pizCorresponding
Window Wiz[piz-Az,piz+Bz], i and z is integer, and 1≤z≤M;
B) described window W is judgediz[piz-Az,piz+BzIn], whether at least part of data meet pre-
Fixed condition Cz;
As described window Wiz[piz-Az,piz+BzIn], at least part of data are unsatisfactory for described predetermined bar
Part Cz, from described some pizAlong described data flow point cutpoint search direction jump N number of data stream segmentation
Point minimum searches unit U, and N*U is not more than ‖ Bz‖+maxx(‖Ax‖+‖(ki-pix) ‖),
Obtain new potential cut-point, perform step a);
C) as described current potential cut-point kiM window in each window Wix[pix-
Ax,pix+BxIn], at least part of data meet predetermined condition Cx, the most described current potential cut-point
kiFor data flow point cutpoint.
In conjunction with in the third aspect, the first possible implementation, described rule also includes: extremely
Few two some peAnd pf, meet condition Ae=Af, Be=Bf, Ce=Cf。
In conjunction with the first possible implementation of the third aspect, the implementation that the second is possible
In, described rule also includes: described at least two point peAnd pf, relative to described potential point
Cutpoint k, searches in the reverse direction at described data flow point cutpoint.
In conjunction with the realization side that the first possible implementation of the third aspect or the second are possible
Formula, in the third possible implementation, described rule also includes: described at least two point pe
And pfBetween distance be 1 U.
In conjunction with the third aspect, or first to the 3rd arbitrary possible implementation, the 4th kind may
Implementation in, described CPU specifically for
Random function is used to judge described window Wiz[piz-Az,piz+BzIn], at least part of data are
No meet described predetermined condition Cz。
In conjunction with the 4th kind of possible implementation of the third aspect, the 5th kind of possible implementation
In, described CPU judges described window W specifically for using hash functioniz[piz-Az,
piz+BzIn], whether at least part of data meet described predetermined condition Cz。
In conjunction with the third aspect, or first to the 5th arbitrary possible implementation, the 6th kind may
Implementation in, as described window Wiz[piz-Az,piz+BzIn], at least part of data are unsatisfactory for
Described predetermined condition Cz, from described some pizJump N number of along described data flow point cutpoint search direction
Data flow point cutpoint minimum searches unit U, it is thus achieved that described new potential cut-point, according to described
Rule, the some p determined for described new potential cut-pointicCorresponding window Wic[pic-Ac,pic+
Bc] left margin and described window Wiz[piz-Az,piz+Bz] right margin overlap or be described
The described some p that new potential cut-point determinesicCorresponding described window Wic[pic-Ac,pic+Bc]
Left margin be positioned at described window Wiz[piz-Az,piz+BzWithin the scope of];Wherein, for described newly
Described some p determining of potential cut-pointicIt is according to described rule, for described new potential point
M the point that cutpoint determines is according to the point of sequence first in the sequence of data stream search direction acquisition.
In conjunction with the 4th kind of possible implementation of the third aspect, the 7th kind of possible implementation
In, described CPU uses random function to judge described window Wiz[piz-Az,piz+Bz]
In at least partly data whether meet described predetermined condition Cz, specifically include:
At described window Wiz[piz-Az,piz+BzF byte is selected, by described F byte in]
Recycling H time, obtain F*H byte altogether, the most each byte is formed by 8, is designated as am,1…
am,8, represent the 1st to the 8th of m-th byte in described F*H byte, described F*H word
The position that joint is corresponding can be expressed as: Work as am,nWhen=1, Vam,n
=1, work as am,nWhen=0, Vam,n=-1, wherein am,nRepresent am,1…am,8In any one, described
Position corresponding to F*H byte is according to am,nWith Vam,nTransformational relation obtain matrix Va, described matrix
VaIt is expressed as: Select from the random number of service normal distribution
Select F*H*8 random number composition matrix R, described matrix R to be expressed as: By described matrix VaThe m row of m row and described matrix R
Random number be multiplied, then summation obtain a value, be embodied as Sam=Vam,1*hm,1+Vam,2
*hm,2+…+Vam,8*hm,8, in like manner, it is thus achieved that Sa1、Sa2... to SaF*H, add up Sa1、Sa2…
To SaF*HIn meet number K of value more than 0, when K is even number, the most described window Wiz[piz-Az,
piz+BzIn], at least part of data meet described predetermined condition Cz.Fourth aspect, the present invention is real
Execute example and provide a kind of server for searching data flow point cutpoint, during described server includes
Central Processing Unit and main storage, described CPU and described main storage communication,
Being preset with rule on described server, described rule is: determine M window for potential cut-point k
Mouth Wx[k-Ax,k+Bx] and window Wx[k-Ax,k+Bx] corresponding predetermined condition Cx, its
In, x is 1 to M continuous print natural number, M >=2, AxAnd BxFor integer;
Described main storage is used for storing executable instruction, and described CPU performs described
Executable instruction, to perform following steps:
A) it is current potential cut-point k according to described ruleiDetermine the window W of correspondenceiz[ki-Az,ki
+Bz], i and z is integer, and 1≤z≤M;
B) described window W is judgediz[ki-Az,ki+BzIn], whether at least part of data meet predetermined
Condition Cz;
As described window Wiz[ki-Az,ki+BzIn], at least part of data are unsatisfactory for described predetermined condition
Cz, from described current potential cut-point kiJump N number of along described data flow point cutpoint search direction
Data flow point cutpoint minimum searches unit U, and N*U is not more than ‖ Bz‖+maxx(‖Ax‖), obtain
Obtain potential cut-point newly, perform step a);
C) as described current potential cut-point kiM window in each window Wix[ki-Ax,
ki+BxIn], at least part of data meet predetermined condition Cx, the most described current potential cut-point kiFor
Data flow point cutpoint.
In conjunction with in fourth aspect, the first possible implementation, described rule also includes: extremely
Few two window Wie[ki-Ae,ki+Be] and Wif[ki-Af,ki+Bf], meet condition: | Ae+Be
|=| Af+Bf|, Ce=Cf。
In conjunction with the first possible implementation of fourth aspect, the implementation that the second is possible
In, described rule also includes: AeAnd AfFor positive integer.
In conjunction with the realization side that the first possible implementation of fourth aspect or the second are possible
Formula, in the implementation that the third is possible, described rule also includes: Ae-1=Af, Be+ 1=
Bf。
In conjunction with fourth aspect, or first to the 3rd arbitrary possible implementation, the 4th kind may
Implementation in, described CPU specifically for
Random function is used to judge described window Wiz[ki-Az,ki+BzIn], at least part of data are
No meet described predetermined condition Cz。
In conjunction with the 4th kind of possible implementation of fourth aspect, the 5th kind of possible implementation
In, described CPU judges described window W specifically for using hash functioniz[ki-Az,
ki+BzIn], whether at least part of data meet described predetermined condition Cz。
In conjunction with fourth aspect, or first to the 5th arbitrary possible implementation, the 6th kind may
Implementation in, as described window Wiz[ki-Az,ki+BzIn], at least part of data are unsatisfactory for
Described predetermined condition Cz, from described current potential cut-point kiSearch along described data flow point cutpoint
The direction N number of data flow point cutpoint minimum of jump searches unit U, it is thus achieved that described new potential cut-point,
According to described rule, the window W determined for described new potential cut-pointic[ki-Ac,ki+Bc]
Left margin and described window Wiz[ki-Az,ki+Bz] right margin overlap or be described newly
The described window W that determines of potential cut-pointic[ki-Ac,ki+Bc] left margin be positioned at described window
Mouth Wiz[ki-Az,ki+BzWithin the scope of];Wherein, determine for described new potential cut-point
Described window Wic[ki-Ac,ki+Bc] it is according to described rule, for described new potential cut-point
The sequence that M the window determined obtains according to data stream search direction sorts first window.
In conjunction with the 4th kind of possible implementation of fourth aspect, the 7th kind of possible implementation
In, described CPU uses random function to judge described window Wiz[ki-Az,ki+Bz]
In at least partly data whether meet described predetermined condition Cz, specifically include:
At described window Wiz[ki-Az,ki+BzF byte is selected, by anti-for described F byte in]
Utilizing H time again, obtain F*H byte altogether, the most each byte is formed by 8, is designated as am,1…
am,8, represent the 1st to the 8th of m-th byte in described F*H byte, described F*H word
The position that joint is corresponding can be expressed as: Work as am,nWhen=1, Vam,n
=1, work as am,nWhen=0, Vam,n=-1, wherein am,nRepresent am,1…am,8In any one, described
Position corresponding to F*H byte is according to am,nWith Vam,nTransformational relation obtain matrix Va, described matrix
VaIt is expressed as: Select from the random number of service normal distribution
Select F*H*8 random number composition matrix R, described matrix R to be expressed as: By described matrix VaThe m row of m row and described matrix R
Random number be multiplied, then summation obtain a value, be embodied as Sam=Vam,1*hm,1+Vam,2
*hm,2+…+Vam,8*hm,8, in like manner, it is thus achieved that Sa1、Sa2... to SaF*H, add up Sa1、Sa2…
To SaF*HIn meet number K of value more than 0, when K is even number, the most described window Wiz[ki-Az,
ki+BzIn], at least part of data meet described predetermined condition Cz。
5th aspect, embodiments provides a kind of clothes for searching data flow point cutpoint
Business device, is preset with rule on described server, and described rule is: true for potential cut-point k
Determine M some px, some pxCorresponding window Wx[px-Ax,px+Bx] and window Wx[px-Ax,px+
Bx] corresponding predetermined condition Cx, wherein, x is 1 to M continuous print natural number, M >=2, AxWith
BxFor integer;
Described server includes: processing unit, is used for performing step a):
A) it is current potential cut-point k according to described ruleiDetermine a pizAnd described some pizCorresponding
Window Wiz[piz-Az,piz+Bz], i and z is integer, and 1≤z≤M;
Judge processing unit, be used for judging described window Wiz[piz-Az,piz+BzIn] at least partly
Whether data meet predetermined condition Cz;
As described window Wiz[piz-Az,piz+BzIn], at least part of data are unsatisfactory for described predetermined bar
Part Cz, from described some pizAlong described data flow point cutpoint search direction jump N number of data stream segmentation
Point minimum searches unit U, and N*U is not more than ‖ Bz‖+maxx(‖Ax‖+‖(ki-pix) ‖),
Obtain new potential cut-point, the most described determine that unit is that described new potential cut-point performs step
A);
As described current potential cut-point kiM window in each window Wix[pix-Ax,
pix+BxIn], at least part of data meet predetermined condition Cx, the most described current potential cut-point kiFor
Data flow point cutpoint.
In conjunction with in the 5th aspect, the first possible implementation, described rule also includes: extremely
Few two some peAnd pf, meet condition Ae=Af, Be=Bf, Ce=Cf。
In conjunction with the first possible implementation of the 5th aspect, the implementation that the second is possible
In, described rule also includes: described at least two point peAnd pf, relative to described potential point
Cutpoint k, searches in the reverse direction at described data flow point cutpoint.
In conjunction with the implementation that the first possible implementation of the 5th aspect or the second are possible,
In the implementation that the third is possible, described rule also includes: described at least two point peWith
pfSpacing be 1 U.
In conjunction with the 5th aspect, or first to the 3rd arbitrary possible implementation, the 4th kind may
Implementation in, the described specifically used random function of judgement processing unit judges described window
Wiz[piz-Az,piz+BzIn], whether at least part of data meet described predetermined condition Cz。
In conjunction with the 4th kind of possible implementation of the 5th aspect, the 5th kind of possible implementation
In, described decision process unit judges described window W specifically for using hash functioniz[piz-Az,
piz+BzIn], whether at least part of data meet described predetermined condition Cz。
In conjunction with the 5th aspect, or first to the 5th arbitrary possible implementation, the 6th kind may
Implementation in, described judgement processing unit is for as described window Wiz[piz-Az,piz+Bz]
In at least partly data be unsatisfactory for described predetermined condition Cz, from described some pizAlong described data stream
Cut-point search direction N number of data flow point cutpoint minimum of jumping searches unit U, it is thus achieved that described new
Potential cut-point, described determines that unit is that described new potential cut-point performs step a), root
According to described rule, the some p determined for described new potential cut-pointicCorresponding window Wic[pic-
Ac,pic+Bc] left margin and described window Wiz[piz-Az,piz+Bz] right margin overlap or
The described window W determined for described new potential cut-pointic[pic-Ac,pic+Bc] left margin
It is positioned at described window Wiz[piz-Az,piz+BzWithin the scope of];Wherein, for described new potential point
The described window W that cutpoint determinesic[pic-Ac,pic+Bc] it is according to described rule, for described new
Sequence the in the sequence that potential cut-point determine M point obtains according to data stream search direction
The point of one.
In conjunction with the 4th kind of possible implementation of the 5th aspect, the 7th kind of possible implementation
In, described judgement processing unit judges described window W specifically for using random functioniz[piz-
Az,piz+BzIn], whether at least part of data meet described predetermined condition Cz, specifically include:
At described window Wiz[piz-Az,piz+BzF byte is selected, by described F byte in]
Recycling H time, obtain F*H byte altogether, the most each byte is formed by 8, is designated as am,1…
am,8, represent the 1st to the 8th of m-th byte in described F*H byte, described F*H word
The position that joint is corresponding can be expressed as: Work as am,nWhen=1, Vam,n
=1, work as am,nWhen=0, Vam,n=-1, wherein am,nRepresent am,1…am,8In any one, described
Position corresponding to F*H byte is according to am,nWith Vam,nTransformational relation obtain matrix Va, described matrix
VaIt is expressed as: Select from the random number of service normal distribution
Select F*H*8 random number composition matrix R, described matrix R to be expressed as: By described matrix VaThe m row of m row and described matrix R
Random number be multiplied, then summation obtain a value, be embodied as Sam=Vam,1*hm,1+Vam,2
*hm,2+…+Vam,8*hm,8, in like manner, it is thus achieved that Sa1、Sa2... to SaF*H, add up Sa1、Sa2…
To SaF*HIn meet number K of value more than 0, when K is even number, the most described window Wiz[piz-Az,
piz+BzIn], at least part of data meet described predetermined condition Cz。
6th aspect, embodiments provides a kind of clothes for searching data flow point cutpoint
Business device, is preset with rule on described server, and described rule is: true for potential cut-point k
Determine M window Wx[k-Ax,k+Bx] and window Wx[k-Ax,k+Bx] corresponding predetermined condition Cx,
Wherein, x is 1 to M continuous print natural number, M >=2, AxAnd BxFor integer;
Described server comprises determining that unit, is used for performing step a:
A) it is current potential cut-point k according to described ruleiDetermine the window W of correspondenceiz[ki-Az,ki
+Bz], i and z is integer, and 1≤z≤M;
Judge processing unit, be used for judging described window Wiz[ki-Az,ki+BzAt least partly count in]
According to whether meeting predetermined condition Cz;
As described window Wiz[ki-Az,ki+BzIn], at least part of data are unsatisfactory for described predetermined condition
Cz, from described current potential cut-point kiJump N number of along described data flow point cutpoint search direction
Data flow point cutpoint minimum searches unit U, and N*U is not more than ‖ Bz‖+maxx(‖Ax‖), obtain
Obtain potential cut-point newly, perform step a);
C works as described current potential cut-point kiM window in each window Wix[ki-Ax,
ki+BxIn], at least part of data meet predetermined condition Cx, the most described current potential cut-point kiFor
Data flow point cutpoint.
In conjunction with in the 6th aspect, the first possible implementation, described rule also includes: extremely
Few two window Wie[ki-Ae,ki+Be] and Wif[ki-Af,ki+Bf], meet condition: | Ae+Be
|=| Af+Bf|, Ce=Cf。
In conjunction with the first possible implementation of the 6th aspect, the implementation that the second is possible
In, described rule also includes: AeAnd AfFor positive integer.
In conjunction with the realization side that the first possible implementation of the 6th aspect or the second are possible
Formula, in the implementation that the third is possible, described rule also includes: Ae-1=Af, Be+ 1=
Bf。
In conjunction with the 6th aspect, or first to the 3rd arbitrary possible implementation, the 4th kind may
Implementation in, described judgement processing unit specifically for
Random function is used to judge described window Wiz[ki-Az,ki+BzIn], at least part of data are
No meet described predetermined condition Cz。
In conjunction with the 4th kind of possible implementation of the 6th aspect, the 5th kind of possible implementation
In, described judgement processing unit judges described window W specifically for using hash functioniz[ki-Az,
ki+BzIn], whether at least part of data meet described predetermined condition Cz。
In conjunction with the 6th aspect, or first to the 5th arbitrary possible implementation, the 6th kind may
Implementation in, described judgement processing unit is for as described window Wiz[ki-Az,ki+Bz]
In at least partly data be unsatisfactory for described predetermined condition Cz, from described current potential cut-point kiEdge
The described data flow point cutpoint search direction N number of data flow point cutpoint minimum of jump searches unit U, obtains
Described new potential cut-point, described determine that unit is that described new potential cut-point performs step
A), according to described rule, the window W determined for described new potential cut-pointic[ki-Ac,
ki+Bc] left margin and described window Wiz[ki-Az,ki+Bz] right margin overlap or be
The described window W that described new potential cut-point determinesic[ki-Ac,ki+Bc] left margin be positioned at
Described window Wiz[ki-Az,ki+BzWithin the scope of];Wherein, for described new potential cut-point
The described window W determinedic[ki-Ac,ki+Bc] it is according to described rule, for described new potential
The sequence that M the window that cut-point determines obtains according to data stream search direction sorts first
Window.
In conjunction with the 4th kind of possible implementation of the 6th aspect, the 7th kind of possible implementation
In, described judgement processing unit uses random function to judge described window Wiz[ki-Az,ki+Bz]
In at least partly data whether meet described predetermined condition Cz, specifically include:
At described window Wiz[ki-Az,ki+BzF byte is selected, by anti-for described F byte in]
Utilizing H time again, obtain F*H byte altogether, the most each byte is formed by 8, is designated as am,1…
am,8, represent the 1st to the 8th of m-th byte in described F*H byte, described F*H word
The position that joint is corresponding can be expressed as: Work as am,nWhen=1, Vam,n
=1, work as am,nWhen=0, Vam,n=-1, wherein am,nRepresent am,1…am,8In any one, described
Position corresponding to F*H byte is according to am,nWith Vam,nTransformational relation obtain matrix Va, described matrix
VaIt is expressed as: Select from the random number of service normal distribution
Select F*H*8 random number composition matrix R, described matrix R to be expressed as: By described matrix VaThe m row of m row and described matrix R
Random number be multiplied, then summation obtain a value, be embodied as Sam=Vam,1*hm,1+Vam,2
*hm,2+…+Vam,8*hm,8, in like manner, it is thus achieved that Sa1、Sa2... to SaF*H, add up Sa1、Sa2…
To SaF*HIn meet number K of value more than 0, when K is even number, the most described window Wiz[ki-Az,
ki+BzIn], at least part of data meet described predetermined condition Cz。
The embodiment of the present invention at least partly counts in some window in M window by judging
According to whether meeting predetermined condition, search data flow point cutpoint, when at least portion in some window
Divided data is unsatisfactory for predetermined condition, then skip N*U length, it is thus achieved that next potential cut-point,
Improve data flow point cutpoint search efficiency.
Accompanying drawing explanation
Fig. 1 is embodiment of the present invention one application scenarios schematic diagram;
Fig. 2 is data flow point cutpoint schematic diagrams;
Fig. 3 is for searching data flow point cutpoint schematic diagram;
Fig. 4 is embodiment of the present invention method schematic diagram;
Fig. 5 and Fig. 6 is for searching data flow point cutpoint embodiment schematic diagram;
Fig. 7 and Fig. 8 is for searching data flow point cutpoint embodiment schematic diagram;
Fig. 9 and Figure 10 is for looking for data flow point cutpoint embodiment schematic diagram;
Figure 11 and Figure 12 and Figure 13 is for looking for data flow point cutpoint embodiment schematic diagram;
Figure 14 and Figure 15 is for looking for data flow point cutpoint embodiment schematic diagram;
Figure 16 and Figure 17 is for judge in window, whether at least part of data meet predetermined condition schematic diagram;
Figure 18 is duplicate removal server architecture figure;
Figure 19 is duplicate removal server architecture figure;
Figure 20 is embodiment of the present invention method schematic diagram;
Figure 21 and Figure 22 is for searching data flow point cutpoint embodiment schematic diagram;
Figure 23 and Figure 24 is for searching data flow point cutpoint embodiment schematic diagram;
Figure 25 and Figure 26 is for looking for data flow point cutpoint embodiment schematic diagram;
Figure 27 and Figure 28 and Figure 29 is for looking for data flow point cutpoint embodiment schematic diagram;
Figure 30 and Figure 31 is for looking for data flow point cutpoint embodiment schematic diagram;
Figure 32 and Figure 33 is for judge in window, whether at least part of data meet predetermined condition schematic diagram.
Specific embodiment
Along with the continuous progress of memory technology, data generation amount is also being continuously increased, substantial amounts of number
Highest demand is proposed according to memory capacity.While memory capacity increases, too increase IT
Equipment purchase cost, in order to alleviate the demand contradictory between data volume and memory capacity, saves IT
Equipment purchase cost, introduces data de-duplication technology in field of data storage.
Embodiment of the present invention one uses scene to be data backup scene.Data backup is for preventing
The loss of data that a variety of causes causes, backs up data to other storages by backup server and is situated between
The process of matter.Data backup system framework as shown in Figure 1.Data backup system includes client
End (101a, 101b ... 101n), the backup server 102, (letter of data de-duplication server
Claim duplicate removal server or heavily delete server) 103 and storage device (104a, 104b ... 104n).
Wherein client (101a, 101b ... 101n) can be application server, work station etc.;Standby
The data that part server 102 generates for backup client;Duplicate removal server 103 is used for performing standby
The data de-duplication task of number evidence;Storage device (104a, 104b ... 104n) is as depositing
The storage medium of the data after storage data de-duplication, can be that disk array, tape library etc. are deposited
Storage media.Client (101a, 101b ... 101n), backup server 102, repetition data are deleted
Except server 103 and storage device (104a, 104b ... 104n) can pass through switch, office
The modes such as territory net, the Internet, optical fiber connect, and the said equipment may be located at same place, it is possible to
To be positioned at different location.Backup server 102, heavily delete server 103, storage device (104a,
104b ... 104n) can be independent physical equipment, or be physically integrated in implementing
It is integrated, or backup server 102 becomes one with heavily deleting server 103, or heavily delete
Server 103 and storage device (104a, 104b ... 104n) become one.
The duplicate removal server 103 data stream execution data de-duplication operations to Backup Data, one
As comprise the following steps:
1) data flow point cutpoint is searched: search data flow point in a stream according to special algorithm
Cutpoint;
2) data block is divided according to the data flow point cutpoint found;
3) eigenvalue of data block is calculated: calculate the eigenvalue of data block as identifying these data
The feature of block;Calculated eigenvalue is added to the data block of file corresponding to this data stream
Feature list in;SHA-1 or MD5 algorithm is typically utilized to calculate the eigenvalue of data block;
4) identical block detection: the eigenvalue of calculated data block is special with data block
Levy already present eigenvalue in list to compare to determine whether there is identical block;
5) deleting duplicated data block: detected by identical block, if it find that data block is special
Levy the eigenvalue that in list, existence is identical with this data block, then need not to store again this data block or
The repetition data block stored number that person determines according to backup policy decides whether to store this data block.
By duplicate removal server 103, the data stream of Backup Data is performed data de-duplication operations
Step understand, data flow point cutpoint is searched as the committed step of data de-duplication operations,
Directly determine the performance of data de-duplication.
In the embodiment of the present invention, duplicate removal server 103 receives the backup that backup server 102 sends
File, performs data de-duplication to this document and processes.Usual pending backup file is in duplicate removal
Presenting with data-stream form in server 103, duplicate removal server 103 searches the segmentation in data stream
During point, data flow point cutpoint minimum to be determined searches unit, concrete as in figure 2 it is shown, such as
Potential cut-point k1Continuous two the data flow point cutpoint minimums being positioned at sequence number respectively 1 and 2 are looked into
Looking between unit, potential cut-point refers to that needs carry out judging whether to split as data stream
The point of point;As a k1It is a data flow point cutpoint, data flow point cutpoint search direction such as Fig. 2
Shown in middle arrow, searching next potential cut-point is k7, i.e. it is positioned at sequence number and is respectively 7 and 8
Continuous two data flow point cutpoint minimums search between unit, as potential cut-point k7For data
Flow point cutpoint, two the most adjacent data flow point cutpoint k1、k7Between data be 1 data
Block.Data flow point cutpoint minimum search unit specifically can determine according to actual needs, here with
As a example by 1 byte (Byte), i.e. the data flow point cutpoint minimum of serial number 1,2,7 and 8 is looked into
Unit-sized is looked for be 1 byte.The data flow point usual table of cutpoint search direction as shown in Figure 2
Show and searched to end-of-file direction by file header, or by file Caudad file header direction, this enforcement
In example as a example by searching to end-of-file direction from file header.
In data de-duplication scene, usual data block is the least, and data de-duplication rate is the highest,
The most easily find repetition data block, but the metadata quantity thus generated is the biggest, and number
According to block little to a certain extent after, data de-duplication rate would not add, but metadata
Quantity but can sharply increase.Therefore, it is necessary to control data block size, in actual application, generally
The minima of meeting setting data block, such as 4KB (4096 bytes), simultaneously take account of repetition
Data deletion rate, also can the maximum of setting data block, i.e. data block size not can exceed that maximum
Value, such as 12KB (12288 bytes).A kind of specific implementation is as it is shown on figure 3, go
Weight server 103 is searching data flow point cutpoint, k along direction shown in arrowaFor current lookup
The data flow point cutpoint arrived, from kaNext potential point is searched to data flow point cutpoint search direction
Cutpoint, for meeting minimum data block requirement, it will usually start along data from data flow point cutpoint
Flow point cutpoint search direction skips minimum data block size, from the beginning of minimum data block end position
Search, namely using minimum data block end position as next potential cut-point ki.At this
In inventive embodiments, can be first from kaPoint is along data flow point cutpoint search direction jump minimum data
Block 4KB, i.e. 4*1024=4096 byte.From kaPoint jumps along data flow point cutpoint search direction
4096 bytes, the end position the 4096th byte obtains some ki, as potential cut-point,
Such as kiContinuous two the data flow point cutpoint minimums being positioned at sequence number respectively 4096 and 4097 are looked into
Look between unit.Still as a example by Fig. 3, kaThe data flow point cutpoint arrived for current lookup, edge
Next data flow point cutpoint is searched in direction as shown in Figure 3, if it exceeds data block maximum is still
So do not find next data flow point cutpoint, then from kaPoint starts to look into data flow point cutpoint
Direction is looked for reach the some k of data block maximumzAs next data flow point cutpoint, force
Segmentation.
The embodiment of the present invention provides a kind of side based on duplicate removal whois lookup data flow point cutpoint
Method, as shown in Figure 4, including:
Being preset with rule on duplicate removal server 103, described rule is: true for potential cut-point k
Determine M some px, some pxCorresponding window Wx[px-Ax,px+Bx] and window Wx[px-Ax,px+
Bx] corresponding predetermined condition Cx, wherein, x is 1 to M continuous print natural number, M >=2, AxWith
BxFor integer;Wherein, pxSpacing d with potential cut-point kxIndividual data flow point cutpoint is minimum
Searching unit, data flow point cutpoint minimum is searched unit and is represented with U, and in the present embodiment, U=1 is individual
Byte,.In the implementation shown in Fig. 3, about the value of M, one of which realization side
Formula, M*U value is not more than the ultimate range between two the adjacent data flow point cutpoints preset,
The data block greatest length i.e. preset.Judge some pzCorresponding window Wz[pz-Az, pz+Bz]
In at least partly data whether meet predetermined condition Cz, wherein, z is integer, 1≤z≤M, (pz
-Az) and (pz+Bz) represent window W respectivelyzTwo borders.When judging that any one puts pz
Window Wz[pz-Az, pz+BzIn], at least part of data are unsatisfactory for predetermined condition Cz, then from
It is unsatisfactory for the window W of predetermined conditionz[pz-Az, pz+Bz] corresponding some pzSplit along data stream
The point search direction N number of byte of jump, N≤‖ Bz‖+maxx(‖Ax‖+‖(k-px)‖).Its
In, ‖ (k-px) ‖ represent M some pxIn any one point with potential cut-point k between distance,
maxx(‖Ax‖+‖(k-px) ‖) represent M some pxIn any one point with potential cut-point k
Between distance and A corresponding to this pointxThe maximum of absolute value sum;‖Bz‖ represents Wz
[pz-Az, pz+BzB in]zAbsolute value, embodiment below will specifically be introduced N value
Principle.As each window W judged in M windowx[px-Ax, px+BxIn] at least partly
Data meet predetermined condition Cx, the most potential cut-point k is data flow point cutpoints.
Concrete, to current potential cut-point ki, according to described rule, perform following steps:
Step 401: be current potential cut-point k according to described ruleiDetermine a pizAnd described point
pizCorresponding window Wiz[piz-Az,piz+Bz], i and z is integer, and 1≤z≤M;
Step 402: judge described window Wiz[piz-Az,piz+BzIn], at least part of data are the fullest
Foot predetermined condition Cz;
As described window Wiz[piz-Az,piz+BzIn], at least part of data are unsatisfactory for described predetermined bar
Part Cz, from described some pizAlong described data flow point cutpoint search direction jump N number of data stream segmentation
Point minimum searches unit U, and N*U is not more than ‖ Bz‖+maxx(‖Ax‖+‖(ki-pix) ‖),
Obtain new potential cut-point, perform step 401;
As described current potential cut-point kiM window in each window Wix[pix-Ax,
pix+BxIn], at least part of data meet predetermined condition Cx, the most described current potential cut-point kiFor
Data flow point cutpoint.
Further, described rule also includes: at least two point peAnd pf, meet condition Ae=Af,
Be=Bf, Ce=Cf;
Described rule also includes: described at least two point peAnd pf, relative to described potential point
Cutpoint k, searches in the reverse direction at described data flow point cutpoint.
Described rule also includes: described at least two point peAnd pfBetween distance be 1 U.
Judge described window Wiz[piz-Az,piz+BzIn], whether at least part of data meet described pre-
Fixed condition Cz, specifically include:
Random function is used to judge described window Wiz[piz-Az,piz+BzIn], at least part of data are
No meet described predetermined condition Cz。
Described use random function judges described window Wiz[piz-Az,piz+BzAt least partly count in]
According to whether meeting described predetermined condition Cz, it is specially and uses hash function to judge described window Wiz
[piz-Az,piz+BzIn], whether at least part of data meet described predetermined condition Cz。
As described window Wiz[piz-Az,piz+BzIn], at least part of data are unsatisfactory for described predetermined bar
Part Cz, from described some pizAlong described data flow point cutpoint search direction jump N number of data stream segmentation
Point minimum searches unit U, it is thus achieved that described new potential cut-point, according to described rule, for institute
State the some p that new potential cut-point determinesicCorresponding window Wic[pic-Ac,pic+Bc] the left side
Boundary and described window Wiz[piz-Az,piz+Bz] right margin overlap or be described newly potential point
The described some p that cutpoint determinesicCorresponding described window Wic[pic-Ac,pic+Bc] left margin position
In described window Wiz[piz-Az,piz+BzWithin the scope of];Wherein, for described new potential segmentation
The described some p that point determinesicIt is according to described rule, the M determined for described new potential cut-point
The sequence that individual point obtains according to data stream search direction sorts first point.
The embodiment of the present invention at least partly counts in some window in M window by judging
According to whether meeting predetermined condition, search data flow point cutpoint, when at least portion in some window
Divided data is unsatisfactory for predetermined condition, then skip N*U length, and wherein, N*U is not more than ‖
Bz‖+maxx(‖Ax‖+‖(ki-pix) ‖), it is thus achieved that next potential cut-point, improve
Data flow point cutpoint search efficiency.
During data de-duplication, for ensureing that data block size is uniform, average can be considered
According to block (also referred to as average piecemeal) size, i.e. meeting minimum data block size and maximum data
While block size limits, can determine whether average data block size, big to ensure the data block obtained
Little uniformly.Point pxNumber M and some pxCorresponding window Wx[px-Ax, px+BxAt least portion in]
Divided data meets predetermined condition CxProbability, the two factor determines and finds data flow point cutpoint
Probability (representing with P (n)).The former affects the length of jump, and the latter affects the probability of jump,
The two joint effect average mark block size.It is said that in general, when average mark block size is fixed, point
pxNumber M increases, then a single point pxCorresponding window Wx[px-Ax, px+BxAt least portion in]
Divided data meets predetermined condition CxProbability also increase, such as on duplicate removal server 103 preset
Rule be: determine 11 some p for potential cut-point kx, it is natural that x is respectively 1 to 11 continuous print
Number, any one some p in 11 pointsxCorresponding window Wx[px-Ax, px+BxIn] at least partly
Data meet predetermined condition CxProbability be 1/2.And another preset on duplicate removal server 103
Group rule is: 24 the some p selected for potential cut-point kx, x is respectively 1 to 24 continuous print certainly
So number, any one some p in 24 pointsxCorresponding window Wx[px-Ax, px+BxAt least portion in]
Divided data meets predetermined condition CxProbability 3/4.Concrete window Wx[px-Ax, px+BxIn] extremely
Small part data meet predetermined condition CxProbability set and can be found in and judge window Wx[px-Ax, px
+BxIn], whether at least part of data meet predetermined condition CxThe description of part.Point pxNumber M with
Point pxCorresponding window Wx[px-Ax, px+BxIn], at least part of data meet predetermined condition Cx's
Probability the two factor determine P (n), P (n) represent: from data stream original position/data
Flow point cutpoint is searched after n data flow point cutpoint minimum searches unit and is not found data flow point cutpoint
Probability.The calculating process of P (n), actually one multistep is determined about the two factor
Long Fibonacci ordered series of numbers, after will be described in detail.After obtaining P (n), 1-P (n) is data
The distribution function of flow point cutpoint, (1-P (n))-(1-P (n-1))=P (n-1)-P (n), it is n-th
Point finds the probability of data flow point cutpoint, the namely density function of data flow point cutpoint, according to
The density function of data flow point cutpoint just can be with integrationThus try to achieve
The desired length of data flow point cutpoint, i.e. average mark block size, wherein, 4*1024 (byte)
Representing minimum data block length, 12*1024 (byte) represents maximum data block length.
On the basis of data flow point cutpoint as shown in Figure 3 is searched, at the embodiment shown in Fig. 5
In, duplicate removal server 103 is preset with rule, described rule is: true for potential cut-point k
Fixed 11 some px, some pxCorresponding window Wx[px-Ax,px+Bx] (it is called for short window Wx) and window
Mouth Wx[px-Ax,px+Bx] corresponding predetermined condition Cx, wherein, A1=A2=A3=A4=A5=A6=A7
=A8=A9=A10=A11=169, B1=B2=B3=B4=B5=B6=B7=B8=B9=B10=B11=0, and
C1=C2=C3=C4=C5=C6=C7=C8=C9=C10=C11.Wherein, some pxWith potential cut-point
Spacing d of kxIndividual byte, concrete, put p1With 0 byte of spacing of potential cut-point k,
Point p2With 1 byte of spacing of potential cut-point k, put p3Spacing with potential cut-point k
2 bytes, put p4With 3 bytes of spacing of potential cut-point k, put p5With potential cut-point
4 bytes of the spacing of k, put p6With 5 bytes of spacing of potential cut-point k, put p7With
6 bytes of the spacing of potential cut-point k, put p87 words of spacing with potential cut-point k
Joint, puts p9With 8 bytes of spacing of potential cut-point k, put p10And between potential cut-point k
9 bytes of distance, put p11With 10 bytes of spacing of potential cut-point k, and put p2、p3、
p4、p5、p6、p7、p8、p9、p10And p11It is respectively positioned on data relative to potential cut-point k
Flow point cutpoint searches opposite direction.kaFor data flow point cutpoint, the cutpoint of data flow point shown in Fig. 5
Search direction is from left to right, from data flow point cutpoint kaAfter skipping minimum data block 4KB,
Small data block 4KB end position is as next potential cut-point ki, for potential cut-point kiReally
Fixed point pix, in the present embodiment, according to the rule preset on duplicate removal server 103, x is respectively
It is 1 to 11 continuous print natural numbers.In the embodiment shown in Fig. 5, for potential cut-point kiReally
Fixed point is 11, respectively pi1、pi2、pi3、pi4、pi5、pi6、pi7、pi8、pi9、
pi10And pi11, put pi1、pi2、pi3、pi4、pi5、pi6、pi7、pi8、pi9、pi10And pi11Right
The window answered is respectively Wi1[pi1-169,pi1]、Wi2[pi2-169,pi2]、Wi3[pi3-169,pi3]、Wi4
[pi4-169,pi4]、Wi5[pi5-169,pi5]、Wi6[pi6-169,pi6]、Wi7[pi7-169,pi7]、Wi8
[pi8-169,pi8]、Wi9[pi9-169,pi9]、Wi10[pi10-169,pi10] and Wi11[pi11-169,pi11]。
Above-mentioned window is briefly referred to as Wi1、Wi2、Wi3、Wi4、Wi5、Wi6、Wi7、Wi8、Wi9、
Wi10And Wi11.Wherein, some pixWith potential cut-point kiSpacing dxIndividual byte, concrete,
pi1With kiSpacing 0 byte, pi2With kiSpacing 1 byte, pi3With kiSpacing 2 bytes, pi4
With kiSpacing 3 bytes, pi5With kiSpacing 4 bytes, pi6With kiSpacing 5 bytes, pi7With
kiSpacing 6 bytes, pi8With kiSpacing 7 bytes, pi9With kiSpacing 8 bytes, pi10With ki
9 bytes of spacing, pi11With ki10 bytes of spacing, and pi2、pi3、pi4、pi5、pi6、
pi7、pi8、pi9、pi10And pi11Relative to potential cut-point kiIt is respectively positioned on data flow point cutpoint to look into
Look for opposite direction.Judge Wi1[pi1-169,pi1In], whether at least part of data meet predetermined condition C1、
Judge Wi2[pi2-169,pi2In], whether at least part of data meet predetermined condition C2, judge Wi3
[pi3-169,pi3In], whether at least part of data meet predetermined condition C3, judge Wi4[pi4-169,
pi4In], whether at least part of data meet predetermined condition C4, judge Wi5[pi5-169,pi5In] at least
Whether part data meet predetermined condition C5, judge Wi6[pi6-169,pi6At least part of data in]
Whether meet predetermined condition C6, judge Wi7[pi7-169,pi7In], whether at least part of data meet
Predetermined condition C7, judge Wi8[pi8-169,pi8In], whether at least part of data meet predetermined condition
C8, judge Wi9[pi9-169,pi9In], whether at least part of data meet predetermined condition C9, judge
Wi10[pi10-169,pi10In], whether at least part of data meet predetermined condition C10With judge Wi11[pi11
-169,pi11In], whether at least part of data meet predetermined condition C11.When judging window Wi1In extremely
Small part data meet predetermined condition C1, window Wi2In at least partly data meet predetermined condition
C2, window Wi3In at least partly data meet predetermined condition C3, window Wi4In at least partly count
According to meeting predetermined condition C4, window Wi5In at least partly data meet predetermined condition C5, window Wi6
In at least partly data meet predetermined condition C6, window Wi7In at least partly data meet predetermined
Condition C7, window Wi8In at least partly data meet predetermined condition C8, window Wi9In at least portion
Divided data meets predetermined condition C9, window Wi10In at least partly data meet predetermined condition C10With
Window Wi11In at least partly data meet predetermined condition C11Time, the most current potential cut-point kiFor
Data flow point cutpoint.When data at least part of in any one window in 11 windows are unsatisfactory for correspondence
Predetermined condition time, as shown in Figure 6, Wi5[pi5-169,pi5In], at least part of data are unsatisfactory for
Corresponding predetermined condition C5, then from a pi5Along the data flow point cutpoint search direction N number of word of jump
Joint, the most N number of byte is not more than ‖ B5‖+maxx(‖Ax‖+‖(ki-pix) ‖), at Fig. 6
In shown embodiment, N number of byte of jumping is not more than 179 bytes, in the present embodiment,
N=11, obtains next potential cut-point, for potential cut-point kiDifference, here by new
Potential cut-point is expressed as kj.According in the embodiment shown in Fig. 5 at duplicate removal server 103
The upper rule preset, for potential cut-point kjThe point determined is 11, respectively pj1、pj2、pj3、
pj4、pj5、pj6、pj7、pj8、pj9、pj10And pj11, determine a pj1、pj2、pj3、pj4、pj5、
pj6、pj7、pj8、pj9、pj10And pj11Corresponding window is respectively Wj1[pj1-169,pj1]、Wj2[pj2
-169,pj2]、Wj3[pj3-169,pj3]、Wj4[pj4-169,pj4]、Wj5[pj5-169,pj5]、Wj6[pj6
-169,pj6]、Wj7[pj7-169,pj7]、Wj8[pj8-169,pj8]、Wj9[pj9-169,pj9]、
Wj10[pj10-169,pj10] and Wj11[pj11-169,pj11].Wherein, pjxWith potential cut-point kjIt
Spacing dxIndividual byte, concrete, pj1With kjSpacing 0 byte, pj2With kj1 word of spacing
Joint, pj3With kjSpacing 2 bytes, pj4With kjSpacing 3 bytes, pj5With kj4 words of spacing
Joint, pj6With kjSpacing 5 bytes, pj7With kjSpacing 6 bytes, pj8With kj7 words of spacing
Joint, pj9With kjSpacing 8 bytes, pj10With kj9 bytes of spacing, pj11With kjSpacing 10
Byte, and pj1、pj2、pj3、pj4、pj5、pj6、pj7、pj8、pj9、pj10And pj11Relatively
In potential cut-point kjIt is respectively positioned on data flow point cutpoint and searches opposite direction.Embodiment party as shown in Figure 6
In formula, when for potential cut-point kjThe 11st the window W determinedj11[pj11-169,pj11], protecting
Demonstrate,prove potential cut-point kiWith potential cut-point kjBetween scope all within determination range, then exist
In present embodiment, it is necessary to assure window Wj11[pj11-169,pj11] left margin and Wi5[pi5
-169,pi5] right margin pi5Overlap or be positioned at Wi5[pi5-169,pi5Within the scope of], wherein, institute
State potential cut-point kjThe point p determinedj11It is according to described rule, for described potential cut-point kj
M the point determined is according to the point of sequence first in the sequence of data stream search direction acquisition.Therefore,
In this restriction, work as Wi5[pi5-169,pi5In], at least part of data are unsatisfactory for predetermined condition C5,
From pi5The distance jumped along data flow point cutpoint search direction is no more than ‖ B5‖+maxx
(‖Ax‖+‖(ki-pix) ‖), wherein, M=11,11*U are not more than maxx(‖Ax‖+‖(ki
-pix) ‖), therefore, from pi5The distance jumped along data flow point cutpoint search direction is little
In 179.Judge Wj1[pj1-169,pj1In], whether at least part of data meet predetermined condition C1、
Judge Wj2[pj2-169,pj2In], whether at least part of data meet predetermined condition C2, judge Wj3
[pj3-169,pj3In], whether at least part of data meet predetermined condition C3, judge Wj4[pj4-169,
pj4In], whether at least part of data meet predetermined condition C4, judge Wj5[pj5-169,pj5In] extremely
Whether small part data meet predetermined condition C5, judge Wj6[pj6-169,pj6In] at least partly
Whether data meet predetermined condition C6, judge Wj7[pj7-169,pj7In], at least part of data are
No meet predetermined condition C7, judge Wj8[pj8-169,pj8In], whether at least part of data meet
Predetermined condition C8, judge Wj9[pj9-169,pj9In], whether at least part of data meet predetermined bar
Part C9, judge Wj10[pj10-169,pj10In], whether at least part of data meet predetermined condition C10
With judge Wj11[pj11-169,pj11In], whether at least part of data meet predetermined condition C11.When
The most in embodiments of the present invention, it is judged that potential cut-point kaWhether it is also to abide by during data flow point cutpoint
Follow this rule, implement and no longer describe, be referred to judge potential cut-point kiDescription.
When judging window Wj1In at least partly data meet predetermined condition C1, window Wj2In at least partly
Data meet predetermined condition C2, window Wj3In at least partly data meet predetermined condition C3, window
Mouth Wj4In at least partly data meet predetermined condition C4, window Wj5In at least partly data meet
Predetermined condition C5, window Wj6In at least partly data meet predetermined condition C6, window Wj7In extremely
Small part data meet predetermined condition C7, window Wj8In at least partly data meet predetermined condition
C8, window Wj9In at least partly data meet predetermined condition C9, window Wj10In at least partly count
According to meeting predetermined condition C10With window Wj11In at least partly data meet predetermined condition C11Time, then
Current potential cut-point kjFor data flow point cutpoint, kjWith kaBetween data constitute 1 data
Block, simultaneously according to kaIdentical mode skips minimum piecemeal size 4KB, it is thus achieved that next latent
At cut-point, and according to the rule preset on duplicate removal server 103, it is judged that next potential
Whether cut-point is data flow point cutpoints.When judging potential cut-point kjIt not data flow point cutpoints
Time, according to kiJump 11 bytes of identical mode obtain next potential cut-points, and press
Impinge upon the rule preset on duplicate removal server 103 and said method judges next potential cut-point
Whether it is data flow point cutpoints.When the maximum data block exceeding setting does not the most find data stream
During cut-point, then from the end position of maximum data block as force-splitting point.
In the embodiment shown in Fig. 5, according to the rule preset on duplicate removal server 103,
From judging Wi1[pi1-169,pi1In], whether at least part of data meet predetermined condition C1Start, when
Judge Wi1[pi1-169,pi1In], at least part of data meet predetermined condition C1, judge Wi2[pi2
-169,pi2In], at least part of data meet predetermined condition C2, judge Wi3[pi3-169,pi3In]
At least partly data meet predetermined condition C3With judge Wi4[pi4-169,pi4At least partly count in]
According to meeting predetermined condition C4, it is judged that Wi5[pi5-169,pi5In], at least part of data are unsatisfactory for making a reservation for
Condition C5Time, from a pi5Jump 10 bytes along data flow point cutpoint search direction, the
The end position of 10 bytes obtains new potential cut-point, for distinguishing with other potential cut-points,
Here shown as kg, according to the rule preset on duplicate removal server 103, for potential cut-point kg
Determine 11 some pgx, x is respectively 1 to 11 continuous print natural number, respectively pg1、pg2、pg3、
pg4、pg5、pg6、pg7、pg8、pg9、pg10And pg11, determine a pg1、pg2、pg3、pg4、
pg5、pg6、pg7、pg8、pg9、pg10And pg11Corresponding window is respectively Wg1[pg1-169,pg1]、
Wg2[pg2-169,pg2]、Wg3[pg3-169,pg3]、Wg4[pg4-169,pg4]、Wg5[pg5-169,
pg5]、Wg6[pg6-169,pg6]、Wg7[pg7-169,pg7]、Wg8[pg8-169,pg8]、Wg9[pg9
-169,pg9]、Wg10[pg10-169,pg10] and Wg11[pg11-169,pg11].Wherein, pgxWith latent
At cut-point kgSpacing dxIndividual byte, concrete, pg1With kgSpacing 0 byte, pg2With
kgSpacing 1 byte, pg3With kgSpacing 2 bytes, pg4With kgSpacing 3 bytes, pg5With
kgSpacing 4 bytes, pg6With kgSpacing 5 bytes, pg7With kgSpacing 6 bytes, pg8With
kgSpacing 7 bytes, pg9With kgSpacing 8 bytes, pg10With kg9 bytes of spacing, pg11With
kg10 bytes of spacing, and pg2、pg3、pg4、pg5、pg6、pg7、pg8、pg9、pg10
And pg11Relative to potential cut-point kgIt is respectively positioned on data flow point cutpoint and searches opposite direction.Judge Wg1
[pg1-169,pg1In], whether at least part of data meet predetermined condition C1, judge Wg2[pg2-169,
pg2In], whether at least part of data meet predetermined condition C2, judge Wg3[pg3-169,pg3In] extremely
Whether small part data meet predetermined condition C3, judge Wg4[pg4-169,pg4In] at least partly
Whether data meet predetermined condition C4, judge Wg5[pg5-169,pg5In], at least part of data are
No meet predetermined condition C5, judge Wg6[pg6-169,pg6In], whether at least part of data meet
Predetermined condition C6, judge Wg7[pg7-169,pg7In], whether at least part of data meet predetermined bar
Part C7, judge Wg8[pg8-169,pg8In], whether at least part of data meet predetermined condition C8、
Judge Wg9[pg9-169,pg9In], whether at least part of data meet predetermined condition C9, judge
Wg10[pg10-169,pg10In], whether at least part of data meet predetermined condition C10With judge Wg11
[pg11-169,pg11In], whether at least part of data meet predetermined condition C11.Therefore, potential point
Cutpoint kgCorresponding some pg11With potential cut-point kiCorresponding some pi5Overlap, and put pg11Right
The window W answeredg11[pg11-169,pg11] and some pi5Corresponding window Wi5[pi5-169,pi5] overlap,
And C5=C11, therefore, to as potential cut-point ki, when judging Wi5[pi5-169,pi5In] at least
Part data are unsatisfactory for predetermined condition C5Time, from a pi5Along data flow point cutpoint search direction
Jump 10 bytes, it is thus achieved that potential cut-point kgStill do not meet as data flow point cutpoint
Condition.Therefore, if from a pi5Along 10 the byte meetings of jump of data flow point cutpoint search direction
There is double counting, from a pi5Permissible along 11 bytes of data flow point cutpoint search direction jump
Reduce double counting, in hgher efficiency.Therefore improve the speed searching data flow point cutpoint.When
Preset rules midpoint pxCorresponding window Wx[px-Ax,px+BxIn], at least part of data meet pre-
Fixed condition CxProbability when being 1/2, i other words perform jump with the probability of 1/2, the most at most may be used
With 179 bytes of jumping.
In the present embodiment, pre-defined rule is: determine 11 some p for potential cut-point kx, point
pxCorresponding window Wx[px-Ax,px+Bx] and window Wx[px-Ax,px+Bx] corresponding predetermined bar
Part Cx, x is respectively 1 to 11 continuous print natural numbers, wherein, puts pxCorresponding window Wx[px-Ax,
px+BxThe probability that in], at least part of data meet predetermined condition is 1/2, by the two factor
P (n) can be calculated.And A1=A2=A3=A4=A5=A6=A7=A8=A9=A10=A11=169, B1=B2
=B3=B4=B5=B6=B7=B8=B9=B10=B11=0, and C1=C2=C3=C4=C5=C6=C7=
C8=C9=C10=C11, wherein, pxSpacing d with potential cut-point kxIndividual byte, specifically
, p1With 0 byte of spacing of potential cut-point k, p2With 1 byte of spacing of k, p3
With 2 bytes of spacing of k, p4With 3 bytes of spacing of k, p5Spacing 4 with k
Byte, p6With 5 bytes of spacing of k, p7With 6 bytes of spacing of k, p8And between k
7 bytes of distance, p9With 8 bytes of spacing of k, p10With 9 bytes of spacing of k, p11
With 10 bytes of spacing of k, and p2、p3、p4、p5、p6、p7、p8、p9、p10
And p11It is respectively positioned on data flow point cutpoint relative to potential cut-point k and searches opposite direction.The most whether
There are at least part of data in each window in continuous 11 some correspondence windows and be satisfied by pre-
Fixed condition CxJust determine whether potential cut-point k is data flow point cutpoints.From data stream start bit
Put/a upper data flow point cutpoint jumps after minimum 4096 bytes of piecemeal length, to data flow point
Cutpoint searches 10 bytes of opposite direction rollback, finds the 4086th point, the most there is not number
According to flow point cutpoint, so P (4086)=1, the like, P (4087)=1 ... P (4095)
=1.At the 4096th point, i.e. at minimum piecemeal size, with the probability of (1/2) ^11 this
In the window that 11 points are corresponding, in each window, at least partly data meet predetermined condition Cx, because of
There is data flow point cutpoint with the probability of (1/2) ^11 in this, with the probability of 1-(1/2) ^11 not
There is data flow point cutpoint, so P (11)=1-(1/2) ^11.
At n-th, 12 kinds of situations can be divided into carry out recursion P (n).
In 1: the n-th corresponding window of situation, at least part of data are unsatisfactory for the probability of 1/2
Predetermined condition, now n-1 point before n-th does not exist with the probability of P (n-1) continuously
Window corresponding to 11 points in each window at least partly data meet predetermined bar respectively
Part, therefore P (n) comprises 1/2*P (n-1).In n-th corresponding window at least partly
There are 11 points of continuous print in n-1 the point that data are unsatisfactory for before predetermined condition, and at n-th
In corresponding window, in each window, at least partly data meet the situation of predetermined condition respectively
Unrelated with P (n).
In 2: the n-th corresponding windows of situation, at least part of data meet pre-with the probability of 1/2
Fixed condition, in the window that (n-1)th point is corresponding, at least partly data are unsatisfactory for pre-with the probability of 1/2
Fixed condition, now (n-1)th some n-2 point above does not exist with the probability of P (n-2) continuously
Window corresponding to 11 points in each window at least partly data meet predetermined bar respectively
Part, therefore P (n) comprises 1/2*1/2*P (n-2).At least portion in n-th corresponding window
Divided data meets predetermined condition, and in the window that (n-1)th point is corresponding, at least partly data are unsatisfactory for
Predetermined condition, and there is the window that 11 points of continuous print are corresponding in n-2 the point that (n-1)th point is above
In Kou, in each window, at least part of data meet situation and P (n) nothing of predetermined condition respectively
Close.
According to foregoing description, the window that situation 11: the n-th to n-9 point is corresponding at least partly counts
The probability of (1/2) ^10 meets predetermined condition according to this, in the (n-1)th 0 windows that point is corresponding at least
Part data are unsatisfactory for predetermined condition with the probability of 1/2, now the (n-1)th 0 some n-11 above
Each window in the window that 11 points of continuous print are corresponding is there is not in individual point with the probability of P (n-11)
In Kou, at least part of data meet predetermined condition respectively, and therefore P (n) comprises (1/2) ^10*1/2*P
(n-11).In the window of the n-th to n-9 some correspondence, at least partly data are satisfied by predetermined condition,
In the (n-1)th 0 windows that point is corresponding, at least partly data are unsatisfactory for predetermined condition, and the (n-1)th 0
Individual point above n-11 point exists in the window that 11 points of continuous print are corresponding in each window
The situation that at least part of data meet predetermined condition respectively is unrelated with P (n).
In the window that 12: the n-th to n-10 point of situation is corresponding, at least part of data are with (1/2) ^11
Probability meet predetermined condition, this situation is unrelated with P (n).
Therefore, P (n)=1/2*P (n-1)+(1/2) ^2*P (n-2)+...+(1/2)
^11*P(n-11).Another kind of preset rules: determine 24 some p for potential cut-point kx, point
pxCorresponding window Wx[px-Ax,px+Bx] and window Wx[px-Ax,px+Bx] corresponding predetermined bar
Part Cx, x is respectively 1 to 24 continuous print natural numbers, wherein, puts pxCorresponding window Wx[px-Ax,
px+BxIn], at least part of data meet predetermined condition CxProbability be 3/4, by the two because of
Element can calculate P (n).And A1=A2=A3=A4=A5=A6=A7=A8=A9=A10=A11=169, B1
=B2=B3=B4=B5=B6=B7=B8=B9=B10=B11=0, and C1=C2=C3=C4=C5=C6=
C7=C8=C9=...=C22=C23=C24, wherein, pxSpacing d with potential cut-point kxIndividual
Byte, concrete, p1With 0 byte of spacing of potential cut-point k, p2Spacing with k
1 byte, p3With 2 bytes of spacing of k, p4With 3 bytes of spacing of k, p5With k it
4 bytes of spacing, p6With 5 bytes of spacing of k, p7With 6 bytes of spacing of k,
p8With 7 bytes of spacing of k, p9With 8 bytes of spacing of k ... p22Spacing with k
21 bytes, p23With 22 bytes of spacing of k, p24With 23 bytes of spacing of k, and
p2、p3、p4、p5、p6、p7、p8、p9…p22、p23And p24Relative to potential segmentation
Point k is respectively positioned on data flow point cutpoint and searches opposite direction.The most whether there are continuous 24 some correspondences
In each window in window, at least part of data are satisfied by predetermined condition CxJust determine potential
Whether cut-point k is data flow point cutpoints, can be calculated by equation below:
P (4073)=1, P (4074)=1 ... P (, 4095)=1, P (4096)=1-
(3/4) ^24,
P (n)=1/4*P (n-1)+1/4* (3/4) * P (n-2)+...+1/4* (3/4)
^23*P(n-24)。
Through calculating, P (5*1024)=0.78, P (11*1024)=0.17, P (12*1024)=0.13,
I.e. from data stream original position/a data flow point cutpoint find after 12KB the probability with 13%
Do not find data flow point cutpoint yet, force to split.By this probability, try to achieve data stream
The density function of cut-point, through integration try to achieve about averagely from data stream original position/on
One data flow point cutpoint finds data flow point cutpoint when searching 7.6KB, i.e. average mark block length is big
It is about 7.6KB.In the window corresponding with 11 points of continuous print at least part of data with 1/2 probability
Meeting predetermined condition different, tradition CDC algorithm uses a window to meet with the probability of 1/2^12
During condition, the effect of average mark block length 7.6KB can be reached.
On the basis of the data flow point cutpoint shown in Fig. 3 is searched, at the embodiment shown in Fig. 7
In, duplicate removal server 103 is preset with rule, described rule is: true for potential cut-point k
Fixed 11 some px, some pxCorresponding window Wx[px-Ax,px+Bx] and window Wx[px-Ax,px+Bx]
Corresponding predetermined condition Cx, x is respectively 1 to 11 continuous print natural numbers, wherein, puts pxCorresponding
Window Wx[px-Ax,px+BxIn], at least part of data meet predetermined condition CxProbability be
1/2, and A1=A2=A3=A4=A5=A6=A7=A8=A9=A10=A11=169, B1=B2=B3=B4=B5
=B6=B7=B8=B9=B10=B11=0, and C1=C2=C3=C4=C5=C6=C7=C8=C9=C10=
C11, wherein, pxSpacing d with potential cut-point kxIndividual byte, concrete, p1With potential
2 bytes of the spacing of cut-point k, p2With 3 bytes of spacing of k, p3Spacing with k
4 bytes, p4With 5 bytes of spacing of k, p5With 6 bytes of spacing of k, p6With k it
7 bytes of spacing, p7With 8 bytes of spacing of k, p8With 9 bytes of spacing of k,
p9With 10 bytes of spacing of k, p10With 1 byte of spacing of k, p11Spacing with k
0 byte, and p1、p2、p3、p4、p5、p6、p7、p8、p9And p10Relative to latent
It is respectively positioned on data flow point cutpoint at cut-point k and searches opposite direction.kaFor data flow point cutpoint, figure
The cutpoint search direction of data flow point shown in 7 is from left to right, from data flow point cutpoint kaSkip
After minimum data block 4KB, at minimum data block 4KB end position as next potential segmentation
Point ki, for potential cut-point kiDetermine a pix, in the present embodiment, according at duplicate removal server
The rule preset on 103, x is respectively 1 to 11 continuous print natural numbers.The embodiment party shown in Fig. 7
In formula, according to pre-defined rule, for potential cut-point kiThe point determined is 11, respectively pi1、
pi2、pi3、pi4、pi5、pi6、pi7、pi8、pi9、pi10And pi11, put pi1、pi2、pi3、
pi4、pi5、pi6、pi7、pi8、pi9、pi10And pi11Corresponding window is respectively Wi1[pi1-169,
pi1]、Wi2[pi2-169,pi2]、Wi3[pi3-169,pi3]、Wi4[pi4-169,pi4]、Wi5[pi5-169,
pi5]、Wi6[pi6-169,pi6]、Wi7[pi7-169,pi7]、Wi8[pi8-169,pi8]、Wi9[pi9-169,
pi9]、Wi10[pi10-169,pi10] and Wi11[pi11-169,pi11].Wherein, some pixWith potential cut-point
kiSpacing dixIndividual byte, concrete, pi1With kiSpacing 2 bytes, pi2With kiSpacing 3
Individual byte, pi3With kiSpacing 4 bytes, pi4With kiSpacing 5 bytes, pi5With kiSpacing 6
Byte, pi6With kiSpacing 7 bytes, pi7With kiSpacing 8 bytes, pi8With ki9 words of spacing
Joint, pi9With kiSpacing 10 bytes, pi10With ki1 byte of spacing, pi11With ki0 word of spacing
Joint, and pi1、pi2、pi3、pi4、pi5、pi6、pi7、pi8、pi9And pi10Relative to latent
At cut-point kiIt is respectively positioned on data flow point cutpoint and searches opposite direction.Judge Wi1[pi1-169,pi1In]
At least partly whether data meet predetermined condition C1, judge Wi2[pi2-169,pi2In] at least partly
Whether data meet predetermined condition C2, judge Wi3[pi3-169,pi3In], whether at least part of data
Meet predetermined condition C3, judge Wi4[pi4-169,pi4In], whether at least part of data meet predetermined
Condition C4, judge Wi5[pi5-169,pi5In], whether at least part of data meet predetermined condition C5、
Judge Wi6[pi6-169,pi6In], whether at least part of data meet predetermined condition C6, judge Wi7
[pi7-169,pi7In], whether at least part of data meet predetermined condition C7, judge Wi8[pi8-169,
pi8In], whether at least part of data meet predetermined condition C8, judge Wi9[pi9-169,pi9In] extremely
Whether small part data meet predetermined condition C9, judge Wi10[pi10-169,pi10At least partly count in]
According to whether meeting predetermined condition C10With judge Wi11[pi11-169,pi11In], whether at least part of data
Meet predetermined condition C11.When judging window Wi1In at least partly data meet predetermined condition C1、
Window Wi2In at least partly data meet predetermined condition C2, window Wi3In at least partly data full
Foot predetermined condition C3, window Wi4In at least partly data meet predetermined condition C4, window Wi5In
At least partly data meet predetermined condition C5, window Wi6In at least partly data meet predetermined bar
Part C6, window Wi7In at least partly data meet predetermined condition C7, window Wi8In at least partly
Data meet predetermined condition C8, window Wi9In at least partly data meet predetermined condition C9, window
Mouth Wi10In at least partly data meet predetermined condition C10With window Wi11In at least partly data full
Foot predetermined condition C11Time, the most current potential cut-point kiFor data flow point cutpoint.When 11 windows
In time at least partly data are unsatisfactory for the predetermined condition of correspondence in any one window, as shown in Figure 8,
Wi3[pi3-169,pi3In], at least part of data are unsatisfactory for predetermined condition C3, put pi3Along data stream
It is described as a example by cut-point search direction 11 bytes of jump.As shown in Figure 8, when judging W3No
When meeting predetermined condition, with p3For starting point, along data flow point cutpoint search direction jump N
Individual byte, the most N number of byte is not more than ‖ B3‖+maxx(‖Ax‖+‖(ki-pix) ‖),
In embodiment shown in Fig. 6, N number of byte of jumping, it is specially and is not more than 179 bytes, at this
In embodiment, N=11, at the end position of the 11st byte, it is thus achieved that next potential segmentation
Point, for potential cut-point kiDifference, is expressed as k by new potential cut-point herej, according to
The rule preset on duplicate removal server 103, for potential cut-point kjThe point determined is 11,
It is respectively pj1、pj2、pj3、pj4、pj5、pj6、pj7、pj8、pj9、pj10And pj11, determine a little
pj1、pj2、pj3、pj4、pj5、pj6、pj7、pj8、pj9、pj10And pj11Corresponding window is respectively
For Wj1[pj1-169,pj1]、Wj2[pj2-169,pj2]、Wj3[pj3-169,pj3]、Wj4[pj4-169,
pj4]、Wj5[pj5-169,pj5]、Wj6[pj6-169,pj6]、Wj7[pj7-169,pj7]、Wj8[pj8
-169,pj8]、Wj9[pj9-169,pj9]、Wj10[pj10-169,pj10] and Wj11[pj11-169,pj11]。
Wherein, pjxWith potential cut-point kjSpacing dxIndividual byte, concrete, pj1With kjSpacing 2
Individual byte, pj2With kjSpacing 3 bytes, pj3With kjSpacing 4 bytes, pj4With kjSpacing 5
Individual byte, pj5With kjSpacing 6 bytes, pj6With kjSpacing 7 bytes, pj7With kjSpacing 8
Individual byte, pj8With kjSpacing 9 bytes, pj9With kjSpacing 10 bytes, pj10With kjSpacing 1
Individual byte, pj11With kj0 byte of spacing, and pj1、pj2、pj3、pj4、pj5、pj6、pj7、
pj8、pj9And pj10Relative to potential cut-point kjIt is respectively positioned on data flow point cutpoint and searches opposite direction.
Judge Wj1[pj1-169,pj1In], whether at least part of data meet predetermined condition C1, judge Wj2
[pj2-169,pj2In], whether at least part of data meet predetermined condition C2, judge Wj3[pj3-169,
pj3In], whether at least part of data meet predetermined condition C3, judge Wj4[pj4-169,pj4In] extremely
Whether small part data meet predetermined condition C4, judge Wj5[pj5-169,pj5At least partly count in]
According to whether meeting predetermined condition C5, judge Wj6[pj6-169,pj6In], whether at least part of data
Meet predetermined condition C6, judge Wj7[pj7-169,pj7In], whether at least part of data meet pre-
Fixed condition C7, judge Wj8[pj8-169,pj8In], whether at least part of data meet predetermined condition
C8, judge Wj9[pj9-169,pj9In], whether at least part of data meet predetermined condition C9, sentence
Disconnected Wj10[pj10-169,pj10In], whether at least part of data meet predetermined condition C10And judgement
Wj11[pj11-169,pj11In], whether at least part of data meet predetermined condition C11.Certainly at this
In inventive embodiments, it is judged that potential cut-point kaWhen whether being data flow point cutpoint former also in compliance with this
Then, implement and no longer describe, be referred to judge potential cut-point kiDescription.Work as judgement
Window Wj1In at least partly data meet predetermined condition C1, window Wj2In at least partly data full
Foot predetermined condition C2, window Wj3In at least partly data meet predetermined condition C3, window Wj4In
At least partly data meet predetermined condition C4, window Wj5In at least partly data meet predetermined bar
Part C5, window Wj6In at least partly data meet predetermined condition C6, window Wj7In at least partly
Data meet predetermined condition C7, window Wj8In at least partly data meet predetermined condition C8, window
Mouth Wj9In at least partly data meet predetermined condition C9, window Wj10In at least partly data meet
Predetermined condition C10With window Wj11In at least partly data meet predetermined condition C11Time, the most currently dive
At cut-point kjFor data flow point cutpoint, kjWith kaBetween data constitute 1 data block, with
Time according to kaIdentical mode skips minimum piecemeal size 4KB, it is thus achieved that next potential segmentation
Point, and according to the rule preset on duplicate removal server 103, it is judged that next potential cut-point
Whether it is data flow point cutpoints.When judging potential cut-point kjWhen not being data flow point cutpoint, press
According to kiJump 11 bytes of identical mode obtain next potential cut-points, and according to going
The rule preset on weight server 103 and said method judge that whether next potential cut-point is
Data flow point cutpoint.When the maximum data block exceeding setting does not the most find data flow point cutpoint
Time, then from the end position of maximum data block as force-splitting point.Certainly the enforcement of the method
By maximum data block length and the size constraint of the file constituting this data stream, do not repeat them here.
On the basis of the data flow point cutpoint shown in Fig. 3 is searched, at the embodiment shown in Fig. 9
In, duplicate removal server 103 is preset with rule, described rule is: true for potential cut-point k
Fixed 11 some px, some pxCorresponding window Wx[px-Ax,px+Bx] and window Wx[px-Ax,px+Bx]
Corresponding predetermined condition Cx, wherein A1=A2=A3=A4=A5=A6=A7=A8=A9=A10=A11=169,
B1=B2=B3=B4=B5=B6=B7=B8=B9=B10=B11=0, and C1=C2=C3=C4=C5=C6=
C7=C8=C9=C10=C11.Wherein, pxSpacing d with potential cut-point kxIndividual byte, tool
Body, p1With 3 bytes of spacing of potential cut-point k, p2With 2 bytes of spacing of k,
p3With 1 byte of spacing of k, p4With 0 byte of spacing of k, p5Spacing 1 with k
Individual byte, p6With 2 bytes of spacing of k, p7With 3 bytes of spacing of k, p8With k it
4 bytes of spacing, p9With 5 bytes of spacing of k, p10With 6 bytes of spacing of k,
p11With 7 bytes of spacing of k, and p5、p6、p7、p8、p9、p10And p11Relative to
Potential cut-point k is respectively positioned on data flow point cutpoint and searches opposite direction, p1、p2And p3Relative to latent
It is respectively positioned on data flow point cutpoint search direction at cut-point k.kaFor data flow point cutpoint, Fig. 9
Shown in data flow point cutpoint search direction be from left to right, from data flow point cutpoint kaSkip
After small data block 4KB, minimum data block 4KB end position is as next potential cut-point ki,
For potential cut-point kiDetermine a pix, in the present embodiment, according on duplicate removal server 103
The rule preset, x is respectively 1 to 11 continuous print natural numbers.In the embodiment shown in Fig. 9,
For potential cut-point kiThe point determined is 11, respectively pi1、pi2、pi3、pi4、pi5、pi6、
pi7、pi8、pi9、pi10And pi11, put pi1、pi2、pi3、pi4、pi5、pi6、pi7、pi8、
pi9、pi10And pi11Corresponding window is respectively Wi1[pi1-169,pi1]、Wi2[pi2-169,pi2]、Wi3
[pi3-169,pi3]、Wi4[pi4-169,pi4]、Wi5[pi5-169,pi5]、Wi6[pi6-169,pi6]、Wi7
[pi7-169,pi7]、Wi8[pi8-169,pi8]、Wi9[pi9-169,pi9]、Wi10[pi10-169,pi10] and Wi11
[pi11-169,pi11].Wherein, pixWith potential cut-point kiSpacing dxIndividual byte, concrete,
pi1With kiSpacing 3 bytes, pi2With kiSpacing 2 bytes, pi3With kiSpacing 1 byte, pi4
With kiSpacing 0 byte, pi5With kiSpacing 1 byte, pi6With kiSpacing 2 bytes, pi7With
kiSpacing 3 bytes, pi8With kiSpacing 4 bytes, pi9With kiSpacing 5 bytes, pi10With ki
6 bytes of spacing, pi11With ki7 bytes of spacing, and pi5、pi6、pi7、pi8、pi9、pi10
And pi11Relative to potential cut-point kiIt is respectively positioned on data flow point cutpoint and searches opposite direction, pi1、pi2With
pi3Relative to potential cut-point kiIt is respectively positioned on data flow point cutpoint search direction.Judge Wi1[pi1
-169,pi1In], whether at least part of data meet predetermined condition C1, judge Wi2[pi2-169,pi2]
In at least partly data whether meet predetermined condition C2, judge Wi3[pi3-169,pi3At least portion in]
Whether divided data meets predetermined condition C3, judge Wi4[pi4-169,pi4In], at least part of data are
No meet predetermined condition C4, judge Wi5[pi5-169,pi5In], whether at least part of data meet pre-
Fixed condition C5, judge Wi6[pi6-169,pi6In], whether at least part of data meet predetermined condition C6、
Judge Wi7[pi7-169,pi7In], whether at least part of data meet predetermined condition C7, judge Wi8
[pi8-169,pi8In], whether at least part of data meet predetermined condition C8, judge Wi9[pi9-169,
pi9In], whether at least part of data meet predetermined condition C9, judge Wi10[pi10-169,pi10In] extremely
Whether small part data meet predetermined condition C10With judge Wi11[pi11-169,pi11In] at least partly
Whether data meet predetermined condition C11.When judging window Wi1In at least partly data meet predetermined
Condition C1, window Wi2In at least partly data meet predetermined condition C2, window Wi3In at least portion
Divided data meets predetermined condition C3, window Wi4In at least partly data meet predetermined condition C4、
Window Wi5In at least partly data meet predetermined condition C5, window Wi6In at least partly data full
Foot predetermined condition C6, window Wi7In at least partly data meet predetermined condition C7, window Wi8In
At least partly data meet predetermined condition C8, window Wi9In at least partly data meet predetermined bar
Part C9, window Wi10In at least partly data meet predetermined condition C10With window Wi11In at least portion
Divided data meets predetermined condition C11Time, the most current potential cut-point kiFor data flow point cutpoint.When
When in 11 windows, in any one window, at least part of data are unsatisfactory for the predetermined condition of correspondence, as
Shown in Figure 10, Wi7[pi7-169,pi7In], at least part of data are unsatisfactory for the predetermined condition of correspondence,
Then from a pi7Along the data flow point cutpoint search direction N number of byte of jump, the most N number of byte is not
More than ‖ B4‖+maxx(‖Ax‖+‖(ki-pix) ‖), in the embodiment shown in Figure 10,
Jump N number of byte, be specially and be not more than 179 bytes, in the present embodiment, specifically take N=8,
Obtain new potential cut-point, for potential cut-point kiDifference, here by new potential segmentation
Point is expressed as kj, according to the rule preset on duplicate removal server 103 in the embodiment shown in Fig. 9
Then, for potential cut-point kjThe point determined is 11, respectively pj1、pj2、pj3、pj4、pj5、
pj6、pj7、pj8、pj9、pj10And pj11, determine a pj1、pj2、pj3、pj4、pj5、pj6、pj7、
pj8、pj9、pj10And pj11Corresponding window is respectively Wj1[pj1-169,pj1]、Wj2[pj2-169,
pj2]、Wj3[pj3-169,pj3]、Wj4[pj4-169,pj4]、Wj5[pj5-169,pj5]、Wj6[pj6-169,
pj6]、Wj7[pj7-169,pj7]、Wj8[pj8-169,pj8]、Wj9[pj9-169,pj9]、Wj10[pj10
-169,pj10] and Wj11[pj11-169,pj11].Wherein, pjxWith potential cut-point kjSpacing
dxIndividual byte, concrete, pj1With kjSpacing 3 bytes, pj2With kjSpacing 2 bytes, pj3
With kjSpacing 1 byte, pj4With kjSpacing 0 byte, pj5With kjSpacing 1 byte, pj6With
kjSpacing 2 bytes, pj7With kjSpacing 3 bytes, pj8With kjSpacing 4 bytes, pj9With kj
Spacing 5 bytes, pj10With kj6 bytes of spacing, pj11With kj7 bytes of spacing, and pj5、
pj6、pj7、pj8、pj9、pj10And pj11Relative to potential cut-point kjIt is respectively positioned on the segmentation of data stream
Point searches opposite direction, pj1、pj2And pj3Relative to potential cut-point kjIt is respectively positioned on the segmentation of data stream
Point search direction.Judge Wj1[pj1-169,pj1In], whether at least part of data meet predetermined condition
C1, judge Wj2[pj2-169,pj2In], whether at least part of data meet predetermined condition C2, judge
Wj3[pj3-169,pj3In], whether at least part of data meet predetermined condition C3, judge Wj4[pj4
-169,pj4In], whether at least part of data meet predetermined condition C4, judge Wj5[pj5-169,pj5]
In at least partly data whether meet predetermined condition C5, judge Wj6[pj6-169,pj6In] at least
Whether part data meet predetermined condition C6, judge Wj7[pj7-169,pj7At least partly count in]
According to whether meeting predetermined condition C7, judge Wj8[pj8-169,pj8In], whether at least part of data
Meet predetermined condition C8, judge Wj9[pj9-169,pj9In], whether at least part of data meet pre-
Fixed condition C9, judge Wj10[pj10-169,pj10In], whether at least part of data meet predetermined bar
Part C10With judge Wj11[pj11-169,pj11In], whether at least part of data meet predetermined condition C11。
The most in embodiments of the present invention, it is judged that potential cut-point kaWhen whether being data flow point cutpoint also
Follow this principle, implement and no longer describe, be referred to judge potential cut-point kiDescription.
When judging window Wj1In at least partly data meet predetermined condition C1, window Wj2In at least partly
Data meet predetermined condition C2, window Wj3In at least partly data meet predetermined condition C3, window
Mouth Wj4In at least partly data meet predetermined condition C4, window Wj5In at least partly data meet
Predetermined condition C5, window Wj6In at least partly data meet predetermined condition C6, window Wj7In extremely
Small part data meet predetermined condition C7, window Wj8In at least partly data meet predetermined condition
C8, window Wj9In at least partly data meet predetermined condition C9, window Wj10In at least partly count
According to meeting predetermined condition C10With window Wj11In at least partly data meet predetermined condition C11Time, then
Current potential cut-point kjFor data flow point cutpoint, kjWith kaBetween data constitute 1 data
Block, simultaneously according to kaIdentical mode skips minimum piecemeal size 4KB, it is thus achieved that next latent
At cut-point, and according to the rule preset on duplicate removal server 103, it is judged that next potential
Whether cut-point is data flow point cutpoints.When judging potential cut-point kjIt not data flow point cutpoints
Time, according to kiJump 8 bytes of identical mode obtain next potential cut-points, and press
Impinge upon the rule preset on duplicate removal server 103 and said method judges next potential cut-point
Whether it is data flow point cutpoints.When the maximum data block exceeding setting does not the most find data stream
During cut-point, then from the end position of maximum data block as force-splitting point.
On the basis of the data flow point cutpoint shown in Fig. 3 is searched, the embodiment party shown in Figure 11
In formula, being preset with rule on duplicate removal server 103, described rule is: for potential cut-point k
Determine 11 some px, some pxCorresponding window Wx[px-Ax,px+Bx] and window Wx[px-Ax,px+
Bx] corresponding predetermined condition Cx, wherein A1=A2=A3=A4=A5=A6=A7=A8=A9=A10
=169, A11=182, B1=B2=B3=B4=B5=B6=B7=B8=B9=B10=B11=0, and C1=C2
=C3=C4=C5=C6=C7=C8=C9=C10≠C11.Wherein, pxSpacing with potential cut-point k
From dxIndividual byte, concrete, p1With 0 byte of spacing of potential cut-point k, p2With k it
1 byte of spacing, p3With 2 bytes of spacing of k, p4With 3 bytes of spacing of k,
p5With 4 bytes of spacing of k, p6With 5 bytes of spacing of k, p7Spacing 6 with k
Individual byte, p8With 7 bytes of spacing of k, p9With 8 bytes of spacing of k, p10With k it
1 byte of spacing, p11With 3 bytes of spacing of k, and, p2、p3、p4、p5、
p6、p7、p8And p9It is respectively positioned on data flow point cutpoint relative to potential cut-point k and searches opposite direction,
p10And p11It is respectively positioned on data flow point cutpoint search direction relative to potential cut-point k.kaFor number
According to flow point cutpoint, the cutpoint search direction of data flow point shown in Figure 11 is from left to right, from data
Flow point cutpoint kaAfter skipping minimum data block 4KB, minimum data block 4KB end position as under
One potential cut-point ki, for potential cut-point kiDetermine a pix, in the present embodiment, according to
The rule preset on duplicate removal server 103, x is respectively 1 to 11 continuous print natural numbers.At figure
In embodiment shown in 11, for potential cut-point kiThe point determined is 11, respectively pi1、
pi2、pi3、pi4、pi5、pi6、pi7、pi8、pi9、pi10And pi11, put pi1、pi2、pi3、
pi4、pi5、pi6、pi7、pi8、pi9、pi10And pi11Corresponding window is respectively Wi1[pi1-169,
pi1]、Wi2[pi2-169,pi2]、Wi3[pi3-169,pi3]、Wi4[pi4-169,pi4]、Wi5[pi5-169,
pi5]、Wi6[pi6-169,pi6]、Wi7[pi7-169,pi7]、Wi8[pi8-169,pi8]、Wi9[pi9-169,
pi9]、Wi10[pi10-169,pi10] and Wi11[pi11-182,pi11].Wherein, pixWith potential cut-point kiIt
Spacing dxIndividual byte, concrete, pi1With kiSpacing 0 byte, pi2With ki1 word of spacing
Joint, pi3With kiSpacing 2 bytes, pi4With kiSpacing 3 bytes, pi5With ki4 bytes of spacing,
pi6With kiSpacing 5 bytes, pi7With kiSpacing 6 bytes, pi8With kiSpacing 7 bytes, pi9
With kiSpacing 8 bytes, pi10With ki1 byte of spacing, pi11With ki3 bytes of spacing, and
pi2、pi3、pi4、pi5、pi6、pi7、pi8And pi9Relative to potential cut-point kiIt is respectively positioned on number
Opposite direction, p is searched according to flow point cutpointi10And pi11Relative to potential cut-point kiIt is respectively positioned on data stream
Cut-point search direction.Judge Wi1[pi1-169,pi1In], whether at least part of data meet predetermined
Condition C1, judge Wi2[pi2-169,pi2In], whether at least part of data meet predetermined condition C2、
Judge Wi3[pi3-169,pi3In], whether at least part of data meet predetermined condition C3, judge Wi4
[pi4-169,pi4In], whether at least part of data meet predetermined condition C4, judge Wi5[pi5-169,
pi5In], whether at least part of data meet predetermined condition C5, judge Wi6[pi6-169,pi6In] extremely
Whether small part data meet predetermined condition C6, judge Wi7[pi7-169,pi7At least partly count in]
According to whether meeting predetermined condition C7, judge Wi8[pi8-169,pi8In], at least part of data are the fullest
Foot predetermined condition C8, judge Wi9[pi9-169,pi9In], whether at least part of data meet predetermined bar
Part C9, judge Wi10[pi10-169,pi10In], whether at least part of data meet predetermined condition C10With
Judge Wi11[pi11-169,pi11In], whether at least part of data meet predetermined condition C11.Work as judgement
Window Wi1In at least partly data meet predetermined condition C1, window Wi2In at least partly data full
Foot predetermined condition C2, window Wi3In at least partly data meet predetermined condition C3, window Wi4In
At least partly data meet predetermined condition C4, window Wi5In at least partly data meet predetermined bar
Part C5, window Wi6In at least partly data meet predetermined condition C6, window Wi7In at least partly
Data meet predetermined condition C7, window Wi8In at least partly data meet predetermined condition C8, window
Mouth Wi9In at least partly data meet predetermined condition C9, window Wi10In at least partly data meet
Predetermined condition C10With window Wi11In at least partly data meet predetermined condition C11Time, the most currently dive
At cut-point kiFor data flow point cutpoint.When judging window Wi11In at least partly data be unsatisfactory for
Predetermined condition C11Time, then from potential cut-point kiAlong data flow point cutpoint search direction jump 1
Individual byte, obtains new potential cut-point, for potential cut-point kiDifference, here by new
Potential cut-point is expressed as kj.Work as Wi1、Wi2、Wi3、Wi4、Wi5、Wi6、Wi7、Wi8、Wi9
And Wi10In 10 windows, in any one window, at least part of data are unsatisfactory for the predetermined condition of correspondence
Time, as shown in figure 12, Wi4[pi4-169,pi4], then from a pi4Look into along data flow point cutpoint
Looking for the N number of byte of direction jump, the most N number of byte is not more than ‖ B4‖+maxx(‖
Ax‖+‖(ki-pix) ‖), in the embodiment shown in Figure 12, N number of byte of jumping, specifically
For no more than 179, in the present embodiment, specifically take N=9, obtain new potential cut-point,
For with potential cut-point kiDifference, is expressed as k by new potential cut-point herej, according to Figure 11
The rule preset on duplicate removal server 103 in shown embodiment, for potential cut-point kj
The point determined is 11, respectively pj1、pj2、pj3、pj4、pj5、pj6、pj7、pj8、pj9、pj10
And pj11, determine a pj1、pj2、pj3、pj4、pj5、pj6、pj7、pj8、pj9、pj10And pj11Right
The window answered is respectively Wj1[pj1-169,pj1]、Wj2[pj2-169,pj2]、Wj3[pj3-169,pj3]、
Wj4[pj4-169,pj4]、Wj5[pj5-169,pj5]、Wj6[pj6-169,pj6]、Wj7[pj7-169,
pj7]、Wj8[pj8-169,pj8]、Wj9[pj9-169,pj9]、Wj10[pj10-169,pj10] and Wj11
[pj8-182,pj8].Wherein, pjxWith potential cut-point kjSpacing dxIndividual byte, specifically
, pj1With kjSpacing 0 byte, pj2With kjSpacing 1 byte, pj3With kj2 words of spacing
Joint, pj4With kjSpacing 3 bytes, pj5With kjSpacing 4 bytes, pj6With kj5 words of spacing
Joint, pj7With kjSpacing 6 bytes, pj8With kjSpacing 7 bytes, pj9With kj8 words of spacing
Joint, pj10With kj1 byte of spacing, pj11With kj3 bytes of spacing, and pj2、pj3、pj4、
pj5、pj6、pj7、pj8And pj9Relative to potential cut-point kjIt is respectively positioned on data flow point cutpoint to search
Opposite direction, pj10And pj11Relative to potential cut-point kjIt is respectively positioned on data flow point cutpoint search direction.
Judge Wj1[pj1-169,pj1In], whether at least part of data meet predetermined condition C1, judge Wj2
[pj2-169,pj2In], whether at least part of data meet predetermined condition C2, judge Wj3[pj3-169,
pj3In], whether at least part of data meet predetermined condition C3, judge Wj4[pj4-169,pj4In] extremely
Whether small part data meet predetermined condition C4, judge Wj5[pj5-169,pj5At least partly count in]
According to whether meeting predetermined condition C5, judge Wj6[pj6-169,pj6In], whether at least part of data
Meet predetermined condition C6, judge Wj7[pj7-169,pj7In], whether at least part of data meet pre-
Fixed condition C7, judge Wj8[pj8-169,pj8In], whether at least part of data meet predetermined condition
C8, judge Wj9[pj9-169,pj9In], whether at least part of data meet predetermined condition C9, sentence
Disconnected Wj10[pj10-169,pj10In], whether at least part of data meet predetermined condition C10And judgement
Wj11[pj11-182,pj11In], whether at least part of data meet predetermined condition C11.Certainly at this
In bright embodiment, it is judged that potential cut-point kaAlso in compliance with this principle when whether being data flow point cutpoint,
Implement and no longer describe, be referred to judge potential cut-point kiDescription.When judging window
Wj1In at least partly data meet predetermined condition C1, window Wj2In at least partly data meet pre-
Fixed condition C2, window Wj3In at least partly data meet predetermined condition C3, window Wj4In at least
Part data meet predetermined condition C4, window Wj5In at least partly data meet predetermined condition C5、
Window Wj6In at least partly data meet predetermined condition C6, window Wj7In at least partly data full
Foot predetermined condition C7, window Wj8In at least partly data meet predetermined condition C8, window Wj9In
At least partly data meet predetermined condition C9, window Wj10In at least partly data meet predetermined bar
Part C10With window Wj11In at least partly data meet predetermined condition C11Time, the most current potential segmentation
Point kjFor data flow point cutpoint, kjWith kaBetween data constitute 1 data block, simultaneously according to
With kaIdentical mode skips minimum piecemeal size 4KB, it is thus achieved that next potential cut-point, and
According to the rule preset on duplicate removal server 103, it is judged that whether next potential cut-point is
Data flow point cutpoint.When judging potential cut-point kjWhen not being data flow point cutpoint, according to ki
Identical mode obtains next potential cut-point, and presets according on duplicate removal server 103
Rule and said method judge whether next potential cut-point is data flow point cutpoints.When super
Cross the maximum data block set when the most not finding data flow point cutpoint, then from maximum data block
End position as force-splitting point.
On the basis of the data flow point cutpoint shown in Fig. 3 is searched, the embodiment party shown in Figure 13
In formula, being preset with rule on duplicate removal server 103 is: determine 11 points for potential cut-point k
px, some pxCorresponding window Wx[px-Ax,px+Bx] and window Wx[px-Ax,px+Bx] corresponding
Predetermined condition Cx, x is respectively 1 to 11 continuous print natural numbers, wherein, puts pxCorresponding window Wx
[px-Ax,px+BxThe probability that in], at least part of data meet predetermined condition is 1/2, and A1=A2
=A3=A4=A5=A6=A7=A8=A9=A10=A11=169, B1=B2=B3=B4=B5=B6=B7=B8=B9
=B10=B11=0, and C1=C2=C3=C4=C5=C6=C7=C8=C9=C10=C11, wherein, px
Spacing d with potential cut-point kxIndividual byte, concrete, p1And between potential cut-point k
0 byte of distance, p2With 2 bytes of spacing of k, p3With 4 bytes of spacing of k, p4
With 6 bytes of spacing of k, p5With 8 bytes of spacing of k, p6Spacing 10 with k
Individual byte, p7With 12 bytes of spacing of k, p8With 14 bytes of spacing of k, p9With k
16 bytes of spacing, p10With 18 bytes of spacing of k, p11Spacing 20 with k
Byte, and p2、p3、p4、p5、p6、p7、p8、p9、p10And p11Relative to potential
Cut-point k is respectively positioned on data flow point cutpoint and searches opposite direction.kaFor data flow point cutpoint, Figure 13
Shown in data flow point cutpoint search direction be from left to right, from data flow point cutpoint kaSkip
After small data block 4KB, at minimum data block 4KB end position as next potential cut-point
ki, for potential cut-point kiDetermine a pix, in the present embodiment, according at duplicate removal server 103
The upper rule preset, x is respectively 1 to 11 continuous print natural numbers.At the embodiment shown in Figure 13
In, according to pre-defined rule, for potential cut-point kiThe point determined is 11, respectively pi1、pi2、
pi3、pi4、pi5、pi6、pi7、pi8、pi9、pi10And pi11, put pi1、pi2、pi3、pi4、
pi5、pi6、pi7、pi8、pi9、pi10And pi11Corresponding window is respectively Wi1[pi1-169,pi1]、
Wi2[pi2-169,pi2]、Wi3[pi3-169,pi3]、Wi4[pi4-169,pi4]、Wi5[pi5-169,pi5]、
Wi6[pi6-169,pi6]、Wi7[pi7-169,pi7]、Wi8[pi8-169,pi8]、Wi9[pi9-169,pi9]、
Wi10[pi10-169,pi10] and Wi11[pi11-169,pi11].Wherein, pixWith potential cut-point kiSpacing
From dxIndividual byte, concrete, pi1With kiSpacing 0 byte, pi2With kiSpacing 2 bytes, pi3
With kiSpacing 4 bytes, pi4With kiSpacing 6 bytes, pi5With kiSpacing 8 bytes, pi6With
kiSpacing 10 bytes, pi7With kiSpacing 12 bytes, pi8With kiSpacing 14 bytes, pi9With
kiSpacing 16 bytes, pi10With ki18 bytes of spacing, pi11With ki20 bytes of spacing, and
And pi2、pi3、pi4、pi5、pi6、pi7、pi8、pi9、pi10And pi11Relative to potential segmentation
Point kiIt is respectively positioned on data flow point cutpoint and searches opposite direction.Judge Wi1[pi1-169,pi1In] at least partly
Whether data meet predetermined condition C1, judge Wi2[pi2-169,pi2In], whether at least part of data
Meet predetermined condition C2, judge Wi3[pi3-169,pi3In], whether at least part of data meet predetermined
Condition C3, judge Wi4[pi4-169,pi4In], whether at least part of data meet predetermined condition C4、
Judge Wi5[pi5-169,pi5In], whether at least part of data meet predetermined condition C5, judge Wi6
[pi6-169,pi6In], whether at least part of data meet predetermined condition C6, judge Wi7[pi7-169,
pi7In], whether at least part of data meet predetermined condition C7, judge Wi8[pi8-169,pi8In] at least
Whether part data meet predetermined condition C8, judge Wi9[pi9-169,pi9At least part of data in]
Whether meet predetermined condition C9, judge Wi10[pi10-169,pi10In], whether at least part of data meet
Predetermined condition C10With judge Wi11[pi11-169,pi11In], whether at least part of data meet predetermined bar
Part C11.When judging window Wi1In at least partly data meet predetermined condition C1, window Wi2In extremely
Small part data meet predetermined condition C2, window Wi3In at least partly data meet predetermined condition
C3, window Wi4In at least partly data meet predetermined condition C4, window Wi5In at least partly count
According to meeting predetermined condition C5, window Wi6In at least partly data meet predetermined condition C6, window
Wi7In at least partly data meet predetermined condition C7, window Wi8In at least partly data meet pre-
Fixed condition C8, window Wi9In at least partly data meet predetermined condition C9, window Wi10In at least
Part data meet predetermined condition C10With window Wi11In at least partly data meet predetermined condition
C11Time, the most current potential cut-point kiFor data flow point cutpoint.When any one window in 11 windows
When in Kou, at least part of data are unsatisfactory for the predetermined condition of correspondence, as shown in figure 14, Wi4[pi4
-169,pi4In], at least part of data are unsatisfactory for predetermined condition C4, then next potential segmentation is selected
Point, for potential cut-point kiDifference, here shown as kj, kjIt is positioned at kiThe right, and kj
With ki1 byte of spacing.As shown in figure 14, according to the rule preset on duplicate removal server 103,
For potential cut-point kjDetermine 11 points, respectively pj1、pj2、pj3、pj4、pj5、pj6、pj7、
pj8、pj9、pj10And pj11, determine a pj1、pj2、pj3、pj4、pj5、pj6、pj7、pj8、pj9、
pj10And pj11Corresponding window is respectively Wj1[pj1-169,pj1]、Wj2[pj2-169,pj2]、Wj3
[pj3-169,pj3]、Wj4[pj4-169,pj4]、Wj5[pj5-169,pj5]、Wj6[pj6-169,pj6]、
Wj7[pj7-169,pj7]、Wj8[pj8-169,pj8]、Wj9[pj9-169,pj9]、Wj10[pj10-169,
pj10] and Wj11[pj11-169,pj11], wherein, A1=A2=A3=A4=A5=A6=A7=A8=A9=A10=
A11=169, B1=B2=B3=B4=B5=B6=B7=B8=B9=B10=B11=0, and C1=C2=C3=C4
=C5=C6=C7=C8=C9=C10=C11.Wherein, pjxWith potential cut-point kjSpacing dxIndividual
Byte, concrete, pj1With kjSpacing 0 byte, pj2With kjSpacing 2 bytes, pj3With kj
Spacing 4 bytes, pj4With kjSpacing 6 bytes, pj5With kjSpacing 8 bytes, pj6With kjBetween
Away from 10 bytes, pj7With kjSpacing 12 bytes, pj8With kjSpacing 14 bytes, pj9With kj
Spacing 16 bytes, pj10With kj18 bytes of spacing, pj11With kj20 bytes of spacing, and
pj2、pj3、pj4、pj5、pj6、pj7、pj8、pj9、pj10And pj11Relative to potential cut-point kjAll
It is positioned at data flow point cutpoint and searches opposite direction.Judge Wj1[pj1-169,pj1At least part of data in]
Whether meet predetermined condition C1, judge Wj2[pj2-169,pj2In], whether at least part of data meet
Predetermined condition C2, judge Wj3[pj3-169,pj3In], whether at least part of data meet predetermined condition
C3, judge Wj4[pj4-169,pj4In], whether at least part of data meet predetermined condition C4, judge
Wj5[pj5-169,pj5In], whether at least part of data meet predetermined condition C5, judge Wj6[pj6
-169,pj6In], whether at least part of data meet predetermined condition C6, judge Wj7[pj7-169,
pj7In], whether at least part of data meet predetermined condition C7, judge Wj8[pj8-169,pj8In] extremely
Whether small part data meet predetermined condition C8, judge Wj9[pj9-169,pj9In] at least partly
Whether data meet predetermined condition C9, judge Wj10[pj10-169,pj10In], at least part of data are
No meet predetermined condition C10With judge Wj11[pj11-169,pj11In], at least part of data are the fullest
Foot predetermined condition C11.When judging window Wj1In at least partly data meet predetermined condition C1, window
Mouth Wj2In at least partly data meet predetermined condition C2, window Wj3In at least partly data meet
Predetermined condition C3, window Wj4In at least partly data meet predetermined condition C4, window Wj5In extremely
Small part data meet predetermined condition C5, window Wj6In at least partly data meet predetermined condition
C6, window Wj7In at least partly data meet predetermined condition C7, window Wj8In at least partly count
According to meeting predetermined condition C8, window Wi9In at least partly data meet predetermined condition C9, window
Wj10In at least partly data meet predetermined condition C10With window Wj11In at least partly data meet
Predetermined condition C11Time, the most current potential cut-point kjFor data flow point cutpoint.When judging window
Wj1、Wj2、Wj3、Wj4、Wj5、Wj6、Wj7、Wj8、Wj9、Wj10And Wj11In any one
When in window, at least part of data are unsatisfactory for predetermined condition, as shown in figure 15, Wj3[pj3-169,
pj3In], at least part of data are unsatisfactory for predetermined condition C3Time, put pi4Relative to data flow point cutpoint
Search direction is positioned at a pj3The left side, from a pi4Along data flow point cutpoint search direction jump 21
Individual byte, it is thus achieved that next potential cut-point, for potential cut-point ki、kjDistinguish, table
It is shown as kl.According in Figure 13 institute embodiment on duplicate removal server 103 preset rule, for
Potential cut-point klThe point determined is 11, respectively pl1、pl2、pl3、pl4、pl5、pl6、
pl7、pl8、pl9、pl10And pl11, put pl1、pl2、pl3、pl4、pl5、pl6、pl7、pl8、
pl9、pl10And pl11Corresponding window is respectively Wl1[pl1-169,pl1]、Wl2[pl2-169,pl2]、
Wl3[pl3-169,pl3]、Wl4[pl4-169,pl4]、Wl5[pl5-169,pl5]、Wl6[pl6-169,
pl6]、Wl7[pl7-169,pl7]、Wl8[pl8-169,pl8]、Wl9[pl9-169,pl9]、Wl10[pl10
-169,pl10] and Wl11[pl11-169,pl11], wherein, plxWith potential cut-point klSpacing dx
Individual byte, concrete, pl1With potential cut-point kl0 byte of spacing, pl2With klBetween
2 bytes of distance, pl3With kl4 bytes of spacing, pl4With kl6 bytes of spacing,
pl5With kl8 bytes of spacing, pl6With kl10 bytes of spacing, pl7With klSpacing
From 12 bytes, pl8With kl14 bytes of spacing, pl9With kl16 bytes of spacing,
pl10With kl18 bytes of spacing, pl11With kl20 bytes of spacing, and pl2、pl3、
pl4、pl5、pl6、pl7、pl8、pl9、pl10And pl11Relative to potential cut-point klIt is respectively positioned on
Data flow point cutpoint searches opposite direction.Judge Wl1[pl1-169,pl1In], whether at least part of data
Meet predetermined condition C1, judge Wl2[pl2-169,pl2In], whether at least part of data meet pre-
Fixed condition C2, judge Wl3[pl3-169,pl3In], whether at least part of data meet predetermined condition
C3, judge Wl4[pl4-169,pl4In], whether at least part of data meet predetermined condition C4, sentence
Disconnected Wl5[pl5-169,pl5In], whether at least part of data meet predetermined condition C5, judge Wl6[pl6
-169,pl6In], whether at least part of data meet predetermined condition C6, judge Wl7[pl7-169,pl7]
In at least partly data whether meet predetermined condition C7, judge Wl8[pl8-169,pl8In] at least
Whether part data meet predetermined condition C8, judge Wl9[pl9-169,pl9At least partly count in]
According to whether meeting predetermined condition C9, judge Wl10[pl10-169,pl10In], whether at least part of data
Meet predetermined condition C10With judge Wl11[pl11-169,pl11In], whether at least part of data meet
Predetermined condition C11.When judging window Wl1In at least partly data meet predetermined condition C1, window
Wl2In at least partly data meet predetermined condition C2, window Wl3In at least partly data meet pre-
Fixed condition C3, window Wl4In at least partly data meet predetermined condition C4, window Wl5In at least
Part data meet predetermined condition C5, window Wl6In at least partly data meet predetermined condition C6、
Window Wl7In at least partly data meet predetermined condition C7, window Wl8In at least partly data full
Foot predetermined condition C8, window Wl9In at least partly data meet predetermined condition C9, window Wl10In
At least partly data meet predetermined condition C10With window Wl11In at least partly data meet predetermined
Condition C11Time, the most current potential cut-point klFor data flow point cutpoint.As window Wl1、Wl2、
Wl3、Wl4、Wl5、Wl6、Wl7、Wl8、Wl9、Wl10And Wl11At least portion in middle either window
When divided data is unsatisfactory for predetermined condition, select next potential cut-point, for potential cut-point
ki、kjAnd klDifference, is expressed as km, kmIt is positioned at klThe right, and kmWith kl1 byte of spacing.
The rule preset on duplicate removal server 103 according to embodiment illustrated in fig. 13, for potential cut-point
kmThe point determined is 11, respectively pm1、pm2、pm3、pm4、pm5、pm6、pm7、pm8、
pm9、pm10And pm11, put pm1、pm2、pm3、pm4、pm5、pm6、pm7、pm8、pm9、
pm10And pm11Corresponding window is respectively Wm1[pm1-169,pm1]、Wm2[pm2-169,pm2]、
Wm3[pm3-169,pm3]、Wm4[pm4-169,pm4]、Wm5[pm5-169,pm5]、Wm6[pm6-169,
pm6]、Wm7[pm7-169,pm7]、Wm8[pm8-169,pm8]、Wm9[pm9-169,pm9]、Wm10
[pm10-169,pm10] and Wm11[pm11-169,pm11], wherein, pmxWith potential cut-point kmIt
Spacing dxIndividual byte, concrete, pm1With potential cut-point km0 byte of spacing, pm2
With km2 bytes of spacing, pm3With km4 bytes of spacing, pm4With kmSpacing
6 bytes, pm5With km8 bytes of spacing, pm6With km10 bytes of spacing, pm7
With km12 bytes of spacing, pm8With km14 bytes of spacing, pm9With kmSpacing
From 16 bytes, pm10With km18 bytes of spacing, pm11With km20 words of spacing
Joint, and pm2、pm3、pm4、pm5、pm6、pm7、pm8、pm9、pm10And pm11Relatively
In potential cut-point kmIt is respectively positioned on data flow point cutpoint and searches opposite direction.Judge Wm1[pm1-169,
pm1In], whether at least part of data meet predetermined condition C1, judge Wm2[pm2-169,pm2In]
At least partly whether data meet predetermined condition C2, judge Wm3[pm3-169,pm3At least portion in]
Whether divided data meets predetermined condition C3, judge Wm4[pm4-169,pm4At least part of data in]
Whether meet predetermined condition C4, judge Wm5[pm5-169,pm5In], at least part of data are the fullest
Foot predetermined condition C5, judge Wm6[pm6-169,pm6In], whether at least part of data meet predetermined
Condition C6, judge Wm7[pm7-169,pm7In], whether at least part of data meet predetermined condition C7、
Judge Wm8[pm8-169,pm8In], whether at least part of data meet predetermined condition C8, judge Wm9
[pm9-169,pm9In], whether at least part of data meet predetermined condition C9, judge Wm10[pm10
-169,pm10In], whether at least part of data meet predetermined condition C10With judge Wm11[pm11-169,
pm11In], whether at least part of data meet predetermined condition C11.When judging window Wm1In at least portion
Divided data meets predetermined condition C1, window Wm2In at least partly data meet predetermined condition C2、
Window Wm3In at least partly data meet predetermined condition C3, window Wm4In at least partly data full
Foot predetermined condition C4, window Wm5In at least partly data meet predetermined condition C5, window Wm6In
At least partly data meet predetermined condition C6, window Wm7In at least partly data meet predetermined bar
Part C7, window Wm8In at least partly data meet predetermined condition C8, window Wm9In at least partly
Data meet predetermined condition C9, window Wm10In at least partly data meet predetermined condition C10And window
Mouth Wm11In at least partly data meet predetermined condition C11Time, the most current potential cut-point kmFor number
According to flow point cutpoint.When data at least part of in any one window are unsatisfactory for predetermined condition, then press
Jump is performed, to obtain next potential cut-point and to determine whether according to previously described scheme
Data flow point cutpoint.
Embodiments provide one and judge window Wiz[piz-Az,piz+BzAt least portion in]
Whether divided data meets predetermined condition CzMethod, in the present embodiment use random function judge window
Mouth Wiz[piz-Az,piz+BzIn], whether at least part of data meet predetermined condition Cz, with Fig. 5
As a example by shown embodiment, according to the rule preset on duplicate removal server 103, for potential
Cut-point kiDetermine a pi1And some pi1Corresponding window Wi1[pi1-169,pi1], it is judged that Wi1[pi1-169,
pi1In], whether at least part of data meet predetermined condition C1, as shown in figure 16, Wi1Represent window
Mouth Wi1[pi1-169,pi1], for judging Wi1[pi1-169,pi1In], whether at least part of data meet pre-
Fixed condition C1, select 5 bytes, 1 byte that in Figure 16, " ■ " expression selects, adjacent two
42 bytes are differed between the byte selected.5 byte datas selected are recycled 51 times,
Obtain 255 bytes altogether, to increase randomness.The most each byte is formed by 8, is designated as am,1…
am,8, represent in 255 bytes the 1st to the 8th of m-th byte, therefore, 255 bytes pair
The position answered can be expressed as: Work as am,nWhen=1, Vam,n=1, when
am,nWhen=0, Vam,n=-1, wherein am,nRepresent am,1…am,8In any one, 255 bytes pair
The position answered is according to am,nWith Vam,nTransformational relation obtain matrix Va, can be expressed as: Choose a large amount of random number, form matrix, by random data
The matrix of composition once forms, and keeps constant, as (divided with normal state here from obedience specific distribution
As a example by cloth) random number in select 255*8 random number to form matrix R: By matrix VaThe random number phase of m row and the m row of matrix R
Taking advantage of, then summation obtains a value, is embodied as Sam=Vam,1*hm,1+Vam,2*hm,2+…+Vam,8
*hm,8.According to the method, it is thus achieved that Sa1、Sa2... to Sa255, add up Sa1、Sa2... to Sa255In
Meet number K of the value of specified conditions (here as a example by more than 0).Owing to matrix R is just obeying
State is distributed, then SamAs matrix R, still Normal Distribution, according to theory of probability, normal state
The distribution random numbers probability more than 0 is 1/2, at Sa1、Sa2... to Sa255In, each value is more than 0
Probability be 1/2, so K meets binomial distribution:
According to statistical result, it is judged that Sa1、Sa2... to Sa255Value more than 0 number K whether be even number,
The random number of binomial distribution be the probability of even number for for 1/2, so K meets bar with the probability of 1/2
Part.When K is even number, show Wi1[pi1-169,pi1In], at least part of data meet predetermined condition
C1;When K is odd number, show Wi1[pi1-169,pi1In], at least part of data are unsatisfactory for making a reservation for
Condition C1, C here1I.e. refer to the S obtained according to aforesaid waya1、Sa2... to Sa255Value more than 0
Number K be even number.In the embodiment shown in Fig. 5, at Wi1[pi1-169,pi1]、Wi2[pi2
-169,pi2]、Wi3[pi3-169,pi3]、Wi4[pi4-169,pi4]、Wi5[pi5-169,pi5]、Wi6[pi6
-169,pi6]、Wi7[pi7-169,pi7]、Wi8[pi8-169,pi8]、Wi9[pi9-169,pi9]、Wi10[pi10
-169,pi10] and Wi11[pi11-169,pi11In], each window size is identical, i.e. window size is 169
Byte, judges that the mode that in window, whether at least part of data meet predetermined condition is the most identical simultaneously,
It is specifically shown in above-mentioned judgement Wi1[pi1-169,pi1In], whether at least part of data meet predetermined condition C1
Description.Therefore, as shown in figure 16,Represent and judge window Wi2[pi2-169,pi2In] extremely
Whether small part data meet predetermined condition C2Time select 1 byte, adjacent two select
42 bytes are differed between byte.5 byte datas selected are recycled 51 times, obtains altogether
255 bytes, to increase randomness.The most each byte is formed by 8, is designated as bm,1…bm,8,
Representing in 255 bytes the 1st to the 8th of m-th byte, therefore, 255 bytes are corresponding
Position can be expressed as: Work as bm,nWhen=1, Vbm,n=1, work as bm,n=0
Time, Vbm,n=-1, wherein bm,nRepresent bm,1…bm,8In any one, the position that 255 bytes are corresponding
According to bm,nWith Vbm,nTransformational relation obtain matrix Vb, can be expressed as: Judge Wi1[pi1-169,pi1In], at least part of data are the fullest
Foot predetermined condition mode with judge window Wi2[pi2-169,pi2In], whether at least part of data
The mode meeting predetermined condition is identical, therefore use matrix R:
By matrix VbM row be multiplied with the random number of the m row of matrix R, then summation obtain one
Individual value, is embodied as Sbm=Vbm,1*hm,1+Vbm,2*hm,2+…+Vbm,8*hm,8.According to the method,
Obtain Sb1、Sb2... to Sb255, add up Sb1、Sb2... to Sb255In meet specified conditions (here
As a example by more than 0) number K of value.Due to matrix R Normal Distribution, then SbmWith square
R is the same for battle array, still Normal Distribution, and according to theory of probability, normal distribution random number is more than 0
Probability be 1/2, at Sb1、Sb2... to Sb255In, each value probability more than 0 is 1/2, institute
Binomial distribution is met with K: According to statistical result,
Judge Sb1、Sb2... to Sb255Value more than 0 number K whether be even number, binomial distribution with
Machine number be the probability of even number for for 1/2, so K meets condition with the probability of 1/2.When K is even number
Time, show Wi2[pi2-169,pi2In], at least part of data meet predetermined condition C2;When K is strange
During number, show Wi2[pi2-169,pi2In], at least part of data are unsatisfactory for predetermined condition C2, C here2
I.e. refer to the S obtained according to aforesaid wayb1、Sb2... to Sb255Value more than 0 number K be even number.
In embodiment shown in Fig. 3, Wi2[pi2-169,pi2In], at least part of data meet predetermined condition
C2。
Therefore, as shown in figure 16,Represent and judge window Wi3[pi3-169,pi3At least portion in]
Whether divided data meets predetermined condition C3Time select 1 byte, adjacent two select bytes
Between differ 42 bytes.5 byte datas selected are recycled 51 times, obtains 255 altogether
Byte, to increase randomness.Then use and judge window Wi1[pi1-169,pi1] and Wi2[pi2-169,
pi2The method that in], whether at least part of data meet predetermined condition, it is judged that Wi3[pi3-169,pi3]
In at least data whether meet predetermined condition C3.In embodiment shown in Fig. 5, Wi3[pi3-169,
pi3In], at least part of data meet predetermined condition.As shown in figure 16,Represent and judge window
Wi4[pi4-169,pi4In], whether at least part of data meet predetermined condition C4Time select 1 word
Joint, differs 42 bytes between adjacent two bytes selected.By anti-for 5 byte datas of selection
Utilize again 51 times, obtain 255 bytes altogether, to increase randomness.Then use and judge window Wi1
[pi1-169,pi1]、Wi2[pi2-169,pi2] and Wi3[pi3-169,pi3In], at least part of data are the fullest
The method of foot predetermined condition, it is judged that Wi4[pi4-169,pi4In], whether at least part of data meet pre-
Fixed condition C4.In embodiment shown in Fig. 5, Wi4[pi4-169,pi4At least part of data in]
Meet predetermined condition C4.As shown in figure 16,Represent and judge window Wi5[pi5-169,pi5In]
At least partly whether data meet predetermined condition C5Time select 1 byte, adjacent two selections
Byte between differ 42 bytes.5 byte datas selected are recycled 51 times, obtains altogether
Obtain 255 bytes, to increase randomness.Then use and judge window Wi1[pi1-169,pi1]、Wi2[pi2
-169,pi2]、Wi3[pi3-169,pi3] and Wi4[pi4-169,pi4In], whether at least part of data meet
The method of predetermined condition, it is judged that Wi5[pi5-169,pi5In], at least whether data meet predetermined condition
C5.In embodiment shown in Fig. 5, Wi5[pi5-169,pi5In], at least part of data are unsatisfactory for pre-
Fixed condition C5。
Work as Wi5[pi5-169,pi5In], at least part of data are unsatisfactory for C during predetermined condition5, from a pi5
Along data flow point cutpoint search direction 11 bytes of jump, at the end position of the 11st byte
Obtain next potential cut-point kj, as shown in Figure 6, preset according on duplicate removal server 103
Rule, for potential cut-point kjDetermine a pj1, some pj1Corresponding window Wj1[pj1-169,pj1],
Judge window Wj1[pj1-169,pj1In], whether at least part of data meet predetermined condition C1Side
Formula with judge window Wi1[pi1-169,pi1In], whether at least part of data meet predetermined condition C1's
Mode is identical, the most as shown in figure 17, and Wj1Represent window Wj1[pj1-169,pj1], for judging Wj1
[pj1-169,pj1In], whether at least part of data meet predetermined condition C1, select 5 bytes, figure
In 17, " ■ " represents 1 byte selected, and differs 42 words between adjacent two bytes selected
Joint.5 byte datas selected are recycled 51 times, obtains 255 bytes altogether, random to increase
Property.The most each byte is formed by 8, is designated as am,1'…am,8', represent m in 255 bytes
The 1st of individual byte to the 8th, therefore, position corresponding to 255 bytes can be expressed as: Work as am,nDuring '=1, Vam,n'=1, works as am,nDuring '=0, Vam,n'=-1,
Wherein am,n' represent am,1'…am,8Any one in ', position corresponding to 255 bytes is according to am,n' with
Vam,n' transformational relation obtain matrix Va', can be expressed as:
Judge window Wj1[pj1-169,pj1In], whether at least part of data meet predetermined condition and sentence
Disconnected window Wi1[pi1-169,pi1In], whether at least part of data meet the mode phase of predetermined condition
With, therefore use matrix R: By matrix Va' m row and matrix
The random number of the m row of R is multiplied, and then summation obtains a value, is embodied as Sam'=Vam,1'
*hm,1+Vam,2'*hm,2+…+Vam,8'*hm,8.According to the method, it is thus achieved that Sa1'、Sa2' ... to Sa255',
Statistics Sa1'、Sa2' ... to Sa255The value of specified conditions (here as a example by more than 0) is met in '
Number K.Due to matrix R Normal Distribution, then Sam' as matrix R, just still obeying
State is distributed, and according to theory of probability, the normal distribution random number probability more than 0 is 1/2, at Sa1'、
Sa2' ... to Sa255In ', each value probability more than 0 is 1/2, so K meets binomial distribution: According to statistical result, it is judged that Sa1'、Sa2' ... arrive
Sa255' value more than 0 number K whether be even number, the random number of binomial distribution is the general of even number
Rate is 1/2, so K meets condition with the probability of 1/2.When K is even number, show Wj1[pj1-169,
pj1In], at least part of data meet predetermined condition C1;When K is odd number, show Wj1[pj1-169,
pj1In], at least part of data are unsatisfactory for predetermined condition C1。
Judge Wi2[pi2-169,pi2In], whether at least part of data meet predetermined condition C2Side
Formula and judge Wj2[pj2-169,pj2In], whether at least part of data meet predetermined condition C2Side
Formula is identical, therefore, as shown in figure 17,Represent and judge window Wj2[pj2-169,pj2In] extremely
Whether small part data meet predetermined condition C2Time select 1 byte, adjacent two select
42 bytes are differed between byte.5 byte datas selected are recycled 51 times, obtains altogether
255 bytes, to increase randomness.The most each byte is formed by 8, is designated as bm,1'…bm,8',
Representing in 255 bytes the 1st to the 8th of m-th byte, therefore, 255 bytes are corresponding
Position can be expressed as: Work as bm,nDuring '=1, Vbm,n'=1, works as bm,n'
When=0, Vbm,n'=-1, wherein bm,n' represent bm,1'…bm,8Any one in ', 255 byte correspondences
Position according to bm,n' and Vbm,n' transformational relation obtain matrix Vb', can be expressed as: Window W2[p2-169,p2] and W2[q2-169,q2In] at least
The mode whether part data meet predetermined condition is identical, the most still uses matrix R: By matrix Vb' the random number of m row of m row and matrix R
Being multiplied, then summation obtains a value, is embodied as Sbm'=Vbm,1'*hm,1+Vbm,2'*hm,2
+…+Vbm,8'*hm,8.According to the method, it is thus achieved that Sb1'、Sb2' ... to Sb255', add up Sb1'、Sb2'…
To Sb255Number K of the value of specified conditions (here as a example by more than 0) is met in '.Due to square
Battle array R Normal Distribution, then Sbm' as matrix R, still Normal Distribution, according to generally
Rate opinion, the normal distribution random number probability more than 0 is 1/2, at Sb1'、Sb2' ... to Sb255In ',
Each value probability more than 0 is 1/2, so K meets binomial distribution: According to statistical result, it is judged that Sb1'、Sb2' ... arrive
Sb255' value more than 0 number K whether be even number, the random number of binomial distribution is the general of even number
Rate is for for 1/2, so K meets condition with the probability of 1/2.When K is even number, show Wj2[pj2
-169,pj2In], at least part of data meet predetermined condition C2;When K is odd number, show Wj2[pj2
-169,pj2In], at least part of data are unsatisfactory for predetermined condition C2.In like manner, it is judged that Wi3[pi3-169,
pi3In], whether at least part of data meet predetermined condition C3Mode with judge Wj3[pj3-169,
pj3In], whether at least part of data meet predetermined condition C3Mode identical, in like manner, it is judged that Wj4
[pj4-169,pj4In], whether at least part of data meet predetermined condition C4, judge Wj5[pj5-169,
pj5In], whether at least part of data meet predetermined condition C5, judge Wj6[pj6-169,pj6In] extremely
Whether small part data meet predetermined condition C6, judge Wj7[pj7-169,pj7In] at least partly
Whether data meet predetermined condition C7, judge Wj8[pj8-169,pj8In], at least part of data are
No meet predetermined condition C8, judge Wj9[pj9-169,pj9In], whether at least part of data meet
Predetermined condition C9, judge Wj10[pj10-169,pj10In], whether at least part of data meet predetermined bar
Part C10With judge Wj11[pj11-169,pj11In], whether at least part of data meet predetermined condition C11,
Do not repeat them here.
Still as a example by Fig. 5 illustrated embodiment, it is provided that one judges window Wiz[piz-Az,
piz+BzIn], whether at least part of data meet predetermined condition CzMethod, the present embodiment makes
Window W is judged with random functioniz[piz-Az,piz+BzIn], whether at least part of data meet pre-
Fixed condition Cz, according to the rule preset on duplicate removal server 103, for potential cut-point kiReally
Fixed point pi1And pi1Corresponding window Wi1[pi1-169,pi1], it is judged that Wi1[pi1-169,pi1At least portion in]
Whether divided data meets predetermined condition C1, as shown in figure 16, Wi1Represent window Wi1[pi1-169,
pi1], for judging Wi1[pi1-169,pi1In], whether at least part of data meet predetermined condition C1, choosing
Selecting 5 bytes, in Figure 16, " ■ " represents 1 byte selected, the word of adjacent two selections " ■ "
42 bytes are differed between joint.One of which implementation selects for using HASH function to calculate
5 bytes, use the calculated numerical value of HASH function to be one and fixing be uniformly distributed,
If using the calculated numerical value of HASH function is even number, then judge Wi1[pi1-169,pi1]
In at least partly data meet predetermined condition C1, i.e. C1Represent and use HASH according to aforesaid way
The calculated numerical value of function is even number.Therefore, Wi1[pi1-169,pi1At least part of data in]
The probability whether meeting predetermined condition is 1/2.In the embodiment shown in Fig. 5, use Hash
Function judges Wi2[pi2-169,pi2In], whether at least part of data meet predetermined condition C2、Wi3
[pi3-169,pi3In], whether at least part of data meet predetermined condition C3、Wi4[pi4-169,pi4In]
At least partly whether data meet predetermined condition C4And Wi5[pi5-169,pi5At least part of data in]
Whether meet predetermined condition C5, implement and refer to describe the use of Fig. 5 illustrated embodiment
Hash function judges Wi1[pi1-169,pi1In], whether at least part of data meet the side of predetermined condition
Formula C1, do not repeat them here.
Work as Wi5[pi5-169,pi5In], at least part of data are unsatisfactory for predetermined condition C5Time, from a pi5
Along data flow point cutpoint search direction 11 bytes of jump, at the end position of the 11st byte
Obtain current potential cut-point kj, as shown in Figure 6, according to preset on duplicate removal server 103
Rule, for potential cut-point kjDetermine a pj1, some pj1Corresponding window Wj1[pj1-169,pj1],
Judge window Wj1[pj1-169,pj1In], whether at least part of data meet predetermined condition C1Side
Formula with judge window Wi1[pi1-169,pi1In], whether at least part of data meet predetermined condition C1's
Mode is identical, the most as shown in figure 17, and Wj1Represent window Wj1[pj1-169,pj1], for judging Wj1
[pj1-169,pj1In], whether at least part of data meet predetermined condition C1, select 5 bytes, figure
In 17, " ■ " represents 1 byte selected, and differs 42 between adjacent two bytes " ■ " selected
Individual byte.Hash function is used to calculate from window Wj1[pj1-169,pj15 bytes chosen in],
If the numerical value obtained is even number, then Wj1[pj1-169,pj1In], at least part of data meet predetermined
Condition C1.In Figure 17, it is judged that Wi2[pi2-169,pi2In], whether at least part of data meet predetermined
Condition C2Mode and judge Wj2[pj2-169,pj2In], whether at least part of data meet predetermined
Condition C2Mode identical, therefore, as shown in figure 17,Represent and judge window Wj2[pj2-169,
pj2In], whether at least part of data meet predetermined condition C2Time select 1 byte, adjacent two
The byte of individual selectionBetween differ 42 bytes.Hash function is used to calculate 5 selected
Byte, if the numerical value obtained is even number, then Wj2[pj2-169,pj2In], at least part of data are full
Foot predetermined condition C2.In Figure 17, it is judged that Wi3[pi3-169,pi3In], at least part of data are the fullest
Foot predetermined condition C3Mode with judge Wj3[pj3-169,pj3In], at least part of data are the fullest
Foot predetermined condition C3Mode identical, therefore, as shown in figure 17,Represent and judge window Wj3
[pj3-169,pj3In], whether at least part of data meet predetermined condition C3Time select 1 byte,
Adjacent two bytes selectedBetween differ 42 bytes.Use Hash function to calculate to select
5 bytes, the numerical value obtained is even number, then Wj3[pj3-169,pj3At least part of data in]
Meet predetermined condition C3.In Figure 17, it is judged that Wj4[pj4-169,pj4In], whether at least part of data
Meet predetermined condition C4Mode and judge window Wi4[pi4-169,pi4At least part of data in]
Whether meet predetermined condition C4Mode, therefore, as shown in figure 17,Represent and judge window
Mouth Wj4[pj4-169,pj4In], whether at least part of data meet predetermined condition C4Time select 1
Byte, adjacent two bytes selectedBetween differ 42 bytes.Use Hash function meter
Calculating 5 bytes selected, the numerical value obtained is even number, then Wj4[pj4-169,pj4At least portion in]
Divided data meets predetermined condition C4.According to said method, it is judged that Wj5[pj5-169,pj5In] at least
Whether part data meet predetermined condition C5, judge Wj6[pj6-169,pj6At least partly count in]
According to whether meeting predetermined condition C6, judge Wj7[pj7-169,pj7In], whether at least part of data
Meet predetermined condition C7, judge Wj8[pj8-169,pj8In], whether at least part of data meet pre-
Fixed condition C8, judge Wj9[pj9-169,pj9In], whether at least part of data meet predetermined condition
C9, judge Wj10[pj10-169,pj10In], whether at least part of data meet predetermined condition C10With
Judge Wj11[pj11-169,pj11In], whether at least part of data meet predetermined condition C11, at this not
Repeat again.
As a example by the embodiment shown in Fig. 5, it is provided that one judges window Wiz[piz-Az,piz
+BzIn], whether at least part of data meet predetermined condition CzMethod, in the present embodiment use
Random function judges window Wiz[piz-Az,piz+BzIn], whether at least part of data meet predetermined
Condition Cz, according to the rule preset on duplicate removal server 103, for potential cut-point kiDetermine
Point pi1And pi1Corresponding window Wi1[pi1-169,pi1], it is judged that Wi1[pi1-169,pi1In] at least partly
Whether data meet predetermined condition C1, as shown in figure 16, Wi1Represent window Wi1[pi1-169,pi1],
For judging Wi1[pi1-169,pi1In], whether at least part of data meet predetermined condition C1, select 5
Byte, in Figure 16, the byte " ■ " of serial number 169,127,85,43 and 1 represents selection respectively
1 byte, adjacent two select bytes between differ 42 bytes.By serial number 169,
127, the byte " ■ " of 85,43 and 1 is converted into a decimal value respectively, represents respectively
For a1、a2、a3、a4And a5.Because 1 byte is formed by 8, so each byte " ■ "
As numerical value, then an a1、a2、a3、a4And a5In any one arIt is satisfied by 0≤ar≤255。
a1、a2、a3、a4And a5The matrix of composition 1*5.Select from the random number obeying binomial distribution
Select 256*5 random number, form matrix R, be expressed as:
According to a1Value and the row at place, search from matrix R correspondence value, such as a1=36, a1
It is positioned at the 1st row, then searches h36,1Corresponding value;According to a2Value and the row at place, from matrix R
The middle value searching correspondence, such as a2=48, a2It is positioned at the 2nd row, then searches h48,2Corresponding value;Root
According to a3Value and the row at place, search from matrix R correspondence value, such as a3=26, a3It is positioned at
3rd row, then search h26,3Corresponding value;According to a4Value and the row at place, look into from matrix R
Look for the value of correspondence, such as a4=26, a4It is positioned at the 4th row, then searches h26,4Corresponding value;According to a5
Value and the row at place, search from matrix R correspondence value, such as a5=88, a5It is positioned at the 5th row,
Then search h88,5Corresponding value.S1=h36,1+h48,2+h26,3+h26,4+h88,5, because matrix R obeys two
Item distribution, therefore, S1Also binomial distribution is obeyed.Work as S1For even number, then Wi1[pi1-169,pi1]
In at least partly data meet predetermined condition C1, work as S1For odd number, then Wi1[pi1-169,pi1In]
At least partly data are unsatisfactory for predetermined condition C1, S1Probability for even number is 1/2, C1Expression is pressed
Aforesaid way calculates S1For even number.In the embodiment shown in fig. 5, Wi1[pi1-169,pi1In] at least
Part data meet predetermined condition C1.As shown in figure 16,Represent and judge window Wi2[pi2-169,
pi2In], whether at least part of data meet predetermined condition C2Time 1 byte selecting respectively, at figure
In 16, represent by sequence number 170,128,86,44 and 2 respectively, adjacent two bytes selected
Between differ 42 bytes.Byte by sequence number 170,128,86,44 and 2Turn respectively
Change a decimal value into, be expressed as b1、b2、b3、b4And b5.Because 1 byte
Formed by 8, so each byteAs numerical value, then a b1、b2、b3、b4And b5
In any one brIt is satisfied by 0≤br≤255。b1、b2、b3、b4And b5The matrix of composition 1*5.
In present embodiment, it is judged that Wi1And Wi2In at least partly data whether meet the side of predetermined condition
Formula is identical, the most still uses matrix R, according to b1Value and the row at place, from matrix R
Search corresponding value, such as b1=66, b1It is positioned at the 1st row, then searches h66,1Corresponding value;According to b2
Value and the row at place, search from matrix R correspondence value, such as b2=48, b2It is positioned at the 2nd row,
Then search h48,2Corresponding value;According to b3Value and the row at place, search corresponding from matrix R
Value, such as b3=99, b3It is positioned at the 3rd row, then searches h99,3Corresponding value;According to b4Value and institute
Row, search from matrix R correspondence value, such as b4=26, b4It is positioned at the 4th row, then searches h26,4
Corresponding value;According to b5Value and the row at place, search from matrix R correspondence value, such as b5=90,
b5It is positioned at the 5th row, then searches h90,5Corresponding value.S2=h66,1+h48,2+h99,3+h26,4+h90,5, because
Matrix R obeys binomial distribution, therefore, S2Also binomial distribution is obeyed.Work as S2For even number, then Wi2
[pi2-169,pi2In], at least part of data meet predetermined condition C2, work as S2For odd number, then Wi2[pi2
-169,pi2In], at least part of data are unsatisfactory for predetermined condition C2, S2Probability for even number is 1/2.
In the embodiment shown in fig. 5, Wi2[pi2-169,pi2In], at least part of data meet predetermined condition
C2.Use same rule, judge W respectivelyi3[pi3-169,pi3In], whether at least part of data
Meet predetermined condition C3, judge Wi4[pi4-169,pi4In], whether at least part of data meet predetermined
Condition C4, judge Wi5[pi5-169,pi5In], whether at least part of data meet predetermined condition C5、
Judge Wi6[pi6-169,pi6In], whether at least part of data meet predetermined condition C6, judge Wi7
[pi7-169,pi7In], whether at least part of data meet predetermined condition C7, judge Wi8[pi8-169,
pi8In], whether at least part of data meet predetermined condition C8, judge Wi9[pi9-169,pi9In] at least
Whether part data meet predetermined condition C9, judge Wi10[pi10-169,pi10At least part of data in]
Whether meet predetermined condition C10With judge Wi11[pi11-169,pi11In], at least part of data are the fullest
Foot predetermined condition C11.In embodiment shown in Fig. 5, Wi5[pi5-169,pi5At least partly count in]
According to being unsatisfactory for predetermined condition C5, from a pi5Along data flow point cutpoint search direction 11 words of jump
Joint, the end position the 11st byte obtains current potential cut-point kj, as shown in Figure 6,
According to the rule preset on duplicate removal server 103, for potential cut-point kjDetermine a pj1, point
pj1Corresponding window Wj1[pj1-169,pj1], it is judged that window Wj1[pj1-169,pj1In] at least partly
Whether data meet predetermined condition C1Mode with judge window Wi1[pi1-169,pi1At least portion in]
Whether divided data meets predetermined condition C1Mode identical, the most as shown in figure 17, Wj1Represent
Window Wj1[pj1-169,pj1], for judging Wj1[pj1-169,pj1In], at least part of data are the fullest
Foot predetermined condition C1, in Figure 17, the byte " ■ " of serial number 169,127,85,43 and 1 is respectively
Represent 1 byte selected, between adjacent two bytes selected, differ 42 bytes.By sequence
Number be 169,127,85,43 and 1 byte " ■ " be converted into a decimal value respectively,
It is expressed as a1'、a2'、a3'、a4' and a5'.Because 1 byte is formed by 8, so often
Individual byte " ■ " is as numerical value, then an a1'、a2'、a3'、a4' and a5Any one a in 'r' all
Meet 0≤ar'≤255。a1'、a2'、a3'、a4' and a5' composition 1*5 matrix.Judge window Wj1
[pj1-169,pj1In], whether at least part of data meet predetermined condition C1Mode with judge window
Wi1[pi1-169,pi1In], whether at least part of data meet predetermined condition C1Mode identical, because of
This, still use matrix R, be expressed as:
According to a1' value and the row at place, search from matrix R correspondence value, such as a1'=16, a1' position
In the 1st row, then search h16,1Corresponding value;According to a2' value and the row at place, from matrix R
Search corresponding value, such as a2'=98, a2' be positioned at the 2nd row, then search h98,2Corresponding value;According to
a3' value and the row at place, search from matrix R correspondence value, such as a3'=56, a3' it is positioned at
3 row, then search h56,3Corresponding value;According to a4' value and the row at place, from matrix R search
Corresponding value, such as a4'=36, a4' be positioned at the 4th row, then search h36,4Corresponding value;According to a5'
Value and the row at place, search the value of correspondence, such as a from matrix R5'=99, a5' it is positioned at the 5th row,
Then search h99,5Corresponding value.S1'=h16,1+h98,2+h56,3+h36,4+h99,5, because matrix R obeys binomial
Distribution, therefore, S1' also obey binomial distribution.Work as S1' for even number, then Wj1[pj1-169,pj1]
In at least partly data meet predetermined condition C1, work as S1' for odd number, then Wj1[pj1-169,pj1]
In at least partly data be unsatisfactory for predetermined condition C1, S1' it is 1/2 for the probability of even number.
Judge Wi2[pi2-169,pi2In], whether at least part of data meet predetermined condition C2Side
Formula and judge Wj2[pj2-169,pj2In], whether at least part of data meet predetermined condition C2Side
Formula is identical, therefore, as shown in figure 17,Represent and judge window Wj2[pj2-169,pj2In] extremely
Whether small part data meet predetermined condition C2Time select 1 byte, adjacent two select
Differ 42 bytes between byte, represent by sequence number 170,128,86,44 and 2 respectively, phase
42 bytes are differed between adjacent two bytes selected.By sequence number 170,128,86,44 and 2
ByteIt is converted into a decimal value respectively, is expressed as b1'、b2'、b3'、b4'
And b5'.Because 1 byte is formed by 8, so each byteAs a numerical value, then
b1'、b2'、b3'、b4' and b5Any one b in 'r' it is satisfied by 0≤br'≤255。b1'、b2'、b3'、b4'
And b5' composition 1*5 matrix.With judge window Wi2[pi2-169,pi2In], at least part of data are
No meet predetermined condition C2Use identical matrix R, according to b1' value and the row at place, from square
Battle array R searches the value of correspondence, such as b1'=210, b1' be positioned at the 1st row, then search h210,1Corresponding value;
According to b2' value and the row at place, search from matrix R correspondence value, such as b2'=156, b2' position
In the 2nd row, then search h156,2Corresponding value;According to b3' value and the row at place, from matrix R
Search corresponding value, such as b3'=144, b3' be positioned at the 3rd row, then search h144,3Corresponding value;Root
According to b4' value and the row at place, search from matrix R correspondence value, such as b4'=60, b4' be positioned at
4th row, then search h60,4Corresponding value;According to b5' value and the row at place, look into from matrix R
Look for the value of correspondence, such as b5'=90, b5' be positioned at the 5th row, then search h90,5Corresponding value.S2'=h210,1
+h156,2+h144,3+h60,4+h90,5, with S2Rule of judgment identical, work as S2' for even number, then Wj2[pj2
-169,pj2In], at least part of data meet predetermined condition C2, work as S2' for odd number, then Wj2[pj2
-169,pj2In], at least part of data are unsatisfactory for predetermined condition C2, S2' it is 1/2 for the probability of even number.
In like manner, it is judged that Wi3[pi3-169,pi3In], whether at least part of data meet predetermined condition C3
Mode with judge Wj3[pj3-169,pj3In], whether at least part of data meet predetermined condition C3
Mode identical, in like manner, it is judged that Wj4[pj4-169,pj4In], whether at least part of data meet pre-
Fixed condition C4, judge Wj5[pj5-169,pj5In], whether at least part of data meet predetermined condition C5、
Judge Wj6[pj6-169,pj6In], whether at least part of data meet predetermined condition C6, judge Wj7
[pj7-169,pj7In], whether at least part of data meet predetermined condition C7, judge Wj8[pj8-169,
pj8In], whether at least part of data meet predetermined condition C8, judge Wj9[pj9-169,pj9In] extremely
Whether small part data meet predetermined condition C9, judge Wj10[pj10-169,pj10In] at least partly
Whether data meet predetermined condition C10With judge Wj11[pj11-169,pj11At least part of data in]
Whether meet predetermined condition C11, do not repeat them here.
As a example by the embodiment shown in Fig. 5, it is provided that one judges window Wiz[piz-Az,piz
+BzIn], whether at least part of data meet predetermined condition CzMethod, in the present embodiment use
Random function judges window Wiz[piz-Az,piz+BzIn], whether at least part of data meet predetermined
Condition Cz, according to the rule preset on duplicate removal server 103, for potential cut-point kiDetermine
Point pi1And pi1Corresponding window Wi1[pi1-169,pi1], it is judged that Wi1[pi1-169,pi1In] at least partly
Whether data meet predetermined condition C1, as shown in figure 16, Wi1Represent window Wi1[pi1-169,pi1],
For judging Wi1[pi1-169,pi1In], whether at least part of data meet predetermined condition C1, select 5
Byte, in Figure 16, the byte " ■ " of serial number 169,127,85,43 and 1 represents selection respectively
1 byte, adjacent two select bytes between differ 42 bytes.By serial number 169,
127, the byte " ■ " of 85,43 and 1 is converted into a decimal value respectively, represents respectively
For a1、a2、a3、a4And a5.Because 1 byte is formed by 8, so each byte " ■ "
As numerical value, then an a1、a2、a3、a4And a5In any one asIt is satisfied by 0≤as≤255。
a1、a2、a3、a4And a5The matrix of composition 1*5.Select from the random number obeying binomial distribution
Select 256*5 random number, form matrix R, be expressed as: From clothes
From the random number of binomial distribution, select 256*5 random number, form matrix G, be expressed as:
According to a1Value and the row at place, such as a1=36, a1It is positioned at the 1st row, then from matrix R
Search h36,1Corresponding value, searches g from matrix G36,1Corresponding value;According to a2Value and
The row at place, such as a2=48, a2It is positioned at the 2nd row, then from matrix R, looks into h48,2Corresponding value,
G is searched from matrix G48,2Corresponding value;According to a3Value and the row at place, such as a3=26, a3
It is positioned at the 3rd row, then from matrix R, searches h26,3Corresponding value, searches g from matrix G26,3Right
The value answered;According to a4Value and the row at place, such as a4=26, a4It is positioned at the 4th row, then from matrix
R searches h26,4Corresponding value, searches g from matrix G26,4Corresponding value;According to a5Value and
The row at place, such as a5=88, a5It is positioned at the 5th row, then from matrix R, searches h88,5Corresponding value,
G is searched from matrix G88,5Corresponding value.S1h=h36,1+h48,2+h26,3+h26,4+h88,5, because matrix
R obeys binomial distribution, therefore, S1hAlso binomial distribution is obeyed;S1g=g36,1+g48,2+g26,3+g26,4+
g88,5, because matrix G obeys binomial distribution, therefore S1gAlso binomial distribution is obeyed.Work as S1hAnd S1g
In have 1 for even number, then Wi1[pi1-169,pi1In], at least part of data meet predetermined condition C1,
Work as S1hAnd S1gIt is odd number, then Wi1[pi1-169,pi1In], at least part of data are unsatisfactory for predetermined bar
Part C1, C1The S that statement obtains according to the method described above1hAnd S1gIn have 1 for even number.Because S1hWith
S1gAll obey binomial distribution, therefore S1hProbability for even number is 1/2, S1gProbability for even number is
1/2, S1hAnd S1gIn to have 1 probability for even number be 1-1/4=3/4, therefore, Wi1[pi1-169,pi1]
In at least partly data meet predetermined condition C1Probability be 3/4.In the embodiment shown in fig. 5,
Wi1[pi1-169,pi1In], at least part of data meet predetermined condition C1.The embodiment party shown in Fig. 5
In formula, at Wi1[pi1-169,pi1]、Wi2[pi2-169,pi2]、Wi3[pi3-169,pi3]、Wi4[pi4-169,
pi4]、Wi5[pi5-169,pi5]、Wi6[pi6-169,pi6]、Wi7[pi7-169,pi7]、Wi8[pi8-169,
pi8]、Wi9[pi9-169,pi9]、Wi10[pi10-169,pi10] and Wi11[pi11-169,pi11In], each window
Size is identical, i.e. window size is 169 bytes, judges that in window, at least part of data are simultaneously
The no mode meeting predetermined condition is the most identical, is specifically shown in above-mentioned judgement Wi1[pi1-169,pi1In] at least
Whether part data meet predetermined condition C1Description.Therefore, as shown in figure 16,Represent
Judge window Wi2[pi2-169,pi2In], whether at least part of data meet predetermined condition C2Time-division
1 byte not selected, in figure 16, represents by sequence number 170,128,86,44 and 2 respectively,
42 bytes are differed between adjacent two bytes selected.By sequence number 170,128,86,44
With 2 byteIt is converted into a decimal value respectively, is expressed as b1、b2、b3、
b4And b5.Because 1 byte is formed by 8, so each byteAs a numerical value,
Then b1、b2、b3、b4And b5In any one bsIt is satisfied by 0≤bs≤255。b1、b2、b3、b4
And b5The matrix of composition 1*5.In present embodiment, it is judged that in each window, at least part of data are
The no mode meeting predetermined condition is identical, the most still uses same matrix R and G.According to b1's
Value and the row at place, such as b1=66, b1It is positioned at the 1st row, then from matrix R, searches h66,1Corresponding
Value, searches g from matrix G66,1Corresponding value;According to b2Value and the row at place, such as b2=48,
b2It is positioned at the 2nd row, then from matrix R, searches h48,2Corresponding value, searches g from matrix G48,2Right
The value answered;According to b3Value and the row at place, such as b3=99, b3It is positioned at the 3rd row, then from matrix
R searches h99,3Corresponding value, searches g from matrix G99,3Corresponding value;According to b4Value and
The row at place, such as b4=26, b4It is positioned at the 4th row, then from matrix R, searches h26,4Corresponding value,
G is searched from matrix G26,4Corresponding value;According to b5Value and the row at place, such as b5=90, b5Position
In the 5th row, then from matrix R, search h90,5Corresponding value, searches g from matrix G90,5Corresponding
Value.S2h=h66,1+h48,2+h99,3+h26,4+h90,5, because matrix R obedience binomial distribution, therefore,
S2hAlso binomial distribution is obeyed.S2g=g66,1+g48,2+g99,3+g26,4+g90,5, because matrix G obeys
Binomial distribution, therefore, S2gAlso binomial distribution is obeyed.Work as S2hAnd S2gIn have 1 for even number,
Then Wi2[pi2-169,pi2In], at least part of data meet predetermined condition C2, work as S2hAnd S2gIt is
Odd number, then Wi2[pi2-169,pi2In], at least part of data are unsatisfactory for predetermined condition C2, S2hAnd S2g
In to have 1 probability for even number be 3/4.In the embodiment shown in fig. 5, Wi2[pi2-169,pi2]
In at least partly data meet predetermined condition C2.Use same rule, judge W respectivelyi3[pi3
-169,pi3In], whether at least part of data meet predetermined condition C3, judge Wi4[pi4-169,pi4]
In at least partly data whether meet predetermined condition C4, judge Wi5[pi5-169,pi5At least portion in]
Whether divided data meets predetermined condition C5, judge Wi6[pi6-169,pi6In], at least part of data are
No meet predetermined condition C6, judge Wi7[pi7-169,pi7In], whether at least part of data meet pre-
Fixed condition C7, judge Wi8[pi8-169,pi8In], whether at least part of data meet predetermined condition C8、
Judge Wi9[pi9-169,pi9In], whether at least part of data meet predetermined condition C9, judge Wi10
[pi10-169,pi10In], whether at least part of data meet predetermined condition C10With judge Wi11[pi11
-169,pi11In], whether at least part of data meet predetermined condition C11.Embodiment shown in Fig. 5
In, Wi5[pi5-169,pi5In], at least part of data are unsatisfactory for predetermined condition C5, from a pi5Along
Data flow point cutpoint search direction 11 bytes of jump, the end position the 11st byte obtains
Current potential cut-point kj, as shown in Figure 6, according to the rule preset on duplicate removal server 103,
For potential cut-point kjDetermine a pj1, some pj1Corresponding window Wj1[pj1-169,pj1], it is judged that
Window Wj1[pj1-169,pj1In], whether at least part of data meet predetermined condition C1Mode with
Judge window Wi1[pi1-169,pi1In], whether at least part of data meet predetermined condition C1Mode
Identical, the most as shown in figure 17, Wj1Represent window Wj1[pj1-169,pj1], for judging Wj1[pj1
-169,pj1In], whether at least part of data meet predetermined condition C1, serial number 169 in Figure 17,
127, the byte " ■ " of 85,43 and 1 represents 1 byte of selection, adjacent two selections respectively
Byte between differ 42 bytes.By the byte " ■ " of serial number 169,127,85,43 and 1
It is converted into a decimal value respectively, is expressed as a1'、a2'、a3'、a4' and a5'。
Because 1 byte is formed by 8, so each byte " ■ " is as numerical value, then an a1'、a2'、
a3'、a4' and a5Any one a in 's' it is satisfied by 0≤as'≤255。a1'、a2'、a3'、a4' and
a5' composition 1*5 matrix.Use and judge window Wi1[pi1-169,pi1In], at least part of data are
No meet predetermined condition C1Identical matrix R and G, is expressed as: With
According to a1' value and the row at place, such as a1'=16, a1' be positioned at the 1st row, then look into from matrix R
Look for h16,1Corresponding value, searches g from matrix G16,1Corresponding value;According to a2' value and place
Row, such as a2'=98, a2' be positioned at the 2nd row, then from matrix R, search h98,2Corresponding value, from square
Battle array G searches g98,2Corresponding value;According to a3' value and the row at place, such as a3'=56, a3' position
In the 3rd row, then from matrix R, search h56,3Corresponding value, searches g from matrix G56,3Corresponding
Value;According to a4' value and the row at place, such as a4'=36, a4' it is positioned at the 4th row, then from matrix R
Middle lookup h36,4Corresponding value, searches g from matrix G36,4Corresponding value;According to a5' value and
The row at place, such as a5'=99, a5' be positioned at the 5th row, then from matrix R, search h99,5Corresponding
Value, searches g from matrix G99,5Corresponding value.S1h'=h16,1+h98,2+h56,3+h36,4+h99,5, because of
Binomial distribution, therefore, S is obeyed for matrix R1h' also obey binomial distribution;S1g'=g16,1+g98,2+
g56,3+g36,4+g99,5, because matrix G obeys binomial distribution, therefore S1g' also obey binomial distribution.
Work as S1h' and S1g1 is had for even number, then W in 'j1[pj1-169,pj1In], at least part of data meet
Predetermined condition C1, work as S1h' and S1g' be odd number, then Wj1[pj1-169,pj1At least partly count in]
According to being unsatisfactory for predetermined condition C1, S1h' and S1g' to have 1 probability for even number be 3/4.
Judge Wi2[pi2-169,pi2In], whether at least part of data meet predetermined condition C2Side
Formula and judge Wj2[pj2-169,pj2In], whether at least part of data meet predetermined condition C2Side
Formula is identical, therefore, as shown in figure 17,Represent and judge window Wj2[pj2-169,pj2In] extremely
Whether small part data meet predetermined condition C2Time select 1 byte, adjacent two select
42 bytes are differed between byte.In fig. 17, respectively by sequence number 170,128,86,44
Represent with 2, between adjacent two bytes selected, differ 42 bytes.By sequence number 170,128,
86, the byte of 44 and 2It is converted into a decimal value respectively, is expressed as b1'、
b2'、b3'、b4' and b5'.Because 1 byte is formed by 8, so each byteAs
One numerical value, then b1'、b2'、b3'、b4' and b5Any one b in 's' it is satisfied by 0≤bs'≤255。
b1'、b2'、b3'、b4' and b5' composition 1*5 matrix.Use and judge window Wi2[pi2-169,pi2]
In at least partly data whether meet predetermined condition C2Identical matrix R and G, according to b1' value
With the row at place, such as b1'=210, b1' be positioned at the 1st row, then from matrix R, search h210,1Corresponding
Value, from matrix G search g210,1Corresponding value;According to b2' value and the row at place, such as b2'
=156, b2' be positioned at the 2nd row, then from matrix R, search h156,2Corresponding value, looks into from matrix G
Look for g156,2Corresponding value;According to b3' value and the row at place, such as b3'=144, b3' it is positioned at the 3rd
Row, then search h from matrix R144,3Corresponding value, searches g from matrix G144,3Corresponding value;
According to b4' value and the row at place, such as b4'=60, b4' be positioned at the 4th row, then look into from matrix R
Look for h60,4Corresponding value, searches g from matrix G60,4Corresponding value;According to b5' value and place
Row, such as b5'=90, b5' be positioned at the 5th row, then from matrix R, search h90,5Corresponding value, from square
Battle array G searches g90,5Corresponding value.S2h'=h210,1+h156,2+h144,3+h60,4+h90,5,S2g'=g210,1+
g156,2+g144,3+g60,4+g90,5.Work as S2h' and S2g1 is had for even number, then W in 'j2[pj2-169,pj2]
In at least partly data meet predetermined condition C2, work as S2h' and S2g' be odd number, then Wj2[pj2
-169,pj2In], at least part of data are unsatisfactory for predetermined condition C2, S2h' and S2g1 is had for even in '
The probability of number is 3/4.
In like manner, it is judged that Wi3[pi3-169,pi3In], whether at least part of data meet predetermined condition C3
Mode with judge Wj3[pj3-169,pj3In], whether at least part of data meet predetermined condition C3
Mode identical, in like manner, it is judged that Wj4[pj4-169,pj4In], whether at least part of data meet pre-
Fixed condition C4, judge Wj5[pj5-169,pj5In], whether at least part of data meet predetermined condition C5、
Judge Wj6[pj6-169,pj6In], whether at least part of data meet predetermined condition C6, judge Wj7
[pj7-169,pj7In], whether at least part of data meet predetermined condition C7, judge Wj8[pj8-169,
pj8In], whether at least part of data meet predetermined condition C8, judge Wj9[pj9-169,pj9In] extremely
Whether small part data meet predetermined condition C9, judge Wj10[pj10-169,pj10In] at least partly
Whether data meet predetermined condition C10With judge Wj11[pj11-169,pj11At least part of data in]
Whether meet predetermined condition C11, do not repeat them here.
As a example by the embodiment shown in Fig. 5, it is provided that one judges window Wiz[piz-Az,piz
+BzIn], whether at least part of data meet predetermined condition CzMethod, in the present embodiment use
Random function judges window Wiz[piz-Az,piz+BzIn], whether at least part of data meet predetermined
Condition Cz, according to the rule preset on duplicate removal server 103, for potential cut-point kiDetermine
Point pi1And pi1Corresponding window Wi1[pi1-169,pi1], it is judged that Wi1[pi1-169,pi1In] at least partly
Whether data meet predetermined condition C1, as shown in figure 16, Wi1Represent window Wi1[pi1-169,pi1],
For judging Wi1[pi1-169,pi1In], whether at least part of data meet predetermined condition C1, select 5
Byte, in Figure 16, the byte " ■ " of serial number 169,127,85,43 and 1 represents selection respectively
1 byte, adjacent two select bytes between differ 42 bytes.By serial number 169,
127, the byte " ■ " of 85,43 and 1 regards 40 positions successively as, is expressed as a1、a2、a3、
a4…a40。a1、a2、a3、a4…a40In arbitrary at, work as atWhen=0, Vat=-1, when
atWhen=1, Vat=1, according to atWith VatCorresponding relation, generates Va1、Va2、Va3、Va4…Va40。
From the random number of Normal Distribution, select 40 randoms number, be expressed as: h1、h2、
h3、h4...h40。Sa=Va1*h1+Va2*h2+Va3*h3+Va4*h4+…+Va40*h40.Because h1、
h2、h3、h4...h40Normal Distribution, therefore, SaAlso Normal Distribution.Work as SaFor
Positive number, then Wi1[pi1-169,pi1In], at least part of data meet predetermined condition C1, work as SaIt is negative
Number or 0, then Wi1[pi1-169,pi1In], at least part of data are unsatisfactory for predetermined condition C1, SaFor just
The probability of number is 1/2.In the embodiment shown in fig. 5, Wi1[pi1-169,pi1At least part of data in]
Meet predetermined condition C1.As shown in figure 16,Represent and judge window Wi2[pi2-169,pi2In]
At least partly whether data meet predetermined condition C2Time 1 byte selecting respectively, in figure 16,
Represent by sequence number 170,128,86,44 and 2 respectively, phase between adjacent two bytes selected
Differ from 42 bytes.Byte by sequence number 170,128,86,44 and 2Regard 40 successively as
Individual position, is expressed as b1、b2、b3、b4…b40。b1、b2、b3、b4…b40In appoint
One bt, work as btWhen=0, Vbt=-1, works as btWhen=1, Vbt=1, according to btWith VbtCorresponding relation,
Generate Vb1、Vb2、Vb3、Vb4…Vb40.Judge window Wi1[pi1-169,pi1At least partly count in]
According to whether meeting predetermined condition C1Mode with judge window Wi2[pi2-169,pi2At least portion in]
Whether divided data meets predetermined condition C2Mode identical, therefore, use identical random number:
h1、h2、h3、h4...h40, Sb=Vb1*h1+Vb2*h2+Vb3*h3+Vb4*h4+…+Vb40*h40。
Because h1、h2、h3、h4...h40Normal Distribution, therefore, SbAlso Normal Distribution.
Work as SbFor positive number, then Wi2[pi2-169,pi2In], at least part of data meet predetermined condition C2, when
SbFor negative or 0, then Wi2[pi2-169,pi2In], at least part of data are unsatisfactory for predetermined condition C2,
SbProbability for positive number is 1/2.In the embodiment shown in fig. 5, Wi2[pi2-169,pi2In] at least
Part data meet predetermined condition C2.Use same rule, judge W respectivelyi3[pi3-169,pi3]
In at least partly data whether meet predetermined condition C3, judge Wi4[pi4-169,pi4At least portion in]
Whether divided data meets predetermined condition C4, judge Wi5[pi5-169,pi5In], at least part of data are
No meet predetermined condition C5, judge Wi6[pi6-169,pi6In], whether at least part of data meet pre-
Fixed condition C6, judge Wi7[pi7-169,pi7In], whether at least part of data meet predetermined condition C7、
Judge Wi8[pi8-169,pi8In], whether at least part of data meet predetermined condition C8, judge Wi9
[pi9-169,pi9In], whether at least part of data meet predetermined condition C9, judge Wi10[pi10-169,
pi10In], whether at least part of data meet predetermined condition C10With judge Wi11[pi11-169,pi11In]
At least partly whether data meet predetermined condition C11.In embodiment shown in Fig. 5, Wi5[pi5
-169,pi5In], at least part of data are unsatisfactory for predetermined condition C5, from a pi5Split along data stream
Point search direction 11 bytes of jump, the end position the 11st byte obtains the most potential point
Cutpoint kj, as shown in Figure 6, according to the rule preset on duplicate removal server 103, for potential point
Cutpoint kjDetermine a pj1, some pj1Corresponding window Wj1[pj1-169,pj1], it is judged that window Wj1[pj1
-169,pj1In], whether at least part of data meet predetermined condition C1Mode with judge window Wi1
[pi1-169,pi1In], whether at least part of data meet predetermined condition C1Mode identical, the most such as
Shown in Figure 17, Wj1Represent window Wj1[pj1-169,pj1], for judging Wj1[pj1-169,pj1In] extremely
Whether small part data meet predetermined condition C1, for judging Wj1[pj1-169,pj1In] at least partly
Whether data meet predetermined condition C1, select 5 bytes, serial number 169 in Figure 17,127,
85, the byte " ■ " of 43 and 1 represents 1 byte of selection, adjacent two words selected respectively
42 bytes are differed between joint.The byte " ■ " of serial number 169,127,85,43 and 1 is depended on
Secondary regard 40 positions as, be expressed as a1'、a2'、a3'、a4'…a40'。a1'、a2'、a3'、
a4'…a40Arbitrary a in 't', work as atDuring '=0, Vat'=-1, works as atDuring '=1, Vat'=1, according to
at' and Vat' corresponding relation, generate Va1'、Va2'、Va3'、Va4'…Va40'.Judge window Wj1[pj1
-169,pj1In], whether at least part of data meet predetermined condition C1Mode with judge window Wi1
[pi1-169,pi1In], whether at least part of data meet predetermined condition C1Mode identical, therefore make
Random number with identical: h1、h2、h3、h4...h40。Sa'=Va1'*h1+Va2'*h2+Va3'*h3
+Va4'*h4+…+Va40'*h40.Because h1、h2、h3、h4...h40Normal Distribution, because of
This, Sa' also Normal Distribution.Work as Sa' for positive number, then Wj1[pj1-169,pj1At least portion in]
Divided data meets predetermined condition C1, work as Sa' for negative or 0, then Wj1[pj1-169,pj1In] at least
Part data are unsatisfactory for predetermined condition C1, Sa' it is 1/2 for the probability of positive number.
Judge Wi2[pi2-169,pi2In], whether at least part of data meet predetermined condition C2Side
Formula and judge Wj2[pj2-169,pj2In], whether at least part of data meet predetermined condition C2Side
Formula is identical, therefore, as shown in figure 17,Represent and judge window Wj2[pj2-169,pj2In] extremely
Whether small part data meet predetermined condition C2Time select 1 byte, adjacent two select
42 bytes are differed between byte.In fig. 17, respectively by sequence number 170,128,86,44
Represent with 2, between adjacent two bytes selected, differ 42 bytes.By sequence number 170,128,
86, the byte of 44 and 2Regard 40 positions successively as, be expressed as b1'、b2'、b3'、b4'…
b40'。b1'、b2'、b3'、b4'…b40Arbitrary b in 't', work as btDuring '=0, Vbt'=-1, works as bt'=1
Time, Vbt'=1, according to bt' and Vbt' corresponding relation, generate Vb1'、Vb2'、Vb3'、Vb4'…Vb40'。
Judge Wi2[pi2-169,pi2In], whether at least part of data meet predetermined condition C2Mode and
Judge Wj2[pj2-169,pj2In], whether at least part of data meet predetermined condition C2Mode phase
With, therefore, use identical random number: h1、h2、h3、h4...h40, Sb'=Vb1'*h1+Vb2'
*h2+Vb3'*h3+Vb4'*h4+…+Vb40'*h40.Because h1、h2、h3、h4...h40Just obey
State is distributed, therefore, and Sb' also Normal Distribution.Work as Sb' for positive number, then Wj2[pj2-169,pj2]
In at least partly data meet predetermined condition C2, work as Sb' for negative or 0, then Wj2[pj2-169,pj2]
In at least partly data be unsatisfactory for predetermined condition C2, Sb' it is 1/2 for the probability of positive number.
In like manner, it is judged that Wi3[pi3-169,pi3In], whether at least part of data meet predetermined condition C3
Mode with judge Wj3[pj3-169,pj3In], whether at least part of data meet predetermined condition C3
Mode identical, in like manner, it is judged that Wj4[pj4-169,pj4In], whether at least part of data meet pre-
Fixed condition C4, judge Wj5[pj5-169,pj5In], whether at least part of data meet predetermined condition C5、
Judge Wj6[pj6-169,pj6In], whether at least part of data meet predetermined condition C6, judge Wj7
[pj7-169,pj7In], whether at least part of data meet predetermined condition C7, judge Wj8[pj8-169,
pj8In], whether at least part of data meet predetermined condition C8, judge Wj9[pj9-169,pj9In] extremely
Whether small part data meet predetermined condition C9, judge Wj10[pj10-169,pj10In] at least partly
Whether data meet predetermined condition C10With judge Wj11[pj11-169,pj11At least part of data in]
Whether meet predetermined condition C11, do not repeat them here.
Still as a example by Fig. 5 illustrated embodiment, it is provided that one judges window Wiz[piz-Az,
piz+BzIn], whether at least part of data meet predetermined condition CzMethod, the present embodiment makes
Window W is judged with random functioniz[piz-Az,piz+BzIn], whether at least part of data meet pre-
Fixed condition Cz, according to the rule preset on duplicate removal server 103, for potential cut-point kiReally
Fixed point pi1And pi1Corresponding window Wi1[pi1-169,pi1], it is judged that Wi1[pi1-169,pi1At least portion in]
Whether divided data meets predetermined condition C1, as shown in figure 16, Wi1Represent window Wi1[pi1-169,pi1],
For judging Wi1[pi1-169,pi1In], whether at least part of data meet predetermined condition C1, select 5
Byte, in Figure 16, the byte " ■ " of serial number 169,127,85,43 and 1 represents selection respectively
1 byte, adjacent two select bytes between differ 42 bytes.By serial number 169,
127, the byte " ■ " of 85,43 and 1 is converted into 1 decimal number, and scope is 0-(2^40-1),
Using uniform random number maker is that each decimal number in 0-(2^40-1) generates 1
Individual designated value, right between each decimal number and the designated value in record 0-(2^40-1)
Should be related to R, once specify, the designated value that this decimal number is corresponding is the most constant, and this designated value takes
From being uniformly distributed, if this designated value is even number, then Wi1[pi1-169,pi1At least partly count in]
According to meeting predetermined condition C1If this designated value is odd number, then Wi1[pi1-169,pi1In] at least
Part data are unsatisfactory for predetermined condition C1, C1Represent that the designated value obtained according to the method described above is for even
Number.Because the probability that equally distributed random number is even number is 1/2, therefore, [pi1-169,pi1]
In at least partly data meet predetermined condition C1Probability be 1/2.At the embodiment shown in Fig. 5
In, use same rule, judge W respectivelyi2[pi2-169,pi2In], whether at least part of data
Meet predetermined condition C2, it is judged that Wi3[pi3-169,pi3In], whether at least part of data meet predetermined
Condition C3, judge Wi4[pi4-169,pi4In], whether at least part of data meet predetermined condition C4、
Judge Wi5[pi5-169,pi5In], whether at least part of data meet predetermined condition C5, at this no longer
Repeat.
Work as Wi5[pi5-169,pi5In], at least part of data are unsatisfactory for predetermined condition C5, from a pi5Edge
Data flow point cutpoint search direction 11 bytes of jump, the end position the 11st byte obtains
Obtain current potential cut-point kj, as shown in Figure 6, according to the rule preset on duplicate removal server 103
Then, for potential cut-point kjDetermine a pj1, some pj1Corresponding window Wj1[pj1-169,pj1],
Judge window Wj1[pj1-169,pj1In], whether at least part of data meet predetermined condition C1Side
Formula with judge window Wi1[pi1-169,pi1In], whether at least part of data meet predetermined condition C1's
Mode is identical, therefore, uses each decimal number in identical 0-(2^40-1) and finger
Corresponding relation R between definite value, as shown in figure 17, Wj1Represent window Wj1[pj1-169,pj1],
For judging Wj1[pj1-169,pj1In], whether at least part of data meet predetermined condition C1, select 5
Individual byte, in Figure 17, " ■ " represents 1 byte selected, adjacent two bytes " ■ " selected
Between differ 42 bytes.The byte " ■ " of serial number 169,127,85,43 and 1 is changed
Become 1 decimal number, search, at R, the designated value that this decimal number is corresponding, if this designated value is
Even number, then Wj1[pj1-169,pj1In], at least part of data meet predetermined condition C1If this refers to
Definite value is odd number, then Wj1[pj1-169,pj1In], at least part of data are unsatisfactory for predetermined condition C1,
Because the probability that equally distributed random number is even number is 1/2, therefore, Wj1[pj1-169,pj1]
In at least partly data meet predetermined condition C1Probability be 1/2.In like manner, it is judged that Wi2[pi2-169,
pi2In], whether at least part of data meet predetermined condition C2Mode and judge Wj2[pj2-169,
pj2In], whether at least part of data meet predetermined condition C2Mode identical, it is judged that Wi3[pi3
-169,pi3In], whether at least part of data meet predetermined condition C3Mode with judge Wj3[pj3
-169,pj3In], whether at least part of data meet predetermined condition C3Mode identical, in like manner, sentence
Disconnected Wj4[pj4-169,pj4In], whether at least part of data meet predetermined condition C4, judge Wj5[pj5
-169,pj5In], whether at least part of data meet predetermined condition C5, judge Wj6[pj6-169,pj6]
In at least partly data whether meet predetermined condition C6, judge Wj7[pj7-169,pj7In] at least
Whether part data meet predetermined condition C7, judge Wj8[pj8-169,pj8At least partly count in]
According to whether meeting predetermined condition C8, judge Wj9[pj9-169,pj9In], whether at least part of data
Meet predetermined condition C9, judge Wj10[pj10-169,pj10In], whether at least part of data meet pre-
Fixed condition C10With judge Wj11[pj11-169,pj11In], whether at least part of data meet predetermined bar
Part C11, do not repeat them here.
Duplicate removal server 103 in the embodiment of the present invention shown in Fig. 1, refers to realize this
The device of the technical scheme described by bright embodiment, as shown in figure 18, generally includes central authorities' process
Unit, main storage and input/output interface.CPU, main storage and input
The intercommunication of output interface, main memory store executable instruction, CPU is held
The executable instruction of storage in row main storage, thus perform specific function, as the present invention is real
Execute the lookup data flow point cutpoint described by illustration 4 to Figure 17.Therefore, as shown in figure 19, root
According to the embodiment of the present invention shown in Fig. 4 to Figure 17, duplicate removal server 103, at duplicate removal server 103
On be preset with rule, described rule is: for potential cut-point k determine M some px, some pxRight
The window W answeredx[px-Ax,px+Bx] and window Wx[px-Ax,px+Bx] corresponding predetermined condition
Cx, wherein, x is 1 to M continuous print natural number, M >=2, AxAnd BxFor integer;Duplicate removal takes
Business device 103 includes determining unit 1901 and judging processing unit 1902.Wherein it is determined that unit 1901,
For being used for performing step a): be a) current potential cut-point k according to described ruleiDetermine a piz
And described some pizCorresponding window Wiz[piz-Az,piz+Bz], i and z is integer, and 1≤z
≤M;Judge processing unit 1902, for described window Wiz[piz-Az,piz+BzAt least portion in]
Whether divided data meets predetermined condition Cz;
As described window Wiz[piz-Az,piz+BzIn], at least part of data are unsatisfactory for described predetermined
Condition Cz, from described some pizAlong the described data flow point cutpoint search direction N number of data flow point of jump
Cutpoint minimum searches unit U, and N*U is not more than ‖ Bz‖+maxx(‖Ax‖+‖(ki-pix) ‖),
Obtain new potential cut-point, the most described determine that unit is that described new potential cut-point performs step
A);As described current potential cut-point kiM window in each window Wix[pix-
Ax,pix+BxIn], at least part of data meet predetermined condition Cx, the most described current potential segmentation
Point kiFor data flow point cutpoint.
Further, described rule also includes: at least two point peAnd pf, meet condition Ae=Af,
Be=Bf, Ce=Cf.Further, described rule also includes: described at least two point peWith
pf, relative to described potential cut-point k, search in the reverse direction at described data flow point cutpoint.
Further, described rule also includes: described at least two point peAnd pfBetween distance
It is 1 U.
Further, described judgement processing unit 1902 is specifically for using random function to judge institute
State window Wiz[piz-Az,piz+BzIn], whether at least part of data meet described predetermined condition
Cz.Specifically, described judgement processing unit 1902 is described specifically for using hash function to judge
Window Wiz[piz-Az,piz+BzIn], whether at least part of data meet described predetermined condition Cz。
Specifically, described judgement processing unit 1902 is specifically for using random function to judge described window
Wiz[piz-Az,piz+BzIn], whether at least part of data meet described predetermined condition Cz, specifically
Including:
At described window Wiz[piz-Az,piz+BzF byte is selected, by described F byte in]
Recycling H time, obtain F*H byte altogether, the most each byte is formed by 8, is designated as am,1…
am,8, represent the 1st to the 8th of m-th byte in described F*H byte, described F*H word
The position that joint is corresponding can be expressed as: Work as am,nWhen=1, Vam,n
=1, work as am,nWhen=0, Vam,n=-1, wherein am,nRepresent am,1…am,8In any one, described
Position corresponding to F*H byte is according to am,nWith Vam,nTransformational relation obtain matrix Va, described matrix
VaIt is expressed as: Select from the random number of service normal distribution
Select F*H*8 random number composition matrix R, described matrix R to be expressed as: By described matrix VaThe m row of m row and described matrix R
Random number be multiplied, then summation obtain a value, be embodied as Sam=Vam,1*hm,1+Vam,2
*hm,2+…+Vam,8*hm,8, in like manner, it is thus achieved that Sa1、Sa2... to SaF*H, add up Sa1、Sa2…
To SaF*HIn meet number K of value more than 0, when K is even number, the most described window Wiz[piz-Az,
piz+BzIn], at least part of data meet described predetermined condition Cz。
Further, described judgement processing unit 1902 is for as described window Wiz[piz-Az,piz+
BzIn], at least part of data are unsatisfactory for described predetermined condition Cz, from described some pizAlong described data
Flow point cutpoint search direction jump N number of data flow point cutpoint minimum search unit U, it is thus achieved that described newly
Potential cut-point, described determine that unit 1901 is that described new potential cut-point performs step a),
According to described rule, the some p determined for described new potential cut-pointicCorresponding window Wic
[pic-Ac,pic+Bc] left margin and described window Wiz[piz-Az,piz+Bz] right margin overlap
Or the described window W determined for described new potential cut-pointic[pic-Ac,pic+Bc] a left side
Border is positioned at described window Wiz[piz-Az,piz+BzWithin the scope of];Wherein, for described new diving
At the described window W that cut-point determinesic[pic-Ac,pic+Bc] it is according to described rule, for described
M the point that new potential cut-point determines is arranged according in the sequence of data stream search direction acquisition
The point of sequence first.
According to shown in Fig. 4 to Figure 17 the embodiment of the present invention provide based on whois lookup data
In the method for flow point cutpoint, for potential cut-point kiDetermine a pixAnd some pixWindow Wix[pix-
Ax, pix+Bx], wherein, x is respectively 1 and arrives M continuous print natural number, M >=2, can sentence parallel
In disconnected M window, in each window, at least partly whether data meet predetermined condition Cx, or
Judge in window, whether at least part of data meet predetermined condition successively, it is also possible to judge window
Wi1[pi1-A1, pi1+B1In], at least part of data meet predetermined condition C1Time, then judge Wi2
[pi2-A2, pi2+B2In], at least part of data meet predetermined condition C2Time, until judging Wim
[pim-Am, pim+BmIn], at least part of data meet predetermined condition Cm.Other windows in embodiment
The judgement of mouth is identical with this, repeats no more.
It addition, according to the embodiment of the present invention shown according to Fig. 4 to Figure 17, in actual application,
Being preset with rule on duplicate removal server 103, described rule is: determine M for potential cut-point k
Individual some px, some pxCorresponding window Wx[px-Ax,px+Bx] and window Wx[px-Ax,px+Bx]
Corresponding predetermined condition Cx, x is respectively 1 and arrives M continuous print natural number, M >=2, presets rule at this
In then, A1、A2、A3…AmCan not be the most equal, B1、B2、B3…BmCan not be complete
Portion is equal, C1、C2、C3…CMCan not also be the most identical.At the embodiment shown in Fig. 5
In, at window Wi1[pi1-169,pi1]、Wi2[pi2-169,pi2]、Wi3[pi3-169,pi3]、Wi4
[pi4-169,pi4]、Wi5[pi5-169,pi5]、Wi6[pi6-169,pi6]、Wi7[pi7-169,pi7]、
Wi8[pi8-169,pi8]、Wi9[pi9-169,pi9]、Wi10[pi10-169,pi10] and Wi11[pi11-169,
pi11In], each window size is identical, i.e. window size is 169 bytes, judges in window simultaneously
The mode whether at least part of data meet predetermined condition is the most identical, is specifically shown in above-mentioned judgement Wi1
[pi1-169,pi1In], whether at least part of data meet predetermined condition C1Description, but at Figure 11
In shown embodiment, Wi1[pi1-169,pi1]、Wi2[pi2-169,pi2]、Wi3[pi3-169,
pi3]、Wi4[pi4-169,pi4]、Wi5[pi5-169,pi5]、Wi6[pi6-169,pi6]、Wi7[pi7-169,
pi7]、Wi8[pi8-169,pi8]、Wi9[pi9-169,pi9]、Wi10[pi10-169,pi10] and Wi11[pi11
-182,pi11] window size can differ, and judges that in window, at least part of data are the fullest simultaneously
The mode of foot predetermined condition can also differ.In all embodiments, according in duplicate removal service
The rule preset on device 103, it is judged that window Wi1In at least partly data whether meet predetermined condition
C1Mode with judge window Wj1In at least partly data whether meet predetermined condition C1Mode
Inevitable identical, it is judged that Wi2In at least partly data whether meet predetermined condition C2Mode with sentence
Disconnected Wj2In at least partly data whether meet predetermined condition C2Mode inevitable the most identical ... judge window
Mouth WiMIn at least partly data whether meet predetermined condition CMMode with judge window WjMIn
At least partly whether data meet predetermined condition CMMode inevitable the most identical.Do not repeat them here,
Simultaneously according to the embodiment of the present invention shown in Fig. 4 to Figure 17, although all as a example by M=11, but root
According to being actually needed, the value of M is not limited to 11, and those skilled in the art implement according to the present invention
Description in example, determines the value of M.
According to the embodiment of the present invention shown in Fig. 4 to Figure 17, duplicate removal server 103 is preset with
Rule, ka、ki、kj、klAnd kmFor searching cut-point along data flow point cutpoint search direction
Time obtain potential cut-point, ka、ki、kj、klAnd kmAll according to this rule.The present invention is real
Execute the window W in examplex[px-Ax,px+Bx] represent a particular range, select at this particular range
Select data to judge whether these data meet predetermined condition Cx, specifically, can be specific at this
In the range of select part data, it is also possible to select total data to judge whether these data meet
Predetermined condition Cx.Window concept specifically used in the embodiment of the present invention can refer to window Wx[px
-Ax,px+Bx], do not repeat them here.
According to the embodiment of the present invention shown in Fig. 4 to Figure 17, window Wx[px-Ax,px+BxIn],
(px-Ax) and (px+Bx) represent this window Wx[px-Ax,px+Bx] two borders,
Wherein (px-Ax) represent window Wx[px-Ax,px+Bx] relative to a pxIt is positioned at data flow point
Cutpoint searches reciprocal border, (px+Bx) represent window Wx[px-Ax,px+Bx] relatively
In a pxIt is positioned at the border of data flow point cutpoint search direction.Specifically, in the embodiment of the present invention
In, it is from left to right in the data flow point cutpoint search direction shown in Fig. 3 to Figure 15, wherein
(px-Ax) represent window Wx[px-Ax,px+Bx] relative to a pxIt is positioned at data flow point cutpoint
Search reciprocal border (i.e. left margin), (px+Bx) represent window Wx[px-Ax,px
+Bx] relative to a pxIt is positioned at the border (i.e. right margin) of data flow point cutpoint search direction.
If being from right to left in the data flow point cutpoint search direction shown in Fig. 3 to Figure 15, wherein
(px-Ax) represent window Wx[px-Ax,px+Bx] relative to a pxIt is positioned at data flow point cutpoint
Search reciprocal border (i.e. right margin), (px+Bx) represent window Wx[px-Ax,px
+Bx] relative to a pxIt is positioned at the border (i.e. left margin) of data flow point cutpoint search direction.
Those of ordinary skill in the art are it is to be appreciated that combine respectively showing of embodiment of the present invention description
The unit of example and algorithm steps, the key feature of the embodiment of the present invention can be tied mutually with other technologies
Close, present with increasingly complex form, but still the key feature of the present invention can be comprised.Truly
May use standby cut-point in environment, such as one embodiment is, according in duplicate removal service
The rule preset on device 103, for potential cut-point kiDetermine 11 some px, x is 1 to 11 continuous
Natural number, determine pxCorresponding window Wx[px-Ax,px+Bx] and window Wx[px-Ax,px+
Bx] corresponding predetermined condition Cx, as each window W in 11 windowsx[px-Ax,px+Bx]
In at least partly data be satisfied by predetermined condition Cx, the most potential cut-point kiFor data flow point cutpoint,
When exceeding the maximum data block of setting, do not find cut-point yet, at this moment may use standby
Preset rules, standby preset rules with on duplicate removal server 103 preset rule similar,
Standby preset rules is: the most potential cut-point kiDetermine 10 some px, x is 1 to 10
Continuous print natural number, determines pxCorresponding window Wx[px-Ax,px+Bx] and window Wx[px-Ax,
px+Bx] corresponding predetermined condition Cx, as each window W in 10 windowsx[px-Ax,px+
BxIn], at least part of data are satisfied by predetermined condition Cx, the most potential cut-point kiFor data flow point
Cutpoint, when exceeding the maximum data block of setting, when not finding data flow point cutpoint yet, from
The end position of maximum data block is as force-splitting point.
Duplicate removal server 103 is preset with rule, described rule is potential cut-point k
Determine M point, be not necessarily to first there is a potential cut-point k, can be by really
M fixed point judges potential cut-point k.
The embodiment of the present invention provides a kind of side based on duplicate removal whois lookup data flow point cutpoint
Method, as shown in figure 20, including:
Being preset with rule on duplicate removal server 103, described rule is: true for potential cut-point k
Determine M window Wx[k-Ax,k+Bx] and window Wx[k-Ax,k+Bx] corresponding predetermined condition Cx,
Wherein, x is 1 to M continuous print natural number, M >=2, AxAnd BxFor integer;Shown in Fig. 3
Embodiment in, about the value of M, one of which implementation, M*U value is not more than
Ultimate range between two the adjacent data flow point cutpoints preset, the data block i.e. preset is
Long length.Judge window Wz[k-Az, k+BzIn], whether at least part of data meet predetermined bar
Part Cz, wherein, z is integer, 1≤z≤M, (k-Az) and (k+Bz) represent window respectively
WzTwo borders.When judging any one window Wz[k-Az, k+BzAt least partly count in]
According to being unsatisfactory for predetermined condition Cz, then jump along data flow point cutpoint search direction from potential cut-point k
Jump N number of byte, N≤‖ Bz‖+maxx(‖Ax‖).Wherein, ‖ Bz‖ represents Wz[k-Az,
k+BzB in]zAbsolute value, maxx(‖Ax‖) represent A in M windowxIn absolute value
Big value, will specifically introduce the principle of N value in embodiment below.When judging in M window
Each window Wx[k-Ax,k+BxIn], at least part of data meet predetermined condition Cx, then dive
It is data flow point cutpoints at cut-point k.
Specifically, to current potential cut-point ki, according to described rule, perform following steps:
Step 2001: be current potential cut-point k according to described ruleiDetermine the window W of correspondenceiz
[ki-Az,ki+Bz], i and z is integer, and 1≤z≤M;
Step 2002: judge described window Wiz[ki-Az,ki+BzIn], at least part of data are the fullest
Foot predetermined condition Cz;
As described window Wiz[ki-Az,ki+BzIn], at least part of data are unsatisfactory for described predetermined bar
Part Cz, from described current potential cut-point kiAlong described data flow point cutpoint search direction jump N
Individual data flow point cutpoint minimum searches unit U, and N*U is not more than ‖ Bz‖+maxx(‖Ax‖),
Obtain new potential cut-point, perform step 2001;
As described current potential cut-point kiM window in each window Wix[ki-Ax,ki
+BxIn], at least part of data meet predetermined condition Cx, the most described current potential cut-point kiFor number
According to flow point cutpoint.
Further, described rule also includes: at least two window Wie[ki-Ae,ki+Be] and Wif
[ki-Af,ki+Bf], meet condition: | Ae+Be|=| Af+Bf|, Ce=Cf;Further, institute
State rule also to include: AeAnd AfFor positive integer;Further, described rule also includes: Ae-1=
Af, Be+ 1=Bf.Wherein, | Ae+Be| represent window WieSize, | Af+Bf| represent window
WifSize.
Further, described window W is judgediz[ki-Az,ki+BzIn], whether at least part of data
Meet described predetermined condition Cz, specifically include: use random function to judge described window Wiz[ki-
Az,ki+BzIn], whether at least part of data meet described predetermined condition Cz;Further, institute
State use random function and judge described window Wiz[ki-Az,ki+BzIn], whether at least part of data
Meet described predetermined condition Cz, it is specially and uses hash function to judge described window Wiz[ki-Az,ki
+BzIn], whether at least part of data meet described predetermined condition Cz。
As described window Wiz[ki-Az,ki+BzIn], at least part of data are unsatisfactory for described predetermined
Condition Cz, from described current potential cut-point kiJump along described data flow point cutpoint search direction
N number of data flow point cutpoint minimum searches unit U, it is thus achieved that described new potential cut-point, according to institute
State rule, the window W determined for described new potential cut-pointic[ki-Ac,ki+Bc] the left side
Boundary and described window Wiz[ki-Az,ki+Bz] right margin overlap or be described newly potential
The described window W that cut-point determinesic[ki-Ac,ki+Bc] left margin be positioned at described window Wiz
[ki-Az,ki+BzWithin the scope of];Wherein, described in determining for described new potential cut-point
Window Wic[ki-Ac,ki+Bc] it is according to described rule, determine for described new potential cut-point
The sequence that obtains according to data stream search direction of M window in sort first window.
The embodiment of the present invention at least partly counts in some window in M window by judging
According to whether meeting predetermined condition, search data flow point cutpoint, when at least portion in some window
Divided data is unsatisfactory for predetermined condition, then skip N*U length, and wherein, N*U is not more than ‖
Bz‖+maxx(‖Ax‖), it is thus achieved that next potential cut-point, improve data flow point cutpoint
Search efficiency.
During data de-duplication, for ensureing that data block size is uniform, average can be considered
According to block (being referred to as average piecemeal) size, i.e. meeting minimum data block size and maximum
While data block size limits, can determine whether average data block size, to ensure the data obtained
Block size is uniform.Window Wx[k-Ax, k+Bx] number M and window Wx[k-Ax, k+Bx]
In at least partly data meet pre-conditioned probability the two factor and determine and find data stream
The probability (representing with P (n)) of cut-point, the former affects the length of jump, and the latter affects jump
Probability, the two joint effect average mark block size.It is said that in general, it is solid at average mark block size
Regularly, Wx[k-Ax, k+Bx] number increase, then single window Wx[k-Ax, k+BxIn] extremely
Small part data meet the probability of predetermined condition also to be increased, such as pre-on duplicate removal server 103
Being provided with rule, described rule is: determine 11 window W for potential cut-point kx[k-Ax, k+Bx],
X is respectively 1 to 11 continuous print natural numbers, any one window W in 11 windowsx[k-Ax, k+Bx]
In at least partly data meet pre-conditioned probability is 1/2.And it is pre-on duplicate removal server 103
If another group rule be: determine 24 window W for potential cut-point kx[k-Ax, k+Bx],
X is respectively 1 to 24 continuous print natural numbers, any one window W in 24 windowsx[k-Ax, k+Bx]
In at least partly data meet pre-conditioned probability 3/4, concrete window Wx[k-Ax, k+Bx]
In at least partly data meet pre-conditioned probability and set can be found in and judge window Wx[k-Ax,
k+BxIn], whether at least part of data meet the description of pre-conditioned part.Window Wx[k-Ax,
k+Bx] number M and window Wx[k-Ax, k+BxThe default bar that in], at least part of data meet
The probability the two factor of part determines P (n), and P (n) represents: from data stream original position or
Search after n data flow point cutpoint minimum searches unit from a upper data flow point cutpoint and do not find number
Probability according to flow point cutpoint.The calculating process of P (n) is determined, actually about the two factor
Step-length Fibonacci ordered series of numbers more than, after will be described in detail.After obtaining P (n), 1-P (n)
Being the distribution function of data flow point cutpoint, (1-P (n))-(1-P (n-1))=P (n-1)-P (n) is
N data flow point cutpoint minimum is searched unit and is found data flow point cutpoint probability, namely data
The density function of flow point cutpoint, the density function according to data flow point cutpoint just can be with integrationThus try to achieve the desired length of data flow point cutpoint, i.e. average mark
Block size, wherein, 4*1024 (byte) represents minimum data block length, 12*1024 (byte)
Represent maximum data block length.
On the basis of the data flow point cutpoint shown in Fig. 3 is searched, the embodiment party shown in Figure 21
In formula, being preset with rule on duplicate removal server 103, described rule is: for potential cut-point k
Determine 11 window Wx[k-Ax,k+Bx] and window Wx[k-Ax,k+Bx] corresponding predetermined condition
Cx, wherein, x is 1 to 11 continuous print natural numbers, AxAnd BxFor integer.Wherein, A1=169, B1
=0;A2=170, B2=-1;A3=171, B3=-2;A4=172, B4=-3;A5=173, B5=-4;A6=174,
B6=-5;A7=175, B7=-6;A8=176, B8=-7;A9=177, B9=-8;A10=178, B10=-9;A11=179,
B11=-10, and C1=C2=C3=C4=C5=C6=C7=C8=C9=C10=C11, then 11 windows
It is respectively W1[k-169,k]、W2[k-170,k-1]、W3[k-171,k-2]、W4[k-172,k-3]、
W5[k-173,k-4]、W6[k-174,k-5]、W7[k-175,k-6]、W8[k-176,k-7]、W9[k-177,
k-8]、W10[k-178, k-9] and W11[k-179,k-10]。kaFor data flow point cutpoint, Tu21Zhong
Shown data flow point cutpoint search direction is from left to right, from data flow point cutpoint kaSkip minimum
After data block 4KB, minimum data block 4KB end position is as next potential cut-point ki,
According to the rule preset for duplicate removal server 103, for potential cut-point kiDetermine window Wix[ki-
Ax,ki+Bx], in the present embodiment, x is respectively 1 to 11 continuous print natural numbers.Shown in Figure 21
Embodiment in, for potential cut-point kiThe window determined is 11, respectively Wi1[ki-169,
ki]、Wi2[ki-170,ki-1]、Wi3[ki-171,ki-2]、Wi4[ki-172,ki-3]、Wi5[ki-173,
ki-4]、Wi6[ki-174,ki-5]、Wi7[ki-175,ki-6]、Wi8[ki-176,ki-7]、Wi9[ki-177,
ki-8]、Wi10[ki-178,ki-9] and Wi11[ki-179,ki-10].Judge Wi1[ki-169,kiIn] extremely
Whether small part data meet predetermined condition C1, judge Wi2[ki-170,ki-1] at least partly count in
According to whether meeting predetermined condition C2, judge Wi3[ki-171,ki-2] in, at least part of data are the fullest
Foot predetermined condition C3, judge Wi4[ki-172,ki-3] in, whether at least part of data meet predetermined bar
Part C4, judge Wi5[ki-173,ki-4] in, whether at least part of data meet predetermined condition C5, sentence
Disconnected Wi6[ki-174,ki-5] in, whether at least part of data meet predetermined condition C6, judge Wi7[ki
-175,ki-6] in, whether at least part of data meet predetermined condition C7, judge Wi8[ki-176,ki-7]
In at least partly data whether meet predetermined condition C8, judge Wi9[ki-177,ki-8] at least portion in
Whether divided data meets predetermined condition C9, judge Wi10[ki-178,ki-9] in, at least part of data are
No meet predetermined condition C10With judge Wi11[ki-179,ki-10] in, at least part of data are the fullest
Foot predetermined condition C11.When judging window Wi1In at least partly data meet predetermined condition C1, window
Mouth Wi2In at least partly data meet predetermined condition C2, window Wi3In at least partly data meet
Predetermined condition C3, window Wi4In at least partly data meet predetermined condition C4, window Wi5In extremely
Small part data meet predetermined condition C5, window Wi6In at least partly data meet predetermined condition
C6, window Wi7In at least partly data meet predetermined condition C7, window Wi8In at least partly count
According to meeting predetermined condition C8, window Wi9In at least partly data meet predetermined condition C9, window
Wi10In at least partly data meet predetermined condition C10With window Wi11In at least partly data meet
Predetermined condition C11Time, the most current potential cut-point kiFor data flow point cutpoint.When in 11 windows
When in any one window, at least part of data are unsatisfactory for the predetermined condition of correspondence, as shown in figure 22,
Wi5[ki-173,ki-4], then from potential cut-point kiJump along data flow point cutpoint search direction
N number of byte, the most N number of byte is not more than ‖ B5‖+maxx(‖Ax‖), shown in Figure 22
In embodiment, N number of byte of jumping is not more than 183 bytes, in the present embodiment, N=7,
Obtain new potential cut-point, for potential cut-point kiDifference, here by new potential segmentation
Point is expressed as kj.According in the embodiment shown in Figure 21, duplicate removal server 103 is preset
Regular, described rule is: for potential cut-point kjDetermine window Wjx[kj-Ax,kj+Bx],
In the present embodiment, x is respectively 1 to 11 continuous print natural numbers.For potential cut-point kjDetermine
Window be 11, respectively Wj1[kj-169,kj]、Wj2[kj-170,kj-1]、Wj3[kj-171,
kj-2]、Wj4[kj-172,kj-3]、Wj5[kj-173,kj-4]、Wj6[kj-174,kj-5]、Wj7[kj
-175,kj-6]、Wj8[kj-176,kj-7]、Wj9[kj-177,kj-8]、Wj10[kj-178,kj-9] and Wj11
[kj-179,kj-10].As shown in figure 22, the 11st the window W determined for potential cut-pointj11[kj
-179,kj-10], potential cut-point k is being ensurediWith potential cut-point kjBetween scope all sentencing
Within the scope of Duan, the most in the present embodiment, it is necessary to assure window Wj11[kj-179,kj-10]
Left margin and window Wi5[ki-173,ki-4] right margin (ki-4) overlap, or be positioned at window Wi5
[ki-173,ki-4] within the scope of, described window Wj11[kj-179,kj-10] it is according to described rule,
For described potential cut-point kjThe sequence that M the window determined obtains according to data stream search direction
The window of sequence first in row.Therefore, in this restriction, as window Wi5[ki-173,ki-4]
In at least partly data be unsatisfactory for predetermined condition C5, from potential cut-point kiAlong data flow point cutpoint
The distance that search direction is jumped is not more than ‖ B5‖+maxx(‖Ax‖).Judge Wj1[kj-169,
kjIn], whether at least part of data meet predetermined condition C1, judge Wj2[kj-170,kj-1] in extremely
Whether small part data meet predetermined condition C2, judge Wj3[kj-171,kj-2] at least partly count in
According to whether meeting predetermined condition C3, judge Wj4[kj-172,kj-3] in, at least part of data are the fullest
Foot predetermined condition C4, judge Wj5[kj-173,kj-4] in, whether at least part of data meet predetermined bar
Part C5, judge Wj6[kj-174,kj-5] in, whether at least part of data meet predetermined condition C6, sentence
Disconnected Wj7[kj-175,kj-6] in, whether at least part of data meet predetermined condition C7, judge Wj8[kj
-176,kj-7] in, whether at least part of data meet predetermined condition C8, judge Wj9[kj-177,kj-8]
In at least partly data whether meet predetermined condition C9, judge Wj10[kj-178,kj-9] at least
Whether part data meet predetermined condition C10With judge Wj11[kj-179,kj-10] at least partly
Whether data meet predetermined condition C11.When judging window Wj1In at least partly data meet predetermined
Condition C1, window Wj2In at least partly data meet predetermined condition C2, window Wj3In at least portion
Divided data meets predetermined condition C3, window Wj4In at least partly data meet predetermined condition C4、
Window Wj5In at least partly data meet predetermined condition C5, window Wj6In at least partly data full
Foot predetermined condition C6, window Wj7In at least partly data meet predetermined condition C7, window Wj8In
At least partly data meet predetermined condition C8, window Wj9In at least partly data meet predetermined bar
Part C9, window Wj10In at least partly data meet predetermined condition C10With window Wj11In at least portion
Divided data meets predetermined condition C11Time, the most current potential cut-point kiFor data flow point cutpoint, kj
With kaBetween data constitute 1 data block, simultaneously according to kaIdentical mode skips minimum
Piecemeal size 4KB, it is thus achieved that next potential cut-point, and according on duplicate removal server 103
The rule preset, it is judged that whether next potential cut-point is data flow point cutpoints.Latent when judging
At cut-point kjWhen not being data flow point cutpoint, according to kiIdentical mode obtains next latent
At cut-point, and according under the rule preset on duplicate removal server 103 and said method judgement
Whether one potential cut-point is data flow point cutpoints.When exceeding the maximum data block of setting still
When not finding data flow point cutpoint, then from the end position of maximum data block as force-splitting
Point.
In embodiment as shown in figure 21, according to the rule preset on duplicate removal server 103
Then, from judging Wi1[ki-169,kiIn], whether at least part of data meet predetermined condition C1Start,
When judging Wi1[ki-169,ki]、Wi2[ki-170,ki-1]、Wi3[ki-171,ki-2] and Wi4[ki-172,
ki-3] in, at least part of data, at least part of data meet predetermined condition C respectively1、C2、C3With
C4, it is judged that Wi5[ki-173,ki-4] in, at least part of data are unsatisfactory for predetermined condition C5Time, from latent
At cut-point kiAlong data flow point cutpoint search direction 6 bytes of jump, the 6th byte
End position obtains new potential cut-point, for distinguishing with other potential cut-points, shown herein as
For kg, according to the rule preset on duplicate removal server 103, for potential cut-point kgDetermine 11
Individual window, respectively Wg1[kg-169,kg]、Wg2[kg-170,kg-1]、Wg3[kg-171,kg-2]、
Wg4[kg-172,kg-3]、Wg5[kg-173,kg-4]、Wg6[kg-174,kg-5]、Wg7[kg-175,kg
-6]、Wg8[kg-176,kg-7]、Wg9[kg-177,kg-8]、Wg10[kg-178,kg-9] and Wg11[kg
-179,kg-10].Judge Wg1[kg-169,kgIn], whether at least part of data meet predetermined condition C1、
Judge Wg2[kg-170,kg-1] in, whether at least part of data meet predetermined condition C2, judge Wg3
[kg-171,kg-2] in, whether at least part of data meet predetermined condition C3, judge Wg4[kg-172,
kg-3] in, whether at least part of data meet predetermined condition C4, judge Wg5[kg-173,kg-4] in
At least partly whether data meet predetermined condition C5, judge Wg6[kg-174,kg-5] at least portion in
Whether divided data meets predetermined condition C6, judge Wg7[kg-175,kg-6] at least part of data in
Whether meet predetermined condition C7, judge Wg8[kg-176,kg-7] in, at least part of data are the fullest
Foot predetermined condition C8, judge Wg9[kg-177,kg-8] in, whether at least part of data meet predetermined
Condition C9, judge Wg10[kg-178,kg-9] in, whether at least part of data meet predetermined condition C10
With judge Wg11[kg-179,kg-10] in, whether at least part of data meet predetermined condition C11.Window
Wg11[kg-179,kg-10] with window Wi5[ki-173,ki-4] overlap, and C5=C11, therefore,
When judging Wi5[ki-173,ki-4] in, at least part of data are unsatisfactory for predetermined condition C5Time, from potential
Cut-point kiAlong data flow point cutpoint T byte of search direction jump, it is thus achieved that potential segmentation
Point kgStill the condition as data flow point cutpoint is not met.Therefore, if from potential cut-point
kiDouble counting can be there is along data flow point cutpoint search direction 6 bytes of jumping, therefore,
From potential cut-point kiWeight can be reduced along data flow point cutpoint search direction 7 bytes of jump
Multiple calculating, in hgher efficiency.Therefore improve the speed searching data flow point cutpoint.When default rule
Window W in fixedx[k-Ax,k+BxIn], at least part of data meet predetermined condition CxProbability be
When 1/2, i other words perform jump with the probability of 1/2, the most at most can jump ‖ B11‖+‖
A11‖=189 byte.
In the present embodiment, pre-defined rule is: determine 11 window W for potential cut-point kx
[k-Ax,k+Bx] and window Wx[k-Ax,k+BxIn], at least part of data meet pre-conditioned Cx,
Wherein Wx[k-Ax,k+BxIn], at least part of data meet pre-conditioned CxProbability be 1/2, x
It is respectively 1 to 11 continuous print natural number and AxAnd BxFor integer.Wherein, A1=169, B1=0;
A2=170, B2=-1;A3=171, B3=-2;A4=172, B4=-3;A5=173, B5=-4;A6=174, B6=-5;
A7=175, B7=-6;A8=176, B8=-7;A9=177, B9=-8;A10=178, B10=-9;A11=179,
B11=-10, and C1=C2=C3=C4=C5=C6=C7=C8=C9=C10=C11.It is potential point
Cutpoint k selects 11 windows, and is continuous 11 windows, can be counted by the two factor
Calculate P (n).The selection mode of 11 windows and judge in each window in 11 windows at least
Part data meet predetermined condition CxFollow the rule preset on duplicate removal server 103, therefore
Whether there are in continuous 11 windows at least part of data in each window and meet predetermined condition
CxJust determine whether potential cut-point k is data flow point cutpoints.We claim between two bytes
Gap is a point.P (n) represents: there is not continuous print 11 in n window of continuous print full
, the most there is not the probability of data flow point cutpoint in the probability of the window of foot condition.From file header/
One cut-point jumps after minimum piecemeal size 4KB, searches opposite direction rollback to data flow point cutpoint
10 bytes, find the 4086th point, the most there is not data flow point cutpoint, so P
(4086)=1, the like, P (4087)=1 ... P (4095)=1.The 4096th
At individual point, i.e. at minimum piecemeal size, with every in these 11 windows of probability of (1/2) ^11
In one window, at least part of data meet predetermined condition Cx, therefore with the probability of (1/2) ^11
There is data flow point cutpoint, there is not data flow point cutpoint, institute with the probability of 1-(1/2) ^11
With P (4096)=1-(1/2) ^11.
At the n-th window, 12 kinds of situations can be divided into carry out recursion P (n).
In situation 1: the n-th window, at least part of data are unsatisfactory for predetermined condition with the probability of 1/2,
Now there is not continuous print 11 with the probability of P (n-1) in n-1 window before the n-th window
In window, at least part of data of each window are satisfied by predetermined condition, and therefore P (n) comprises 1/2*
P(n-1).In n-th window, at least part of data are unsatisfactory for predetermined condition, and while n-th
There are at least part of data in 11 each windows of window of continuous print in some n-1 window above
The situation being satisfied by predetermined condition is unrelated with P (n).
In situation 2: the n-th window, at least part of data meet predetermined condition with the probability of 1/2,
In (n-1)th window, at least part of data are unsatisfactory for predetermined condition with the probability of 1/2, and now (n-1)th
N-2 window before individual window does not exist in 11 windows of continuous print with the probability of P (n-2)
In each window, at least part of data are satisfied by predetermined condition, and therefore P (n) comprises 1/2*1/2*P
(n-2).In n-th window, at least part of data meet predetermined condition, in (n-1)th some window
N-2 the window that at least partly data are unsatisfactory for before predetermined condition, and (n-1)th window is deposited
In 11 windows of continuous print, at least part of data of each window meet the situation of predetermined condition
Unrelated with P (n).
According to foregoing description, in 11: the n-th to n-9 window of situation, at least part of data are with (1/2)
The probability of ^10 meets predetermined condition, in the (n-1)th 0 windows at least partly data with 1/2 probability
Being unsatisfactory for predetermined condition, now n-11 window before the (n-1)th 0 windows is with P's (n-11)
There are not in 11 windows of continuous print at least part of data in each window and be satisfied by pre-in probability
Fixed condition, therefore P (n) comprises (1/2) ^10*1/2*P (n-11).The n-th to n-9 window
In Kou, at least part of data are satisfied by predetermined condition, and in the (n-1)th 0 windows, at least partly data are not
Meet predetermined condition, and n-11 window before the (n-1)th 0 windows exists continuous print 11
In window, in each window, at least part of data are satisfied by situation and P (n) nothing of predetermined condition
Close.
In the window that situation is 12: the n-th to n-10, at least part of data are with the probability of (1/2) ^11
Meeting predetermined condition, this situation is unrelated with P (n).
Therefore, P (n)=1/2*P (n-1)+(1/2) ^2*P (n-2)+...+(1/2)
^11*P(n-11).Another kind of preset rules: determine 24 window W for potential cut-point kx[k
-Ax,k+Bx] and window Wx[k-Ax,k+Bx] corresponding predetermined condition Cx, wherein, x is 1 to 11
Continuous print natural number, A1=169, B1=0;A2=170, B2=-1;A3=171, B3=-2;A4=172, B4
=-3;A5=173, B5=-4;A6=174, B6=-5;A7=175, B7=-6;A8=176, B8=-7;A9=177,
B9=-8;A10=178, B10=-9;A11=179, B11=-10 ... A24=192, B24=-23, and C1=C2
=C3=C4=C5=C6=C7=C8=C9=...=C24, window Wx[k-Ax,k+BxIn] at least partly
Data meet predetermined condition CxProbability be 3/4, P (n) can be calculated by the two factor.
The most whether there are at least part of data in each window in continuous 24 windows equal
Meet predetermined condition CxJust determine whether potential cut-point k is data flow point cutpoints, can pass through
Equation below calculates:
P (1)=1, P (2) ... P (23)=1, P (24)=1-(3/4) ^24,
P (n)=1/4*P (n-1)+1/4* (3/4) * P (n-2)+...+1/4* (3/4)
^23*P(n-24)。
Through calculating, P (5*1024)=0.78, P (11*1024)=0.17, P (12*1024)=0.13,
I.e. from data stream original position/a data flow point cutpoint find after 12KB the probability with 13%
Do not find data flow point cutpoint yet, force to split.By this probability, try to achieve data stream
The density function of cut-point, through integration try to achieve about averagely from data stream original position/on
One data flow point cutpoint finds data flow point cutpoint when searching 7.6KB, i.e. average mark block length is big
It is about 7.6KB.At least part of data meet predetermined with the probability of 1/2 with 11 windows of continuous print
Condition is different, when tradition CDC algorithm uses a window to meet condition with the probability of 1/2^12,
The effect of average mark block length 7.6KB can be reached.
On the basis of the data flow point cutpoint shown in Fig. 3 is searched, the embodiment party shown in Figure 23
In formula, being preset with rule on duplicate removal server 103, described rule is: for potential cut-point k
Determine 11 window Wx[k-Ax,k+Bx] and window Wx[k-Ax,k+Bx] corresponding predetermined condition
Cx, wherein, x is 1 to 11 continuous print natural numbers, AxAnd BxFor integer.Wherein, window Wx
[k-Ax,k+BxIn], at least part of data meet predetermined condition CxProbability be 1/2, A1=171, B1
=-2;A2=172, B2=-3;A3=173, B3=-4;A4=174, B4=-5;A5=175, B5=-6;A6=176,
B6=-7;A7=177, B7=-8;A8=178, B8=-9;A9=179, B9=-10;A10=170, B10=-1;A11
=169, B11=0, and C1=C2=C3=C4=C5=C6=C7=C8=C9=C10=C11。kaFor number
According to flow point cutpoint, the cutpoint search direction of data flow point shown in Figure 23 is from left to right, from data
Flow point cutpoint kaAfter skipping minimum data block 4KB, in minimum data block 4KB end position conduct
Next potential cut-point ki, according to the rule preset on duplicate removal server 103, for potential
Cut-point kiDetermine Wx[k-Ax,k+Bx] and window Wx[k-Ax,k+Bx] corresponding pre-conditioned
Cx, wherein x is 1 to 11 continuous print natural numbers.11 windows determined are respectively Wi1[ki-171,
ki-2]、Wi2[ki-172,ki-3]、Wi3[ki-173,ki-4]、Wi4[ki-174,ki-5]、Wi5[ki-175,
ki-6]、Wi6[ki-176,ki-7]、Wi7[ki-177,ki-8]、Wi8[ki-178,ki-9]、Wi9[ki-179,
ki-10]、Wi10[ki-170,ki-1] and Wi11[ki-169,ki].Judge Wi1[ki-171,ki-2] in extremely
Whether small part data meet predetermined condition C1, judge Wi2[ki-172,ki-3] at least partly count in
According to whether meeting predetermined condition C2, judge Wi3[ki-173,ki-4] in, at least part of data are the fullest
Foot predetermined condition C3, judge Wi4[ki-174,ki-5] in, whether at least part of data meet predetermined bar
Part C4, judge Wi5[ki-175,ki-6] in, whether at least part of data meet predetermined condition C5, sentence
Disconnected Wi6[ki-176,ki-7] in, whether at least part of data meet predetermined condition C6, judge Wi7[ki
-177,ki-8] in, whether at least part of data meet predetermined condition C7, judge Wi8[ki-178,ki-9]
In at least partly data whether meet predetermined condition C8, judge Wi9[ki-179,ki-10] at least
Whether part data meet predetermined condition C9, judge Wi10[ki-170,ki-1] at least part of data in
Whether meet predetermined condition C10With judge Wi11[ki-169,kiIn], whether at least part of data meet
Predetermined condition C11.When judging window Wi1In at least partly data meet predetermined condition C1, window
Wi2In at least partly data meet predetermined condition C2, window Wi3In at least partly data meet pre-
Fixed condition C3, window Wi4In at least partly data meet predetermined condition C4, window Wi5In at least
Part data meet predetermined condition C5, window Wi6In at least partly data meet predetermined condition C6、
Window Wi7In at least partly data meet predetermined condition C7, window Wi8In at least partly data full
Foot predetermined condition C8, window Wi9In at least partly data meet predetermined condition C9, window Wi10In
At least partly data meet predetermined condition C10With window Wi11In at least partly data meet predetermined
Condition C11Time, the most current potential cut-point kiFor data flow point cutpoint.When arbitrary in 11 windows
When in individual window, at least part of data are unsatisfactory for the predetermined condition of correspondence, as shown in figure 24, Wi3
[pi3-169,pi3In], at least part of data are unsatisfactory for predetermined condition C3, put pi3Along data flow point
It is described as a example by cutpoint search direction 11 bytes of jump.As shown in figure 24, when judging W3No
Meet predetermined condition C3Time, with kiFor starting point, jump along data flow point cutpoint search direction
N number of byte, the most N number of byte is not more than ‖ B3‖+maxx(‖Ax‖), in the present embodiment,
N=7, at the end position of the 7th byte, it is thus achieved that next potential cut-point, for potential
Cut-point kiDifference, is expressed as k by new potential cut-point herej, according at duplicate removal server
The rule preset on 103, for potential cut-point kjDetermine 11 window Wjx[kj-Ax,kj+Bx],
It is respectively Wj1[kj-171,kj-2]、Wj2[kj-172,kj-3]、Wj3[kj-173,kj-4]、Wj4[kj
-174,kj-5]、Wj5[kj-175,kj-6]、Wj6[kj-176,kj-7]、Wj7[kj-177,kj-8]、Wj8
[kj-178,kj-9]、Wj9[kj-179,kj-10]、Wj10[kj-170,kj-1] and Wj11[kj-169,kj]。
Judge Wj1[kj-171,kj-2] in, whether at least part of data meet predetermined condition C1, judge Wj2
[kj-172,kj-3] in, whether at least part of data meet predetermined condition C2, judge Wj3[kj-173,
kj-4] in, whether at least part of data meet predetermined condition C3, judge Wj4[kj-174,kj-5] in
At least partly whether data meet predetermined condition C4, judge Wj5[kj-175,kj-6] at least partly
Whether data meet predetermined condition C5, judge Wj6[kj-176,kj-7] in, whether at least part of data
Meet predetermined condition C6, judge Wj7[kj-177,kj-8] in, whether at least part of data meet predetermined
Condition C7, judge Wj8[kj-178,kj-9] in, whether at least part of data meet predetermined condition C8、
Judge Wj9[kj-179,kj-10] in, whether at least part of data meet predetermined condition C9, judge
Wj10[kj-170,kj-1] in, whether at least part of data meet predetermined condition C10With judge Wj11[kj
-169,kjIn], whether at least part of data meet predetermined condition C11.Certainly in the embodiment of the present invention
In, it is judged that potential cut-point kaAlso in compliance with this principle when whether being data flow point cutpoint, specifically real
The most no longer describe, be referred to judge potential cut-point kiDescription.When judging window Wj1In
At least partly data meet predetermined condition C1, window Wj2In at least partly data meet predetermined bar
Part C2, window Wj3In at least partly data meet predetermined condition C3, window Wj4In at least partly
Data meet predetermined condition C4, window Wj5In at least partly data meet predetermined condition C5, window
Mouth Wj6In at least partly data meet predetermined condition C6, window Wj7In at least partly data meet
Predetermined condition C7, window Wj8In at least partly data meet predetermined condition C8, window Wj9In extremely
Small part data meet predetermined condition C9, window Wj10In at least partly data meet predetermined condition
C10With window Wj11In at least partly data meet predetermined condition C11Time, the most current potential segmentation
Point kjFor data flow point cutpoint, kjWith kaBetween data constitute 1 data block, simultaneously according to
With kaIdentical mode skips minimum piecemeal size 4KB, it is thus achieved that next potential cut-point, and
According to the rule preset on duplicate removal server 103, it is judged that whether next potential cut-point is
Data flow point cutpoint.When judging potential cut-point kjWhen not being data flow point cutpoint, according to ki
Identical mode obtains next potential cut-point, and presets according on duplicate removal server 103
Rule and said method judge whether next potential cut-point is data flow point cutpoints.When super
Cross the maximum data block set when the most not finding data flow point cutpoint, then from maximum data block
End position as force-splitting point.Certainly the enforcement of the method by maximum data block length and
Constitute the size constraint of the file of this data stream, do not repeat them here.
On the basis of the data flow point cutpoint shown in Fig. 3 is searched, the embodiment party shown in Figure 25
In formula, being preset with rule on duplicate removal server 103, described rule is: for potential cut-point k
Determine 11 window Wx[k-Ax,k+Bx] and window Wx[k-Ax,k+Bx] corresponding predetermined condition
Cx, wherein x is 1 to 11 consecution natural numbers, A1=166, B1=3;A2=167, B2=2;A3=168, B3
=1;A4=169, B4=0;A5=170, B5=-1;A6=171, B6=-2;A7=172, B7=-3;A8=173,
B8=-4;A9=174, B9=-5;A10=175, B10=-6;A11=176, B11=-7;And C1=C2=C3=C4
=C5=C6=C7=C8=C9=C10=C11, then 11 windows are respectively W1[k-166,k+3]、W2
[k-167,k+2]、W3[k-168,k+1]、W4[k-169,k]、W5[k-170,k-1]、W6[k-171,
k-2]、W7[k-172,k-3]、W8[k-173,k-4]、W9[k-174,k-5]、W10[k-175,k-6]
And W11[k-176,k-7]。kaFor data flow point cutpoint, the cutpoint of data flow point shown in Figure 25 is looked into
Looking for direction is from left to right, from data flow point cutpoint kaAfter skipping minimum data block 4KB, minimum
Data block 4KB end position is as next potential cut-point ki, in the present embodiment, according to
The rule preset on duplicate removal server 103, for potential cut-point kiDetermine 11 window Wix[k-
Ax,k+Bx] and window Wix[k-Ax,k+Bx] corresponding predetermined condition Cx, x is respectively 1 to 11 even
Continuous natural number.In the embodiment shown in Figure 25, for potential cut-point kiDetermine 11 windows
Mouthful, respectively Wi1[ki-166,ki+3]、Wi2[ki-167,ki+2]、Wi3[ki-168,ki+1]、
Wi4[ki-169,ki]、Wi5[ki-170,ki-1]、Wi6[ki-171,ki-2]、Wi7[ki-172,ki-3]、
Wi8[ki-173,ki-4]、Wi9[ki-174,ki-5]、Wi10[ki-175,ki-6] and Wi11[ki-176,ki-7]。
Judge Wi1[ki-166,ki+ 3] in, whether at least part of data meet predetermined condition C1, judge Wi2
[ki-167,ki+ 2] in, whether at least part of data meet predetermined condition C2, judge Wi3[ki-168,
ki+ 1] in, whether at least part of data meet predetermined condition C3, judge Wi4[ki-169,kiIn] extremely
Whether small part data meet predetermined condition C4, judge Wi5[ki-170,ki-1] at least partly count in
According to whether meeting predetermined condition C5, judge Wi6[ki-171,ki-2] in, at least part of data are the fullest
Foot predetermined condition C6, judge Wi7Wi7[ki-172,ki-3] in, whether at least part of data meet pre-
Fixed condition C7, judge Wi8[ki-173,ki-4] in, whether at least part of data meet predetermined condition C8、
Judge Wi9[ki-174,ki-5] in, whether at least part of data meet predetermined condition C9, judge Wi10
[ki-175,ki-6] in, whether at least part of data meet predetermined condition C10With judge Wi11[ki-176,
ki-7] in, whether at least part of data meet predetermined condition C11.When judging window Wi1In at least portion
Divided data meets predetermined condition C1, window Wi2In at least partly data meet predetermined condition C2、
Window Wi3In at least partly data meet predetermined condition C3, window Wi4In at least partly data full
Foot predetermined condition C4, window Wi5In at least partly data meet predetermined condition C5, window Wi6In
At least partly data meet predetermined condition C6, window Wi7In at least partly data meet predetermined bar
Part C7, window Wi8In at least partly data meet predetermined condition C8, window Wi9In at least partly
Data meet predetermined condition C9, window Wi10In at least partly data meet predetermined condition C10And window
Mouth Wi11In at least partly data meet predetermined condition C11Time, the most current potential cut-point kiFor number
According to flow point cutpoint.When data at least part of in any one window in 11 windows are unsatisfactory for correspondence
During predetermined condition, as shown in figure 26, Wi7[ki-172,ki-3], then from potential cut-point kiAlong
The data flow point cutpoint search direction N number of byte of jump, the most N number of byte is not more than ‖ B7‖+
maxx(‖Ax‖), in the embodiment shown in Figure 26, N number of byte of jumping is not more than 185
Individual byte, in the present embodiment, N=5, obtain new potential cut-point, for potential segmentation
Point kiDifference, is expressed as k by new potential cut-point herej, according to the embodiment party shown in Figure 25
The rule preset on duplicate removal server 103 in formula, for potential cut-point kjThe window determined is
11, respectively Wj1[kj-166,kj+3]、Wj2[kj-167,kj+2]、Wj3[kj-168,kj+1]、
Wj4[kj-169,kj]、Wj5[kj-170,kj-1]、Wj6[kj-171,kj-2]、Wj7[kj-172,kj-3]、
Wj8[kj-173,kj-4]、Wj9[kj-174,kj-5]、Wj10[kj-175,kj-6] and Wj11[kj-176,kj
-7].Judge Wj1[kj-166,kj+ 3] in, whether at least part of data meet predetermined condition C1, sentence
Disconnected Wj2[kj-167,kj+ 2] in, whether at least part of data meet predetermined condition C2, judge Wj3[kj
-168,kj+ 1] in, whether at least part of data meet predetermined condition C3, judge Wj4[kj-169,kj]
In at least partly data whether meet predetermined condition C4, judge Wj5[kj-170,kj-1] at least portion in
Whether divided data meets predetermined condition C5, judge Wj6[kj-171,kj-2] in, at least part of data are
No meet predetermined condition C6, judge Wj7[kj-172,kj-3] in, whether at least part of data meet pre-
Fixed condition C7, judge Wj8[kj-173,kj-4] in, whether at least part of data meet predetermined condition C8、
Judge Wj9[kj-174,kj-5] in, whether at least part of data meet predetermined condition C9, judge Wj10
[kj-175,kj-6] in, whether at least part of data meet predetermined condition C10With judge Wj11[kj-176,
kj-7] in, whether at least part of data meet predetermined condition C11.The most in embodiments of the present invention,
Judge potential cut-point kaAlso in compliance with this principle when whether being data flow point cutpoint, implement not
Describe again, be referred to judge potential cut-point kiDescription.When judging window Wj1In at least
Part data meet predetermined condition C1, window Wj2In at least partly data meet predetermined condition C2、
Window Wj3In at least partly data meet predetermined condition C3, window Wj4In at least partly data full
Foot predetermined condition C4, window Wj5In at least partly data meet predetermined condition C5, window Wj6In
At least partly data meet predetermined condition C6, window Wj7In at least partly data meet predetermined bar
Part C7, window Wj8In at least partly data meet predetermined condition C8, window Wj9In at least partly
Data meet predetermined condition C9, window Wj10In at least partly data meet predetermined condition C10And window
Mouth Wj11In at least partly data meet predetermined condition C11Time, the most current potential cut-point kjFor number
According to flow point cutpoint, kjWith kaBetween data constitute 1 data block, simultaneously according to kaIdentical
Mode skip minimum piecemeal size 4KB, it is thus achieved that next potential cut-point, and according to going
The rule preset on weight server 103, it is judged that whether next potential cut-point is data flow point
Cutpoint.When judging potential cut-point kjWhen not being data flow point cutpoint, according to kiIdentical side
Formula obtain next potential cut-point, and according on duplicate removal server 103 preset rule and
Said method judges whether next potential cut-point is data flow point cutpoints.When exceeding setting
When maximum data block does not the most find data flow point cutpoint, then from the stop bits of maximum data block
Put as force-splitting point.
On the basis of the data flow point cutpoint shown in Fig. 3 is searched, in the reality shown in Figure 27
Executing in mode, be preset with rule on duplicate removal server 103, described rule is: be latent
11 window W are determined at cut-point kx[k-Ax,k+Bx] and window Wx[k-Ax,k+Bx] right
Predetermined condition C answeredx, wherein x is 1 to 11 continuous print natural numbers, A1=169, B1=0;
A2=170, B2=-1;A3=171, B3=-2;A4=172, B4=-3;A5=173, B5=-4;
A6=174, B6=-5;A7=175, B7=-6;A8=176, B8=-7;A9=177, B9=-8;
A10=168, B10=1;A11=179, B11=3;And C1=C2=C3=C4=C5=C6=C7=C8
=C9=C10≠C11, then 11 windows are respectively W1[k-169,k]、W2[k-170,k-1]、W3
[k-171,k-2]、W4[k-172,k-3]、W5[k-173,k-4]、W6[k-174,k-5]、W7
[k-175,k-6]、W8[k-176,k-7]、W9[k-177,k-8]、W10[k-168, k+1] and W11
[k-179,k+3]。kaFor data flow point cutpoint, the cutpoint of data flow point shown in Figure 27 is searched
Direction is from left to right, from data flow point cutpoint kaAfter skipping minimum data block 4KB,
Small data block 4KB end position is as next potential cut-point ki, in the present embodiment,
According to the rule preset on duplicate removal server 103, for potential cut-point kiDetermine window Wix
[ki-Ax, ki+Bx], x is respectively 1 to 11 continuous print natural numbers, shown in Figure 27
In embodiment, for potential cut-point kiDetermine that 11 windows are respectively Wi1[ki-169,ki]、
Wi2[ki-170,ki-1]、Wi3[ki-171,ki-2]、Wi4[ki-172,ki-3]、Wi5[ki-173,
ki-4]、Wi6[ki-174,ki-5]、Wi7[ki-175,ki-6]、Wi8[ki-176,ki-7]、Wi9
[ki-177,ki-8]、Wi10[ki-168,ki+ 1] and Wi11[ki-179,ki+3].Judge Wi1[ki
-169,kiIn], whether at least part of data meet predetermined condition C1, judge Wi2[ki-170,ki
-1] in, whether at least part of data meet predetermined condition C2, judge Wi3[ki-171,ki-2] in
At least partly whether data meet predetermined condition C3, judge Wi4[ki-172,ki-3] at least
Whether part data meet predetermined condition C4, judge Wi5[ki-173,ki-4] at least partly
Whether data meet predetermined condition C5, judge Wi6[ki-174,ki-5] at least part of data in
Whether meet predetermined condition C6, judge Wi7[ki-175,ki-6] in, whether at least part of data
Meet predetermined condition C7, judge Wi8[ki-176,ki-7] in, whether at least part of data meet
Predetermined condition C8, judge Wi9[ki-177,ki-8] in, whether at least part of data meet predetermined
Condition C9, judge Wi10[ki-168,ki+ 1] in, whether at least part of data meet predetermined condition
C10With judge Wi11[ki-179,ki+ 3] in, whether at least part of data meet predetermined condition C11。
When judging window Wi1In at least partly data meet predetermined condition C1, window Wi2In at least portion
Divided data meets predetermined condition C2, window Wi3In at least partly data meet predetermined condition C3、
Window Wi4In at least partly data meet predetermined condition C4, window Wi5In at least part of data
Meet predetermined condition C5, window Wi6In at least partly data meet predetermined condition C6, window
Wi7In at least partly data meet predetermined condition C7, window Wi8In at least partly data meet
Predetermined condition C8, window Wi9In at least partly data meet predetermined condition C9, window Wi10In
At least partly data meet predetermined condition C10With window Wi11In at least partly data meet pre-
Fixed condition C11Time, the most current potential cut-point kiFor data flow point cutpoint.When judging window
Wi11In at least partly data be unsatisfactory for predetermined condition C11Time, then from potential cut-point kiAlong
Data flow point cutpoint search direction 1 byte of jump, obtains new potential cut-point, for
With potential cut-point kiDifference, is expressed as k by new potential cut-point herej.Work as Wi1、Wi2、
Wi3、Wi4、Wi5、Wi6、Wi7、Wi8、Wi9And Wi10Any one window in 10 windows
In time at least partly data are unsatisfactory for the predetermined condition of correspondence, as shown in figure 28, Wi4[ki
-172,ki-3], then from a kiAlong the data flow point cutpoint search direction N number of byte of jump,
The most N number of byte is not more than ‖ B4‖+maxx(‖Ax‖), in the enforcement shown in Figure 28
In mode, N number of byte of jumping is not more than 182 bytes, in the present embodiment, N=6,
Obtain new potential cut-point, for potential cut-point kiDifference, here by new potential point
Cutpoint is expressed as kj, according in the embodiment shown in Figure 27 on duplicate removal server 103
The rule preset, for potential cut-point kjThe window determined is respectively Wj1[kj-169,kj]、
Wj2[kj-170,kj-1]、Wj3[kj-171,kj-2]、Wj4[kj-172,kj-3]、Wj5[kj-173,
kj-4]、Wj6[kj-174,kj-5]、Wj7[kj-175,kj-6]、Wj8[kj-176,kj-7]、Wj9
[kj-177,kj-8]、Wj10[kj-168,kj+ 1] and Wj11[kj-179,kj+3].Judge Wj1
[kj-169,kjIn], whether at least part of data meet predetermined condition C1, judge Wj2[kj-170,
kj-1] in, whether at least part of data meet predetermined condition C2, judge Wj3[kj-171,kj-2]
In at least partly data whether meet predetermined condition C3, judge Wj4[kj-172,kj-3] in extremely
Whether small part data meet predetermined condition C4, judge Wj5[kj-173,kj-4] at least portion in
Whether divided data meets predetermined condition C5, judge Wj6[kj-174,kj-5] at least partly count in
According to whether meeting predetermined condition C6, judge Wj7[kj-175,kj-6] in, at least part of data are
No meet predetermined condition C7, judge Wj8[kj-176,kj-7] in, at least part of data are the fullest
Foot predetermined condition C8, judge Wj9[kj-177,kj-8] in, whether at least part of data meet pre-
Fixed condition C9, judge Wj10[kj-168,kj+ 1] in, whether at least part of data meet predetermined bar
Part C10With judge Wj11[kj-179,kj+ 3] in, whether at least part of data meet predetermined condition
C11.The most in embodiments of the present invention, it is judged that potential cut-point kaWhether it is data flow point
Also in compliance with this principle during cutpoint, implement and no longer describe, be referred to judge potential point
Cutpoint kiDescription.When judging window Wj1In at least partly data meet predetermined condition C1、
Window Wj2In at least partly data meet predetermined condition C2, window Wj3In at least part of data
Meet predetermined condition C3, window Wj4In at least partly data meet predetermined condition C4, window
Wj5In at least partly data meet predetermined condition C5, window Wj6In at least partly data meet
Predetermined condition C6, window Wj7In at least partly data meet predetermined condition C7, window Wj8In
At least partly data meet predetermined condition C8, window Wj9In at least partly data meet predetermined
Condition C9, window Wj10In at least partly data meet predetermined condition C10With window Wj11In extremely
Small part data meet predetermined condition C11Time, the most current potential cut-point kjFor data flow point
Cutpoint, kjWith kaBetween data constitute 1 data block, simultaneously according to kaIdentical
Mode skips minimum piecemeal size 4KB, it is thus achieved that next potential cut-point, and according to
The rule preset on duplicate removal server 103, it is judged that whether next potential cut-point is several
According to flow point cutpoint.When judging potential cut-point kjWhen not being data flow point cutpoint, according to ki
Identical mode obtains next potential cut-point, and according on duplicate removal server 103
The rule preset and said method judge whether next potential cut-point is the segmentation of data stream
Point.When the maximum data block exceeding setting does not the most find data flow point cutpoint, then
From the end position of maximum data block as force-splitting point.
On the basis of the data flow point cutpoint shown in Fig. 3 is searched, the embodiment party shown in Figure 29
In formula, being preset with rule on duplicate removal server 103, described rule is: for potential cut-point k
Determine 11 window Wx[px-Ax,px+Bx] and window Wx[px-Ax,px+Bx] corresponding making a reservation for
Condition Cx, x is respectively 1 to 11 continuous print natural numbers, wherein, window Wx[px-Ax,px+Bx]
In at least partly data meet the probability of predetermined condition is 1/2, A1=169, B1=0;A2=171, B2
=-2;A3=173, B3=-4;A4=175, B4=-6;A5=177, B5=-8;A6=179, B6=-10;A7=181,
B7=-12;A8=183, B8=-14;A9=185, B9=-16;A10=187, B10=-18;A11=189, B11=-20;
And C1=C2=C3=C4=C5=C6=C7=C8=C9=C10=C11, then 11 windows are respectively W1
[k-169,k]、W2[k-171,k-2]、W3[k-173,k-4]、W4[k-175,k-6]、W5[k-177,
k-8]、W6[k-179,k-10]、W7[k-181,k-12]、W8[k-183,k-14]、W9[k-185,k-16]、
W10[k-187, k-18] and W11[k-189,k-20]。kaFor data flow point cutpoint, shown in Figure 29
Data flow point cutpoint search direction is from left to right, from data flow point cutpoint kaSkip minimum data
After block 4KB, at minimum data block 4KB end position as next potential cut-point ki, for
Potential cut-point kiDetermine a pix, in the present embodiment, according to pre-on duplicate removal server 103
If rule, x is respectively 1 to 11 continuous print natural numbers.In the embodiment shown in Figure 29,
According to pre-defined rule, for potential cut-point ki11 windows determined are respectively Wi1[ki-169,
ki]、Wi2[ki-171,ki-2]、Wi3[ki-173,ki-4]、Wi4[ki-175,ki-6]、Wi5[ki-177,
ki-8]、Wi6[ki-179,ki-10]、Wi7[ki-181,ki-12]、Wi8[ki-183,ki-14]、Wi9
[ki-185,ki-16]、Wi10[ki-187,ki-18] and Wi11[ki-189,ki-20].Judge Wi1[ki
-169,kiIn], whether at least part of data meet predetermined condition C1, judge Wi2[ki-171,ki-2]
In at least partly data whether meet predetermined condition C2, judge Wi3[ki-173,ki-4] at least
Whether part data meet predetermined condition C3, judge Wi4[ki-175,ki-6] at least partly count in
According to whether meeting predetermined condition C4, judge Wi5[ki-177,ki-8] in, whether at least part of data
Meet predetermined condition C5, judge Wi6[ki-179,ki-10] in, whether at least part of data meet pre-
Fixed condition C6, judge Wi7[ki-181,ki-12] in, whether at least part of data meet predetermined condition
C7, judge Wi8[ki-183,ki-14] in, whether at least part of data meet predetermined condition C8, sentence
Disconnected Wi9[ki-185,ki-16] in, whether at least part of data meet predetermined condition C9, judge Wi10
[ki-187,ki-18] in, whether at least part of data meet predetermined condition C10With judge Wi11[ki
-189,ki-20] in, whether at least part of data meet predetermined condition C11.When judging window Wi1In
At least partly data meet predetermined condition C1, window Wi2In at least partly data meet predetermined bar
Part C2, window Wi3In at least partly data meet predetermined condition C3, window Wi4In at least partly
Data meet predetermined condition C4, window Wi5In at least partly data meet predetermined condition C5, window
Mouth Wi6In at least partly data meet predetermined condition C6, window Wi7In at least partly data meet
Predetermined condition C7, window Wi8In at least partly data meet predetermined condition C8, window Wi9In extremely
Small part data meet predetermined condition C9, window Wi10In at least partly data meet predetermined condition
C10With window Wi11In at least partly data meet predetermined condition C11Time, the most current potential cut-point
kiFor data flow point cutpoint.When data at least part of in any one window in 11 windows are unsatisfactory for
During corresponding predetermined condition, as shown in figure 30, Wi4[ki-175,ki-6] at least part of data in
It is unsatisfactory for predetermined condition C4, then select next potential cut-point, for potential cut-point kiDistrict
Not, here shown as kj, kjIt is positioned at kiThe right, and kjWith ki1 byte of spacing.Such as figure
Shown in 30, according to the rule preset for duplicate removal server 103, for potential cut-point kjDetermine 11
Window is respectively Wj1[kj-169,kj]、Wj2[kj-171,kj-2]、Wj3[kj-173,kj-4]、
Wj4[kj-175,kj-6]、Wj5[kj-177,kj-8]、Wj6[kj-179,kj-10]、Wj7[kj-181,
kj-12]、Wj8[kj-183,kj-14]、Wj9[kj-185,kj-16]、Wj10[kj-187,kj-18]
And Wj11[kj-189,kj, and C-20]1=C2=C3=C4=C5=C6=C7=C8=C9=C10=C11。
Judge Wj1[kj-169,kjIn], whether at least part of data meet predetermined condition C1, judge Wj2
[kj-171,kj-2] in, whether at least part of data meet predetermined condition C2, judge Wj3[kj-173,
kj-4] in, whether at least part of data meet predetermined condition C3, judge Wj4[kj-175,kj-6] in
At least partly whether data meet predetermined condition C4, judge Wj5[kj-177,kj-8] at least portion in
Whether divided data meets predetermined condition C5, judge Wj6[kj-179,kj-10] at least part of data in
Whether meet predetermined condition C6, judge Wj7[kj-181,kj-12] in, at least part of data are the fullest
Foot predetermined condition C7, judge Wj8[kj-183,kj-14] in, whether at least part of data meet predetermined
Condition C8, judge Wj9[kj-185,kj-16] in, whether at least part of data meet predetermined condition C9、
Judge Wj10[kj-187,kj-18] in, whether at least part of data meet predetermined condition C10And judgement
Wj11[kj-189,kj-20] in, whether at least part of data meet predetermined condition C11.When judging window
Mouth Wj1In at least partly data meet predetermined condition C1, window Wj2In at least partly data meet
Predetermined condition C2, window Wj3In at least partly data meet predetermined condition C3, window Wj4In extremely
Small part data meet predetermined condition C4, window Wj5In at least partly data meet predetermined condition
C5, window Wj6In at least partly data meet predetermined condition C6, window Wj7In at least partly count
According to meeting predetermined condition C7, window Wj8In at least partly data meet predetermined condition C8, window
Wi9In at least partly data meet predetermined condition C9, window Wj10In at least partly data meet
Predetermined condition C10With window Wj11In at least partly data meet predetermined condition C11Time, the most currently
Potential cut-point kjFor data flow point cutpoint.When judging window Wj1、Wj2、Wj3、Wj4、Wj5、
Wj6、Wj7、Wj8、Wj9、Wj10And Wj11In in any one window at least partly data be discontented with
During foot predetermined condition, as shown in figure 31, Wj3[kj-173,kj-4] in, at least part of data are discontented with
Foot predetermined condition C3Time, kjIt is positioned at kiThe right is from kiJump along data flow point cutpoint search direction
N number of byte, the most N number of byte is not more than ‖ B4‖+maxx(‖Ax‖), shown in Figure 28
In embodiment, N number of byte is not more than 195 bytes, in the present embodiment, N=15, obtains
Next potential cut-point, for potential cut-point ki、kjDistinguish, be expressed as kl.Root
According to Figure 29 institute embodiment being the default rule of duplicate removal server 103, for potential cut-point kl
Determine that 11 windows are respectively Wl1[kl-169,kl]、Wl2[kl-171,kl-2]、Wl3[kl-173,
kl-4]、Wl4[kl-175,kl-6]、Wl5[kl-177,kl-8]、Wl6[kl-179,kl-10]、Wl7
[kl-181,kl-12]、Wl8[kl-183,kl-14]、Wl9[kl-185,kl-16]、Wl10[kl-187,
kl-18] and Wl11[kl-189,kl-20].Judge Wl1[kl-169,klIn], whether at least part of data
Meet predetermined condition C1, judge Wl2[kl-171,kl-2] in, whether at least part of data meet pre-
Fixed condition C2, judge Wl3[kl-173,kl-4] in, whether at least part of data meet predetermined condition
C3, judge Wl4[kl-175,kl-6] in, whether at least part of data meet predetermined condition C4, sentence
Disconnected Wl5[kl-177,kl-8] in, whether at least part of data meet predetermined condition C5, judge Wl6[kl
-179,kl-10] in, whether at least part of data meet predetermined condition C6, judge Wl7[kl-181,
kl-12] in, whether at least part of data meet predetermined condition C7, judge Wl8[kl-183,kl-14]
In at least partly data whether meet predetermined condition C8, judge Wl9[kl-185,kl-16] at least
Whether part data meet predetermined condition C9, judge Wl10[kl-187,kl-18] at least partly count in
According to whether meeting predetermined condition C10With judge Wl11[kl-189,kl-20] in, at least part of data are
No meet predetermined condition C11.When judging window Wl1In at least partly data meet predetermined condition C1、
Window Wl2In at least partly data meet predetermined condition C2, window Wl3In at least partly data full
Foot predetermined condition C3, window Wl4In at least partly data meet predetermined condition C4, window Wl5In
At least partly data meet predetermined condition C5, window Wl6In at least partly data meet predetermined bar
Part C6, window Wl7In at least partly data meet predetermined condition C7, window Wl8In at least partly
Data meet predetermined condition C8, window Wl9In at least partly data meet predetermined condition C9, window
Mouth Wl10In at least partly data meet predetermined condition C10With window Wl11In at least partly data full
Foot predetermined condition C11Time, the most current potential cut-point klFor data flow point cutpoint.As window Wl1、
Wl2、Wl3、Wl4、Wl5、Wl6、Wl7、Wl8、Wl9、Wl10And Wl11In middle either window
When at least partly data are unsatisfactory for predetermined condition, select next potential cut-point, for potential
Cut-point ki、kjAnd klDifference, is expressed as km, kmIt is positioned at klThe right, and kmWith klSpacing
1 byte.It is the rule that duplicate removal server 103 is preset according to embodiment illustrated in fig. 29, for potential
Cut-point km11 windows determined are respectively Wm1[km-169,km]、Wm2[km-171,km-2]、
Wm3[km-173,km-4]、Wm4[km-175,km-6]、Wm5[km-177,km-8]、Wm6[km
-179,km-10]、Wm7[km-181,km-12]、Wm8[km-183,km-14]、Wm9[km-185,
km-16]、Wm10[km-187,km-18] and Wm11[km-189,km-20].Judge Wm1[km-169,
kmIn], whether at least part of data meet predetermined condition C1, judge Wm2[km-171,km-2] in
At least partly whether data meet predetermined condition C2, judge Wm3[km-173,km-4] at least portion in
Whether divided data meets predetermined condition C3, judge Wm4[km-175,km-6] at least part of data in
Whether meet predetermined condition C4, judge Wm5[km-177,km-8] in, at least part of data are the fullest
Foot predetermined condition C5, judge Wm6[km-179,km-10] in, whether at least part of data meet pre-
Fixed condition C6, judge Wm7[km-181,km-12] in, whether at least part of data meet predetermined bar
Part C7, judge Wm8[km-183,km-14] in, whether at least part of data meet predetermined condition C8、
Judge Wm9[km-185,km-16] in, whether at least part of data meet predetermined condition C9, judge
Wm10[km-187,km-18] in, whether at least part of data meet predetermined condition C10And judgement
Wm11[km-189,km-20] in, whether at least part of data meet predetermined condition C11.When judging window
Mouth Wm1In at least partly data meet predetermined condition C1, window Wm2In at least partly data full
Foot predetermined condition C2, window Wm3In at least partly data meet predetermined condition C3, window Wm4In
At least partly data meet predetermined condition C4, window Wm5In at least partly data meet predetermined bar
Part C5, window Wm6In at least partly data meet predetermined condition C6, window Wm7In at least portion
Divided data meets predetermined condition C7, window Wm8In at least partly data meet predetermined condition C8、
Window Wm9In at least partly data meet predetermined condition C9, window Wm10In at least part of data
Meet predetermined condition C10With window Wm11In at least partly data meet predetermined condition C11Time, then
Current potential cut-point kmFor data flow point cutpoint.When data at least part of in any one window not
When meeting predetermined condition, scheme the most as described above performs jump, latent to obtain the next one
At cut-point and determine whether data flow point cutpoint.
Embodiments provide one and judge window Wiz[ki-Az,ki+BzIn] at least
Whether part data meet predetermined condition CzMethod, in the present embodiment use random function
Judge window Wiz[ki-Az,ki+BzIn], whether at least part of data meet predetermined condition Cz,
As a example by the embodiment shown in Figure 21, according to the rule preset on duplicate removal server 103
Then, for potential cut-point kiDetermine window Wi1[ki-169,ki], it is judged that Wi1[ki-169,ki]
In at least partly data whether meet predetermined condition C1, as shown in figure 32, Wi1Represent window
Mouth Wi1[ki-169,ki], for judging Wi1[ki-169,kiIn], whether at least part of data meet
Predetermined condition C1, selecting 5 bytes, in Figure 32, " ■ " represents 1 byte selected,
42 bytes are differed between adjacent two bytes selected.By anti-for 5 byte datas of selection
Utilize again 51 times, obtain 255 bytes altogether, to increase randomness.The most each byte by
8 compositions, are designated as am,1…am,8, represent in 255 bytes that the 1st of m-th byte the arrives
8th, therefore, position corresponding to 255 bytes can be expressed as: Work as am,nWhen=1, Vam,n=1, work as am,nWhen=0, Vam,n=-1,
Wherein am,nRepresent am,1…am,8In any one, position corresponding to 255 bytes is according to am,nWith
Vam,nTransformational relation obtain matrix Va, can be expressed as: Choose a large amount of random number, form matrix, by random number
According to composition matrix once form, keep constant, as from obey specific distribution (here with
As a example by normal distribution) random number in select 255*8 random number to form matrix R: By matrix VaM row and the m row of matrix R random
Number is multiplied, and then summation obtains a value, is embodied as Sam=Vam,1*hm,1+Vam,2*hm,2
+…+Vam,8*hm,8.According to the method, it is thus achieved that Sa1、Sa2... to Sa255, add up Sa1、Sa2…
To Sa255In meet number K of value of specified conditions (here as a example by more than 0).Due to
Matrix R Normal Distribution, then SamAs matrix R, still Normal Distribution,
According to theory of probability, the normal distribution random number probability more than 0 is 1/2, at Sa1、Sa2…
To Sa255In, each value probability more than 0 is 1/2, so K meets binomial distribution: According to statistical result, it is judged that Sa1、Sa2…
To Sa255Value more than 0 number K whether be even number, the random number of binomial distribution be idol
The probability of number is 1/2, so K meets condition with the probability of 1/2.When K is even number,
Show Wi1[ki-169,kiIn], at least part of data meet predetermined condition C1;When K is odd number
Time, show W1[ki-169,kiIn], at least part of data are unsatisfactory for predetermined condition C1, C here1
I.e. refer to the S obtained according to aforesaid waya1、Sa2... to Sa255Value more than 0 number K be
Even number.In the embodiment shown in Figure 21, at Wi1[ki-169,ki]、Wi2[ki-170,ki
-1]、Wi3[ki-171,ki-2]、Wi4[ki-172,ki-3]、Wi5[ki-173,ki-4]、Wi6[ki-174,
ki-5]、Wi7[ki-175,ki-6]、Wi8[ki-176,ki-7]、Wi9[ki-177,ki-8]、Wi10[ki
-178,ki-9] and Wi11[ki-179,ki-10] in, each window size is identical, i.e. window size is equal
It is 169 bytes, judges in window, whether at least part of data meet predetermined condition simultaneously
Mode is the most identical, is specifically shown in above-mentioned judgement Wi1[ki-169,kiIn], whether at least part of data
Meet predetermined condition C1Description.Therefore, as shown in figure 32,Represent and judge window
Wi2[ki-170,ki-1] in, whether at least part of data meet predetermined condition C2Time select 1
Individual byte, differs 42 bytes between adjacent two bytes selected.5 words that will select
Joint number, according to recycling 51 times, obtains 255 bytes, altogether to increase randomness.The most every
Individual byte is formed by 8, is designated as bm,1…bm,8, represent m-th byte in 255 bytes
The the 1st to the 8th, therefore, position corresponding to 255 bytes can be expressed as: Work as bm,nWhen=1, Vbm,n=1, work as bm,nWhen=0, Vbm,n=-1,
Wherein bm,nRepresent bm,1…bm,8In any one, position corresponding to 255 bytes is according to bm,nWith
Vbm,nTransformational relation obtain matrix Vb, can be expressed as:
Judge Wi1[ki-169,kiIn], whether at least part of data meet the mode of predetermined condition and sentence
Disconnected window Wi2[ki-170,ki-1] in, whether at least part of data meet the mode of predetermined condition
Identical, therefore use matrix R: By matrix VbM row
Being multiplied with the random number of the m row of matrix R, then summation obtains a value, concrete table
It is shown as Sbm=Vbm,1*hm,1+Vbm,2*hm,2+…+Vbm,8*hm,8.According to the method, it is thus achieved that Sb1、
Sb2... to Sb255, add up Sb1、Sb2... to Sb255In meet specified conditions (here to be more than
As a example by 0) number K of value.Due to matrix R Normal Distribution, then SbmWith matrix R
Equally, still Normal Distribution, according to theory of probability, normal distribution random number is more than 0
Probability be 1/2, at Sb1、Sb2... to Sb255In, each value probability more than 0 is 1/2,
So K meets binomial distribution: According to statistics
Result, it is judged that Sb1、Sb2... to Sb255Value more than 0 number K whether be even number, two
The probability that random number is even number of distribution for for 1/2, so K with 1/2 probability satisfied
Condition.When K is even number, show Wi2[ki-170,ki-1] in, at least part of data meet
Predetermined condition C2;When K is odd number, show Wi2[ki-170,ki-1] at least partly count in
According to being unsatisfactory for predetermined condition C2, C here2I.e. refer to the S obtained according to aforesaid wayb1、Sb2... arrive
Sb255Value more than 0 number K be even number.In embodiment shown in Figure 21, Wi2[ki
-170,ki-1] in, at least part of data meet predetermined condition C2。
Therefore, as shown in figure 32,Represent and judge window Wi3[ki-171,ki-2] at least portion in
Whether divided data meets predetermined condition C3Time select 1 byte, adjacent two select bytes
Between differ 42 bytes.5 byte datas selected are recycled 51 times, obtains 255 altogether
Byte, to increase randomness.Then use and judge window Wi1[ki-169,ki] and Wi2[ki-170,ki
-1] method that in, whether at least part of data meet predetermined condition, it is judged that Wi3[ki-171,ki-2]
In at least data whether meet predetermined condition C3.In embodiment shown in Figure 21, Wi3[ki-171,
ki-2] in, at least part of data meet predetermined condition.As shown in figure 32,Represent and judge window
Mouth Wi4[ki-172,ki-3] in, whether at least part of data meet predetermined condition C4Time select 1
Byte, differs 42 bytes between adjacent two bytes selected.5 byte datas that will select
Recycle 51 times, obtain 255 bytes altogether, to increase randomness.Then use and judge window Wi1
[ki-169,ki]、Wi2[ki-170,ki-1] and Wi3[ki-171,ki-2] in, at least part of data are the fullest
The method of foot predetermined condition, it is judged that Wi4[ki-172,ki-3] in, whether at least part of data meet pre-
Fixed condition C4.In embodiment shown in Figure 21, Wi4[ki-172,ki-3] at least part of data in
Meet predetermined condition C4.As shown in figure 32,Represent and judge window Wi5[ki-173,ki-4]
In at least partly data whether meet predetermined condition C5Time select 1 byte, adjacent two choosings
42 bytes are differed between the byte selected.5 byte datas selected are recycled 51 times, altogether
Obtain 255 bytes, to increase randomness.Then use and judge window Wi1[ki-169,ki]、Wi2
[ki-170,ki-1]、Wi3[ki-171,ki-2] and Wi4[ki-172,ki-3] in, whether at least part of data
The method meeting predetermined condition, it is judged that Wi5[ki-173,ki-4] in, at least whether data meet predetermined
Condition C5.In embodiment shown in Figure 21, Wi5[ki-173,ki-4] in, at least part of data are not
Meet predetermined condition C5。
Work as Wi5[ki-173,ki-4] in, at least part of data are unsatisfactory for C during predetermined condition5, from a pi5
Along data flow point cutpoint search direction 7 bytes of jump, the end position the 7th byte obtains
Obtain next potential cut-point kj, as shown in figure 22, according to preset for duplicate removal server 103
Rule, for potential cut-point kjDetermine window Wj1[kj-169,kj], it is judged that window Wj1[kj-169,
kjIn], whether at least part of data meet predetermined condition C1Mode with judge window Wi1[ki
-169,kiIn], whether at least part of data meet predetermined condition C1Mode identical, therefore as figure
Shown in 33, Wj1Represent window, whether meet predetermined condition C for data at least part of in judging1,
Selecting 5 bytes, in Figure 33, " ■ " represents 1 byte selected, adjacent two bytes selected
Between differ 42 bytes.5 byte datas selected are recycled 51 times, obtains 255 altogether
Byte, to increase randomness.The most each byte is formed by 8, is designated as am,1'…am,8', table
Show m-th byte in 255 bytes the 1st to the 8th, therefore, the position that 255 bytes are corresponding
Can be expressed as: Work as am,nDuring '=1, Vam,n'=1, works as am,n'
When=0, Vam,n'=-1, wherein am,n' represent am,1'…am,8Any one in ', 255 bytes pair
The position answered is according to am,n' and Vam,n' transformational relation obtain matrix Va', can be expressed as: Judge in window, whether at least part of data meet predetermined
Condition with judge window Wi1[ki-169,kiIn], whether at least part of data meet predetermined condition
Mode identical, therefore use matrix R: By matrix Va' m
Row is multiplied with the random number of the m row of matrix R, and then summation obtains a value, specifically represents
For Sam'=Vam,1'*hm,1+Vam,2'*hm,2+…+Vam,8'*hm,8.According to the method, it is thus achieved that Sa1'、
Sa2' ... to Sa255', add up Sa1'、Sa2' ... to Sa255Specified conditions are met (here with greatly in '
As a example by 0) number K of value.Due to matrix R Normal Distribution, then Sam' and matrix R
Equally, still Normal Distribution, according to theory of probability, general more than 0 of normal distribution random number
Rate is 1/2, at Sa1'、Sa2' ... to Sa255In ', each value probability more than 0 is 1/2, so
K meets binomial distribution: According to statistical result,
Judge Sa1'、Sa2' ... to Sa255' value more than 0 number K whether be even number, binomial distribution
Random number be the probability of even number be 1/2, so K meets condition with the probability of 1/2.When K is even number
Time, show Wj1[kj-169,kjIn], at least part of data meet predetermined condition C1;When K is odd number
Time, show Wj1[kj-169,kjIn], at least part of data are unsatisfactory for predetermined condition C1。
Judge Wi2[ki-170,ki-1] in, whether at least part of data meet predetermined condition C2Side
Formula and judge Wj2[kj-170,kj-1] in, whether at least part of data meet predetermined condition C2Side
Formula is identical, therefore, as shown in figure 33,Represent and judge window Wj2[kj-170,kj-1] in extremely
Whether small part data meet predetermined condition C2Time select 1 byte, adjacent two select
42 bytes are differed between byte.5 byte datas selected are recycled 51 times, obtains altogether
255 bytes, to increase randomness.The most each byte is formed by 8, is designated as bm,1'…bm,8',
Representing in 255 bytes the 1st to the 8th of m-th byte, therefore, 255 bytes are corresponding
Position can be expressed as: Work as bm,nDuring '=1, Vbm,n'=1, works as bm,n'
When=0, Vbm,n'=-1, wherein bm,n' represent bm,1'…bm,8Any one in ', 255 byte correspondences
Position according to bm,n' and Vbm,n' transformational relation obtain matrix Vb', can be expressed as: Judge window Wi2[ki-170,ki-1] in, at least part of data are
No meet predetermined condition C1And Wj2[kj-170,kj-1] in, whether at least part of data meet predetermined
Condition C1Mode identical, the most still use matrix R: By matrix
Vb' m row be multiplied with the random number of the m row of matrix R, then summation obtain a value,
It is embodied as Sbm'=Vbm,1'*hm,1+Vbm,2'*hm,2+…+Vbm,8'*hm,8.According to the method,
Obtain Sb1'、Sb2' ... to Sb255', add up Sb1'、Sb2' ... to Sb255Specified conditions (this is met in '
In as a example by more than 0) number K of value.Due to matrix R Normal Distribution, then Sbm' with
Matrix R is the same, still Normal Distribution, and according to theory of probability, normal distribution random number is more than 0
Probability be 1/2, at Sb1'、Sb2' ... to Sb255In ', each value probability more than 0 is 1/2,
So K meets binomial distribution: According to statistics knot
Really, it is judged that Sb1'、Sb2' ... to Sb255' value more than 0 number K whether be even number, binomial divides
The random number of cloth be the probability of even number for for 1/2, so K meets condition with the probability of 1/2.Work as K
During for even number, in showing, at least part of data meet predetermined condition C2;When K is odd number, table
Bright Wj2[kj-170,kj-1] in, at least part of data are unsatisfactory for predetermined condition C2.In like manner, it is judged that Wi3
[ki-171,ki-2] in, whether at least part of data meet predetermined condition C3Mode with judge Wj3
[kj-171,kj-2] in, whether at least part of data meet predetermined condition C3Mode identical, in like manner,
Judge Wj4[kj-172,kj-3] in, whether at least part of data meet predetermined condition C4, judge Wj5
[kj-173,kj-4] in, whether at least part of data meet predetermined condition C5, judge Wj6[kj-174,
kj-5] in, whether at least part of data meet predetermined condition C6, judge Wj7[kj-175,kj-6] in
At least partly whether data meet predetermined condition C7, judge Wj8[kj-176,kj-7] at least partly
Whether data meet predetermined condition C8, judge Wj9[kj-177,kj-8] in, whether at least part of data
Meet predetermined condition C9, judge Wj10[kj-178,kj-9] in, whether at least part of data meet pre-
Fixed condition C10With judge Wj11[kj-179,kj-10] in, whether at least part of data meet predetermined bar
Part C11, do not repeat them here.
The present embodiment use random function judge window Wiz[ki-Az,ki+BzAt least portion in]
Whether divided data meets predetermined condition Cz, still as a example by Figure 21 illustrated embodiment, according to
The rule preset on duplicate removal server 103, for potential cut-point kiDetermine window Wi1[ki-169,
ki], it is judged that Wi1[ki-169,kiIn], whether at least part of data meet predetermined condition C1, as
Shown in Figure 32, Wi1Represent window Wi1[ki-169,ki], for judging Wi1[ki-169,kiIn] at least
Whether part data meet predetermined condition C1, selecting 5 bytes, in Figure 32, " ■ " represents selection
1 byte, between the byte of adjacent two selections " ■ " differ 42 bytes.One of which
Implementation is to use HASH function to calculate 5 bytes selected, and uses HASH function to calculate
The numerical value obtained is one and fixing is uniformly distributed, if using the calculated number of HASH function
Value is even number, then judge Wi1[ki-169,kiIn], at least part of data meet predetermined condition C1,
I.e. C1Representing uses the calculated numerical value of HASH function to be even number according to aforesaid way.Therefore,
Wi1[ki-169,kiThe probability that in], whether at least part of data meet predetermined condition is 1/2.At figure
In embodiment shown in 21, Hash function is used to judge Wi2[ki-170,ki-1] at least partly
Whether data meet predetermined condition C2、Wi3[ki-171,ki-2] in, whether at least part of data meet
Predetermined condition C3、Wi4[ki-172,ki-3] in, whether at least part of data meet predetermined condition C4With
Wi5[ki-173,ki-4] in, whether at least part of data meet predetermined condition C5, implement and can join
Examining description Figure 21 illustrated embodiment uses Hash function to judge Wi1[ki-169,kiAt least portion in]
Whether divided data meets mode C of predetermined condition1, do not repeat them here.
Work as Wi5[ki-173,ki-4] in, at least part of data are unsatisfactory for predetermined condition C5Time, from potential
Cut-point kiAlong data flow point cutpoint search direction 7 bytes of jump, at the knot of the 7th byte
Bundle position obtains current potential cut-point kj, as shown in figure 22, according to for duplicate removal server 103
The rule preset, for potential cut-point kjDetermine window Wj1[kj-169,kj], it is judged that window Wj1
[kj-169,kjIn], whether at least part of data meet predetermined condition C1Mode with judge window
Wi1[ki-169,kiIn], whether at least part of data meet predetermined condition C1Mode identical, therefore
As shown in figure 33, Wj1Represent window Wj1[kj-169,kj], for judging Wj1[kj-169,kjIn] extremely
Whether small part data meet predetermined condition C1, selecting 5 bytes, in Figure 33, " ■ " represents choosing
1 byte selected, differs 42 bytes between adjacent two bytes " ■ " selected.Use Hash
Function calculates from window Wj1[kj-169,kj5 bytes chosen in], if the numerical value obtained is
Even number, then Wj1[kj-169,kjIn], at least part of data meet predetermined condition C1.In Figure 33, sentence
Disconnected Wi2[ki-170,ki-1] in, whether at least part of data meet predetermined condition C2Mode and sentence
Disconnected Wj2[kj-170,kj-1] in, whether at least part of data meet predetermined condition C2Mode identical,
Therefore, as shown in figure 33,Represent and judge window Wj2[kj-170,kj-1] at least partly count in
According to whether meeting predetermined condition C2Time select 1 byte, adjacent two select bytes
Between differ 42 bytes.Hash function is used to calculate 5 bytes selected, if obtain
Numerical value is even number, then Wj2[kj-170,kj-1] in, at least part of data meet predetermined condition C2.Figure
In 33, it is judged that Wi3[ki-171,ki-2] in, whether at least part of data meet predetermined condition C3Side
Formula with judge Wj3[kj-171,kj-2] in, whether at least part of data meet predetermined condition C3Side
Formula is identical, therefore, as shown in figure 33,Represent and judge window Wj3[kj-171,kj-2] in extremely
Whether small part data meet predetermined condition C3Time select 1 byte, adjacent two select
ByteBetween differ 42 bytes.Hash function is used to calculate 5 bytes selected,
To numerical value be even number, then Wj3[kj-171,kj-2] in, at least part of data meet predetermined condition C3。
In Figure 33, it is judged that Wj4[kj-172,kj-3] in, whether at least part of data meet predetermined condition C4's
Mode and judge window Wi4[ki-172,ki-3] in, whether at least part of data meet predetermined condition
C4Mode, therefore, as shown in figure 33,Represent and judge window Wj4[kj-172,kj-3]
In at least partly data whether meet predetermined condition C4Time select 1 byte, adjacent two choosings
The byte selectedBetween differ 42 bytes.Hash function is used to calculate 5 bytes selected,
The numerical value obtained is even number, then Wj4[kj-172,kj-3] in, at least part of data meet predetermined condition
C4.According to said method, it is judged that Wj5[kj-173,kj-4] in, whether at least part of data meet pre-
Fixed condition C5, judge Wj6[kj-174,kj-5] in, whether at least part of data meet predetermined condition C6、
Judge Wj7[kj-175,kj-6] in, whether at least part of data meet predetermined condition C7, judge Wj8
[kj-176,kj-7] in, whether at least part of data meet predetermined condition C8, judge Wj9[kj-177,
kj-8] in, whether at least part of data meet predetermined condition C9, judge Wj10[kj-178,kj-9] in
At least partly whether data meet predetermined condition C10With judge Wj11[kj-179,kj-10] at least
Whether part data meet predetermined condition C11, do not repeat them here.
The present embodiment use random function judge window Wiz[ki-Az,ki+BzAt least portion in]
Whether divided data meets predetermined condition Cz, as a example by the embodiment shown in Figure 21, according to going
The rule preset on weight server 103, for potential cut-point kiDetermine window Wi1[ki-169,ki],
Judge Wi1[ki-169,kiIn], whether at least part of data meet predetermined condition C1, such as Figure 32 institute
Show, Wi1Represent window Wi1[ki-169,ki], for judging Wi1[ki-169,kiAt least partly count in]
According to whether meeting predetermined condition C1, select 5 bytes, serial number 169 in Figure 32,127,85,
The byte " ■ " of 43 and 1 represents 1 byte of selection respectively, adjacent two bytes selected it
Between differ 42 bytes.The byte " ■ " of serial number 169,127,85,43 and 1 is turned respectively
Change a decimal value into, be expressed as a1、a2、a3、a4And a5.Because 1
Byte is formed by 8, so each byte " ■ " is as numerical value, then an a1、a2、a3、
a4And a5In any one arIt is satisfied by 0≤ar≤255。a1、a2、a3、a4And a5Composition
The matrix of 1*5.From the random number obeying binomial distribution, select 256*5 random number, form square
Battle array R, is expressed as:
According to a1Value and the row at place, search from matrix R correspondence value, such as a1=36, a1
It is positioned at the 1st row, then searches h36,1Corresponding value;According to a2Value and the row at place, from matrix R
The middle value searching correspondence, such as a2=48, a2It is positioned at the 2nd row, then searches h48,2Corresponding value;Root
According to a3Value and the row at place, search from matrix R correspondence value, such as a3=26, a3It is positioned at
3rd row, then search h26,3Corresponding value;According to a4Value and the row at place, look into from matrix R
Look for the value of correspondence, such as a4=26, a4It is positioned at the 4th row, then searches h26,4Corresponding value;According to a5
Value and the row at place, search from matrix R correspondence value, such as a5=88, a5It is positioned at the 5th
Row, then search h88,5Corresponding value.S1=h36,1+h48,2+h26,3+h26,4+h88,5, because matrix R clothes
From binomial distribution, therefore, S1Also binomial distribution is obeyed.Work as S1For even number, then Wi1[ki-169,
kiIn], at least part of data meet predetermined condition C1, work as S1For odd number, then Wi1[ki-169,ki]
In at least partly data be unsatisfactory for predetermined condition C1, S1Probability for even number is 1/2, C1Represent
Calculate S in a manner described1For even number.In embodiment illustrated in fig. 21, Wi1[ki-169,kiIn]
At least partly data meet predetermined condition C1.As shown in figure 32,Represent and judge window Wi2[ki
-170,ki-1] in, whether at least part of data meet predetermined condition C2Time 1 byte selecting respectively,
In Figure 32, represent by sequence number 170,128,86,44 and 2 respectively, adjacent two selections
42 bytes are differed between byte.Byte by sequence number 170,128,86,44 and 2Point
It is not converted into a decimal value, is expressed as b1、b2、b3、b4And b5.Because 1
Individual byte is formed by 8, so each byteAs numerical value, then a b1、b2、b3、
b4And b5In any one brIt is satisfied by 0≤br≤255。b1、b2、b3、b4And b5Composition 1*5
Matrix.In present embodiment, it is judged that Wi1And Wi2In at least partly data whether meet predetermined
The mode of condition is identical, the most still uses matrix R, according to b1Value and the row at place, from
Matrix R searches the value of correspondence, such as b1=66, b1It is positioned at the 1st row, then searches h66,1Corresponding
Value;According to b2Value and the row at place, search from matrix R correspondence value, such as b2=48, b2
It is positioned at the 2nd row, then searches h48,2Corresponding value;According to b3Value and the row at place, from matrix R
The middle value searching correspondence, such as b3=99, b3It is positioned at the 3rd row, then searches h99,3Corresponding value;Root
According to b4Value and the row at place, search from matrix R correspondence value, such as b4=26, b4It is positioned at
4th row, then search h26,4Corresponding value;According to b5Value and the row at place, look into from matrix R
Look for the value of correspondence, such as b5=90, b5It is positioned at the 5th row, then searches h90,5Corresponding value.S2=h66,1+
h48,2+h99,3+h26,4+h90,5, because matrix R obeys binomial distribution, therefore, S2Also obey binomial to divide
Cloth.Work as S2For even number, then Wi2[ki-170,ki-1] in, at least part of data meet predetermined condition C2,
Work as S2For odd number, then Wi2[ki-170,ki-1] in, at least part of data are unsatisfactory for predetermined condition C2,
S2Probability for even number is 1/2.In embodiment illustrated in fig. 21, Wi2[ki-170,ki-1] in extremely
Small part data meet predetermined condition C2.Use same rule, judge W respectivelyi3[ki-171,
ki-2] in, whether at least part of data meet predetermined condition C3, judge Wi4[ki-172,ki-3] in extremely
Whether small part data meet predetermined condition C4, judge Wi5[ki-173,ki-4] at least partly count in
According to whether meeting predetermined condition C5, judge Wi6[ki-174,ki-5] in, at least part of data are the fullest
Foot predetermined condition C6, judge Wi7[ki-175,ki-6] in, whether at least part of data meet predetermined bar
Part C7, judge Wi8[ki-176,ki-7] in, whether at least part of data meet predetermined condition C8, sentence
Disconnected Wi9[ki-177,ki-8] in, whether at least part of data meet predetermined condition C9, judge Wi10[ki
-178,ki-9] in, whether at least part of data meet predetermined condition C10With judge Wi11[ki-179,ki
-10] in, whether at least part of data meet predetermined condition C11.In embodiment shown in Figure 21,
Wi5[ki-173,ki-4] in, at least part of data are unsatisfactory for predetermined condition C5, from potential cut-point ki
Along data flow point cutpoint search direction 7 bytes of jump, the end position the 7th byte obtains
Obtain current potential cut-point kj, as shown in figure 22, according to the rule preset for duplicate removal server 103
Then, for potential cut-point kjDetermine window Wj1[kj-169,kj], it is judged that window Wj1[kj-169,kj]
In at least partly data whether meet predetermined condition C1Mode with judge window Wi1[ki-169,ki]
In at least partly data whether meet predetermined condition C1Mode identical, the most as shown in figure 33,
Wj1Represent window Wj1[kj-169,kj], for judging Wj1[kj-169,kjIn], at least part of data are
No meet predetermined condition C1, the byte " ■ " of serial number 169,127,85,43 and 1 in Figure 33
Represent 1 byte of selection respectively, between adjacent two bytes selected, differ 42 bytes.
The byte " ■ " of serial number 169,127,85,43 and 1 is converted into a decimal number respectively
Value, is expressed as a1'、a2'、a3'、a4' and a5'.Because 1 byte is formed by 8,
So each byte " ■ " is as numerical value, then an a1'、a2'、a3'、a4' and a5Appointing in '
One ar' it is satisfied by 0≤ar'≤255。a1'、a2'、a3'、a4' and a5' composition 1*5 matrix.
Judge window Wj1[kj-169,kjIn], whether at least part of data meet predetermined condition C1Mode
With judge window Wi1[ki-169,kiIn], whether at least part of data meet predetermined condition C1Side
Formula is identical, therefore, still uses matrix R, is expressed as:
According to a1' value and the row at place, search from matrix R correspondence value, such as a1'=16,
a1' be positioned at the 1st row, then search h16,1Corresponding value;According to a2' value and the row at place, from
Matrix R searches the value of correspondence, such as a2'=98, a2' be positioned at the 2nd row, then search h98,2Right
The value answered;According to a3' value and the row at place, search from matrix R correspondence value, as
a3'=56, a3' be positioned at the 3rd row, then search h56,3Corresponding value;According to a4' value and place
Row, search from matrix R correspondence value, such as a4'=36, a4' it is positioned at the 4th row, then
Search h36,4Corresponding value;According to a5' value and the row at place, it is right to search from matrix R
The value answered, such as a5'=99, a5' be positioned at the 5th row, then search h99,5Corresponding value.S1'=h16,1
+h98,2+h56,3+h36,4+h99,5, because matrix R obeys binomial distribution, therefore, S1' also obey two
Item distribution.Work as S1' for even number, then Wj1[kj-169,kjIn], at least part of data meet predetermined
Condition C1, work as S1' for odd number, then Wj1[kj-169,kjIn], at least part of data are unsatisfactory for making a reservation for
Condition C1, S1' it is 1/2 for the probability of even number.
Judge Wi2[ki-170,ki-1] in, whether at least part of data meet predetermined condition C2Side
Formula and judge Wj2[kj-170,kj-1] in, whether at least part of data meet predetermined condition C2Side
Formula is identical, therefore, as shown in figure 33,Represent and judge window Wj2[kj-170,kj-1] in extremely
Whether small part data meet predetermined condition C2Time select 1 byte, adjacent two select
Differ 42 bytes between byte, represent by sequence number 170,128,86,44 and 2 respectively, phase
42 bytes are differed between adjacent two bytes selected.By sequence number 170,128,86,44 and 2
ByteIt is converted into a decimal value respectively, is expressed as b1'、b2'、b3'、
b4' and b5'.Because 1 byte is formed by 8, so each byteAs a numerical value,
Then b1'、b2'、b3'、b4' and b5Any one b in 'r' it is satisfied by 0≤br'≤255。b1'、b2'、b3'、
b4' and b5' composition 1*5 matrix.With judge window Wi2[ki-170,ki-1] at least part of data in
Whether meet predetermined condition C2Use identical matrix R, according to b1' value and the row at place, from
Matrix R searches the value of correspondence, such as b1'=210, b1' be positioned at the 1st row, then search h210,1Corresponding
Value;According to b2' value and the row at place, search from matrix R correspondence value, such as b2'=156,
b2' be positioned at the 2nd row, then search h156,2Corresponding value;According to b3' value and the row at place, from square
Battle array R searches the value of correspondence, such as b3'=144, b3' be positioned at the 3rd row, then search h144,3Corresponding
Value;According to b4' value and the row at place, search from matrix R correspondence value, such as b4'=60, b4'
It is positioned at the 4th row, then searches h60,4Corresponding value;According to b5' value and the row at place, from matrix R
The middle value searching correspondence, such as b5'=90, b5' be positioned at the 5th row, then search h90,5Corresponding value.S2'
=h210,1+h156,2+h144,3+h60,4+h90,5, with S2Rule of judgment identical, work as S2' for even number,
Then Wj2[kj-170,kj-1] in, at least part of data meet predetermined condition C2, work as S2' for odd number,
Then Wj2[kj-170,kj-1] in, at least part of data are unsatisfactory for predetermined condition C2, S2' for even number
Probability is 1/2.
In like manner, it is judged that Wi3[ki-171,ki-2] in, whether at least part of data meet predetermined condition C3
Mode with judge Wj3[kj-171,kj-2] in, whether at least part of data meet predetermined condition C3
Mode identical, in like manner, it is judged that Wj4[kj-172,kj-3] in, whether at least part of data meet pre-
Fixed condition C4, judge Wj5[kj-173,kj-4] in, whether at least part of data meet predetermined condition C5、
Judge Wj6[kj-174,kj-5] in, whether at least part of data meet predetermined condition C6, judge Wj7
[kj-175,kj-6] in, whether at least part of data meet predetermined condition C7, judge Wj8[kj-176,
kj-7] in, whether at least part of data meet predetermined condition C8, judge Wj9[kj-177,kj-8] in
At least partly whether data meet predetermined condition C9, judge Wj10[kj-178,kj-9] at least portion in
Whether divided data meets predetermined condition C10With judge Wj11[kj-179,kj-10] at least partly count in
According to whether meeting predetermined condition C11, do not repeat them here.
The present embodiment use random function judge window Wiz[ki-Az,ki+BzAt least portion in]
Whether divided data meets predetermined condition Cz, as a example by the embodiment shown in Figure 21, according to going
The rule preset on weight server 103, for potential cut-point kiDetermine window Wi1[ki-169,ki],
Judge Wi1[ki-169,kiIn], whether at least part of data meet predetermined condition C1, such as Figure 32
Shown in, Wi1Represent window Wi1[ki-169,ki], for judging Wi1[ki-169,kiIn] at least partly
Whether data meet predetermined condition C1, select 5 bytes, serial number 169 in Figure 32,127,
85, the byte " ■ " of 43 and 1 represents 1 byte of selection, adjacent two words selected respectively
42 bytes are differed between joint.By the byte " ■ " of serial number 169,127,85,43 and 1 point
It is not converted into a decimal value, is expressed as a1、a2、a3、a4And a5.Because 1
Individual byte is formed by 8, so each byte " ■ " is as numerical value, then an a1、a2、a3、
a4And a5In any one asIt is satisfied by 0≤as≤255。a1、a2、a3、a4And a5Composition 1*5
Matrix.From the random number obeying binomial distribution, select 256*5 random number, form matrix R,
It is expressed as: 256*5 is selected from the random number obeying binomial distribution
Individual random number, forms matrix G, is expressed as:
According to a1Value and the row at place, such as a1=36, a1It is positioned at the 1st row, then from matrix R
Search h36,1Corresponding value, searches g from matrix G36,1Corresponding value;According to a2Value and
The row at place, such as a2=48, a2It is positioned at the 2nd row, then from matrix R, looks into h48,2Corresponding value,
G is searched from matrix G48,2Corresponding value;According to a3Value and the row at place, such as a3=26, a3
It is positioned at the 3rd row, then from matrix R, searches h26,3Corresponding value, searches g from matrix G26,3Right
The value answered;According to a4Value and the row at place, such as a4=26, a4It is positioned at the 4th row, then from matrix
R searches h26,4Corresponding value, searches g from matrix G26,4Corresponding value;According to a5Value and
The row at place, such as a5=88, a5It is positioned at the 5th row, then from matrix R, searches h88,5Corresponding value,
G is searched from matrix G88,5Corresponding value.S1h=h36,1+h48,2+h26,3+h26,4+h88,5, because matrix
R obeys binomial distribution, therefore, S1hAlso binomial distribution is obeyed;S1g=g36,1+g48,2+g26,3+g26,4+
g88,5, because matrix G obeys binomial distribution, therefore S1gAlso binomial distribution is obeyed.Work as S1hAnd S1g
In have 1 for even number, then Wi1[ki-169,kiIn], at least part of data meet predetermined condition C1,
Work as S1hAnd S1gIt is odd number, then Wi1[ki-169,kiIn], at least part of data are unsatisfactory for predetermined bar
Part C1, C1The S that statement obtains according to the method described above1hAnd S1gIn have 1 for even number.Because S1hWith
S1gAll obey binomial distribution, therefore S1hProbability for even number is 1/2, S1gProbability for even number
It is 1/2, S1hAnd S1gIn to have 1 probability for even number be 1-1/4=3/4, therefore, Wi1[ki-169,
kiIn], at least part of data meet predetermined condition C1Probability be 3/4.In embodiment illustrated in fig. 21
In, Wi1[ki-169,kiIn], at least part of data meet predetermined condition C1.Shown in Figure 21
In embodiment, at Wi1[ki-169,ki]、Wi2[ki-170,ki-1]、Wi3[ki-171,ki-2]、Wi4
[ki-172,ki-3]、Wi5[ki-173,ki-4]、Wi6[ki-174,ki-5]、Wi7[ki-175,ki-6]、Wi8
[ki-176,ki-7]、Wi9[ki-177,ki-8]、Wi10[ki-178,ki-9] and Wi11[ki-179,ki-10]
In, each window size is identical, i.e. window size is 169 bytes, judges in window at least simultaneously
The mode whether part data meet predetermined condition is the most identical, is specifically shown in above-mentioned judgement Wi1[ki-169,
kiIn], whether at least part of data meet predetermined condition C1Description.Therefore, as shown in figure 32,Represent and judge window Wi2[ki-170,ki-1] in, whether at least part of data meet predetermined bar
Part C2Time 1 byte selecting respectively, in Figure 32, respectively with sequence number 170,128,86,
44 and 2 represent, differ 42 bytes between adjacent two bytes selected.By sequence number 170,128,
86, the byte of 44 and 2It is converted into a decimal value respectively, is expressed as b1、
b2、b3、b4And b5.Because 1 byte is formed by 8, so each byteAs one
Individual numerical value, then b1、b2、b3、b4And b5In any one bsIt is satisfied by 0≤bs≤255。b1、b2、
b3、b4And b5The matrix of composition 1*5.In present embodiment, it is judged that each window at least partly counts
Identical according to the mode whether meeting predetermined condition, the most still use same matrix R and G.Root
According to b1Value and the row at place, such as b1=66, b1It is positioned at the 1st row, then from matrix R, searches h66,1
Corresponding value, searches g from matrix G66,1Corresponding value;According to b2Value and the row at place,
Such as b2=48, b2It is positioned at the 2nd row, then from matrix R, searches h48,2Corresponding value, from matrix G
Search g48,2Corresponding value;According to b3Value and the row at place, such as b3=99, b3It is positioned at the 3rd row,
From matrix R, then search h99,3Corresponding value, searches g from matrix G99,3Corresponding value;According to
b4Value and the row at place, such as b4=26, b4It is positioned at the 4th row, then from matrix R, searches h26,4Right
The value answered, searches g from matrix G26,4Corresponding value;According to b5Value and the row at place, such as b5
=90, b5It is positioned at the 5th row, then from matrix R, searches h90,5Corresponding value, searches from matrix G
g90,5Corresponding value.S2h=h66,1+h48,2+h99,3+h26,4+h90,5, divide because matrix R obeys binomial
Cloth, therefore, S2hAlso binomial distribution is obeyed.S2g=g66,1+g48,2+g99,3+g26,4+g90,5, because
Matrix G obeys binomial distribution, therefore, S2gAlso binomial distribution is obeyed.Work as S2hAnd S2gIn have 1
Individual for even number, then Wi2[ki-170,ki-1] in, at least part of data meet predetermined condition C2, work as S2h
And S2gIt is odd number, then Wi2[ki-170,ki-1] in, at least part of data are unsatisfactory for predetermined condition
C2, S2hAnd S2gIn to have 1 probability for even number be 3/4.In embodiment illustrated in fig. 21, Wi2
[ki-170,ki-1] in, at least part of data meet predetermined condition C2.Use same rule, point
Do not judge Wi3[ki-171,ki-2] in, whether at least part of data meet predetermined condition C3, judge
Wi4[ki-172,ki-3] in, whether at least part of data meet predetermined condition C4, judge Wi5[ki
-173,ki-4] in, whether at least part of data meet predetermined condition C5, judge Wi6[ki-174,ki-5]
In at least partly data whether meet predetermined condition C6, judge Wi7[ki-175,ki-6] at least portion in
Whether divided data meets predetermined condition C7, judge Wi8[ki-176,ki-7] in, at least part of data are
No meet predetermined condition C8, judge Wi9[ki-177,ki-8] in, whether at least part of data meet pre-
Fixed condition C9, judge Wi10[ki-178,ki-9] in, whether at least part of data meet predetermined condition
C10With judge Wi11[ki-179,ki-10] in, whether at least part of data meet predetermined condition C11.Figure
In embodiment shown in 21, Wi5[ki-173,ki-4] in, at least part of data are unsatisfactory for predetermined bar
Part C5, from potential cut-point kiAlong data flow point cutpoint search direction jump 7 bytes,
The end position of the 7th byte obtains current potential cut-point kj, as shown in figure 22, according to for
The rule that duplicate removal server 103 is preset, for potential cut-point kjDetermine window Wj1[kj-169,kj],
Judge window Wj1[kj-169,kjIn], whether at least part of data meet predetermined condition C1Mode
With judge window Wi1[ki-169,kiIn], whether at least part of data meet predetermined condition C1Side
Formula is identical, the most as shown in figure 33, and Wj1Represent window Wj1[kj-169,kj], for judging Wj1[kj
-169,kjIn], whether at least part of data meet predetermined condition C1, serial number 169 in Figure 33,127,
85, the byte " ■ " of 43 and 1 represents 1 byte of selection, adjacent two words selected respectively
42 bytes are differed between joint.By the byte " ■ " of serial number 169,127,85,43 and 1 point
It is not converted into a decimal value, is expressed as a1'、a2'、a3'、a4' and a5'.Cause
It is that 1 byte is formed by 8, so each byte " ■ " is as numerical value, then an a1'、a2'、
a3'、a4' and a5Any one a in 's' it is satisfied by 0≤as'≤255。a1'、a2'、a3'、a4' and
a5' composition 1*5 matrix.Use and judge window Wi1[ki-169,kiIn], at least part of data are
No meet predetermined condition C1Identical matrix R and G, is expressed as: With
According to a1' value and the row at place, such as a1'=16, a1' be positioned at the 1st row, then look into from matrix R
Look for h16,1Corresponding value, searches g from matrix G16,1Corresponding value;According to a2' value and place
Row, such as a2'=98, a2' be positioned at the 2nd row, then from matrix R, search h98,2Corresponding value, from square
Battle array G searches g98,2Corresponding value;According to a3' value and the row at place, such as a3'=56, a3' position
In the 3rd row, then from matrix R, search h56,3Corresponding value, searches g from matrix G56,3Corresponding
Value;According to a4' value and the row at place, such as a4'=36, a4' it is positioned at the 4th row, then from matrix R
Middle lookup h36,4Corresponding value, searches g from matrix G36,4Corresponding value;According to a5' value and
The row at place, such as a5'=99, a5' be positioned at the 5th row, then from matrix R, search h99,5Corresponding
Value, searches g from matrix G99,5Corresponding value.S1h'=h16,1+h98,2+h56,3+h36,4+h99,5, because of
Binomial distribution, therefore, S is obeyed for matrix R1h' also obey binomial distribution;S1g'=g16,1+g98,2+
g56,3+g36,4+g99,5, because matrix G obeys binomial distribution, therefore S1g' also obey binomial distribution.
Work as S1h' and S1g1 is had for even number, then W in 'j1[kj-169,kjIn], at least part of data meet pre-
Fixed condition C1, work as S1h' and S1g' be odd number, then Wj1[kj-169,kjAt least part of data in]
It is unsatisfactory for predetermined condition C1, S1h' and S1g' to have 1 probability for even number be 3/4.
Judge Wi2[ki-170,ki-1] in, whether at least part of data meet predetermined condition C2Side
Formula and judge Wj2[kj-170,kj-1] in, whether at least part of data meet predetermined condition C2Side
Formula is identical, therefore, as shown in figure 33,Represent and judge window Wj2[kj-170,kj-1] in extremely
Whether small part data meet predetermined condition C2Time select 1 byte, adjacent two select
42 bytes are differed between byte.In fig. 33, respectively by sequence number 170,128,86,44
Represent with 2, between adjacent two bytes selected, differ 42 bytes.By sequence number 170,128,
86, the byte of 44 and 2It is converted into a decimal value respectively, is expressed as b1'、
b2'、b3'、b4' and b5'.Because 1 byte is formed by 8, so each byteAs
One numerical value, then b1'、b2'、b3'、b4' and b5Any one b in 's' it is satisfied by 0≤bs'≤255。
b1'、b2'、b3'、b4' and b5' composition 1*5 matrix.Use and judge window Wj2[kj-170,kj
-1] in, whether at least part of data meet predetermined condition C2Identical matrix R and G, according to b1'
Value and the row at place, such as b1'=210, b1' be positioned at the 1st row, then from matrix R, search h210,1Right
The value answered, searches g from matrix G210,1Corresponding value;According to b2' value and the row at place, as
b2'=156, b2' be positioned at the 2nd row, then from matrix R, search h156,2Corresponding value, from matrix G
Search g156,2Corresponding value;According to b3' value and the row at place, such as b3'=144, b3' it is positioned at the 3rd
Row, then search h from matrix R144,3Corresponding value, searches g from matrix G144,3Corresponding value;
According to b4' value and the row at place, such as b4'=60, b4' be positioned at the 4th row, then look into from matrix R
Look for h60,4Corresponding value, searches g from matrix G60,4Corresponding value;According to b5' value and place
Row, such as b5'=90, b5' be positioned at the 5th row, then from matrix R, search h90,5Corresponding value, from square
Battle array G searches g90,5Corresponding value.S2h'=h210,1+h156,2+h144,3+h60,4+h90,5,S2g'=g210,1+
g156,2+g144,3+g60,4+g90,5.Work as S2h' and S2g1 is had for even number, then W in 'j2[kj-170,kj
-1] in, at least part of data meet predetermined condition C2, work as S2h' and S2g' be odd number, then Wj2[kj
-170,kj-1] in, at least part of data are unsatisfactory for predetermined condition C2, S2h' and S2g1 is had for even in '
The probability of number is 3/4.
In like manner, it is judged that Wi3[ki-171,ki-2] in, whether at least part of data meet predetermined condition C3
Mode with judge Wj3[kj-171,kj-2] in, whether at least part of data meet predetermined condition C3
Mode identical, in like manner, it is judged that Wj4[kj-172,kj-3] in, whether at least part of data meet pre-
Fixed condition C4, judge Wj5[kj-173,kj-4] in, whether at least part of data meet predetermined condition C5、
Judge Wj6[kj-174,kj-5] in, whether at least part of data meet predetermined condition C6, judge Wj7
[kj-175,kj-6] in, whether at least part of data meet predetermined condition C7, judge Wj8[kj-176,
kj-7] in, whether at least part of data meet predetermined condition C8, judge Wj9[kj-177,kj-8] in
At least partly whether data meet predetermined condition C9, judge Wj10[kj-178,kj-9] at least portion in
Whether divided data meets predetermined condition C10With judge Wj11[kj-179,kj-10] at least partly count in
According to whether meeting predetermined condition C11, do not repeat them here.
The present embodiment use random function judge window Wiz[ki-Az,ki+BzAt least portion in]
Whether divided data meets predetermined condition Cz, as a example by the embodiment shown in Figure 21, according to going
The rule preset on weight server 103, for potential cut-point kiDetermine window Wi1[ki-169,ki],
Judge Wi1[ki-169,kiIn], whether at least part of data meet predetermined condition C1, such as Figure 32
Shown in, Wi1Represent window Wi1[ki-169,ki], for judging Wi1[ki-169,kiIn] at least partly
Whether data meet predetermined condition C1, select 5 bytes, serial number 169 in Figure 32,127,
85, the byte " ■ " of 43 and 1 represents 1 byte of selection, adjacent two words selected respectively
42 bytes are differed between joint.The byte " ■ " of serial number 169,127,85,43 and 1 is depended on
Secondary regard 40 positions as, be expressed as a1、a2、a3、a4…a40。a1、a2、a3、a4…
a40In arbitrary at, work as atWhen=0, Vat=-1, works as atWhen=1, Vat=1, according to atWith Vat
Corresponding relation, generates Va1、Va2、Va3、Va4…Va40.From the random number of Normal Distribution
40 randoms number of middle selection, are expressed as: h1、h2、h3、h4...h40。Sa=Va1*h1+
Va2*h2+Va3*h3+Va4*h4+…+Va40*h40.Because h1、h2、h3、h4...h40Just obey
State is distributed, therefore, and SaAlso Normal Distribution.Work as SaFor positive number, then Wi1[ki-169,ki]
In at least partly data meet predetermined condition C1, work as SaFor negative or 0, then Wi1[ki-169,ki]
In at least partly data be unsatisfactory for predetermined condition C1, SaProbability for positive number is 1/2.At Figure 21
In illustrated embodiment, Wi1[ki-169,kiIn], at least part of data meet predetermined condition C1.As
Shown in Figure 32,Represent and judge window Wi2[ki-170,ki-1] in, whether at least part of data
Meet predetermined condition C2Time 1 byte selecting respectively, in Figure 32, respectively with sequence number 170,
128,86,44 and 2 represent, differ 42 bytes between adjacent two bytes selected.By sequence
The byte of numbers 170,128,86,44 and 2Regard 40 positions successively as, be expressed as b1、
b2、b3、b4…b40。b1、b2、b3、b4…b40In arbitrary bt, work as btWhen=0, Vbt=-1,
Work as btWhen=1, Vbt=1, according to btWith VbtCorresponding relation, generates Vb1、Vb2、Vb3、Vb4…Vb40。
Judge window Wi1[ki-169,kiIn], whether at least part of data meet predetermined condition C1Mode
With judge window Wi2[ki-170,ki-1] in, whether at least part of data meet predetermined condition C2's
Mode is identical, therefore, uses identical random number: h1、h2、h3、h4...h40, Sb=Vb1
*h1+Vb2*h2+Vb3*h3+Vb4*h4+…+Vb40*h40.Because h1、h2、h3、h4...h40Clothes
From normal distribution, therefore, SbAlso Normal Distribution.Work as SbFor positive number, then Wi2[ki-170,
ki-1] in, at least part of data meet predetermined condition C2, work as SbFor negative or 0, then Wi2[ki-170,
ki-1] in, at least part of data are unsatisfactory for predetermined condition C2, SbProbability for positive number is 1/2.?
In embodiment illustrated in fig. 21, Wi2[ki-170,ki-1] in, at least part of data meet predetermined condition
C2.Use same rule, judge W respectivelyi3[ki-171,ki-2] in, whether at least part of data
Meet predetermined condition C3, judge Wi4[ki-172,ki-3] in, whether at least part of data meet predetermined
Condition C4, judge Wi5[ki-173,ki-4] in, whether at least part of data meet predetermined condition C5、
Judge Wi6[ki-174,ki-5] in, whether at least part of data meet predetermined condition C6, judge Wi7
[ki-175,ki-6] in, whether at least part of data meet predetermined condition C7, judge Wi8[ki-176,ki
-7] in, whether at least part of data meet predetermined condition C8, judge Wi9[ki-177,ki-8] at least
Whether part data meet predetermined condition C9, judge Wi10[ki-178,ki-9] at least part of data in
Whether meet predetermined condition C10With judge Wi11[ki-179,ki-10] in, whether at least part of data
Meet predetermined condition C11.In embodiment shown in Figure 21, Wi5[ki-173,ki-4] at least portion in
Divided data is unsatisfactory for predetermined condition C5, from potential cut-point kiAlong data flow point cutpoint lookup side
To 7 bytes of jumping, the end position the 7th byte obtains current potential cut-point kj, as
Shown in Figure 22, according to the rule preset for duplicate removal server 103, for potential cut-point kjDetermine
Window Wj1[kj-169,kj], it is judged that window Wj1[kj-169,kjIn], whether at least part of data meet
Predetermined condition C1Mode with judge window Wi1[ki-169,kiIn], at least part of data are the fullest
Foot predetermined condition C1Mode identical, the most as shown in figure 33, Wj1Represent window Wj1[kj-169,
kj], for judging Wj1[kj-169,kjIn], whether at least part of data meet predetermined condition C1, choosing
Select 5 bytes, the byte " ■ " of serial number 169,127,85,43 and 1 table respectively in Figure 33
Show 1 byte of selection, between adjacent two bytes selected, differ 42 bytes.By sequence number
Be 169,127,85,43 and 1 byte " ■ " regard 40 positions successively as, be expressed as a1'、
a2'、a3'、a4'…a40'。a1'、a2'、a3'、a4'…a40Arbitrary a in 't', work as at'=0
Time, Vat'=-1, works as atDuring '=1, Vat'=1, according to at' and Vat' corresponding relation, generate Va1'、
Va2'、Va3'、Va4'…Va40'.Judge window Wj1[kj-169,kjIn], whether at least part of data
Meet predetermined condition C1Mode with judge window Wi1[ki-169,kiIn], at least part of data are
No meet predetermined condition C1Mode identical, therefore use identical random number: h1、h2、h3、
h4...h40。Sa'=Va1'*h1+Va2'*h2+Va3'*h3+Va4'*h4+…+Va40'*h40.Because h1、
h2、h3、h4...h40Normal Distribution, therefore, Sa' also Normal Distribution.Work as Sa' it is
Positive number, then Wj1[kj-169,kjIn], at least part of data meet predetermined condition C1, work as Sa' it is negative
Number or 0, then Wj1[kj-169,kjIn], at least part of data are unsatisfactory for predetermined condition C1, Sa' just it is
The probability of number is 1/2.
Judge Wi2[ki-170,ki-1] in, whether at least part of data meet predetermined condition C2Side
Formula and judge Wj2[kj-170,kj-1] in, whether at least part of data meet predetermined condition C2Side
Formula is identical, therefore, as shown in figure 33,Represent and judge window Wj2[kj-170,kj-1] in extremely
Whether small part data meet predetermined condition C2Time select 1 byte, adjacent two select
42 bytes are differed between byte.In fig. 33, respectively by sequence number 170,128,86,44
Represent with 2, between adjacent two bytes selected, differ 42 bytes.By sequence number 170,128,
86, the byte of 44 and 2Regard 40 positions successively as, be expressed as b1'、b2'、b3'、b4'…
b40'。b1'、b2'、b3'、b4'…b40Arbitrary b in 't', work as btDuring '=0, Vbt'=-1, works as bt'=1
Time, Vbt'=1, according to bt' and Vbt' corresponding relation, generate Vb1'、Vb2'、Vb3'、Vb4'…Vb40'。
Judge Wi2[ki-170,ki-1] in, whether at least part of data meet predetermined condition C2Mode and
Judge Wj2[kj-170,kj-1] in, whether at least part of data meet predetermined condition C2Mode phase
With, therefore, use identical random number: h1、h2、h3、h4...h40, Sb'=Vb1'*h1+Vb2'
*h2+Vb3'*h3+Vb4'*h4+…+Vb40'*h40.Because h1、h2、h3、h4...h40Just obey
State is distributed, therefore, and Sb' also Normal Distribution.Work as Sb' for positive number, then Wj2[kj-170,kj-1]
In at least partly data meet predetermined condition C2, work as Sb' for negative or 0, then Wj2[kj-170,kj
-1] in, at least part of data are unsatisfactory for predetermined condition C2, Sb' it is 1/2 for the probability of positive number.
In like manner, it is judged that Wi3[ki-171,ki-2] in, whether at least part of data meet predetermined condition
C3Mode with judge Wj3[kj-171,kj-2] in, whether at least part of data meet predetermined condition
C3Mode identical, in like manner, it is judged that Wj4[kj-172,kj-3] in, whether at least part of data meet
Predetermined condition C4, judge Wj5[kj-173,kj-4] in, whether at least part of data meet predetermined condition
C5, judge Wj6[kj-174,kj-5] in, whether at least part of data meet predetermined condition C6, judge
Wj7[kj-175,kj-6] in, whether at least part of data meet predetermined condition C7, judge Wj8[kj
-176,kj-7] in, whether at least part of data meet predetermined condition C8, judge Wj9[kj-177,kj-8]
In at least partly data whether meet predetermined condition C9, judge Wj10[kj-178,kj-9] at least
Whether part data meet predetermined condition C10With judge Wj11[kj-179,kj-10] at least partly
Whether data meet predetermined condition C11, do not repeat them here.
The present embodiment use random function judge window Wiz[ki-Az,ki+BzAt least portion in]
Whether divided data meets predetermined condition Cz, still as a example by Figure 21 illustrated embodiment, according to
The rule preset on duplicate removal server 103, for potential cut-point kiDetermine window Wi1[ki-169,
ki], it is judged that Wi1[ki-169,kiIn], whether at least part of data meet predetermined condition C1, as
Shown in Figure 32, Wi1Represent window Wi1[ki-169,ki], for judging Wi1[ki-169,kiIn] at least
Whether part data meet predetermined condition C1, select 5 bytes, serial number 169 in Figure 32,127,
85, the byte " ■ " of 43 and 1 represents 1 byte of selection, adjacent two words selected respectively
42 bytes are differed between joint.The byte " ■ " of serial number 169,127,85,43 and 1 is turned
Changing 1 decimal number into, scope is 0-(2^40-1), uses uniform random number maker
1 designated value, record 0-(2^40-1) is generated for each decimal number in 0-(2^40-1)
In each decimal number and designated value between corresponding relation R, once specify, this ten enters
The designated value that number processed is corresponding is the most constant, and this designated value is obeyed and is uniformly distributed, if this designated value is
Even number, then Wi1[ki-169,kiIn], at least part of data meet predetermined condition C1If this refers to
Definite value is odd number, then Wi1[ki-169,kiIn], at least part of data are unsatisfactory for predetermined condition C1, C1
Represent that the designated value obtained according to the method described above is even number.Because equally distributed random number is even
The probability of number is 1/2, therefore, Wi1[ki-169,kiIn], at least part of data meet predetermined condition C1
Probability be 1/2.In the embodiment shown in Figure 21, use same rule, sentence respectively
Disconnected Wi2[ki-170,ki-1] in, whether at least part of data meet predetermined condition C2, it is judged that Wi3[ki
-171,ki-2] in, whether at least part of data meet predetermined condition C3, judge Wi4[ki-172,ki-3]
In at least partly data whether meet predetermined condition C4, judge Wi5[ki-173,ki-4] at least portion in
Whether divided data meets predetermined condition C5, do not repeat them here.
Work as Wi5[ki-173,ki-4] in, at least part of data are unsatisfactory for predetermined condition C5, from potential point
Cutpoint kiAlong data flow point cutpoint search direction 7 bytes of jump, in the end of the 7th byte
Position obtains current potential cut-point kj, as shown in figure 22, according to pre-for duplicate removal server 103
If rule, for potential cut-point kjDetermine window Wj1[kj-169,kj], it is judged that window Wj1[kj
-169,kjIn], whether at least part of data meet predetermined condition C1Mode with judge window Wi1
[ki-169,kiIn], whether at least part of data meet predetermined condition C1Mode identical, therefore,
Use the corresponding pass between each decimal number with the designated value in identical 0-(2^40-1)
It is R, as shown in figure 33, Wj1Represent window, for judging Wj1[kj-169,kjIn] at least partly
Whether data meet predetermined condition C1, selecting 5 bytes, in Figure 33, " ■ " represents 1 selected
Individual byte, differs 42 bytes between adjacent two bytes " ■ " selected.By serial number 169,
127, the byte " ■ " of 85,43 and 1 is converted into 1 decimal number, searches this decimal scale at R
The designated value that number is corresponding, if this designated value is even number, then Wj1[kj-169,kjIn] at least partly
Data meet predetermined condition C1If this designated value is odd number, then Wj1[kj-169,kjIn] at least
Part data are unsatisfactory for predetermined condition C1, because the probability that equally distributed random number is even number is
1/2, therefore, Wj1[kj-169,kjIn], at least part of data meet predetermined condition C1Probability be
1/2.In like manner, it is judged that Wi2[ki-170,ki-1] in, whether at least part of data meet predetermined condition C2
Mode and judge Wj2[kj-170,kj-1] in, whether at least part of data meet predetermined condition C2
Mode identical, it is judged that Wi3[ki-171,ki-2] in, whether at least part of data meet predetermined condition
C3Mode with judge Wj3[kj-171,kj-2] in, whether at least part of data meet predetermined condition
C3Mode identical, in like manner, it is judged that Wj4[kj-172,kj-3] in, whether at least part of data meet
Predetermined condition C4, judge Wj5[kj-173,kj-4] in, whether at least part of data meet predetermined condition
C5, judge Wj6[kj-174,kj-5] in, whether at least part of data meet predetermined condition C6, judge
Wj7[kj-175,kj-6] in, whether at least part of data meet predetermined condition C7, judge Wj8[kj
-176,kj-7] in, whether at least part of data meet predetermined condition C8, judge Wj9[kj-177,kj-8]
In at least partly data whether meet predetermined condition C9, judge Wj10[kj-178,kj-9] at least
Whether part data meet predetermined condition C10With judge Wj11[kj-179,kj-10] at least partly
Whether data meet predetermined condition C11, do not repeat them here.
Duplicate removal server 103 in the embodiment of the present invention shown in Fig. 1, refers to realize this
The device of the technical scheme described by bright embodiment, as shown in figure 18, generally includes central authorities' process
Unit, main storage and input/output interface.CPU, main storage and input
The intercommunication of output interface, main memory store executable instruction, CPU is held
The executable instruction of storage in row main storage, thus perform specific function, make duplicate removal service
Device 103 possesses specific function, the lookup data as described by embodiment of the present invention Figure 20 to Figure 33
Flow point cutpoint.Therefore, as shown in figure 19, according to the embodiment of the present invention shown in 20 to Figure 33,
Duplicate removal server 103, is preset with rule on duplicate removal server 103, and described rule is: be latent
M window W is determined at cut-point kx[k-Ax,k+Bx] and window Wx[k-Ax,k+Bx] right
Predetermined condition C answeredx, wherein, x is 1 to M continuous print natural number, M >=2, AxAnd BxFor
Integer;
Duplicate removal server 103 includes determining unit 1901 and judging processing unit 1902.Wherein,
Determine that unit 1901 is for performing step a):
A) it is current potential cut-point k according to described ruleiDetermine the window W of correspondenceiz[ki-Az,
ki+Bz], i and z is integer, and 1≤z≤M;
Judge processing unit 1902, be used for judging described window Wiz[ki-Az,ki+BzIn] at least
Whether part data meet predetermined condition Cz;
As described window Wiz[ki-Az,ki+BzIn], at least part of data are unsatisfactory for described predetermined bar
Part Cz, from described current potential cut-point kiAlong described data flow point cutpoint search direction jump N
Individual data flow point cutpoint minimum searches unit U, and N*U is not more than ‖ Bz‖+maxx(‖Ax‖),
Obtain new potential cut-point, the most described determine that unit 1901 is that described new potential cut-point is held
Row step a);
As described current potential cut-point kiM window in each window Wix[ki-Ax,ki
+BxIn], at least part of data meet predetermined condition Cx, the most described current potential cut-point kiFor number
According to flow point cutpoint.
Further, described rule also includes: at least two window Wie[ki-Ae,ki+Be] and Wif
[ki-Af,ki+Bf], meet condition: | Ae+Be|=| Af+Bf|, Ce=Cf.Further,
Described rule also includes: AeAnd AfFor positive integer.Further, described rule also includes: Ae
-1=Af, Be+ 1=Bf。
Further, it is judged that processing unit 1902 judges window specifically for using random function
Wiz[ki-Az,ki+BzIn], whether at least part of data meet predetermined condition Cz.Further,
Judge that processing unit 1902 specifically used hash function judges window Wiz[ki-Az,ki+BzIn] extremely
Whether small part data meet predetermined condition Cz。
Further, it is judged that processing unit 1902 is for as described window Wiz[ki-Az,ki+Bz]
In at least partly data be unsatisfactory for described predetermined condition Cz, from described current potential cut-point kiEdge
The described data flow point cutpoint search direction N number of data flow point cutpoint minimum of jump searches unit U, obtains
Described new potential cut-point, described determine that unit 1901 be that described potential cut-point newly is held
Row step a), according to described rule, the window W determined for described new potential cut-pointic[ki
-Ac,ki+Bc] left margin and described window Wiz[ki-Az,ki+Bz] right margin overlap or
The described window W determined for described new potential cut-pointic[ki-Ac,ki+Bc] left margin position
In described window Wiz[ki-Az,ki+BzWithin the scope of];Wherein, for described new potential segmentation
The described window W that point determinesic[ki-Ac,ki+Bc] it is according to described rule, for described new diving
Sequence the in the sequence that M the window determined at cut-point obtains according to data stream search direction
The window of one.
Further, it is judged that processing unit 1902 uses random function to judge described window Wiz[ki
-Az,ki+BzIn], whether at least part of data meet described predetermined condition Cz, specifically include:
At described window Wiz[ki-Az,ki+BzF byte is selected, by anti-for described F byte in]
Utilizing H time again, obtain F*H byte altogether, the most each byte is formed by 8, is designated as am,1…
am,8, represent the 1st to the 8th of m-th byte in described F*H byte, described F*H word
The position that joint is corresponding can be expressed as: Work as am,nWhen=1, Vam,n
=1, work as am,nWhen=0, Vam,n=-1, wherein am,nRepresent am,1…am,8In any one, described
Position corresponding to F*H byte is according to am,nWith Vam,nTransformational relation obtain matrix Va, described matrix
VaIt is expressed as: Select from the random number of service normal distribution
Select F*H*8 random number composition matrix R, described matrix R to be expressed as: By described matrix VaThe m row of m row and described matrix R
Random number be multiplied, then summation obtain a value, be embodied as Sam=Vam,1*hm,1+Vam,2
*hm,2+…+Vam,8*hm,8, in like manner, it is thus achieved that Sa1、Sa2... to SaF*H, add up Sa1、Sa2…
To SaF*HIn meet number K of value more than 0, when K is even number, the most described window Wiz[ki-Az,
ki+BzIn], at least part of data meet described predetermined condition Cz。
According to shown in 20 to Figure 33 the embodiment of the present invention provide based on whois lookup data
In the method for flow point cutpoint, for potential cut-point kiDetermine window Wix[ki-Ax, ki+Bx], its
In, x is respectively 1 and arrives M continuous print natural number, M >=2, can judge in M window every parallel
In one window, whether at least part of data meet predetermined condition Cx, or judge successively in window
At least partly whether data meet predetermined condition, it is also possible to window W successivelyi1[ki-A1, ki+B1],
In at least partly data meet predetermined condition C1Time, then judge Wi2[ki-A2, ki+B2In] at least
Part data meet predetermined condition C2Time, until judging Wim[ki-Am, ki+BmAt least portion in]
Divided data meets predetermined condition Cm.In embodiment, the judgement of other windows is identical with this, no longer
Repeat.
It addition, according to the embodiment of the present invention shown in 20 to Figure 33, on duplicate removal server 103
It is preset with rule, described rule: determine M window W for potential cut-point kx[k-Ax,k+Bx]
With window Wx[k-Ax,k+Bx] corresponding predetermined condition Cx, x is respectively 1 to M continuous print certainly
So number, M >=2, in this preset rules, A1、A2、A3…AmCan not be the most equal, B1、
B2、B3…BmCan not be the most equal, C1、C2、C3…CMCan not also be the most identical.
In the embodiment shown in Figure 21, at Wi1[ki-169,ki]、Wi2[ki-170,ki-1]、Wi3[ki
-171,ki-2]、Wi4[ki-172,ki-3]、Wi5[ki-173,ki-4]、Wi6[ki-174,ki-5]、Wi7
[ki-175,ki-6]、Wi8[ki-176,ki-7]、Wi9[ki-177,ki-8]、Wi10[ki-178,ki-9] and
Wi11[ki-179,ki-10] in, each window size is identical, i.e. window size is 169 bytes, simultaneously
Judge that the mode that in window, whether at least part of data meet predetermined condition is the most identical, be specifically shown in
State and judge Wi1[ki-169,kiIn], whether at least part of data meet predetermined condition C1Description, but
In the embodiment shown in Figure 11, Wi1[ki-169,ki]、Wi2[ki-170,ki-1]、Wi3[ki
-171,ki-2]、Wi4[ki-172,ki-3]、Wi5[ki-173,ki-4]、Wi6[ki-174,ki-5]、
Wi7[ki-175,ki-6]、Wi8[ki-176,ki-7]、Wi9[ki-177,ki-8]、Wi10[ki-168,
ki+ 1] and Wi11[ki-179,ki+ 3] window size can differ, and judges in window at least simultaneously
Whether part data meet the mode of predetermined condition can also differ.In all embodiments,
According to the rule preset for duplicate removal server 103, it is judged that window Wi1In at least partly whether data
Meet predetermined condition C1Mode with judge window Wj1In at least partly data whether meet predetermined
Condition C1Mode inevitable the most identical, it is judged that Wi2In at least partly data whether meet predetermined condition
C2Mode with judge Wj2In at least partly data whether meet predetermined condition C2Mode inevitable
Identical ... to judge window WiMIn at least partly data whether meet predetermined condition CMMode with
Judge window WjMIn at least partly data whether meet predetermined condition CMMode inevitable the most identical.
Do not repeat them here.
According to the embodiment of the present invention shown in 20 to Figure 33, duplicate removal server 103 is preset with
Rule, ka、ki、kj、klAnd kmFor searching cut-point along data flow point cutpoint search direction
Time obtain potential cut-point, ka、ki、kj、klAnd kmAll according to this rule.The present invention is real
Execute the window W in examplex[k-Ax,k+Bx] represent a particular range, select at this particular range
Data are to judge whether these data meet predetermined condition Cx, specifically, can be at this specific model
Enclose interior selection part data, it is also possible to select total data pre-to judge whether these data meet
Fixed condition Cx.Window concept specifically used in the embodiment of the present invention can refer to window Wx[k-Ax,
k+Bx], do not repeat them here.
Window Wx[k-Ax,k+BxIn], (k-Ax) and (k+Bx) represent this window Wx[k-
Ax,k+Bx] two borders, wherein (k-Ax) represent window Wx[k-Ax,k+Bx] relatively
It is positioned at data flow point cutpoint in potential cut-point k and searches reciprocal border, (k+Bx) table
Show window Wx[k-Ax,k+Bx] it is positioned at the lookup of data flow point cutpoint relative to potential cut-point k
The border in direction.Specifically, in embodiments of the present invention, in the data shown in Figure 20 to Figure 33
Flow point cutpoint search direction is from left to right, wherein (k-Ax) represent window Wx[k-Ax,k+
Bx] it is positioned at data flow point cutpoint lookup (the i.e. left side, reciprocal border relative to potential cut-point k
Boundary), (k+Bx) represent window Wx[k-Ax,k+Bx] it is positioned at number relative to potential cut-point k
Border (i.e. right margin) according to flow point cutpoint search direction.If shown in Figure 20 to Figure 33
Data flow point cutpoint search direction is from right to left, wherein (k-Ax) represent window Wx[k-Ax,
k+Bx] it is positioned at the reciprocal border of data flow point cutpoint lookup (i.e. relative to potential cut-point k
Right margin), (k+Bx) represent window Wx[k-Ax,k+Bx] relative to potential cut-point k position
Border (i.e. left margin) in data flow point cutpoint search direction.
Those of ordinary skill in the art are it is to be appreciated that combine embodiment of the present invention Figure 20 to Figure 33
The unit of each example described and algorithm steps, the key feature of the embodiment of the present invention can be with it
He combines at technology, presents with increasingly complex form, but still can comprise the crucial special of the present invention
Levy.May use standby cut-point in true environment, such as one embodiment is, according to
The rule preset for duplicate removal server 103, for potential cut-point kiDetermine 11 window Wx[k-Ax,
k+Bx] and window Wx[k-Ax,k+Bx] corresponding predetermined condition Cx, x be 1 to 11 continuous print from
So number, as each window W in 11 windowsx[k-Ax,k+BxIn], at least part of data are the fullest
Foot predetermined condition Cx, the most potential cut-point kiFor data flow point cutpoint, when the maximum exceeding setting
During data block, do not find cut-point yet, at this moment may use standby preset rules, standby
Preset rules is similar with the rule preset on duplicate removal server 103, and standby preset rules is:
The most potential cut-point kiDetermine 10 window Wx[k-Ax,k+Bx] and window Wx[k-Ax,k
+Bx] corresponding predetermined condition Cx, x is 1 to 10 continuous print natural numbers, determines when in 10 windows
Each window Wx[k-Ax,k+BxIn], at least part of data are satisfied by predetermined condition Cx, then dive
At cut-point kiFor data flow point cutpoint, when exceeding the maximum data block of setting, search not yet
During to data flow point cutpoint, from the end position of maximum data block as force-splitting point.
According to the embodiment of the present invention shown in 20 to Figure 33, on duplicate removal server 103
It is preset with rule, described rule determines M window for potential cut-point k, and differs
Provisioning request first has a potential cut-point k, and M the window that can be determined by judges
Potential cut-point k.
Those of ordinary skill in the art are it is to be appreciated that combine enforcement disclosed herein
The unit of each example that example describes and algorithm steps, it is possible to electronic hardware or calculating
Being implemented in combination in of machine software and electronic hardware.These functions are actually with hardware or software
Mode performs, and depends on application-specific and the design constraint of technical scheme.Specialty
Technical staff can to each specifically should be used for using different methods to realize described
Function, but this realization is it is not considered that beyond the scope of this invention.
Those skilled in the art is it can be understood that arrive, for the convenience described and letter
Clean, the specific works process of the system of foregoing description, device and unit, before being referred to
State the corresponding process in embodiment of the method, do not repeat them here.
In the several embodiments provided, it should be understood that disclosed system, method,
Can realize by another way.Such as, device embodiment described above is only
It is schematic, such as, the division of described unit, it is only a kind of logic function and divides,
Actual can have other dividing mode, the most multiple unit or assembly to tie when realizing
Close or be desirably integrated into another system, or some features can be ignored, or not performing.
Another point, shown or discussed coupling each other or direct-coupling or communication connection
Can be the INDIRECT COUPLING by some interfaces, device or unit or communication connection, permissible
It is electrical, machinery or other form.
The described unit illustrated as separating component can be or may not be physically
Separate, the parts shown as unit can be or may not be physical location,
I.e. may be located at a place, or can also be distributed on multiple NE.Permissible
Select some or all of unit therein to realize the present embodiment side according to the actual needs
The purpose of case.
It addition, each functional unit in each embodiment of the present invention can be integrated in one
In processing unit, it is also possible to be that unit is individually physically present, it is also possible to two or two
Individual above unit is integrated in a unit.
If described function realizes and as independent product using the form of SFU software functional unit
When selling or use, an embodied on computer readable non-volatile memory medium can be stored in
In.Based on such understanding, technical scheme is the most in other words to existing skill
Part or the part of this technical scheme that art contributes can be with the forms of software product
Embodying, this computer software product is stored in a non-volatile memory medium,
Including some instructions with so that a computer equipment (can be personal computer, take
Business device, or the network equipment etc.) perform the whole of method described in each embodiment of the present invention
Or part steps.And aforesaid non-volatile memory medium includes: USB flash disk, portable hard drive,
Read only memory (Read-Only Memory, ROM), magnetic disc or CD etc. are various
The medium of program code can be stored.
The above, the only detailed description of the invention of the present invention, but the protection model of the present invention
Enclosing and be not limited thereto, any those familiar with the art the invention discloses
Technical scope in, can readily occur in change or replace, all should contain the guarantor in the present invention
Within the scope of protecting.Therefore, protection scope of the present invention answers the described protection with claim
Scope is as the criterion.
Claims (48)
1. a method based on whois lookup data flow point cutpoint, it is characterised in that:
Being preset with rule on described server, described rule is: determine for potential cut-point k
M some px, some pxCorresponding window Wx[px-Ax,px+Bx] and window Wx[px-Ax,
px+Bx] corresponding predetermined condition Cx, wherein, x is 1 to M continuous print natural number, M >=2, Ax
And BxFor integer;
Described method includes:
A) it is current potential cut-point k according to described ruleiDetermine a pizAnd described some pizCorresponding
Window Wiz[piz-Az,piz+Bz], i and z is integer, and 1≤z≤M;
B) described window W is judgediz[piz-Az,piz+BzIn], whether at least part of data meet
Predetermined condition Cz;
As described window Wiz[piz-Az,piz+BzIn], at least part of data are unsatisfactory for described predetermined
Condition Cz, from described some pizAlong the described data flow point cutpoint search direction N number of data flow point of jump
Cutpoint minimum searches unit U, and N*U is not more than ‖ Bz‖+maxx(‖Ax‖+‖(ki-pix) ‖),
Obtain new potential cut-point, perform step a);
C) as described current potential cut-point kiM window in each window Wix[pix-
Ax,pix+BxIn], at least part of data meet predetermined condition Cx, the most described current potential segmentation
Point kiFor data flow point cutpoint.
Method the most according to claim 1, it is characterised in that described rule also includes:
At least two point peAnd pf, meet condition Ae=Af, Be=Bf, Ce=Cf。
Method the most according to claim 2, it is characterised in that described rule also includes:
Described at least two point peAnd pf, relative to described potential cut-point k, at described data flow point
Cutpoint searches in the reverse direction.
The most according to the method in claim 2 or 3, it is characterised in that described rule is also wrapped
Include: described at least two point peAnd pfBetween distance be 1 U.
5. according to the arbitrary described method of claims 1 to 3, it is characterised in that judge described
Window Wiz[piz-Az,piz+BzIn], whether at least part of data meet described predetermined condition Cz,
Specifically include:
Random function is used to judge described window Wiz[piz-Az,piz+BzAt least part of data in]
Whether meet described predetermined condition Cz。
Method the most according to claim 5, it is characterised in that described use random function
Judge described window Wiz[piz-Az,piz+BzIn], whether at least part of data meet described predetermined
Condition Cz, it is specially and uses hash function to judge described window Wiz[piz-Az,piz+BzIn] extremely
Whether small part data meet described predetermined condition Cz。
7. according to the arbitrary described method of claims 1 to 3, it is characterised in that when described window
Mouth Wiz[piz-Az,piz+BzIn], at least part of data are unsatisfactory for described predetermined condition Cz, from described
Point pizSearch along the described data flow point cutpoint search direction N number of data flow point cutpoint minimum of jump
Unit U, it is thus achieved that described new potential cut-point, according to described rule, for described new potential
The point p that cut-point determinesicCorresponding window Wic[pic-Ac,pic+Bc] left margin and described window
Mouth Wiz[piz-Az,piz+Bz] right margin overlap or determine for described new potential cut-point
Described some picCorresponding described window Wic[pic-Ac,pic+Bc] left margin be positioned at described window
Wiz[piz-Az,piz+BzWithin the scope of];Wherein, the institute determined for described new potential cut-point
State a picIt is according to described rule, puts according to number for M determined for described new potential cut-point
According to the point of sequence first in the sequence that stream search direction obtains.
Method the most according to claim 5, it is characterised in that use random function to judge
Described window Wiz[piz-Az,piz+BzIn], whether at least part of data meet described predetermined condition
Cz, specifically include:
At described window Wiz[piz-Az,piz+BzF byte is selected, by described F byte in]
Recycling H time, obtain F*H byte altogether, the most each byte is formed by 8, is designated as am,1…
am,8, represent the 1st to the 8th of m-th byte in described F*H byte, described F*H word
The position that joint is corresponding can be expressed as: Work as am,nWhen=1, Vam,n
=1, work as am,nWhen=0, Vam,n=-1, wherein am,nRepresent am,1…am,8In any one, described
Position corresponding to F*H byte is according to am,nWith Vam,nTransformational relation obtain matrix Va, described matrix
VaIt is expressed as: Select from the random number of service normal distribution
Select F*H*8 random number composition matrix R, described matrix R to be expressed as: By described matrix VaThe m row of m row and described matrix R
Random number be multiplied, then summation obtain a value, be embodied as Sam=Vam,1*hm,1+Vam,2
*hm,2+…+Vam,8*hm,8, in like manner, it is thus achieved that Sa1、Sa2... to SaF*H, add up Sa1、Sa2…
To SaF*HIn meet number K of value more than 0, when K is even number, the most described window Wiz[piz-Az,
piz+BzIn], at least part of data meet described predetermined condition Cz。
9. a method based on whois lookup data flow point cutpoint, it is characterised in that
Being preset with rule on described server, described rule is: determine for potential cut-point k
M window Wx[k-Ax,k+Bx] and window Wx[k-Ax,k+Bx] corresponding predetermined condition
Cx, wherein, x is 1 to M continuous print natural number, M >=2, AxAnd BxFor integer;
Described method includes:
A) it is current potential cut-point k according to described ruleiDetermine the window W of correspondenceiz[ki-Az,
ki+Bz], i and z is integer, and 1≤z≤M;
B) described window W is judgediz[ki-Az,ki+BzIn], whether at least part of data meet pre-
Fixed condition Cz;
As described window Wiz[ki-Az,ki+BzIn], at least part of data are unsatisfactory for described pre-
Fixed condition Cz, from described current potential cut-point kiLook into along described data flow point cutpoint
Looking for the direction N number of data flow point cutpoint minimum of jump to search unit U, N*U is not more than ‖
Bz‖+maxx(‖Ax‖), it is thus achieved that new potential cut-point, step a) is performed;
C) as described current potential cut-point kiM window in each window Wix[ki-Ax,
ki+BxIn], at least part of data meet predetermined condition Cx, the most described current potential cut-point kiFor
Data flow point cutpoint.
Method the most according to claim 9, it is characterised in that described rule also includes:
At least two window Wie[ki-Ae,ki+Be] and Wif[ki-Af,ki+Bf], meet condition: |
Ae+Be|=| Af+Bf|, Ce=Cf。
11. methods according to claim 10, it is characterised in that described rule also includes:
AeAnd AfFor positive integer.
12. according to the method described in claim 10 or 11, it is characterised in that described rule is also
Including: Ae-1=Af, Be+ 1=Bf。
13. according to the arbitrary described method of claim 9 to 11, it is characterised in that judge institute
State window Wiz[ki-Az,ki+BzIn], whether at least part of data meet described predetermined condition Cz,
Specifically include:
Random function is used to judge described window Wiz[ki-Az,ki+BzAt least part of data in]
Whether meet described predetermined condition Cz。
14. methods according to claim 13, it is characterised in that the random letter of described use
Number judges described Wiz[ki-Az,ki+BzIn], whether at least part of data meet described predetermined bar
Part Cz, it is specially and uses hash function to judge described Wiz[ki-Az,ki+BzAt least partly count in]
According to whether meeting described predetermined condition Cz。
15. according to the arbitrary described method of claim 9 to 11, it is characterised in that when described
Window Wiz[ki-Az,ki+BzIn], at least part of data are unsatisfactory for described predetermined condition Cz, from institute
State current potential cut-point kiAlong the described data flow point cutpoint search direction N number of data flow point of jump
Cutpoint minimum searches unit U, it is thus achieved that described new potential cut-point, according to described rule, for
The window W that described new potential cut-point determinesic[ki-Ac,ki+Bc] left margin and described window
Mouth Wiz[ki-Az,ki+Bz] right margin overlap or determine for described new potential cut-point
Described window Wic[ki-Ac,ki+Bc] left margin be positioned at described window Wiz[ki-Az,ki+
BzWithin the scope of];Wherein, the described window W determined for described new potential cut-pointic[ki-
Ac,ki+Bc] be according to described rule, M the window determined for described new potential cut-point by
According to the window of sequence first in the sequence that data stream search direction obtains.
16. methods according to claim 13, it is characterised in that use random function to sentence
Disconnected described window Wiz[ki-Az,ki+BzIn], whether at least part of data meet described predetermined bar
Part Cz, specifically include:
At described window Wiz[ki-Az,ki+BzF byte is selected, by anti-for described F byte in]
Utilizing H time again, obtain F*H byte altogether, the most each byte is formed by 8, is designated as am,1…
am,8, represent the 1st to the 8th of m-th byte in described F*H byte, described F*H word
The position that joint is corresponding can be expressed as: Work as am,nWhen=1, Vam,n
=1, work as am,nWhen=0, Vam,n=-1, wherein am,nRepresent am,1…am,8In any one, described
Position corresponding to F*H byte is according to am,nWith Vam,nTransformational relation obtain matrix Va, described matrix
VaIt is expressed as: Select from the random number of service normal distribution
Select F*H*8 random number composition matrix R, described matrix R to be expressed as:
17. 1 kinds for searching the server of data flow point cutpoint, it is characterised in that described clothes
Business device includes CPU and main storage, described CPU and described primary storage
Device communicates, and is preset with rule on described server, and described rule is: for potential cut-point k
Determine M some px, some pxCorresponding window Wx[px-Ax,px+Bx] and window Wx[px-Ax,
px+Bx] corresponding predetermined condition Cx, wherein, x is 1 to M continuous print natural number, M >=2, Ax
And BxFor integer;
Described main storage is used for storing executable instruction, and described CPU performs described
Executable instruction, to perform following steps:
A) it is current potential cut-point k according to described ruleiDetermine a pizAnd described some pizCorresponding
Window Wiz[piz-Az,piz+Bz], i and z is integer, and 1≤z≤M;
B) described window W is judgediz[piz-Az,piz+BzIn], whether at least part of data meet
Predetermined condition Cz;
As described window Wiz[piz-Az,piz+BzIn], at least part of data are unsatisfactory for described predetermined
Condition Cz, from described some pizAlong the described data flow point cutpoint search direction N number of data flow point of jump
Cutpoint minimum searches unit U, and N*U is not more than ‖ Bz‖+maxx(‖Ax‖+‖(ki-
pix) ‖), it is thus achieved that new potential cut-point, perform step a);
C) as described current potential cut-point kiM window in each window Wix[pix-
Ax,pix+BxIn], at least part of data meet predetermined condition Cx, the most described current potential segmentation
Point kiFor data flow point cutpoint.
18. according to server described in claim 17, it is characterised in that described rule also includes:
At least two point peAnd pf, meet condition Ae=Af, Be=Bf, Ce=Cf。
19. according to server described in claim 18, it is characterised in that described rule also includes:
Described at least two point peAnd pf, relative to described potential cut-point k, at described data flow point
Cutpoint searches in the reverse direction.
20. according to the server described in claim 18 or 19, it is characterised in that described rule
Also include: described at least two point peAnd pfBetween distance be 1 U.
21. according to the arbitrary described server of claim 17 to 19, it is characterised in that described
CPU judges described window W specifically for using random functioniz[piz-Az,piz+
BzIn], whether at least part of data meet described predetermined condition Cz。
22. servers according to claim 21, it is characterised in that described central authorities process
Unit judges described window W specifically for using hash functioniz[piz-Az,piz+BzIn] at least
Whether part data meet described predetermined condition Cz。
23. according to the arbitrary described server of claim 17 to 19, it is characterised in that work as institute
State window Wiz[piz-Az,piz+BzIn], at least part of data are unsatisfactory for described predetermined condition Cz, from
Described some pizMinimum along the described data flow point cutpoint search direction N number of data flow point cutpoint of jump
Search unit U, it is thus achieved that described new potential cut-point, according to described rule, for described new
The point p that potential cut-point determinesicCorresponding window Wic[pic-Ac,pic+Bc] left margin and institute
State window Wiz[piz-Az,piz+Bz] right margin overlap or be that described potential cut-point newly is true
Described fixed picCorresponding described window Wic[pic-Ac,pic+Bc] left margin be positioned at described
Window Wiz[piz-Az,piz+BzWithin the scope of];Wherein, determine for described new potential cut-point
Described some picIt is according to described rule, presses for M determined for described new potential cut-point
According to the point of sequence first in the sequence that data stream search direction obtains.
24. servers according to claim 21, it is characterised in that described central authorities process
Unit uses random function to judge described window Wiz[piz-Az,piz+BzAt least part of data in]
Whether meet described predetermined condition Cz, specifically include:
At described window Wiz[piz-Az,piz+BzF byte is selected, by described F byte in]
Recycling H time, obtain F*H byte altogether, the most each byte is formed by 8, is designated as am,1…
am,8, represent the 1st to the 8th of m-th byte in described F*H byte, described F*H word
The position that joint is corresponding can be expressed as: Work as am,nWhen=1, Vam,n
=1, work as am,nWhen=0, Vam,n=-1, wherein am,nRepresent am,1…am,8In any one, described
Position corresponding to F*H byte is according to am,nWith Vam,nTransformational relation obtain matrix Va, described matrix
VaIt is expressed as: Select from the random number of service normal distribution
Select F*H*8 random number composition matrix R, described matrix R to be expressed as: By described matrix VaThe m row of m row and described matrix R
Random number be multiplied, then summation obtain a value, be embodied as Sam=Vam,1*hm,1+Vam,2
*hm,2+…+Vam,8*hm,8, in like manner, it is thus achieved that Sa1、Sa2... to SaF*H, add up Sa1、Sa2…
To SaF*HIn meet number K of value more than 0, when K is even number, the most described window Wiz[piz-Az,
piz+BzIn], at least part of data meet described predetermined condition Cz。
25. 1 kinds for searching the server of data flow point cutpoint, it is characterised in that described clothes
Business device includes CPU and main storage, described CPU and described primary storage
Device communicates, and is preset with rule on described server, and described rule is: for potential cut-point k
Determine M window Wx[k-Ax,k+Bx] and window Wx[k-Ax,k+Bx] corresponding making a reservation for
Condition Cx, wherein, x is 1 to M continuous print natural number, M >=2, AxAnd BxFor integer;
Described main storage is used for storing executable instruction, and described CPU performs described
Executable instruction, to perform following steps:
A) it is current potential cut-point k according to described ruleiDetermine the window W of correspondenceiz[ki-Az,
ki+Bz], i and z is integer, and 1≤z≤M;
B) described window W is judgediz[ki-Az,ki+BzIn], whether at least part of data meet pre-
Fixed condition Cz;
As described window Wiz[ki-Az,ki+BzIn], at least part of data are unsatisfactory for described predetermined bar
Part Cz, from described current potential cut-point kiAlong described data flow point cutpoint search direction jump N
Individual data flow point cutpoint minimum searches unit U, and N*U is not more than ‖ Bz‖+maxx(‖Ax‖),
Obtain new potential cut-point, perform step a);
C) as described current potential cut-point kiM window in each window Wix[ki-Ax,
ki+BxIn], at least part of data meet predetermined condition Cx, the most described current potential cut-point kiFor
Data flow point cutpoint.
26. servers according to claim 25, it is characterised in that described rule is also wrapped
Include: at least two window Wie[ki-Ae,ki+Be] and Wif[ki-Af,ki+Bf], meet condition:
|Ae+Be|=| Af+Bf|, Ce=Cf。
27. servers according to claim 26, it is characterised in that for described server
Preset rules, described rule also includes: AeAnd AfFor positive integer.
28. according to the server described in claim 26 or 27, it is characterised in that described rule
Also include: Ae-1=Af, Be+ 1=Bf。
29. according to the arbitrary described server of claim 25 to 27, it is characterised in that described
CPU judges described window W specifically for using random functioniz[ki-Az,ki+Bz]
In at least partly data whether meet described predetermined condition Cz。
30. servers according to claim 29, it is characterised in that described central authorities process
Unit judges described window W specifically for using hash functioniz[ki-Az,ki+BzIn] at least
Whether part data meet described predetermined condition Cz。
31. according to the arbitrary described server of claim 25 to 27, it is characterised in that work as institute
State window Wiz[ki-Az,ki+BzIn], at least part of data are unsatisfactory for described predetermined condition Cz, from
Described current potential cut-point kiAlong the described data flow point cutpoint search direction N number of data stream of jump
Cut-point minimum searches unit U, it is thus achieved that described new potential cut-point, according to described rule,
The window W determined for described new potential cut-pointic[ki-Ac,ki+Bc] left margin with described
Window Wiz[ki-Az,ki+Bz] right margin overlap or be that described potential cut-point newly is true
Fixed described window Wic[ki-Ac,ki+Bc] left margin be positioned at described window Wiz[ki-Az,ki+
BzWithin the scope of];Wherein, the described window W determined for described new potential cut-pointic[ki-
Ac,ki+Bc] be according to described rule, M the window determined for described new potential cut-point by
According to the window of sequence first in the sequence that data stream search direction obtains.
32. servers according to claim 29, it is characterised in that described central authorities process
Unit uses random function to judge described window Wiz[ki-Az,ki+BzAt least part of data in]
Whether meet described predetermined condition Cz, specifically include:
At described window Wiz[ki-Az,ki+BzF byte is selected, by anti-for described F byte in]
Utilizing H time again, obtain F*H byte altogether, the most each byte is formed by 8, is designated as am,1…
am8, represent the 1st to the 8th of m-th byte in described F*H byte, described F*H word
The position that joint is corresponding can be expressed as: Work as am,nWhen=1, Vam,n
=1, work as am,nWhen=0, Vam,n=-1, wherein am,nRepresent am,1…am,8In any one, described
Position corresponding to F*H byte is according to am,nWith Vam,nTransformational relation obtain matrix Va, described matrix
VaIt is expressed as: Select from the random number of service normal distribution
Select F*H*8 random number composition matrix R, described matrix R to be expressed as: By described matrix VaThe m row of m row and described matrix R
Random number be multiplied, then summation obtain a value, be embodied as Sam=Vam,1*hm,1+Vam,2
*hm,2+…+Vam,8*hm,8, in like manner, it is thus achieved that Sa1、Sa2... to SaF*H, add up Sa1、Sa2…
To SaF*HIn meet number K of value more than 0, when K is even number, the most described window Wiz[ki-Az,
ki+BzIn], at least part of data meet described predetermined condition Cz。
33. 1 kinds for searching the server of data flow point cutpoint, it is characterised in that described
Being preset with rule on server, described rule is: determine M some p for potential cut-point kx、
Point pxCorresponding window Wx[px-Ax,px+Bx] and window Wx[px-Ax,px+Bx] right
Predetermined condition C answeredx, wherein, x is 1 to M continuous print natural number, M >=2, AxAnd BxFor
Integer;
Described server comprises determining that unit, is used for performing step a): a) according to described rule
For current potential cut-point kiDetermine a pizAnd described some pizCorresponding window Wiz[piz-Az,
piz+Bz], i and z is integer, and 1≤z≤M;
Judge processing unit, be used for judging described window Wiz[piz-Az,piz+BzAt least portion in]
Whether divided data meets predetermined condition Cz;
As described window Wiz[piz-Az,piz+BzIn], at least part of data are unsatisfactory for described predetermined
Condition Cz, from described some pizAlong the described data flow point cutpoint search direction N number of data flow point of jump
Cutpoint minimum searches unit U, and N*U is not more than ‖ Bz‖+maxx(‖Ax‖+‖(ki-
pix) ‖), it is thus achieved that new potential cut-point, the most described determine that unit is described new potential point
Cutpoint performs step a);
As described current potential cut-point kiM window in each window Wix[pix-Ax,
pix+BxIn], at least part of data meet predetermined condition Cx, the most described current potential cut-point ki
For data flow point cutpoint.
34. servers according to claim 33, it is characterised in that described rule is also wrapped
Include: at least two point peAnd pf, meet condition Ae=Af, Be=Bf, Ce=Cf。
35. servers according to claim 34, it is characterised in that described rule is also wrapped
Include: described at least two point peAnd pf, relative to described potential cut-point k, in described data
Flow point cutpoint searches in the reverse direction.
36. according to the server described in claim 34 or 35, it is characterised in that described rule
Also include: described at least two point peAnd pfBetween distance be 1 U.
37. according to the arbitrary described server of claim 33 to 35, it is characterised in that described
Judge that processing unit is specifically for using random function to judge described window Wiz[piz-Az,piz+
BzIn], whether at least part of data meet described predetermined condition Cz。
38. according to the server described in claim 37, it is characterised in that described judgement processes
Unit judges described window W specifically for using hash functioniz[piz-Az,piz+BzIn] at least
Whether part data meet described predetermined condition Cz。
39. according to the arbitrary described server of claim 33 to 35, it is characterised in that described
Judge that processing unit is for as described window Wiz[piz-Az,piz+BzIn], at least part of data are discontented with
Described predetermined condition C of footz, from described some pizAlong described data flow point cutpoint search direction jump N
Individual data flow point cutpoint minimum searches unit U, it is thus achieved that described new potential cut-point, described really
Cell is that described new potential cut-point performs step a), according to described rule, for described
The point p that new potential cut-point determinesicCorresponding window Wic[pic-Ac,pic+Bc] left margin
With described window Wiz[piz-Az,piz+Bz] right margin overlap or be described potential segmentation newly
The described window W that point determinesic[pic-Ac,pic+Bc] left margin be positioned at described window Wiz[piz-
Az,piz+BzWithin the scope of];Wherein, the described window determined for described new potential cut-point
Wic[pic-Ac,pic+Bc] it is according to described rule, the M determined for described new potential cut-point
The sequence that individual point obtains according to data stream search direction sorts first point.
40. according to the server described in claim 37, it is characterised in that described judgement processes
Unit judges described window W specifically for using random functioniz[piz-Az,piz+BzIn] at least
Whether part data meet described predetermined condition Cz, specifically include:
At described window Wiz[piz-Az,piz+BzF byte is selected, by described F byte in]
Recycling H time, obtain F*H byte altogether, the most each byte is formed by 8, is designated as am,1…
am,8, represent the 1st to the 8th of m-th byte in described F*H byte, described F*H word
The position that joint is corresponding can be expressed as: Work as am,nWhen=1, Vam,n
=1, work as am,nWhen=0, Vam,n=-1, wherein am,nRepresent am,1…am,8In any one, described
Position corresponding to F*H byte is according to am,nWith Vam,nTransformational relation obtain matrix Va, described matrix
VaIt is expressed as: Select from the random number of service normal distribution
Select F*H*8 random number composition matrix R, described matrix R to be expressed as: By described matrix VaThe m row of m row and described matrix R
Random number be multiplied, then summation obtain a value, be embodied as Sam=Vam,1*hm,1+Vam,2
*hm,2+…+Vam,8*hm,8, in like manner, it is thus achieved that Sa1、Sa2... to SaF*H, add up Sa1、Sa2…
To SaF*HIn meet number K of value more than 0, when K is even number, the most described window Wiz[piz-Az,
piz+BzIn], at least part of data meet described predetermined condition Cz。
41. 1 kinds for searching the server of data flow point cutpoint, it is characterised in that described
Being preset with rule on server, described rule is: determine M window W for potential cut-point kx
[k-Ax,k+Bx] and window Wx[k-Ax,k+Bx] corresponding predetermined condition Cx, wherein, x
It is 1 to M continuous print natural number, M >=2, AxAnd BxFor integer;
Described server comprises determining that unit, is used for performing step a):
A) it is current potential cut-point k according to described ruleiDetermine the window W of correspondenceiz[ki-Az,
ki+Bz], i and z is integer, and 1≤z≤M;
Judge processing unit, be used for judging described window Wiz[ki-Az,ki+BzIn] at least partly
Whether data meet predetermined condition Cz;
As described window Wiz[ki-Az,ki+BzIn], at least part of data are unsatisfactory for described predetermined bar
Part Cz, from described current potential cut-point kiAlong described data flow point cutpoint search direction jump N
Individual data flow point cutpoint minimum searches unit U, and N*U is not more than ‖ Bz‖+maxx(‖Ax‖),
Obtain new potential cut-point, the most described determine that unit is that described new potential cut-point performs step
A);
As described current potential cut-point kiM window in each window Wix[ki-Ax,ki
+BxIn], at least part of data meet predetermined condition Cx, the most described current potential cut-point kiFor
Data flow point cutpoint.
42. servers according to claim 41, it is characterised in that described rule is also wrapped
Include: at least two window Wie[ki-Ae,ki+Be] and Wif[ki-Af,ki+Bf], meet condition:
|Ae+Be|=| Af+Bf|, Ce=Cf。
43. servers according to claim 42, it is characterised in that described rule is also wrapped
Include: AeAnd AfFor positive integer.
44. according to the server described in claim 42 or 43, it is characterised in that described rule
Also include: Ae-1=Af, Be+ 1=Bf。
45. according to the arbitrary described server of claim 41 to 43, it is characterised in that described
Judge processing unit specifically for
Random function is used to judge described window Wiz[ki-Az,ki+BzAt least part of data in]
Whether meet described predetermined condition Cz。
46. servers according to claim 45, it is characterised in that described judgement processes
Unit specifically used hash function judges described window Wiz[ki-Az,ki+BzIn] at least partly
Whether data meet described predetermined condition Cz。
47. according to the arbitrary described server of claim 41 to 43, it is characterised in that described
Judge that processing unit is for as described window Wiz[ki-Az,ki+BzIn], at least part of data are not
Meet described predetermined condition Cz, from described current potential cut-point kiAlong described data flow point cutpoint
Search direction N number of data flow point cutpoint minimum of jumping searches unit U, it is thus achieved that described new potential point
Cutpoint, described determines that unit is that described new potential cut-point performs step a), according to described
Rule, the window W determined for described new potential cut-pointic[ki-Ac,ki+Bc] left margin
With described window Wiz[ki-Az,ki+Bz] right margin overlap or be described newly potential point
The described window W that cutpoint determinesic[ki-Ac,ki+Bc] left margin be positioned at described window Wiz[ki-
Az,ki+BzWithin the scope of];Wherein, the described window determined for described new potential cut-point
Wic[ki-Ac,ki+Bc] it is according to described rule, the M determined for described new potential cut-point
The sequence that individual window obtains according to data stream search direction sorts first window.
48. servers according to claim 46, it is characterised in that described judgement processes
Unit uses random function to judge described window Wiz[ki-Az,ki+BzAt least part of data in]
Whether meet described predetermined condition Cz, specifically include:
At described window Wiz[ki-Az,ki+BzF byte is selected, by anti-for described F byte in]
Utilizing H time again, obtain F*H byte altogether, the most each byte is formed by 8, is designated as am,1…
am,8, represent the 1st to the 8th of m-th byte in described F*H byte, described F*H word
The position that joint is corresponding can be expressed as: Work as am,nWhen=1, Vam,n
=1, work as am,nWhen=0, Vam,n=-1, wherein am,nRepresent am,1…am,8In any one, described
Position corresponding to F*H byte is according to am,nWith Vam,nTransformational relation obtain matrix Va, described matrix
VaIt is expressed as: Select from the random number of service normal distribution
Select F*H*8 random number composition matrix R, described matrix R to be expressed as: By described matrix VaThe m row of m row and described matrix R
Random number be multiplied, then summation obtain a value, be embodied as Sam=Vam,1*hm,1+Vam,2
*hm,2+…+Vam,8*hm,8, in like manner, it is thus achieved that Sa1、Sa2... to SaF*H, add up Sa1、Sa2…
To SaF*HIn meet number K of value more than 0, when K is even number, the most described window Wiz[ki-Az,
ki+BzIn], at least part of data meet described predetermined condition Cz。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201480000347.4A CN104169917B (en) | 2014-02-14 | 2014-02-27 | A kind of method based on whois lookup data flow point cutpoint and server |
CN201610439783.2A CN106095971B (en) | 2014-02-14 | 2014-02-27 | A kind of method and server for searching data flow cut-point based on server |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2014072115 | 2014-02-14 | ||
CNPCT/CN2014/072115 | 2014-02-14 | ||
CN201480000347.4A CN104169917B (en) | 2014-02-14 | 2014-02-27 | A kind of method based on whois lookup data flow point cutpoint and server |
PCT/CN2014/072648 WO2015120645A1 (en) | 2014-02-14 | 2014-02-27 | Server-based method for searching for data flow break point, and server |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610439783.2A Division CN106095971B (en) | 2014-02-14 | 2014-02-27 | A kind of method and server for searching data flow cut-point based on server |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104169917A CN104169917A (en) | 2014-11-26 |
CN104169917B true CN104169917B (en) | 2016-08-24 |
Family
ID=51912349
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201480000347.4A Active CN104169917B (en) | 2014-02-14 | 2014-02-27 | A kind of method based on whois lookup data flow point cutpoint and server |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104169917B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102214210A (en) * | 2011-05-16 | 2011-10-12 | 成都市华为赛门铁克科技有限公司 | Method, device and system for processing repeating data |
WO2012044366A1 (en) * | 2010-09-30 | 2012-04-05 | Commvault Systems, Inc. | Content aligned block-based deduplication |
-
2014
- 2014-02-27 CN CN201480000347.4A patent/CN104169917B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2012044366A1 (en) * | 2010-09-30 | 2012-04-05 | Commvault Systems, Inc. | Content aligned block-based deduplication |
CN102214210A (en) * | 2011-05-16 | 2011-10-12 | 成都市华为赛门铁克科技有限公司 | Method, device and system for processing repeating data |
Non-Patent Citations (2)
Title |
---|
《Improving Duplicate Elimination in Storage Systems》;DEEPAK R. BOBBARJUNG等;《ACM Transactions on Storage》;20061130;第2卷(第4期);第424-448页 * |
《基于存储环境感知的重复数据删除算法优化》;周敬利等;《计算机科学》;20110228;第38卷(第2期);第63-67页 * |
Also Published As
Publication number | Publication date |
---|---|
CN104169917A (en) | 2014-11-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20200285634A1 (en) | System for data sharing platform based on distributed data sharing environment based on block chain, method of searching for data in the system, and method of providing search index in the system | |
Pittel | Asymptotical growth of a class of random trees | |
CN104462609B (en) | RDF data storage and querying method with reference to star-like graph code | |
US20150358219A1 (en) | System and method for gathering information | |
CN106897409A (en) | Data point library storage method and device | |
Ou et al. | Order acceptance and scheduling with consideration of service level | |
Li et al. | ASLM: Adaptive single layer model for learned index | |
CN107038059A (en) | virtual machine deployment method and device | |
CN108415912A (en) | Data processing method based on MapReduce model and equipment | |
CN104182518A (en) | Collaborative filtering recommendation method and device | |
CN101551814B (en) | Method for data management and data search | |
EP3026585A1 (en) | Server-based method for searching for data flow break point, and server | |
Flores | Analysis of internal computer sorting | |
CN104169917B (en) | A kind of method based on whois lookup data flow point cutpoint and server | |
Street | Defining sets for block designs: an update | |
CN106095971B (en) | A kind of method and server for searching data flow cut-point based on server | |
CN105843859A (en) | Data processing method, device and equipment | |
JP7099316B2 (en) | Similarity arithmetic units, methods, and programs | |
Epstein et al. | Robust algorithms for total completion time | |
Tarjan et al. | Balancing applied to maximum network flow problems | |
CN106202503A (en) | Data processing method and device | |
WO2011016281A2 (en) | Information processing device and program for learning bayesian network structure | |
CN105373561B (en) | The method and apparatus for identifying the logging mode in non-relational database | |
CN114169488A (en) | Hybrid meta-heuristic algorithm-based vehicle path acquisition method with capacity constraint | |
Shibasaki et al. | Lagrangian bounds for large‐scale multicommodity network design: a comparison between Volume and Bundle methods |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20220118 Address after: 450046 Floor 9, building 1, Zhengshang Boya Plaza, Longzihu wisdom Island, Zhengdong New Area, Zhengzhou City, Henan Province Patentee after: xFusion Digital Technologies Co., Ltd. Address before: 518129 Bantian HUAWEI headquarters office building, Longgang District, Guangdong, Shenzhen Patentee before: HUAWEI TECHNOLOGIES Co.,Ltd. |
|
TR01 | Transfer of patent right |