CN104169917B - A kind of method based on whois lookup data flow point cutpoint and server - Google Patents

A kind of method based on whois lookup data flow point cutpoint and server Download PDF

Info

Publication number
CN104169917B
CN104169917B CN201480000347.4A CN201480000347A CN104169917B CN 104169917 B CN104169917 B CN 104169917B CN 201480000347 A CN201480000347 A CN 201480000347A CN 104169917 B CN104169917 B CN 104169917B
Authority
CN
China
Prior art keywords
window
point
data
predetermined condition
potential cut
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201480000347.4A
Other languages
Chinese (zh)
Other versions
CN104169917A (en
Inventor
于传帅
张程伟
徐林波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
XFusion Digital Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201480000347.4A priority Critical patent/CN104169917B/en
Priority to CN201610439783.2A priority patent/CN106095971B/en
Priority claimed from PCT/CN2014/072648 external-priority patent/WO2015120645A1/en
Publication of CN104169917A publication Critical patent/CN104169917A/en
Application granted granted Critical
Publication of CN104169917B publication Critical patent/CN104169917B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Embodiments provide a kind of method based on whois lookup data flow point cutpoint.By judging in M window, in some window, whether at least part of data meet predetermined condition in the embodiment of the present invention, search data flow point cutpoint, when data at least part of in some window are unsatisfactory for predetermined condition, then skip N × U length, obtain next potential cut-point, improve data flow point cutpoint search efficiency.

Description

A kind of method based on whois lookup data flow point cutpoint and server
Technical field
The present invention relates to areas of information technology, particularly relate to a kind of based on whois lookup data stream The method of cut-point and server.
Background technology
The continuous growth of data volume so that the data storage providing sufficient becomes currently stored field The severe challenge faced.A kind of mode tackling this challenge at present is the number utilizing and needing storage According to redundancy properties, use data de-duplication technology, thus reduce the data volume of storage.
In prior art, repetition based on content piecemeal (Content Defined Chunk, CDC) Data deletion algorithm, first has to data stream to be stored is divided into a lot of data block, and by data Stream is divided into data block and is accomplished by searching suitable cut-point in a stream, two adjacent data streams Data between cut-point constitute a data block.Calculate the eigenvalue of data block, thus search Whether there is the data block of same characteristic features value, if finding the data block that same characteristic features refers to, then Think that existence repeats data.Concrete, data de-duplication technology based on content piecemeal is should Search by sliding window technique (Sliding Window Technique) content based on file The cut-point of piecemeal, i.e. determines data flow point by the Rabin fingerprint of data in calculation window Cutpoint.Assume to search cut-point from the left side of data stream to the right, calculate in sliding window every time The fingerprint of data, and by fingerprint value based on given integer K delivery after, with given remainder R compares;If equal, the right-hand member of window is data flow point cutpoints, is otherwise continued by window Turn right slip one byte, the most cyclically carry out calculating and comparison, until arrive data stream end Tail.During data de-duplication based on content piecemeal, search data flow point cutpoint, need Consume substantial amounts of calculating resource, thus become the bottleneck promoting data de-duplication performance.
Summary of the invention
First aspect, embodiments provides a kind of based on the segmentation of whois lookup data stream The method of point, is preset with rule on described server, and described rule is: for potential cut-point
K determines M some px, some pxCorresponding window Wx[px-Ax,px+Bx] and window Wx[px- Ax,px+Bx] corresponding predetermined condition Cx, wherein, x is 1 to M continuous print natural number, M >=2, AxAnd BxFor integer;Described method includes:
A) it is current potential cut-point k according to described ruleiDetermine a pizAnd described some pizCorresponding Window Wiz[piz-Az,piz+Bz], i and z is integer, and 1≤z≤M;
B) described window W is judgediz[piz-Az,piz+BzIn], whether at least part of data meet pre- Fixed condition Cz
As described window Wiz[piz-Az,piz+BzIn], at least part of data are unsatisfactory for described predetermined bar Part Cz, from described some pizAlong described data flow point cutpoint search direction jump N number of data stream segmentation Point minimum searches unit U, and N*U is not more than ‖ Bz‖+maxx(‖Ax‖+‖(ki-pix) ‖), Obtain new potential cut-point, perform step a);
C) as described current potential cut-point kiM window in each window Wix[pix- Ax,pix+BxIn], at least part of data meet predetermined condition Cx, the most described current potential cut-point kiFor data flow point cutpoint.
In conjunction with in first aspect, the first possible implementation, described rule also includes: extremely Few two some peAnd pf, meet condition Ae=Af, Be=Bf, Ce=Cf
In conjunction with the first possible implementation of first aspect, the implementation that the second is possible In, described rule also includes: described at least two point peAnd pf, relative to described potential segmentation Point k, searches in the reverse direction at described data flow point cutpoint.
In conjunction with the realization side that the first possible implementation of first aspect or the second are possible Formula, in the third possible implementation, described rule also includes: described at least two point pe And pfBetween distance be 1 U.
In conjunction with first aspect, or first aspect first is to the third arbitrary possible implementation, In 4th kind of possible implementation, it is judged that described window Wiz[piz-Az,piz+BzAt least portion in] Whether divided data meets described predetermined condition Cz, specifically include:
Random function is used to judge described window Wiz[piz-Az,piz+BzIn], at least part of data are No meet described predetermined condition Cz
In conjunction with the 4th kind of possible implementation of first aspect, the 5th kind of possible implementation In, described use random function judges described window Wiz[piz-Az,piz+BzAt least partly count in] According to whether meeting described predetermined condition Cz, it is specially and uses hash function to judge described window Wiz [piz-Az,piz+BzIn], whether at least part of data meet described predetermined condition Cz
In conjunction with first aspect, or first aspect first is to the 5th kind of arbitrary possible implementation, In 6th kind of possible implementation, as described window Wiz[piz-Az,piz+BzIn] at least partly Data are unsatisfactory for described predetermined condition Cz, from described some pizSearch along described data flow point cutpoint The direction N number of data flow point cutpoint minimum of jump searches unit U, it is thus achieved that described new potential segmentation Point, according to described rule, the some p determined for described new potential cut-pointicCorresponding window Wic[pic-Ac,pic+Bc] left margin and described window Wiz[piz-Az,piz+Bz] right margin weight The described some p closed or determine for described new potential cut-pointicCorresponding described window Wic [pic-Ac,pic+Bc] left margin be positioned at described window Wiz[piz-Az,piz+BzWithin the scope of];Its In, the described some p determined for described new potential cut-pointicIt is according to described rule, for institute State in the sequence that M point that new potential cut-point determines obtains according to data stream search direction The point of sequence first.
In conjunction with the 4th kind of possible implementation of first aspect, the 7th kind of possible implementation In, use random function to judge described window Wiz[piz-Az,piz+BzAt least part of data in] Whether meet described predetermined condition Cz, specifically include:
At described window Wiz[piz-Az,piz+BzF byte is selected, by described F byte in] Recycling H time, obtain F*H byte altogether, the most each byte is formed by 8, is designated as am,1… am,8, represent the 1st to the 8th of m-th byte in described F*H byte, described F*H word The position that joint is corresponding can be expressed as: a 1 , 1 a 1 , 2 ... a 1 , 8 a 2 , 1 a 2 , 2 ... a 2 , 8 . . . . . . . . . . . . a F * H , 1 a F * H , 2 ... a F * H , 8 , Work as am,nWhen=1, Vam,n =1, work as am,nWhen=0, Vam,n=-1, wherein am,nRepresent am,1…am,8In any one, described Position corresponding to F*H byte is according to am,nWith Vam,nTransformational relation obtain matrix Va, described matrix VaIt is expressed as: V a 1 , 1 V a 1 , 2 ... V a 1 , 8 V a 2 , 1 V a 2 , 2 ... V a 2 , 8 . . . . . . . . . . . . V a F * H , 1 V a F * H , 2 ... V a F * H , 8 , Select from the random number of service normal distribution Select F*H*8 random number composition matrix R, described matrix R to be expressed as: h 1 , 1 h 1 , 2 ... h 1 , 8 h 2 , 1 h 2 , 2 ... h 2 , 8 . . . . . . . . . . . . h F * H , 1 h F * H , 2 ... h F * H , 8 , By described matrix VaThe m row of m row and described matrix R Random number be multiplied, then summation obtain a value, be embodied as Sam=Vam,1*hm,1+Vam,2 *hm,2+…+Vam,8*hm,8, in like manner, it is thus achieved that Sa1、Sa2... to SaF*H, add up Sa1、Sa2… To SaF*HIn meet number K of value more than 0, when K is even number, the most described window Wiz[piz-Az, piz+BzIn], at least part of data meet described predetermined condition Cz
Second aspect, embodiments provides a kind of based on the segmentation of whois lookup data stream The method of point, is preset with rule on described server, and described rule is: for potential cut-point K determines M window Wx[k-Ax,k+Bx] and window Wx[k-Ax,k+Bx] corresponding predetermined bar Part Cx, wherein, x is 1 to M continuous print natural number, M >=2, AxAnd BxFor integer;
Described method includes:
A) it is current potential cut-point k according to described ruleiDetermine the window W of correspondenceiz[ki-Az, ki+Bz], i and z is integer, and 1≤z≤M;
B) described window W is judgediz[ki-Az,ki+BzIn], whether at least part of data meet predetermined Condition Cz
As described window Wiz[ki-Az,ki+BzIn], at least part of data are unsatisfactory for described predetermined Condition Cz, from described current potential cut-point kiSearch along described data flow point cutpoint The direction N number of data flow point cutpoint minimum of jump searches unit U, and N*U is not more than ‖ Bz‖+maxx(‖Ax‖), it is thus achieved that new potential cut-point, step a) is performed;
C) as described current potential cut-point kiM window in each window Wix[ki-Ax, ki+BxIn], at least part of data meet predetermined condition Cx, the most described current potential cut-point kiFor Data flow point cutpoint.
In conjunction with in second aspect, the first possible implementation, described rule also includes: extremely Few two window Wie[ki-Ae,ki+Be] and Wif[ki-Af,ki+Bf], meet condition: | Ae+Be |=| Af+Bf|, Ce=Cf
In conjunction with the first possible implementation of second aspect, the implementation that the second is possible In, described rule also includes: AeAnd AfFor positive integer.
In conjunction with the realization side that the first possible implementation of second aspect or the second are possible Formula, in the implementation that the third is possible, described rule also includes: Ae-1=Af, Be+ 1= Bf
In conjunction with second aspect, or second aspect first is to the 3rd arbitrary possible implementation, the In four kinds of possible implementations, it is judged that described window Wiz[ki-Az,ki+BzAt least partly count in] Predetermined condition C is met according to the most describedz, specifically include:
Random function is used to judge described window Wiz[ki-Az,ki+BzIn], at least part of data are No meet described predetermined condition Cz
In conjunction with the 4th kind of possible implementation of second aspect, the 5th kind of possible implementation In, described use random function judges described window Wiz[ki-Az,ki+BzAt least part of data in] Whether meet described predetermined condition Cz, it is specially and uses hash function to judge described window Wiz[ki- Az,ki+BzIn], whether at least part of data meet described predetermined condition Cz
In conjunction with second aspect, or second aspect first is to the 5th arbitrary possible implementation, the In six kinds of possible implementations, as described window Wiz[ki-Az,ki+BzAt least partly count in] According to being unsatisfactory for described predetermined condition Cz, from described current potential cut-point kiAlong described data flow point The cutpoint search direction N number of data flow point cutpoint minimum of jump searches unit U, it is thus achieved that described new diving At cut-point, according to described rule, the window W determined for described new potential cut-pointic[ki -Ac,ki+Bc] left margin and described window Wiz[ki-Az,ki+Bz] right margin overlap or The described window W determined for described new potential cut-pointic[ki-Ac,ki+Bc] left margin position In described window Wiz[ki-Az,ki+BzWithin the scope of];Wherein, for described new potential segmentation The described window W that point determinesic[ki-Ac,ki+Bc] it is according to described rule, for described new diving Sequence the in the sequence that M the window determined at cut-point obtains according to data stream search direction The window of one.
In conjunction with the 4th kind of possible implementation of second aspect, the 7th kind of possible implementation In, use random function to judge described window Wiz[ki-Az,ki+BzIn], at least part of data are No meet described predetermined condition Cz, specifically include:
At described window Wiz[ki-Az,ki+BzF byte is selected, by anti-for described F byte in] Utilizing H time again, obtain F*H byte altogether, the most each byte is formed by 8, is designated as am,1… am,8, represent the 1st to the 8th of m-th byte in described F*H byte, described F*H word The position that joint is corresponding can be expressed as: a 1 , 1 a 1 , 2 ... a 1 , 8 a 2 , 1 a 2 , 2 ... a 2 , 8 . . . . . . . . . . . . a F * H , 1 a F * H , 2 ... a F * H , 8 , Work as am,nWhen=1, Vam,n =1, work as am,nWhen=0, Vam,n=-1, wherein am,nRepresent am,1…am,8In any one, described Position corresponding to F*H byte is according to am,nWith Vam,nTransformational relation obtain matrix Va, described matrix VaIt is expressed as: V a 1 , 1 V a 1 , 2 ... V a 1 , 8 V a 2 , 1 V a 2 , 2 ... V a 2 , 8 . . . . . . . . . . . . V a F * H , 1 V a F * H , 2 ... V a F * H , 8 , Select from the random number of service normal distribution Select F*H*8 random number composition matrix R, described matrix R to be expressed as: h 1 , 1 h 1 , 2 ... h 1 , 8 h 2 , 1 h 2 , 2 ... h 2 , 8 . . . . . . . . . . . . h F * H , 1 h F * H , 2 ... h F * H , 8 , By described matrix VaThe m row of m row and described matrix R Random number be multiplied, then summation obtain a value, be embodied as Sam=Vam,1*hm,1+Vam,2 *hm,2+…+Vam,8*hm,8, in like manner, it is thus achieved that Sa1、Sa2... to SaF*H, add up Sa1、Sa2… To SaF*HIn meet number K of value more than 0, when K is even number, the most described window Wiz[ki-Az, ki+BzIn], at least part of data meet described predetermined condition Cz
The third aspect, embodiments provides a kind of clothes for searching data flow point cutpoint Business device, described server includes CPU and main storage, described CPU Communicating with described main storage, be preset with rule on described server, described rule is: for Potential cut-point k determines M some px, some pxCorresponding window Wx[px-Ax,px+Bx] and window Mouth Wx[px-Ax,px+Bx] corresponding predetermined condition Cx, wherein, x be 1 to M continuous print from So number, M >=2, AxAnd BxFor integer;
Described main storage is used for storing executable instruction, and described CPU performs described Executable instruction, to perform following steps:
A) it is current potential cut-point k according to described ruleiDetermine a pizAnd described some pizCorresponding Window Wiz[piz-Az,piz+Bz], i and z is integer, and 1≤z≤M;
B) described window W is judgediz[piz-Az,piz+BzIn], whether at least part of data meet pre- Fixed condition Cz
As described window Wiz[piz-Az,piz+BzIn], at least part of data are unsatisfactory for described predetermined bar Part Cz, from described some pizAlong described data flow point cutpoint search direction jump N number of data stream segmentation Point minimum searches unit U, and N*U is not more than ‖ Bz‖+maxx(‖Ax‖+‖(ki-pix) ‖), Obtain new potential cut-point, perform step a);
C) as described current potential cut-point kiM window in each window Wix[pix- Ax,pix+BxIn], at least part of data meet predetermined condition Cx, the most described current potential cut-point kiFor data flow point cutpoint.
In conjunction with in the third aspect, the first possible implementation, described rule also includes: extremely Few two some peAnd pf, meet condition Ae=Af, Be=Bf, Ce=Cf
In conjunction with the first possible implementation of the third aspect, the implementation that the second is possible In, described rule also includes: described at least two point peAnd pf, relative to described potential point Cutpoint k, searches in the reverse direction at described data flow point cutpoint.
In conjunction with the realization side that the first possible implementation of the third aspect or the second are possible Formula, in the third possible implementation, described rule also includes: described at least two point pe And pfBetween distance be 1 U.
In conjunction with the third aspect, or first to the 3rd arbitrary possible implementation, the 4th kind may Implementation in, described CPU specifically for
Random function is used to judge described window Wiz[piz-Az,piz+BzIn], at least part of data are No meet described predetermined condition Cz
In conjunction with the 4th kind of possible implementation of the third aspect, the 5th kind of possible implementation In, described CPU judges described window W specifically for using hash functioniz[piz-Az, piz+BzIn], whether at least part of data meet described predetermined condition Cz
In conjunction with the third aspect, or first to the 5th arbitrary possible implementation, the 6th kind may Implementation in, as described window Wiz[piz-Az,piz+BzIn], at least part of data are unsatisfactory for Described predetermined condition Cz, from described some pizJump N number of along described data flow point cutpoint search direction Data flow point cutpoint minimum searches unit U, it is thus achieved that described new potential cut-point, according to described Rule, the some p determined for described new potential cut-pointicCorresponding window Wic[pic-Ac,pic+ Bc] left margin and described window Wiz[piz-Az,piz+Bz] right margin overlap or be described The described some p that new potential cut-point determinesicCorresponding described window Wic[pic-Ac,pic+Bc] Left margin be positioned at described window Wiz[piz-Az,piz+BzWithin the scope of];Wherein, for described newly Described some p determining of potential cut-pointicIt is according to described rule, for described new potential point M the point that cutpoint determines is according to the point of sequence first in the sequence of data stream search direction acquisition.
In conjunction with the 4th kind of possible implementation of the third aspect, the 7th kind of possible implementation In, described CPU uses random function to judge described window Wiz[piz-Az,piz+Bz] In at least partly data whether meet described predetermined condition Cz, specifically include:
At described window Wiz[piz-Az,piz+BzF byte is selected, by described F byte in] Recycling H time, obtain F*H byte altogether, the most each byte is formed by 8, is designated as am,1… am,8, represent the 1st to the 8th of m-th byte in described F*H byte, described F*H word The position that joint is corresponding can be expressed as: a 1 , 1 a 1 , 2 ... a 1 , 8 a 2 , 1 a 2 , 2 ... a 2 , 8 . . . . . . . . . . . . a F * H , 1 a F * H , 2 ... a F * H , 8 , Work as am,nWhen=1, Vam,n =1, work as am,nWhen=0, Vam,n=-1, wherein am,nRepresent am,1…am,8In any one, described Position corresponding to F*H byte is according to am,nWith Vam,nTransformational relation obtain matrix Va, described matrix VaIt is expressed as: V a 1 , 1 V a 1 , 2 ... V a 1 , 8 V a 2 , 1 V a 2 , 2 ... V a 2 , 8 . . . . . . . . . . . . V a F * H , 1 V a F * H , 2 ... V a F * H , 8 , Select from the random number of service normal distribution Select F*H*8 random number composition matrix R, described matrix R to be expressed as: h 1 , 1 h 1 , 2 ... h 1 , 8 h 2 , 1 h 2 , 2 ... h 2 , 8 . . . . . . . . . . . . h F * H , 1 h F * H , 2 ... h F * H , 8 , By described matrix VaThe m row of m row and described matrix R Random number be multiplied, then summation obtain a value, be embodied as Sam=Vam,1*hm,1+Vam,2 *hm,2+…+Vam,8*hm,8, in like manner, it is thus achieved that Sa1、Sa2... to SaF*H, add up Sa1、Sa2… To SaF*HIn meet number K of value more than 0, when K is even number, the most described window Wiz[piz-Az, piz+BzIn], at least part of data meet described predetermined condition Cz.Fourth aspect, the present invention is real Execute example and provide a kind of server for searching data flow point cutpoint, during described server includes Central Processing Unit and main storage, described CPU and described main storage communication, Being preset with rule on described server, described rule is: determine M window for potential cut-point k Mouth Wx[k-Ax,k+Bx] and window Wx[k-Ax,k+Bx] corresponding predetermined condition Cx, its In, x is 1 to M continuous print natural number, M >=2, AxAnd BxFor integer;
Described main storage is used for storing executable instruction, and described CPU performs described Executable instruction, to perform following steps:
A) it is current potential cut-point k according to described ruleiDetermine the window W of correspondenceiz[ki-Az,ki +Bz], i and z is integer, and 1≤z≤M;
B) described window W is judgediz[ki-Az,ki+BzIn], whether at least part of data meet predetermined Condition Cz
As described window Wiz[ki-Az,ki+BzIn], at least part of data are unsatisfactory for described predetermined condition Cz, from described current potential cut-point kiJump N number of along described data flow point cutpoint search direction Data flow point cutpoint minimum searches unit U, and N*U is not more than ‖ Bz‖+maxx(‖Ax‖), obtain Obtain potential cut-point newly, perform step a);
C) as described current potential cut-point kiM window in each window Wix[ki-Ax, ki+BxIn], at least part of data meet predetermined condition Cx, the most described current potential cut-point kiFor Data flow point cutpoint.
In conjunction with in fourth aspect, the first possible implementation, described rule also includes: extremely Few two window Wie[ki-Ae,ki+Be] and Wif[ki-Af,ki+Bf], meet condition: | Ae+Be |=| Af+Bf|, Ce=Cf
In conjunction with the first possible implementation of fourth aspect, the implementation that the second is possible In, described rule also includes: AeAnd AfFor positive integer.
In conjunction with the realization side that the first possible implementation of fourth aspect or the second are possible Formula, in the implementation that the third is possible, described rule also includes: Ae-1=Af, Be+ 1= Bf
In conjunction with fourth aspect, or first to the 3rd arbitrary possible implementation, the 4th kind may Implementation in, described CPU specifically for
Random function is used to judge described window Wiz[ki-Az,ki+BzIn], at least part of data are No meet described predetermined condition Cz
In conjunction with the 4th kind of possible implementation of fourth aspect, the 5th kind of possible implementation In, described CPU judges described window W specifically for using hash functioniz[ki-Az, ki+BzIn], whether at least part of data meet described predetermined condition Cz
In conjunction with fourth aspect, or first to the 5th arbitrary possible implementation, the 6th kind may Implementation in, as described window Wiz[ki-Az,ki+BzIn], at least part of data are unsatisfactory for Described predetermined condition Cz, from described current potential cut-point kiSearch along described data flow point cutpoint The direction N number of data flow point cutpoint minimum of jump searches unit U, it is thus achieved that described new potential cut-point, According to described rule, the window W determined for described new potential cut-pointic[ki-Ac,ki+Bc] Left margin and described window Wiz[ki-Az,ki+Bz] right margin overlap or be described newly The described window W that determines of potential cut-pointic[ki-Ac,ki+Bc] left margin be positioned at described window Mouth Wiz[ki-Az,ki+BzWithin the scope of];Wherein, determine for described new potential cut-point Described window Wic[ki-Ac,ki+Bc] it is according to described rule, for described new potential cut-point The sequence that M the window determined obtains according to data stream search direction sorts first window.
In conjunction with the 4th kind of possible implementation of fourth aspect, the 7th kind of possible implementation In, described CPU uses random function to judge described window Wiz[ki-Az,ki+Bz] In at least partly data whether meet described predetermined condition Cz, specifically include:
At described window Wiz[ki-Az,ki+BzF byte is selected, by anti-for described F byte in] Utilizing H time again, obtain F*H byte altogether, the most each byte is formed by 8, is designated as am,1… am,8, represent the 1st to the 8th of m-th byte in described F*H byte, described F*H word The position that joint is corresponding can be expressed as: a 1 , 1 a 1 , 2 ... a 1 , 8 a 2 , 1 a 2 , 2 ... a 2 , 8 . . . . . . . . . . . . a F * H , 1 a F * H , 2 ... a F * H , 8 , Work as am,nWhen=1, Vam,n =1, work as am,nWhen=0, Vam,n=-1, wherein am,nRepresent am,1…am,8In any one, described Position corresponding to F*H byte is according to am,nWith Vam,nTransformational relation obtain matrix Va, described matrix VaIt is expressed as: V a 1 , 1 V a 1 , 2 ... V a 1 , 8 V a 2 , 1 V a 2 , 2 ... V a 2 , 8 . . . . . . . . . . . . V a F * H , 1 V a F * H , 2 ... V a F * H , 8 , Select from the random number of service normal distribution Select F*H*8 random number composition matrix R, described matrix R to be expressed as: h 1 , 1 h 1 , 2 ... h 1 , 8 h 2 , 1 h 2 , 2 ... h 2 , 8 . . . . . . . . . . . . h F * H , 1 h F * H , 2 ... h F * H , 8 , By described matrix VaThe m row of m row and described matrix R Random number be multiplied, then summation obtain a value, be embodied as Sam=Vam,1*hm,1+Vam,2 *hm,2+…+Vam,8*hm,8, in like manner, it is thus achieved that Sa1、Sa2... to SaF*H, add up Sa1、Sa2… To SaF*HIn meet number K of value more than 0, when K is even number, the most described window Wiz[ki-Az, ki+BzIn], at least part of data meet described predetermined condition Cz
5th aspect, embodiments provides a kind of clothes for searching data flow point cutpoint Business device, is preset with rule on described server, and described rule is: true for potential cut-point k Determine M some px, some pxCorresponding window Wx[px-Ax,px+Bx] and window Wx[px-Ax,px+ Bx] corresponding predetermined condition Cx, wherein, x is 1 to M continuous print natural number, M >=2, AxWith BxFor integer;
Described server includes: processing unit, is used for performing step a):
A) it is current potential cut-point k according to described ruleiDetermine a pizAnd described some pizCorresponding Window Wiz[piz-Az,piz+Bz], i and z is integer, and 1≤z≤M;
Judge processing unit, be used for judging described window Wiz[piz-Az,piz+BzIn] at least partly Whether data meet predetermined condition Cz
As described window Wiz[piz-Az,piz+BzIn], at least part of data are unsatisfactory for described predetermined bar Part Cz, from described some pizAlong described data flow point cutpoint search direction jump N number of data stream segmentation Point minimum searches unit U, and N*U is not more than ‖ Bz‖+maxx(‖Ax‖+‖(ki-pix) ‖), Obtain new potential cut-point, the most described determine that unit is that described new potential cut-point performs step A);
As described current potential cut-point kiM window in each window Wix[pix-Ax, pix+BxIn], at least part of data meet predetermined condition Cx, the most described current potential cut-point kiFor Data flow point cutpoint.
In conjunction with in the 5th aspect, the first possible implementation, described rule also includes: extremely Few two some peAnd pf, meet condition Ae=Af, Be=Bf, Ce=Cf
In conjunction with the first possible implementation of the 5th aspect, the implementation that the second is possible In, described rule also includes: described at least two point peAnd pf, relative to described potential point Cutpoint k, searches in the reverse direction at described data flow point cutpoint.
In conjunction with the implementation that the first possible implementation of the 5th aspect or the second are possible, In the implementation that the third is possible, described rule also includes: described at least two point peWith pfSpacing be 1 U.
In conjunction with the 5th aspect, or first to the 3rd arbitrary possible implementation, the 4th kind may Implementation in, the described specifically used random function of judgement processing unit judges described window Wiz[piz-Az,piz+BzIn], whether at least part of data meet described predetermined condition Cz
In conjunction with the 4th kind of possible implementation of the 5th aspect, the 5th kind of possible implementation In, described decision process unit judges described window W specifically for using hash functioniz[piz-Az, piz+BzIn], whether at least part of data meet described predetermined condition Cz
In conjunction with the 5th aspect, or first to the 5th arbitrary possible implementation, the 6th kind may Implementation in, described judgement processing unit is for as described window Wiz[piz-Az,piz+Bz] In at least partly data be unsatisfactory for described predetermined condition Cz, from described some pizAlong described data stream Cut-point search direction N number of data flow point cutpoint minimum of jumping searches unit U, it is thus achieved that described new Potential cut-point, described determines that unit is that described new potential cut-point performs step a), root According to described rule, the some p determined for described new potential cut-pointicCorresponding window Wic[pic- Ac,pic+Bc] left margin and described window Wiz[piz-Az,piz+Bz] right margin overlap or The described window W determined for described new potential cut-pointic[pic-Ac,pic+Bc] left margin It is positioned at described window Wiz[piz-Az,piz+BzWithin the scope of];Wherein, for described new potential point The described window W that cutpoint determinesic[pic-Ac,pic+Bc] it is according to described rule, for described new Sequence the in the sequence that potential cut-point determine M point obtains according to data stream search direction The point of one.
In conjunction with the 4th kind of possible implementation of the 5th aspect, the 7th kind of possible implementation In, described judgement processing unit judges described window W specifically for using random functioniz[piz- Az,piz+BzIn], whether at least part of data meet described predetermined condition Cz, specifically include:
At described window Wiz[piz-Az,piz+BzF byte is selected, by described F byte in] Recycling H time, obtain F*H byte altogether, the most each byte is formed by 8, is designated as am,1… am,8, represent the 1st to the 8th of m-th byte in described F*H byte, described F*H word The position that joint is corresponding can be expressed as: a 1 , 1 a 1 , 2 ... a 1 , 8 a 2 , 1 a 2 , 2 ... a 2 , 8 . . . . . . . . . . . . a F * H , 1 a F * H , 2 ... a F * H , 8 , Work as am,nWhen=1, Vam,n =1, work as am,nWhen=0, Vam,n=-1, wherein am,nRepresent am,1…am,8In any one, described Position corresponding to F*H byte is according to am,nWith Vam,nTransformational relation obtain matrix Va, described matrix VaIt is expressed as: V a 1 , 1 V a 1 , 2 ... V a 1 , 8 V a 2 , 1 V a 2 , 2 ... V a 2 , 8 . . . . . . . . . . . . V a F * H , 1 V a F * H , 2 ... V a F * H , 8 , Select from the random number of service normal distribution Select F*H*8 random number composition matrix R, described matrix R to be expressed as: h 1 , 1 h 1 , 2 ... h 1 , 8 h 2 , 1 h 2 , 2 ... h 2 , 8 . . . . . . . . . . . . h F * H , 1 h F * H , 2 ... h F * H , 8 , By described matrix VaThe m row of m row and described matrix R Random number be multiplied, then summation obtain a value, be embodied as Sam=Vam,1*hm,1+Vam,2 *hm,2+…+Vam,8*hm,8, in like manner, it is thus achieved that Sa1、Sa2... to SaF*H, add up Sa1、Sa2… To SaF*HIn meet number K of value more than 0, when K is even number, the most described window Wiz[piz-Az, piz+BzIn], at least part of data meet described predetermined condition Cz
6th aspect, embodiments provides a kind of clothes for searching data flow point cutpoint Business device, is preset with rule on described server, and described rule is: true for potential cut-point k Determine M window Wx[k-Ax,k+Bx] and window Wx[k-Ax,k+Bx] corresponding predetermined condition Cx, Wherein, x is 1 to M continuous print natural number, M >=2, AxAnd BxFor integer;
Described server comprises determining that unit, is used for performing step a:
A) it is current potential cut-point k according to described ruleiDetermine the window W of correspondenceiz[ki-Az,ki +Bz], i and z is integer, and 1≤z≤M;
Judge processing unit, be used for judging described window Wiz[ki-Az,ki+BzAt least partly count in] According to whether meeting predetermined condition Cz
As described window Wiz[ki-Az,ki+BzIn], at least part of data are unsatisfactory for described predetermined condition Cz, from described current potential cut-point kiJump N number of along described data flow point cutpoint search direction Data flow point cutpoint minimum searches unit U, and N*U is not more than ‖ Bz‖+maxx(‖Ax‖), obtain Obtain potential cut-point newly, perform step a);
C works as described current potential cut-point kiM window in each window Wix[ki-Ax, ki+BxIn], at least part of data meet predetermined condition Cx, the most described current potential cut-point kiFor Data flow point cutpoint.
In conjunction with in the 6th aspect, the first possible implementation, described rule also includes: extremely Few two window Wie[ki-Ae,ki+Be] and Wif[ki-Af,ki+Bf], meet condition: | Ae+Be |=| Af+Bf|, Ce=Cf
In conjunction with the first possible implementation of the 6th aspect, the implementation that the second is possible In, described rule also includes: AeAnd AfFor positive integer.
In conjunction with the realization side that the first possible implementation of the 6th aspect or the second are possible Formula, in the implementation that the third is possible, described rule also includes: Ae-1=Af, Be+ 1= Bf
In conjunction with the 6th aspect, or first to the 3rd arbitrary possible implementation, the 4th kind may Implementation in, described judgement processing unit specifically for
Random function is used to judge described window Wiz[ki-Az,ki+BzIn], at least part of data are No meet described predetermined condition Cz
In conjunction with the 4th kind of possible implementation of the 6th aspect, the 5th kind of possible implementation In, described judgement processing unit judges described window W specifically for using hash functioniz[ki-Az, ki+BzIn], whether at least part of data meet described predetermined condition Cz
In conjunction with the 6th aspect, or first to the 5th arbitrary possible implementation, the 6th kind may Implementation in, described judgement processing unit is for as described window Wiz[ki-Az,ki+Bz] In at least partly data be unsatisfactory for described predetermined condition Cz, from described current potential cut-point kiEdge The described data flow point cutpoint search direction N number of data flow point cutpoint minimum of jump searches unit U, obtains Described new potential cut-point, described determine that unit is that described new potential cut-point performs step A), according to described rule, the window W determined for described new potential cut-pointic[ki-Ac, ki+Bc] left margin and described window Wiz[ki-Az,ki+Bz] right margin overlap or be The described window W that described new potential cut-point determinesic[ki-Ac,ki+Bc] left margin be positioned at Described window Wiz[ki-Az,ki+BzWithin the scope of];Wherein, for described new potential cut-point The described window W determinedic[ki-Ac,ki+Bc] it is according to described rule, for described new potential The sequence that M the window that cut-point determines obtains according to data stream search direction sorts first Window.
In conjunction with the 4th kind of possible implementation of the 6th aspect, the 7th kind of possible implementation In, described judgement processing unit uses random function to judge described window Wiz[ki-Az,ki+Bz] In at least partly data whether meet described predetermined condition Cz, specifically include:
At described window Wiz[ki-Az,ki+BzF byte is selected, by anti-for described F byte in] Utilizing H time again, obtain F*H byte altogether, the most each byte is formed by 8, is designated as am,1… am,8, represent the 1st to the 8th of m-th byte in described F*H byte, described F*H word The position that joint is corresponding can be expressed as: a 1 , 1 a 1 , 2 ... a 1 , 8 a 2 , 1 a 2 , 2 ... a 2 , 8 . . . . . . . . . . . . a F * H , 1 a F * H , 2 ... a F * H , 8 , Work as am,nWhen=1, Vam,n =1, work as am,nWhen=0, Vam,n=-1, wherein am,nRepresent am,1…am,8In any one, described Position corresponding to F*H byte is according to am,nWith Vam,nTransformational relation obtain matrix Va, described matrix VaIt is expressed as: V a 1 , 1 V a 1 , 2 ... V a 1 , 8 V a 2 , 1 V a 2 , 2 ... V a 2 , 8 . . . . . . . . . . . . V a F * H , 1 V a F * H , 2 ... V a F * H , 8 , Select from the random number of service normal distribution Select F*H*8 random number composition matrix R, described matrix R to be expressed as: h 1 , 1 h 1 , 2 ... h 1 , 8 h 2 , 1 h 2 , 2 ... h 2 , 8 . . . . . . . . . . . . h F * H , 1 h F * H , 2 ... h F * H , 8 , By described matrix VaThe m row of m row and described matrix R Random number be multiplied, then summation obtain a value, be embodied as Sam=Vam,1*hm,1+Vam,2 *hm,2+…+Vam,8*hm,8, in like manner, it is thus achieved that Sa1、Sa2... to SaF*H, add up Sa1、Sa2… To SaF*HIn meet number K of value more than 0, when K is even number, the most described window Wiz[ki-Az, ki+BzIn], at least part of data meet described predetermined condition Cz
The embodiment of the present invention at least partly counts in some window in M window by judging According to whether meeting predetermined condition, search data flow point cutpoint, when at least portion in some window Divided data is unsatisfactory for predetermined condition, then skip N*U length, it is thus achieved that next potential cut-point, Improve data flow point cutpoint search efficiency.
Accompanying drawing explanation
Fig. 1 is embodiment of the present invention one application scenarios schematic diagram;
Fig. 2 is data flow point cutpoint schematic diagrams;
Fig. 3 is for searching data flow point cutpoint schematic diagram;
Fig. 4 is embodiment of the present invention method schematic diagram;
Fig. 5 and Fig. 6 is for searching data flow point cutpoint embodiment schematic diagram;
Fig. 7 and Fig. 8 is for searching data flow point cutpoint embodiment schematic diagram;
Fig. 9 and Figure 10 is for looking for data flow point cutpoint embodiment schematic diagram;
Figure 11 and Figure 12 and Figure 13 is for looking for data flow point cutpoint embodiment schematic diagram;
Figure 14 and Figure 15 is for looking for data flow point cutpoint embodiment schematic diagram;
Figure 16 and Figure 17 is for judge in window, whether at least part of data meet predetermined condition schematic diagram;
Figure 18 is duplicate removal server architecture figure;
Figure 19 is duplicate removal server architecture figure;
Figure 20 is embodiment of the present invention method schematic diagram;
Figure 21 and Figure 22 is for searching data flow point cutpoint embodiment schematic diagram;
Figure 23 and Figure 24 is for searching data flow point cutpoint embodiment schematic diagram;
Figure 25 and Figure 26 is for looking for data flow point cutpoint embodiment schematic diagram;
Figure 27 and Figure 28 and Figure 29 is for looking for data flow point cutpoint embodiment schematic diagram;
Figure 30 and Figure 31 is for looking for data flow point cutpoint embodiment schematic diagram;
Figure 32 and Figure 33 is for judge in window, whether at least part of data meet predetermined condition schematic diagram.
Specific embodiment
Along with the continuous progress of memory technology, data generation amount is also being continuously increased, substantial amounts of number Highest demand is proposed according to memory capacity.While memory capacity increases, too increase IT Equipment purchase cost, in order to alleviate the demand contradictory between data volume and memory capacity, saves IT Equipment purchase cost, introduces data de-duplication technology in field of data storage.
Embodiment of the present invention one uses scene to be data backup scene.Data backup is for preventing The loss of data that a variety of causes causes, backs up data to other storages by backup server and is situated between The process of matter.Data backup system framework as shown in Figure 1.Data backup system includes client End (101a, 101b ... 101n), the backup server 102, (letter of data de-duplication server Claim duplicate removal server or heavily delete server) 103 and storage device (104a, 104b ... 104n). Wherein client (101a, 101b ... 101n) can be application server, work station etc.;Standby The data that part server 102 generates for backup client;Duplicate removal server 103 is used for performing standby The data de-duplication task of number evidence;Storage device (104a, 104b ... 104n) is as depositing The storage medium of the data after storage data de-duplication, can be that disk array, tape library etc. are deposited Storage media.Client (101a, 101b ... 101n), backup server 102, repetition data are deleted Except server 103 and storage device (104a, 104b ... 104n) can pass through switch, office The modes such as territory net, the Internet, optical fiber connect, and the said equipment may be located at same place, it is possible to To be positioned at different location.Backup server 102, heavily delete server 103, storage device (104a, 104b ... 104n) can be independent physical equipment, or be physically integrated in implementing It is integrated, or backup server 102 becomes one with heavily deleting server 103, or heavily delete Server 103 and storage device (104a, 104b ... 104n) become one.
The duplicate removal server 103 data stream execution data de-duplication operations to Backup Data, one As comprise the following steps:
1) data flow point cutpoint is searched: search data flow point in a stream according to special algorithm Cutpoint;
2) data block is divided according to the data flow point cutpoint found;
3) eigenvalue of data block is calculated: calculate the eigenvalue of data block as identifying these data The feature of block;Calculated eigenvalue is added to the data block of file corresponding to this data stream Feature list in;SHA-1 or MD5 algorithm is typically utilized to calculate the eigenvalue of data block;
4) identical block detection: the eigenvalue of calculated data block is special with data block Levy already present eigenvalue in list to compare to determine whether there is identical block;
5) deleting duplicated data block: detected by identical block, if it find that data block is special Levy the eigenvalue that in list, existence is identical with this data block, then need not to store again this data block or The repetition data block stored number that person determines according to backup policy decides whether to store this data block.
By duplicate removal server 103, the data stream of Backup Data is performed data de-duplication operations Step understand, data flow point cutpoint is searched as the committed step of data de-duplication operations, Directly determine the performance of data de-duplication.
In the embodiment of the present invention, duplicate removal server 103 receives the backup that backup server 102 sends File, performs data de-duplication to this document and processes.Usual pending backup file is in duplicate removal Presenting with data-stream form in server 103, duplicate removal server 103 searches the segmentation in data stream During point, data flow point cutpoint minimum to be determined searches unit, concrete as in figure 2 it is shown, such as Potential cut-point k1Continuous two the data flow point cutpoint minimums being positioned at sequence number respectively 1 and 2 are looked into Looking between unit, potential cut-point refers to that needs carry out judging whether to split as data stream The point of point;As a k1It is a data flow point cutpoint, data flow point cutpoint search direction such as Fig. 2 Shown in middle arrow, searching next potential cut-point is k7, i.e. it is positioned at sequence number and is respectively 7 and 8 Continuous two data flow point cutpoint minimums search between unit, as potential cut-point k7For data Flow point cutpoint, two the most adjacent data flow point cutpoint k1、k7Between data be 1 data Block.Data flow point cutpoint minimum search unit specifically can determine according to actual needs, here with As a example by 1 byte (Byte), i.e. the data flow point cutpoint minimum of serial number 1,2,7 and 8 is looked into Unit-sized is looked for be 1 byte.The data flow point usual table of cutpoint search direction as shown in Figure 2 Show and searched to end-of-file direction by file header, or by file Caudad file header direction, this enforcement In example as a example by searching to end-of-file direction from file header.
In data de-duplication scene, usual data block is the least, and data de-duplication rate is the highest, The most easily find repetition data block, but the metadata quantity thus generated is the biggest, and number According to block little to a certain extent after, data de-duplication rate would not add, but metadata Quantity but can sharply increase.Therefore, it is necessary to control data block size, in actual application, generally The minima of meeting setting data block, such as 4KB (4096 bytes), simultaneously take account of repetition Data deletion rate, also can the maximum of setting data block, i.e. data block size not can exceed that maximum Value, such as 12KB (12288 bytes).A kind of specific implementation is as it is shown on figure 3, go Weight server 103 is searching data flow point cutpoint, k along direction shown in arrowaFor current lookup The data flow point cutpoint arrived, from kaNext potential point is searched to data flow point cutpoint search direction Cutpoint, for meeting minimum data block requirement, it will usually start along data from data flow point cutpoint Flow point cutpoint search direction skips minimum data block size, from the beginning of minimum data block end position Search, namely using minimum data block end position as next potential cut-point ki.At this In inventive embodiments, can be first from kaPoint is along data flow point cutpoint search direction jump minimum data Block 4KB, i.e. 4*1024=4096 byte.From kaPoint jumps along data flow point cutpoint search direction 4096 bytes, the end position the 4096th byte obtains some ki, as potential cut-point, Such as kiContinuous two the data flow point cutpoint minimums being positioned at sequence number respectively 4096 and 4097 are looked into Look between unit.Still as a example by Fig. 3, kaThe data flow point cutpoint arrived for current lookup, edge Next data flow point cutpoint is searched in direction as shown in Figure 3, if it exceeds data block maximum is still So do not find next data flow point cutpoint, then from kaPoint starts to look into data flow point cutpoint Direction is looked for reach the some k of data block maximumzAs next data flow point cutpoint, force Segmentation.
The embodiment of the present invention provides a kind of side based on duplicate removal whois lookup data flow point cutpoint Method, as shown in Figure 4, including:
Being preset with rule on duplicate removal server 103, described rule is: true for potential cut-point k Determine M some px, some pxCorresponding window Wx[px-Ax,px+Bx] and window Wx[px-Ax,px+ Bx] corresponding predetermined condition Cx, wherein, x is 1 to M continuous print natural number, M >=2, AxWith BxFor integer;Wherein, pxSpacing d with potential cut-point kxIndividual data flow point cutpoint is minimum Searching unit, data flow point cutpoint minimum is searched unit and is represented with U, and in the present embodiment, U=1 is individual Byte,.In the implementation shown in Fig. 3, about the value of M, one of which realization side Formula, M*U value is not more than the ultimate range between two the adjacent data flow point cutpoints preset, The data block greatest length i.e. preset.Judge some pzCorresponding window Wz[pz-Az, pz+Bz] In at least partly data whether meet predetermined condition Cz, wherein, z is integer, 1≤z≤M, (pz -Az) and (pz+Bz) represent window W respectivelyzTwo borders.When judging that any one puts pz Window Wz[pz-Az, pz+BzIn], at least part of data are unsatisfactory for predetermined condition Cz, then from It is unsatisfactory for the window W of predetermined conditionz[pz-Az, pz+Bz] corresponding some pzSplit along data stream The point search direction N number of byte of jump, N≤‖ Bz‖+maxx(‖Ax‖+‖(k-px)‖).Its In, ‖ (k-px) ‖ represent M some pxIn any one point with potential cut-point k between distance, maxx(‖Ax‖+‖(k-px) ‖) represent M some pxIn any one point with potential cut-point k Between distance and A corresponding to this pointxThe maximum of absolute value sum;‖Bz‖ represents Wz [pz-Az, pz+BzB in]zAbsolute value, embodiment below will specifically be introduced N value Principle.As each window W judged in M windowx[px-Ax, px+BxIn] at least partly Data meet predetermined condition Cx, the most potential cut-point k is data flow point cutpoints.
Concrete, to current potential cut-point ki, according to described rule, perform following steps:
Step 401: be current potential cut-point k according to described ruleiDetermine a pizAnd described point pizCorresponding window Wiz[piz-Az,piz+Bz], i and z is integer, and 1≤z≤M;
Step 402: judge described window Wiz[piz-Az,piz+BzIn], at least part of data are the fullest Foot predetermined condition Cz
As described window Wiz[piz-Az,piz+BzIn], at least part of data are unsatisfactory for described predetermined bar Part Cz, from described some pizAlong described data flow point cutpoint search direction jump N number of data stream segmentation Point minimum searches unit U, and N*U is not more than ‖ Bz‖+maxx(‖Ax‖+‖(ki-pix) ‖), Obtain new potential cut-point, perform step 401;
As described current potential cut-point kiM window in each window Wix[pix-Ax, pix+BxIn], at least part of data meet predetermined condition Cx, the most described current potential cut-point kiFor Data flow point cutpoint.
Further, described rule also includes: at least two point peAnd pf, meet condition Ae=Af, Be=Bf, Ce=Cf
Described rule also includes: described at least two point peAnd pf, relative to described potential point Cutpoint k, searches in the reverse direction at described data flow point cutpoint.
Described rule also includes: described at least two point peAnd pfBetween distance be 1 U.
Judge described window Wiz[piz-Az,piz+BzIn], whether at least part of data meet described pre- Fixed condition Cz, specifically include:
Random function is used to judge described window Wiz[piz-Az,piz+BzIn], at least part of data are No meet described predetermined condition Cz
Described use random function judges described window Wiz[piz-Az,piz+BzAt least partly count in] According to whether meeting described predetermined condition Cz, it is specially and uses hash function to judge described window Wiz [piz-Az,piz+BzIn], whether at least part of data meet described predetermined condition Cz
As described window Wiz[piz-Az,piz+BzIn], at least part of data are unsatisfactory for described predetermined bar Part Cz, from described some pizAlong described data flow point cutpoint search direction jump N number of data stream segmentation Point minimum searches unit U, it is thus achieved that described new potential cut-point, according to described rule, for institute State the some p that new potential cut-point determinesicCorresponding window Wic[pic-Ac,pic+Bc] the left side Boundary and described window Wiz[piz-Az,piz+Bz] right margin overlap or be described newly potential point The described some p that cutpoint determinesicCorresponding described window Wic[pic-Ac,pic+Bc] left margin position In described window Wiz[piz-Az,piz+BzWithin the scope of];Wherein, for described new potential segmentation The described some p that point determinesicIt is according to described rule, the M determined for described new potential cut-point The sequence that individual point obtains according to data stream search direction sorts first point.
The embodiment of the present invention at least partly counts in some window in M window by judging According to whether meeting predetermined condition, search data flow point cutpoint, when at least portion in some window Divided data is unsatisfactory for predetermined condition, then skip N*U length, and wherein, N*U is not more than ‖ Bz‖+maxx(‖Ax‖+‖(ki-pix) ‖), it is thus achieved that next potential cut-point, improve Data flow point cutpoint search efficiency.
During data de-duplication, for ensureing that data block size is uniform, average can be considered According to block (also referred to as average piecemeal) size, i.e. meeting minimum data block size and maximum data While block size limits, can determine whether average data block size, big to ensure the data block obtained Little uniformly.Point pxNumber M and some pxCorresponding window Wx[px-Ax, px+BxAt least portion in] Divided data meets predetermined condition CxProbability, the two factor determines and finds data flow point cutpoint Probability (representing with P (n)).The former affects the length of jump, and the latter affects the probability of jump, The two joint effect average mark block size.It is said that in general, when average mark block size is fixed, point pxNumber M increases, then a single point pxCorresponding window Wx[px-Ax, px+BxAt least portion in] Divided data meets predetermined condition CxProbability also increase, such as on duplicate removal server 103 preset Rule be: determine 11 some p for potential cut-point kx, it is natural that x is respectively 1 to 11 continuous print Number, any one some p in 11 pointsxCorresponding window Wx[px-Ax, px+BxIn] at least partly Data meet predetermined condition CxProbability be 1/2.And another preset on duplicate removal server 103 Group rule is: 24 the some p selected for potential cut-point kx, x is respectively 1 to 24 continuous print certainly So number, any one some p in 24 pointsxCorresponding window Wx[px-Ax, px+BxAt least portion in] Divided data meets predetermined condition CxProbability 3/4.Concrete window Wx[px-Ax, px+BxIn] extremely Small part data meet predetermined condition CxProbability set and can be found in and judge window Wx[px-Ax, px +BxIn], whether at least part of data meet predetermined condition CxThe description of part.Point pxNumber M with Point pxCorresponding window Wx[px-Ax, px+BxIn], at least part of data meet predetermined condition Cx's Probability the two factor determine P (n), P (n) represent: from data stream original position/data Flow point cutpoint is searched after n data flow point cutpoint minimum searches unit and is not found data flow point cutpoint Probability.The calculating process of P (n), actually one multistep is determined about the two factor Long Fibonacci ordered series of numbers, after will be described in detail.After obtaining P (n), 1-P (n) is data The distribution function of flow point cutpoint, (1-P (n))-(1-P (n-1))=P (n-1)-P (n), it is n-th Point finds the probability of data flow point cutpoint, the namely density function of data flow point cutpoint, according to The density function of data flow point cutpoint just can be with integrationThus try to achieve The desired length of data flow point cutpoint, i.e. average mark block size, wherein, 4*1024 (byte) Representing minimum data block length, 12*1024 (byte) represents maximum data block length.
On the basis of data flow point cutpoint as shown in Figure 3 is searched, at the embodiment shown in Fig. 5 In, duplicate removal server 103 is preset with rule, described rule is: true for potential cut-point k Fixed 11 some px, some pxCorresponding window Wx[px-Ax,px+Bx] (it is called for short window Wx) and window Mouth Wx[px-Ax,px+Bx] corresponding predetermined condition Cx, wherein, A1=A2=A3=A4=A5=A6=A7 =A8=A9=A10=A11=169, B1=B2=B3=B4=B5=B6=B7=B8=B9=B10=B11=0, and C1=C2=C3=C4=C5=C6=C7=C8=C9=C10=C11.Wherein, some pxWith potential cut-point Spacing d of kxIndividual byte, concrete, put p1With 0 byte of spacing of potential cut-point k, Point p2With 1 byte of spacing of potential cut-point k, put p3Spacing with potential cut-point k 2 bytes, put p4With 3 bytes of spacing of potential cut-point k, put p5With potential cut-point 4 bytes of the spacing of k, put p6With 5 bytes of spacing of potential cut-point k, put p7With 6 bytes of the spacing of potential cut-point k, put p87 words of spacing with potential cut-point k Joint, puts p9With 8 bytes of spacing of potential cut-point k, put p10And between potential cut-point k 9 bytes of distance, put p11With 10 bytes of spacing of potential cut-point k, and put p2、p3、 p4、p5、p6、p7、p8、p9、p10And p11It is respectively positioned on data relative to potential cut-point k Flow point cutpoint searches opposite direction.kaFor data flow point cutpoint, the cutpoint of data flow point shown in Fig. 5 Search direction is from left to right, from data flow point cutpoint kaAfter skipping minimum data block 4KB, Small data block 4KB end position is as next potential cut-point ki, for potential cut-point kiReally Fixed point pix, in the present embodiment, according to the rule preset on duplicate removal server 103, x is respectively It is 1 to 11 continuous print natural numbers.In the embodiment shown in Fig. 5, for potential cut-point kiReally Fixed point is 11, respectively pi1、pi2、pi3、pi4、pi5、pi6、pi7、pi8、pi9、 pi10And pi11, put pi1、pi2、pi3、pi4、pi5、pi6、pi7、pi8、pi9、pi10And pi11Right The window answered is respectively Wi1[pi1-169,pi1]、Wi2[pi2-169,pi2]、Wi3[pi3-169,pi3]、Wi4 [pi4-169,pi4]、Wi5[pi5-169,pi5]、Wi6[pi6-169,pi6]、Wi7[pi7-169,pi7]、Wi8 [pi8-169,pi8]、Wi9[pi9-169,pi9]、Wi10[pi10-169,pi10] and Wi11[pi11-169,pi11]。 Above-mentioned window is briefly referred to as Wi1、Wi2、Wi3、Wi4、Wi5、Wi6、Wi7、Wi8、Wi9、 Wi10And Wi11.Wherein, some pixWith potential cut-point kiSpacing dxIndividual byte, concrete, pi1With kiSpacing 0 byte, pi2With kiSpacing 1 byte, pi3With kiSpacing 2 bytes, pi4 With kiSpacing 3 bytes, pi5With kiSpacing 4 bytes, pi6With kiSpacing 5 bytes, pi7With kiSpacing 6 bytes, pi8With kiSpacing 7 bytes, pi9With kiSpacing 8 bytes, pi10With ki 9 bytes of spacing, pi11With ki10 bytes of spacing, and pi2、pi3、pi4、pi5、pi6、 pi7、pi8、pi9、pi10And pi11Relative to potential cut-point kiIt is respectively positioned on data flow point cutpoint to look into Look for opposite direction.Judge Wi1[pi1-169,pi1In], whether at least part of data meet predetermined condition C1、 Judge Wi2[pi2-169,pi2In], whether at least part of data meet predetermined condition C2, judge Wi3 [pi3-169,pi3In], whether at least part of data meet predetermined condition C3, judge Wi4[pi4-169, pi4In], whether at least part of data meet predetermined condition C4, judge Wi5[pi5-169,pi5In] at least Whether part data meet predetermined condition C5, judge Wi6[pi6-169,pi6At least part of data in] Whether meet predetermined condition C6, judge Wi7[pi7-169,pi7In], whether at least part of data meet Predetermined condition C7, judge Wi8[pi8-169,pi8In], whether at least part of data meet predetermined condition C8, judge Wi9[pi9-169,pi9In], whether at least part of data meet predetermined condition C9, judge Wi10[pi10-169,pi10In], whether at least part of data meet predetermined condition C10With judge Wi11[pi11 -169,pi11In], whether at least part of data meet predetermined condition C11.When judging window Wi1In extremely Small part data meet predetermined condition C1, window Wi2In at least partly data meet predetermined condition C2, window Wi3In at least partly data meet predetermined condition C3, window Wi4In at least partly count According to meeting predetermined condition C4, window Wi5In at least partly data meet predetermined condition C5, window Wi6 In at least partly data meet predetermined condition C6, window Wi7In at least partly data meet predetermined Condition C7, window Wi8In at least partly data meet predetermined condition C8, window Wi9In at least portion Divided data meets predetermined condition C9, window Wi10In at least partly data meet predetermined condition C10With Window Wi11In at least partly data meet predetermined condition C11Time, the most current potential cut-point kiFor Data flow point cutpoint.When data at least part of in any one window in 11 windows are unsatisfactory for correspondence Predetermined condition time, as shown in Figure 6, Wi5[pi5-169,pi5In], at least part of data are unsatisfactory for Corresponding predetermined condition C5, then from a pi5Along the data flow point cutpoint search direction N number of word of jump Joint, the most N number of byte is not more than ‖ B5‖+maxx(‖Ax‖+‖(ki-pix) ‖), at Fig. 6 In shown embodiment, N number of byte of jumping is not more than 179 bytes, in the present embodiment, N=11, obtains next potential cut-point, for potential cut-point kiDifference, here by new Potential cut-point is expressed as kj.According in the embodiment shown in Fig. 5 at duplicate removal server 103 The upper rule preset, for potential cut-point kjThe point determined is 11, respectively pj1、pj2、pj3、 pj4、pj5、pj6、pj7、pj8、pj9、pj10And pj11, determine a pj1、pj2、pj3、pj4、pj5、 pj6、pj7、pj8、pj9、pj10And pj11Corresponding window is respectively Wj1[pj1-169,pj1]、Wj2[pj2 -169,pj2]、Wj3[pj3-169,pj3]、Wj4[pj4-169,pj4]、Wj5[pj5-169,pj5]、Wj6[pj6 -169,pj6]、Wj7[pj7-169,pj7]、Wj8[pj8-169,pj8]、Wj9[pj9-169,pj9]、 Wj10[pj10-169,pj10] and Wj11[pj11-169,pj11].Wherein, pjxWith potential cut-point kjIt Spacing dxIndividual byte, concrete, pj1With kjSpacing 0 byte, pj2With kj1 word of spacing Joint, pj3With kjSpacing 2 bytes, pj4With kjSpacing 3 bytes, pj5With kj4 words of spacing Joint, pj6With kjSpacing 5 bytes, pj7With kjSpacing 6 bytes, pj8With kj7 words of spacing Joint, pj9With kjSpacing 8 bytes, pj10With kj9 bytes of spacing, pj11With kjSpacing 10 Byte, and pj1、pj2、pj3、pj4、pj5、pj6、pj7、pj8、pj9、pj10And pj11Relatively In potential cut-point kjIt is respectively positioned on data flow point cutpoint and searches opposite direction.Embodiment party as shown in Figure 6 In formula, when for potential cut-point kjThe 11st the window W determinedj11[pj11-169,pj11], protecting Demonstrate,prove potential cut-point kiWith potential cut-point kjBetween scope all within determination range, then exist In present embodiment, it is necessary to assure window Wj11[pj11-169,pj11] left margin and Wi5[pi5 -169,pi5] right margin pi5Overlap or be positioned at Wi5[pi5-169,pi5Within the scope of], wherein, institute State potential cut-point kjThe point p determinedj11It is according to described rule, for described potential cut-point kj M the point determined is according to the point of sequence first in the sequence of data stream search direction acquisition.Therefore, In this restriction, work as Wi5[pi5-169,pi5In], at least part of data are unsatisfactory for predetermined condition C5, From pi5The distance jumped along data flow point cutpoint search direction is no more than ‖ B5‖+maxx (‖Ax‖+‖(ki-pix) ‖), wherein, M=11,11*U are not more than maxx(‖Ax‖+‖(ki -pix) ‖), therefore, from pi5The distance jumped along data flow point cutpoint search direction is little In 179.Judge Wj1[pj1-169,pj1In], whether at least part of data meet predetermined condition C1、 Judge Wj2[pj2-169,pj2In], whether at least part of data meet predetermined condition C2, judge Wj3 [pj3-169,pj3In], whether at least part of data meet predetermined condition C3, judge Wj4[pj4-169, pj4In], whether at least part of data meet predetermined condition C4, judge Wj5[pj5-169,pj5In] extremely Whether small part data meet predetermined condition C5, judge Wj6[pj6-169,pj6In] at least partly Whether data meet predetermined condition C6, judge Wj7[pj7-169,pj7In], at least part of data are No meet predetermined condition C7, judge Wj8[pj8-169,pj8In], whether at least part of data meet Predetermined condition C8, judge Wj9[pj9-169,pj9In], whether at least part of data meet predetermined bar Part C9, judge Wj10[pj10-169,pj10In], whether at least part of data meet predetermined condition C10 With judge Wj11[pj11-169,pj11In], whether at least part of data meet predetermined condition C11.When The most in embodiments of the present invention, it is judged that potential cut-point kaWhether it is also to abide by during data flow point cutpoint Follow this rule, implement and no longer describe, be referred to judge potential cut-point kiDescription. When judging window Wj1In at least partly data meet predetermined condition C1, window Wj2In at least partly Data meet predetermined condition C2, window Wj3In at least partly data meet predetermined condition C3, window Mouth Wj4In at least partly data meet predetermined condition C4, window Wj5In at least partly data meet Predetermined condition C5, window Wj6In at least partly data meet predetermined condition C6, window Wj7In extremely Small part data meet predetermined condition C7, window Wj8In at least partly data meet predetermined condition C8, window Wj9In at least partly data meet predetermined condition C9, window Wj10In at least partly count According to meeting predetermined condition C10With window Wj11In at least partly data meet predetermined condition C11Time, then Current potential cut-point kjFor data flow point cutpoint, kjWith kaBetween data constitute 1 data Block, simultaneously according to kaIdentical mode skips minimum piecemeal size 4KB, it is thus achieved that next latent At cut-point, and according to the rule preset on duplicate removal server 103, it is judged that next potential Whether cut-point is data flow point cutpoints.When judging potential cut-point kjIt not data flow point cutpoints Time, according to kiJump 11 bytes of identical mode obtain next potential cut-points, and press Impinge upon the rule preset on duplicate removal server 103 and said method judges next potential cut-point Whether it is data flow point cutpoints.When the maximum data block exceeding setting does not the most find data stream During cut-point, then from the end position of maximum data block as force-splitting point.
In the embodiment shown in Fig. 5, according to the rule preset on duplicate removal server 103, From judging Wi1[pi1-169,pi1In], whether at least part of data meet predetermined condition C1Start, when Judge Wi1[pi1-169,pi1In], at least part of data meet predetermined condition C1, judge Wi2[pi2 -169,pi2In], at least part of data meet predetermined condition C2, judge Wi3[pi3-169,pi3In] At least partly data meet predetermined condition C3With judge Wi4[pi4-169,pi4At least partly count in] According to meeting predetermined condition C4, it is judged that Wi5[pi5-169,pi5In], at least part of data are unsatisfactory for making a reservation for Condition C5Time, from a pi5Jump 10 bytes along data flow point cutpoint search direction, the The end position of 10 bytes obtains new potential cut-point, for distinguishing with other potential cut-points, Here shown as kg, according to the rule preset on duplicate removal server 103, for potential cut-point kg Determine 11 some pgx, x is respectively 1 to 11 continuous print natural number, respectively pg1、pg2、pg3、 pg4、pg5、pg6、pg7、pg8、pg9、pg10And pg11, determine a pg1、pg2、pg3、pg4、 pg5、pg6、pg7、pg8、pg9、pg10And pg11Corresponding window is respectively Wg1[pg1-169,pg1]、 Wg2[pg2-169,pg2]、Wg3[pg3-169,pg3]、Wg4[pg4-169,pg4]、Wg5[pg5-169, pg5]、Wg6[pg6-169,pg6]、Wg7[pg7-169,pg7]、Wg8[pg8-169,pg8]、Wg9[pg9 -169,pg9]、Wg10[pg10-169,pg10] and Wg11[pg11-169,pg11].Wherein, pgxWith latent At cut-point kgSpacing dxIndividual byte, concrete, pg1With kgSpacing 0 byte, pg2With kgSpacing 1 byte, pg3With kgSpacing 2 bytes, pg4With kgSpacing 3 bytes, pg5With kgSpacing 4 bytes, pg6With kgSpacing 5 bytes, pg7With kgSpacing 6 bytes, pg8With kgSpacing 7 bytes, pg9With kgSpacing 8 bytes, pg10With kg9 bytes of spacing, pg11With kg10 bytes of spacing, and pg2、pg3、pg4、pg5、pg6、pg7、pg8、pg9、pg10 And pg11Relative to potential cut-point kgIt is respectively positioned on data flow point cutpoint and searches opposite direction.Judge Wg1 [pg1-169,pg1In], whether at least part of data meet predetermined condition C1, judge Wg2[pg2-169, pg2In], whether at least part of data meet predetermined condition C2, judge Wg3[pg3-169,pg3In] extremely Whether small part data meet predetermined condition C3, judge Wg4[pg4-169,pg4In] at least partly Whether data meet predetermined condition C4, judge Wg5[pg5-169,pg5In], at least part of data are No meet predetermined condition C5, judge Wg6[pg6-169,pg6In], whether at least part of data meet Predetermined condition C6, judge Wg7[pg7-169,pg7In], whether at least part of data meet predetermined bar Part C7, judge Wg8[pg8-169,pg8In], whether at least part of data meet predetermined condition C8、 Judge Wg9[pg9-169,pg9In], whether at least part of data meet predetermined condition C9, judge Wg10[pg10-169,pg10In], whether at least part of data meet predetermined condition C10With judge Wg11 [pg11-169,pg11In], whether at least part of data meet predetermined condition C11.Therefore, potential point Cutpoint kgCorresponding some pg11With potential cut-point kiCorresponding some pi5Overlap, and put pg11Right The window W answeredg11[pg11-169,pg11] and some pi5Corresponding window Wi5[pi5-169,pi5] overlap, And C5=C11, therefore, to as potential cut-point ki, when judging Wi5[pi5-169,pi5In] at least Part data are unsatisfactory for predetermined condition C5Time, from a pi5Along data flow point cutpoint search direction Jump 10 bytes, it is thus achieved that potential cut-point kgStill do not meet as data flow point cutpoint Condition.Therefore, if from a pi5Along 10 the byte meetings of jump of data flow point cutpoint search direction There is double counting, from a pi5Permissible along 11 bytes of data flow point cutpoint search direction jump Reduce double counting, in hgher efficiency.Therefore improve the speed searching data flow point cutpoint.When Preset rules midpoint pxCorresponding window Wx[px-Ax,px+BxIn], at least part of data meet pre- Fixed condition CxProbability when being 1/2, i other words perform jump with the probability of 1/2, the most at most may be used With 179 bytes of jumping.
In the present embodiment, pre-defined rule is: determine 11 some p for potential cut-point kx, point pxCorresponding window Wx[px-Ax,px+Bx] and window Wx[px-Ax,px+Bx] corresponding predetermined bar Part Cx, x is respectively 1 to 11 continuous print natural numbers, wherein, puts pxCorresponding window Wx[px-Ax, px+BxThe probability that in], at least part of data meet predetermined condition is 1/2, by the two factor P (n) can be calculated.And A1=A2=A3=A4=A5=A6=A7=A8=A9=A10=A11=169, B1=B2 =B3=B4=B5=B6=B7=B8=B9=B10=B11=0, and C1=C2=C3=C4=C5=C6=C7= C8=C9=C10=C11, wherein, pxSpacing d with potential cut-point kxIndividual byte, specifically , p1With 0 byte of spacing of potential cut-point k, p2With 1 byte of spacing of k, p3 With 2 bytes of spacing of k, p4With 3 bytes of spacing of k, p5Spacing 4 with k Byte, p6With 5 bytes of spacing of k, p7With 6 bytes of spacing of k, p8And between k 7 bytes of distance, p9With 8 bytes of spacing of k, p10With 9 bytes of spacing of k, p11 With 10 bytes of spacing of k, and p2、p3、p4、p5、p6、p7、p8、p9、p10 And p11It is respectively positioned on data flow point cutpoint relative to potential cut-point k and searches opposite direction.The most whether There are at least part of data in each window in continuous 11 some correspondence windows and be satisfied by pre- Fixed condition CxJust determine whether potential cut-point k is data flow point cutpoints.From data stream start bit Put/a upper data flow point cutpoint jumps after minimum 4096 bytes of piecemeal length, to data flow point Cutpoint searches 10 bytes of opposite direction rollback, finds the 4086th point, the most there is not number According to flow point cutpoint, so P (4086)=1, the like, P (4087)=1 ... P (4095) =1.At the 4096th point, i.e. at minimum piecemeal size, with the probability of (1/2) ^11 this In the window that 11 points are corresponding, in each window, at least partly data meet predetermined condition Cx, because of There is data flow point cutpoint with the probability of (1/2) ^11 in this, with the probability of 1-(1/2) ^11 not There is data flow point cutpoint, so P (11)=1-(1/2) ^11.
At n-th, 12 kinds of situations can be divided into carry out recursion P (n).
In 1: the n-th corresponding window of situation, at least part of data are unsatisfactory for the probability of 1/2 Predetermined condition, now n-1 point before n-th does not exist with the probability of P (n-1) continuously Window corresponding to 11 points in each window at least partly data meet predetermined bar respectively Part, therefore P (n) comprises 1/2*P (n-1).In n-th corresponding window at least partly There are 11 points of continuous print in n-1 the point that data are unsatisfactory for before predetermined condition, and at n-th In corresponding window, in each window, at least partly data meet the situation of predetermined condition respectively Unrelated with P (n).
In 2: the n-th corresponding windows of situation, at least part of data meet pre-with the probability of 1/2 Fixed condition, in the window that (n-1)th point is corresponding, at least partly data are unsatisfactory for pre-with the probability of 1/2 Fixed condition, now (n-1)th some n-2 point above does not exist with the probability of P (n-2) continuously Window corresponding to 11 points in each window at least partly data meet predetermined bar respectively Part, therefore P (n) comprises 1/2*1/2*P (n-2).At least portion in n-th corresponding window Divided data meets predetermined condition, and in the window that (n-1)th point is corresponding, at least partly data are unsatisfactory for Predetermined condition, and there is the window that 11 points of continuous print are corresponding in n-2 the point that (n-1)th point is above In Kou, in each window, at least part of data meet situation and P (n) nothing of predetermined condition respectively Close.
According to foregoing description, the window that situation 11: the n-th to n-9 point is corresponding at least partly counts The probability of (1/2) ^10 meets predetermined condition according to this, in the (n-1)th 0 windows that point is corresponding at least Part data are unsatisfactory for predetermined condition with the probability of 1/2, now the (n-1)th 0 some n-11 above Each window in the window that 11 points of continuous print are corresponding is there is not in individual point with the probability of P (n-11) In Kou, at least part of data meet predetermined condition respectively, and therefore P (n) comprises (1/2) ^10*1/2*P (n-11).In the window of the n-th to n-9 some correspondence, at least partly data are satisfied by predetermined condition, In the (n-1)th 0 windows that point is corresponding, at least partly data are unsatisfactory for predetermined condition, and the (n-1)th 0 Individual point above n-11 point exists in the window that 11 points of continuous print are corresponding in each window The situation that at least part of data meet predetermined condition respectively is unrelated with P (n).
In the window that 12: the n-th to n-10 point of situation is corresponding, at least part of data are with (1/2) ^11 Probability meet predetermined condition, this situation is unrelated with P (n).
Therefore, P (n)=1/2*P (n-1)+(1/2) ^2*P (n-2)+...+(1/2) ^11*P(n-11).Another kind of preset rules: determine 24 some p for potential cut-point kx, point pxCorresponding window Wx[px-Ax,px+Bx] and window Wx[px-Ax,px+Bx] corresponding predetermined bar Part Cx, x is respectively 1 to 24 continuous print natural numbers, wherein, puts pxCorresponding window Wx[px-Ax, px+BxIn], at least part of data meet predetermined condition CxProbability be 3/4, by the two because of Element can calculate P (n).And A1=A2=A3=A4=A5=A6=A7=A8=A9=A10=A11=169, B1 =B2=B3=B4=B5=B6=B7=B8=B9=B10=B11=0, and C1=C2=C3=C4=C5=C6= C7=C8=C9=...=C22=C23=C24, wherein, pxSpacing d with potential cut-point kxIndividual Byte, concrete, p1With 0 byte of spacing of potential cut-point k, p2Spacing with k 1 byte, p3With 2 bytes of spacing of k, p4With 3 bytes of spacing of k, p5With k it 4 bytes of spacing, p6With 5 bytes of spacing of k, p7With 6 bytes of spacing of k, p8With 7 bytes of spacing of k, p9With 8 bytes of spacing of k ... p22Spacing with k 21 bytes, p23With 22 bytes of spacing of k, p24With 23 bytes of spacing of k, and p2、p3、p4、p5、p6、p7、p8、p9…p22、p23And p24Relative to potential segmentation Point k is respectively positioned on data flow point cutpoint and searches opposite direction.The most whether there are continuous 24 some correspondences In each window in window, at least part of data are satisfied by predetermined condition CxJust determine potential Whether cut-point k is data flow point cutpoints, can be calculated by equation below:
P (4073)=1, P (4074)=1 ... P (, 4095)=1, P (4096)=1- (3/4) ^24,
P (n)=1/4*P (n-1)+1/4* (3/4) * P (n-2)+...+1/4* (3/4) ^23*P(n-24)。
Through calculating, P (5*1024)=0.78, P (11*1024)=0.17, P (12*1024)=0.13, I.e. from data stream original position/a data flow point cutpoint find after 12KB the probability with 13% Do not find data flow point cutpoint yet, force to split.By this probability, try to achieve data stream The density function of cut-point, through integration try to achieve about averagely from data stream original position/on One data flow point cutpoint finds data flow point cutpoint when searching 7.6KB, i.e. average mark block length is big It is about 7.6KB.In the window corresponding with 11 points of continuous print at least part of data with 1/2 probability Meeting predetermined condition different, tradition CDC algorithm uses a window to meet with the probability of 1/2^12 During condition, the effect of average mark block length 7.6KB can be reached.
On the basis of the data flow point cutpoint shown in Fig. 3 is searched, at the embodiment shown in Fig. 7 In, duplicate removal server 103 is preset with rule, described rule is: true for potential cut-point k Fixed 11 some px, some pxCorresponding window Wx[px-Ax,px+Bx] and window Wx[px-Ax,px+Bx] Corresponding predetermined condition Cx, x is respectively 1 to 11 continuous print natural numbers, wherein, puts pxCorresponding Window Wx[px-Ax,px+BxIn], at least part of data meet predetermined condition CxProbability be 1/2, and A1=A2=A3=A4=A5=A6=A7=A8=A9=A10=A11=169, B1=B2=B3=B4=B5 =B6=B7=B8=B9=B10=B11=0, and C1=C2=C3=C4=C5=C6=C7=C8=C9=C10= C11, wherein, pxSpacing d with potential cut-point kxIndividual byte, concrete, p1With potential 2 bytes of the spacing of cut-point k, p2With 3 bytes of spacing of k, p3Spacing with k 4 bytes, p4With 5 bytes of spacing of k, p5With 6 bytes of spacing of k, p6With k it 7 bytes of spacing, p7With 8 bytes of spacing of k, p8With 9 bytes of spacing of k, p9With 10 bytes of spacing of k, p10With 1 byte of spacing of k, p11Spacing with k 0 byte, and p1、p2、p3、p4、p5、p6、p7、p8、p9And p10Relative to latent It is respectively positioned on data flow point cutpoint at cut-point k and searches opposite direction.kaFor data flow point cutpoint, figure The cutpoint search direction of data flow point shown in 7 is from left to right, from data flow point cutpoint kaSkip After minimum data block 4KB, at minimum data block 4KB end position as next potential segmentation Point ki, for potential cut-point kiDetermine a pix, in the present embodiment, according at duplicate removal server The rule preset on 103, x is respectively 1 to 11 continuous print natural numbers.The embodiment party shown in Fig. 7 In formula, according to pre-defined rule, for potential cut-point kiThe point determined is 11, respectively pi1、 pi2、pi3、pi4、pi5、pi6、pi7、pi8、pi9、pi10And pi11, put pi1、pi2、pi3、 pi4、pi5、pi6、pi7、pi8、pi9、pi10And pi11Corresponding window is respectively Wi1[pi1-169, pi1]、Wi2[pi2-169,pi2]、Wi3[pi3-169,pi3]、Wi4[pi4-169,pi4]、Wi5[pi5-169, pi5]、Wi6[pi6-169,pi6]、Wi7[pi7-169,pi7]、Wi8[pi8-169,pi8]、Wi9[pi9-169, pi9]、Wi10[pi10-169,pi10] and Wi11[pi11-169,pi11].Wherein, some pixWith potential cut-point kiSpacing dixIndividual byte, concrete, pi1With kiSpacing 2 bytes, pi2With kiSpacing 3 Individual byte, pi3With kiSpacing 4 bytes, pi4With kiSpacing 5 bytes, pi5With kiSpacing 6 Byte, pi6With kiSpacing 7 bytes, pi7With kiSpacing 8 bytes, pi8With ki9 words of spacing Joint, pi9With kiSpacing 10 bytes, pi10With ki1 byte of spacing, pi11With ki0 word of spacing Joint, and pi1、pi2、pi3、pi4、pi5、pi6、pi7、pi8、pi9And pi10Relative to latent At cut-point kiIt is respectively positioned on data flow point cutpoint and searches opposite direction.Judge Wi1[pi1-169,pi1In] At least partly whether data meet predetermined condition C1, judge Wi2[pi2-169,pi2In] at least partly Whether data meet predetermined condition C2, judge Wi3[pi3-169,pi3In], whether at least part of data Meet predetermined condition C3, judge Wi4[pi4-169,pi4In], whether at least part of data meet predetermined Condition C4, judge Wi5[pi5-169,pi5In], whether at least part of data meet predetermined condition C5、 Judge Wi6[pi6-169,pi6In], whether at least part of data meet predetermined condition C6, judge Wi7 [pi7-169,pi7In], whether at least part of data meet predetermined condition C7, judge Wi8[pi8-169, pi8In], whether at least part of data meet predetermined condition C8, judge Wi9[pi9-169,pi9In] extremely Whether small part data meet predetermined condition C9, judge Wi10[pi10-169,pi10At least partly count in] According to whether meeting predetermined condition C10With judge Wi11[pi11-169,pi11In], whether at least part of data Meet predetermined condition C11.When judging window Wi1In at least partly data meet predetermined condition C1、 Window Wi2In at least partly data meet predetermined condition C2, window Wi3In at least partly data full Foot predetermined condition C3, window Wi4In at least partly data meet predetermined condition C4, window Wi5In At least partly data meet predetermined condition C5, window Wi6In at least partly data meet predetermined bar Part C6, window Wi7In at least partly data meet predetermined condition C7, window Wi8In at least partly Data meet predetermined condition C8, window Wi9In at least partly data meet predetermined condition C9, window Mouth Wi10In at least partly data meet predetermined condition C10With window Wi11In at least partly data full Foot predetermined condition C11Time, the most current potential cut-point kiFor data flow point cutpoint.When 11 windows In time at least partly data are unsatisfactory for the predetermined condition of correspondence in any one window, as shown in Figure 8, Wi3[pi3-169,pi3In], at least part of data are unsatisfactory for predetermined condition C3, put pi3Along data stream It is described as a example by cut-point search direction 11 bytes of jump.As shown in Figure 8, when judging W3No When meeting predetermined condition, with p3For starting point, along data flow point cutpoint search direction jump N Individual byte, the most N number of byte is not more than ‖ B3‖+maxx(‖Ax‖+‖(ki-pix) ‖), In embodiment shown in Fig. 6, N number of byte of jumping, it is specially and is not more than 179 bytes, at this In embodiment, N=11, at the end position of the 11st byte, it is thus achieved that next potential segmentation Point, for potential cut-point kiDifference, is expressed as k by new potential cut-point herej, according to The rule preset on duplicate removal server 103, for potential cut-point kjThe point determined is 11, It is respectively pj1、pj2、pj3、pj4、pj5、pj6、pj7、pj8、pj9、pj10And pj11, determine a little pj1、pj2、pj3、pj4、pj5、pj6、pj7、pj8、pj9、pj10And pj11Corresponding window is respectively For Wj1[pj1-169,pj1]、Wj2[pj2-169,pj2]、Wj3[pj3-169,pj3]、Wj4[pj4-169, pj4]、Wj5[pj5-169,pj5]、Wj6[pj6-169,pj6]、Wj7[pj7-169,pj7]、Wj8[pj8 -169,pj8]、Wj9[pj9-169,pj9]、Wj10[pj10-169,pj10] and Wj11[pj11-169,pj11]。 Wherein, pjxWith potential cut-point kjSpacing dxIndividual byte, concrete, pj1With kjSpacing 2 Individual byte, pj2With kjSpacing 3 bytes, pj3With kjSpacing 4 bytes, pj4With kjSpacing 5 Individual byte, pj5With kjSpacing 6 bytes, pj6With kjSpacing 7 bytes, pj7With kjSpacing 8 Individual byte, pj8With kjSpacing 9 bytes, pj9With kjSpacing 10 bytes, pj10With kjSpacing 1 Individual byte, pj11With kj0 byte of spacing, and pj1、pj2、pj3、pj4、pj5、pj6、pj7、 pj8、pj9And pj10Relative to potential cut-point kjIt is respectively positioned on data flow point cutpoint and searches opposite direction. Judge Wj1[pj1-169,pj1In], whether at least part of data meet predetermined condition C1, judge Wj2 [pj2-169,pj2In], whether at least part of data meet predetermined condition C2, judge Wj3[pj3-169, pj3In], whether at least part of data meet predetermined condition C3, judge Wj4[pj4-169,pj4In] extremely Whether small part data meet predetermined condition C4, judge Wj5[pj5-169,pj5At least partly count in] According to whether meeting predetermined condition C5, judge Wj6[pj6-169,pj6In], whether at least part of data Meet predetermined condition C6, judge Wj7[pj7-169,pj7In], whether at least part of data meet pre- Fixed condition C7, judge Wj8[pj8-169,pj8In], whether at least part of data meet predetermined condition C8, judge Wj9[pj9-169,pj9In], whether at least part of data meet predetermined condition C9, sentence Disconnected Wj10[pj10-169,pj10In], whether at least part of data meet predetermined condition C10And judgement Wj11[pj11-169,pj11In], whether at least part of data meet predetermined condition C11.Certainly at this In inventive embodiments, it is judged that potential cut-point kaWhen whether being data flow point cutpoint former also in compliance with this Then, implement and no longer describe, be referred to judge potential cut-point kiDescription.Work as judgement Window Wj1In at least partly data meet predetermined condition C1, window Wj2In at least partly data full Foot predetermined condition C2, window Wj3In at least partly data meet predetermined condition C3, window Wj4In At least partly data meet predetermined condition C4, window Wj5In at least partly data meet predetermined bar Part C5, window Wj6In at least partly data meet predetermined condition C6, window Wj7In at least partly Data meet predetermined condition C7, window Wj8In at least partly data meet predetermined condition C8, window Mouth Wj9In at least partly data meet predetermined condition C9, window Wj10In at least partly data meet Predetermined condition C10With window Wj11In at least partly data meet predetermined condition C11Time, the most currently dive At cut-point kjFor data flow point cutpoint, kjWith kaBetween data constitute 1 data block, with Time according to kaIdentical mode skips minimum piecemeal size 4KB, it is thus achieved that next potential segmentation Point, and according to the rule preset on duplicate removal server 103, it is judged that next potential cut-point Whether it is data flow point cutpoints.When judging potential cut-point kjWhen not being data flow point cutpoint, press According to kiJump 11 bytes of identical mode obtain next potential cut-points, and according to going The rule preset on weight server 103 and said method judge that whether next potential cut-point is Data flow point cutpoint.When the maximum data block exceeding setting does not the most find data flow point cutpoint Time, then from the end position of maximum data block as force-splitting point.Certainly the enforcement of the method By maximum data block length and the size constraint of the file constituting this data stream, do not repeat them here.
On the basis of the data flow point cutpoint shown in Fig. 3 is searched, at the embodiment shown in Fig. 9 In, duplicate removal server 103 is preset with rule, described rule is: true for potential cut-point k Fixed 11 some px, some pxCorresponding window Wx[px-Ax,px+Bx] and window Wx[px-Ax,px+Bx] Corresponding predetermined condition Cx, wherein A1=A2=A3=A4=A5=A6=A7=A8=A9=A10=A11=169, B1=B2=B3=B4=B5=B6=B7=B8=B9=B10=B11=0, and C1=C2=C3=C4=C5=C6= C7=C8=C9=C10=C11.Wherein, pxSpacing d with potential cut-point kxIndividual byte, tool Body, p1With 3 bytes of spacing of potential cut-point k, p2With 2 bytes of spacing of k, p3With 1 byte of spacing of k, p4With 0 byte of spacing of k, p5Spacing 1 with k Individual byte, p6With 2 bytes of spacing of k, p7With 3 bytes of spacing of k, p8With k it 4 bytes of spacing, p9With 5 bytes of spacing of k, p10With 6 bytes of spacing of k, p11With 7 bytes of spacing of k, and p5、p6、p7、p8、p9、p10And p11Relative to Potential cut-point k is respectively positioned on data flow point cutpoint and searches opposite direction, p1、p2And p3Relative to latent It is respectively positioned on data flow point cutpoint search direction at cut-point k.kaFor data flow point cutpoint, Fig. 9 Shown in data flow point cutpoint search direction be from left to right, from data flow point cutpoint kaSkip After small data block 4KB, minimum data block 4KB end position is as next potential cut-point ki, For potential cut-point kiDetermine a pix, in the present embodiment, according on duplicate removal server 103 The rule preset, x is respectively 1 to 11 continuous print natural numbers.In the embodiment shown in Fig. 9, For potential cut-point kiThe point determined is 11, respectively pi1、pi2、pi3、pi4、pi5、pi6、 pi7、pi8、pi9、pi10And pi11, put pi1、pi2、pi3、pi4、pi5、pi6、pi7、pi8、 pi9、pi10And pi11Corresponding window is respectively Wi1[pi1-169,pi1]、Wi2[pi2-169,pi2]、Wi3 [pi3-169,pi3]、Wi4[pi4-169,pi4]、Wi5[pi5-169,pi5]、Wi6[pi6-169,pi6]、Wi7 [pi7-169,pi7]、Wi8[pi8-169,pi8]、Wi9[pi9-169,pi9]、Wi10[pi10-169,pi10] and Wi11 [pi11-169,pi11].Wherein, pixWith potential cut-point kiSpacing dxIndividual byte, concrete, pi1With kiSpacing 3 bytes, pi2With kiSpacing 2 bytes, pi3With kiSpacing 1 byte, pi4 With kiSpacing 0 byte, pi5With kiSpacing 1 byte, pi6With kiSpacing 2 bytes, pi7With kiSpacing 3 bytes, pi8With kiSpacing 4 bytes, pi9With kiSpacing 5 bytes, pi10With ki 6 bytes of spacing, pi11With ki7 bytes of spacing, and pi5、pi6、pi7、pi8、pi9、pi10 And pi11Relative to potential cut-point kiIt is respectively positioned on data flow point cutpoint and searches opposite direction, pi1、pi2With pi3Relative to potential cut-point kiIt is respectively positioned on data flow point cutpoint search direction.Judge Wi1[pi1 -169,pi1In], whether at least part of data meet predetermined condition C1, judge Wi2[pi2-169,pi2] In at least partly data whether meet predetermined condition C2, judge Wi3[pi3-169,pi3At least portion in] Whether divided data meets predetermined condition C3, judge Wi4[pi4-169,pi4In], at least part of data are No meet predetermined condition C4, judge Wi5[pi5-169,pi5In], whether at least part of data meet pre- Fixed condition C5, judge Wi6[pi6-169,pi6In], whether at least part of data meet predetermined condition C6、 Judge Wi7[pi7-169,pi7In], whether at least part of data meet predetermined condition C7, judge Wi8 [pi8-169,pi8In], whether at least part of data meet predetermined condition C8, judge Wi9[pi9-169, pi9In], whether at least part of data meet predetermined condition C9, judge Wi10[pi10-169,pi10In] extremely Whether small part data meet predetermined condition C10With judge Wi11[pi11-169,pi11In] at least partly Whether data meet predetermined condition C11.When judging window Wi1In at least partly data meet predetermined Condition C1, window Wi2In at least partly data meet predetermined condition C2, window Wi3In at least portion Divided data meets predetermined condition C3, window Wi4In at least partly data meet predetermined condition C4、 Window Wi5In at least partly data meet predetermined condition C5, window Wi6In at least partly data full Foot predetermined condition C6, window Wi7In at least partly data meet predetermined condition C7, window Wi8In At least partly data meet predetermined condition C8, window Wi9In at least partly data meet predetermined bar Part C9, window Wi10In at least partly data meet predetermined condition C10With window Wi11In at least portion Divided data meets predetermined condition C11Time, the most current potential cut-point kiFor data flow point cutpoint.When When in 11 windows, in any one window, at least part of data are unsatisfactory for the predetermined condition of correspondence, as Shown in Figure 10, Wi7[pi7-169,pi7In], at least part of data are unsatisfactory for the predetermined condition of correspondence, Then from a pi7Along the data flow point cutpoint search direction N number of byte of jump, the most N number of byte is not More than ‖ B4‖+maxx(‖Ax‖+‖(ki-pix) ‖), in the embodiment shown in Figure 10, Jump N number of byte, be specially and be not more than 179 bytes, in the present embodiment, specifically take N=8, Obtain new potential cut-point, for potential cut-point kiDifference, here by new potential segmentation Point is expressed as kj, according to the rule preset on duplicate removal server 103 in the embodiment shown in Fig. 9 Then, for potential cut-point kjThe point determined is 11, respectively pj1、pj2、pj3、pj4、pj5、 pj6、pj7、pj8、pj9、pj10And pj11, determine a pj1、pj2、pj3、pj4、pj5、pj6、pj7、 pj8、pj9、pj10And pj11Corresponding window is respectively Wj1[pj1-169,pj1]、Wj2[pj2-169, pj2]、Wj3[pj3-169,pj3]、Wj4[pj4-169,pj4]、Wj5[pj5-169,pj5]、Wj6[pj6-169, pj6]、Wj7[pj7-169,pj7]、Wj8[pj8-169,pj8]、Wj9[pj9-169,pj9]、Wj10[pj10 -169,pj10] and Wj11[pj11-169,pj11].Wherein, pjxWith potential cut-point kjSpacing dxIndividual byte, concrete, pj1With kjSpacing 3 bytes, pj2With kjSpacing 2 bytes, pj3 With kjSpacing 1 byte, pj4With kjSpacing 0 byte, pj5With kjSpacing 1 byte, pj6With kjSpacing 2 bytes, pj7With kjSpacing 3 bytes, pj8With kjSpacing 4 bytes, pj9With kj Spacing 5 bytes, pj10With kj6 bytes of spacing, pj11With kj7 bytes of spacing, and pj5、 pj6、pj7、pj8、pj9、pj10And pj11Relative to potential cut-point kjIt is respectively positioned on the segmentation of data stream Point searches opposite direction, pj1、pj2And pj3Relative to potential cut-point kjIt is respectively positioned on the segmentation of data stream Point search direction.Judge Wj1[pj1-169,pj1In], whether at least part of data meet predetermined condition C1, judge Wj2[pj2-169,pj2In], whether at least part of data meet predetermined condition C2, judge Wj3[pj3-169,pj3In], whether at least part of data meet predetermined condition C3, judge Wj4[pj4 -169,pj4In], whether at least part of data meet predetermined condition C4, judge Wj5[pj5-169,pj5] In at least partly data whether meet predetermined condition C5, judge Wj6[pj6-169,pj6In] at least Whether part data meet predetermined condition C6, judge Wj7[pj7-169,pj7At least partly count in] According to whether meeting predetermined condition C7, judge Wj8[pj8-169,pj8In], whether at least part of data Meet predetermined condition C8, judge Wj9[pj9-169,pj9In], whether at least part of data meet pre- Fixed condition C9, judge Wj10[pj10-169,pj10In], whether at least part of data meet predetermined bar Part C10With judge Wj11[pj11-169,pj11In], whether at least part of data meet predetermined condition C11。 The most in embodiments of the present invention, it is judged that potential cut-point kaWhen whether being data flow point cutpoint also Follow this principle, implement and no longer describe, be referred to judge potential cut-point kiDescription. When judging window Wj1In at least partly data meet predetermined condition C1, window Wj2In at least partly Data meet predetermined condition C2, window Wj3In at least partly data meet predetermined condition C3, window Mouth Wj4In at least partly data meet predetermined condition C4, window Wj5In at least partly data meet Predetermined condition C5, window Wj6In at least partly data meet predetermined condition C6, window Wj7In extremely Small part data meet predetermined condition C7, window Wj8In at least partly data meet predetermined condition C8, window Wj9In at least partly data meet predetermined condition C9, window Wj10In at least partly count According to meeting predetermined condition C10With window Wj11In at least partly data meet predetermined condition C11Time, then Current potential cut-point kjFor data flow point cutpoint, kjWith kaBetween data constitute 1 data Block, simultaneously according to kaIdentical mode skips minimum piecemeal size 4KB, it is thus achieved that next latent At cut-point, and according to the rule preset on duplicate removal server 103, it is judged that next potential Whether cut-point is data flow point cutpoints.When judging potential cut-point kjIt not data flow point cutpoints Time, according to kiJump 8 bytes of identical mode obtain next potential cut-points, and press Impinge upon the rule preset on duplicate removal server 103 and said method judges next potential cut-point Whether it is data flow point cutpoints.When the maximum data block exceeding setting does not the most find data stream During cut-point, then from the end position of maximum data block as force-splitting point.
On the basis of the data flow point cutpoint shown in Fig. 3 is searched, the embodiment party shown in Figure 11 In formula, being preset with rule on duplicate removal server 103, described rule is: for potential cut-point k Determine 11 some px, some pxCorresponding window Wx[px-Ax,px+Bx] and window Wx[px-Ax,px+ Bx] corresponding predetermined condition Cx, wherein A1=A2=A3=A4=A5=A6=A7=A8=A9=A10 =169, A11=182, B1=B2=B3=B4=B5=B6=B7=B8=B9=B10=B11=0, and C1=C2 =C3=C4=C5=C6=C7=C8=C9=C10≠C11.Wherein, pxSpacing with potential cut-point k From dxIndividual byte, concrete, p1With 0 byte of spacing of potential cut-point k, p2With k it 1 byte of spacing, p3With 2 bytes of spacing of k, p4With 3 bytes of spacing of k, p5With 4 bytes of spacing of k, p6With 5 bytes of spacing of k, p7Spacing 6 with k Individual byte, p8With 7 bytes of spacing of k, p9With 8 bytes of spacing of k, p10With k it 1 byte of spacing, p11With 3 bytes of spacing of k, and, p2、p3、p4、p5、 p6、p7、p8And p9It is respectively positioned on data flow point cutpoint relative to potential cut-point k and searches opposite direction, p10And p11It is respectively positioned on data flow point cutpoint search direction relative to potential cut-point k.kaFor number According to flow point cutpoint, the cutpoint search direction of data flow point shown in Figure 11 is from left to right, from data Flow point cutpoint kaAfter skipping minimum data block 4KB, minimum data block 4KB end position as under One potential cut-point ki, for potential cut-point kiDetermine a pix, in the present embodiment, according to The rule preset on duplicate removal server 103, x is respectively 1 to 11 continuous print natural numbers.At figure In embodiment shown in 11, for potential cut-point kiThe point determined is 11, respectively pi1、 pi2、pi3、pi4、pi5、pi6、pi7、pi8、pi9、pi10And pi11, put pi1、pi2、pi3、 pi4、pi5、pi6、pi7、pi8、pi9、pi10And pi11Corresponding window is respectively Wi1[pi1-169, pi1]、Wi2[pi2-169,pi2]、Wi3[pi3-169,pi3]、Wi4[pi4-169,pi4]、Wi5[pi5-169, pi5]、Wi6[pi6-169,pi6]、Wi7[pi7-169,pi7]、Wi8[pi8-169,pi8]、Wi9[pi9-169, pi9]、Wi10[pi10-169,pi10] and Wi11[pi11-182,pi11].Wherein, pixWith potential cut-point kiIt Spacing dxIndividual byte, concrete, pi1With kiSpacing 0 byte, pi2With ki1 word of spacing Joint, pi3With kiSpacing 2 bytes, pi4With kiSpacing 3 bytes, pi5With ki4 bytes of spacing, pi6With kiSpacing 5 bytes, pi7With kiSpacing 6 bytes, pi8With kiSpacing 7 bytes, pi9 With kiSpacing 8 bytes, pi10With ki1 byte of spacing, pi11With ki3 bytes of spacing, and pi2、pi3、pi4、pi5、pi6、pi7、pi8And pi9Relative to potential cut-point kiIt is respectively positioned on number Opposite direction, p is searched according to flow point cutpointi10And pi11Relative to potential cut-point kiIt is respectively positioned on data stream Cut-point search direction.Judge Wi1[pi1-169,pi1In], whether at least part of data meet predetermined Condition C1, judge Wi2[pi2-169,pi2In], whether at least part of data meet predetermined condition C2、 Judge Wi3[pi3-169,pi3In], whether at least part of data meet predetermined condition C3, judge Wi4 [pi4-169,pi4In], whether at least part of data meet predetermined condition C4, judge Wi5[pi5-169, pi5In], whether at least part of data meet predetermined condition C5, judge Wi6[pi6-169,pi6In] extremely Whether small part data meet predetermined condition C6, judge Wi7[pi7-169,pi7At least partly count in] According to whether meeting predetermined condition C7, judge Wi8[pi8-169,pi8In], at least part of data are the fullest Foot predetermined condition C8, judge Wi9[pi9-169,pi9In], whether at least part of data meet predetermined bar Part C9, judge Wi10[pi10-169,pi10In], whether at least part of data meet predetermined condition C10With Judge Wi11[pi11-169,pi11In], whether at least part of data meet predetermined condition C11.Work as judgement Window Wi1In at least partly data meet predetermined condition C1, window Wi2In at least partly data full Foot predetermined condition C2, window Wi3In at least partly data meet predetermined condition C3, window Wi4In At least partly data meet predetermined condition C4, window Wi5In at least partly data meet predetermined bar Part C5, window Wi6In at least partly data meet predetermined condition C6, window Wi7In at least partly Data meet predetermined condition C7, window Wi8In at least partly data meet predetermined condition C8, window Mouth Wi9In at least partly data meet predetermined condition C9, window Wi10In at least partly data meet Predetermined condition C10With window Wi11In at least partly data meet predetermined condition C11Time, the most currently dive At cut-point kiFor data flow point cutpoint.When judging window Wi11In at least partly data be unsatisfactory for Predetermined condition C11Time, then from potential cut-point kiAlong data flow point cutpoint search direction jump 1 Individual byte, obtains new potential cut-point, for potential cut-point kiDifference, here by new Potential cut-point is expressed as kj.Work as Wi1、Wi2、Wi3、Wi4、Wi5、Wi6、Wi7、Wi8、Wi9 And Wi10In 10 windows, in any one window, at least part of data are unsatisfactory for the predetermined condition of correspondence Time, as shown in figure 12, Wi4[pi4-169,pi4], then from a pi4Look into along data flow point cutpoint Looking for the N number of byte of direction jump, the most N number of byte is not more than ‖ B4‖+maxx(‖ Ax‖+‖(ki-pix) ‖), in the embodiment shown in Figure 12, N number of byte of jumping, specifically For no more than 179, in the present embodiment, specifically take N=9, obtain new potential cut-point, For with potential cut-point kiDifference, is expressed as k by new potential cut-point herej, according to Figure 11 The rule preset on duplicate removal server 103 in shown embodiment, for potential cut-point kj The point determined is 11, respectively pj1、pj2、pj3、pj4、pj5、pj6、pj7、pj8、pj9、pj10 And pj11, determine a pj1、pj2、pj3、pj4、pj5、pj6、pj7、pj8、pj9、pj10And pj11Right The window answered is respectively Wj1[pj1-169,pj1]、Wj2[pj2-169,pj2]、Wj3[pj3-169,pj3]、 Wj4[pj4-169,pj4]、Wj5[pj5-169,pj5]、Wj6[pj6-169,pj6]、Wj7[pj7-169, pj7]、Wj8[pj8-169,pj8]、Wj9[pj9-169,pj9]、Wj10[pj10-169,pj10] and Wj11 [pj8-182,pj8].Wherein, pjxWith potential cut-point kjSpacing dxIndividual byte, specifically , pj1With kjSpacing 0 byte, pj2With kjSpacing 1 byte, pj3With kj2 words of spacing Joint, pj4With kjSpacing 3 bytes, pj5With kjSpacing 4 bytes, pj6With kj5 words of spacing Joint, pj7With kjSpacing 6 bytes, pj8With kjSpacing 7 bytes, pj9With kj8 words of spacing Joint, pj10With kj1 byte of spacing, pj11With kj3 bytes of spacing, and pj2、pj3、pj4、 pj5、pj6、pj7、pj8And pj9Relative to potential cut-point kjIt is respectively positioned on data flow point cutpoint to search Opposite direction, pj10And pj11Relative to potential cut-point kjIt is respectively positioned on data flow point cutpoint search direction. Judge Wj1[pj1-169,pj1In], whether at least part of data meet predetermined condition C1, judge Wj2 [pj2-169,pj2In], whether at least part of data meet predetermined condition C2, judge Wj3[pj3-169, pj3In], whether at least part of data meet predetermined condition C3, judge Wj4[pj4-169,pj4In] extremely Whether small part data meet predetermined condition C4, judge Wj5[pj5-169,pj5At least partly count in] According to whether meeting predetermined condition C5, judge Wj6[pj6-169,pj6In], whether at least part of data Meet predetermined condition C6, judge Wj7[pj7-169,pj7In], whether at least part of data meet pre- Fixed condition C7, judge Wj8[pj8-169,pj8In], whether at least part of data meet predetermined condition C8, judge Wj9[pj9-169,pj9In], whether at least part of data meet predetermined condition C9, sentence Disconnected Wj10[pj10-169,pj10In], whether at least part of data meet predetermined condition C10And judgement Wj11[pj11-182,pj11In], whether at least part of data meet predetermined condition C11.Certainly at this In bright embodiment, it is judged that potential cut-point kaAlso in compliance with this principle when whether being data flow point cutpoint, Implement and no longer describe, be referred to judge potential cut-point kiDescription.When judging window Wj1In at least partly data meet predetermined condition C1, window Wj2In at least partly data meet pre- Fixed condition C2, window Wj3In at least partly data meet predetermined condition C3, window Wj4In at least Part data meet predetermined condition C4, window Wj5In at least partly data meet predetermined condition C5、 Window Wj6In at least partly data meet predetermined condition C6, window Wj7In at least partly data full Foot predetermined condition C7, window Wj8In at least partly data meet predetermined condition C8, window Wj9In At least partly data meet predetermined condition C9, window Wj10In at least partly data meet predetermined bar Part C10With window Wj11In at least partly data meet predetermined condition C11Time, the most current potential segmentation Point kjFor data flow point cutpoint, kjWith kaBetween data constitute 1 data block, simultaneously according to With kaIdentical mode skips minimum piecemeal size 4KB, it is thus achieved that next potential cut-point, and According to the rule preset on duplicate removal server 103, it is judged that whether next potential cut-point is Data flow point cutpoint.When judging potential cut-point kjWhen not being data flow point cutpoint, according to ki Identical mode obtains next potential cut-point, and presets according on duplicate removal server 103 Rule and said method judge whether next potential cut-point is data flow point cutpoints.When super Cross the maximum data block set when the most not finding data flow point cutpoint, then from maximum data block End position as force-splitting point.
On the basis of the data flow point cutpoint shown in Fig. 3 is searched, the embodiment party shown in Figure 13 In formula, being preset with rule on duplicate removal server 103 is: determine 11 points for potential cut-point k px, some pxCorresponding window Wx[px-Ax,px+Bx] and window Wx[px-Ax,px+Bx] corresponding Predetermined condition Cx, x is respectively 1 to 11 continuous print natural numbers, wherein, puts pxCorresponding window Wx [px-Ax,px+BxThe probability that in], at least part of data meet predetermined condition is 1/2, and A1=A2 =A3=A4=A5=A6=A7=A8=A9=A10=A11=169, B1=B2=B3=B4=B5=B6=B7=B8=B9 =B10=B11=0, and C1=C2=C3=C4=C5=C6=C7=C8=C9=C10=C11, wherein, px Spacing d with potential cut-point kxIndividual byte, concrete, p1And between potential cut-point k 0 byte of distance, p2With 2 bytes of spacing of k, p3With 4 bytes of spacing of k, p4 With 6 bytes of spacing of k, p5With 8 bytes of spacing of k, p6Spacing 10 with k Individual byte, p7With 12 bytes of spacing of k, p8With 14 bytes of spacing of k, p9With k 16 bytes of spacing, p10With 18 bytes of spacing of k, p11Spacing 20 with k Byte, and p2、p3、p4、p5、p6、p7、p8、p9、p10And p11Relative to potential Cut-point k is respectively positioned on data flow point cutpoint and searches opposite direction.kaFor data flow point cutpoint, Figure 13 Shown in data flow point cutpoint search direction be from left to right, from data flow point cutpoint kaSkip After small data block 4KB, at minimum data block 4KB end position as next potential cut-point ki, for potential cut-point kiDetermine a pix, in the present embodiment, according at duplicate removal server 103 The upper rule preset, x is respectively 1 to 11 continuous print natural numbers.At the embodiment shown in Figure 13 In, according to pre-defined rule, for potential cut-point kiThe point determined is 11, respectively pi1、pi2、 pi3、pi4、pi5、pi6、pi7、pi8、pi9、pi10And pi11, put pi1、pi2、pi3、pi4、 pi5、pi6、pi7、pi8、pi9、pi10And pi11Corresponding window is respectively Wi1[pi1-169,pi1]、 Wi2[pi2-169,pi2]、Wi3[pi3-169,pi3]、Wi4[pi4-169,pi4]、Wi5[pi5-169,pi5]、 Wi6[pi6-169,pi6]、Wi7[pi7-169,pi7]、Wi8[pi8-169,pi8]、Wi9[pi9-169,pi9]、 Wi10[pi10-169,pi10] and Wi11[pi11-169,pi11].Wherein, pixWith potential cut-point kiSpacing From dxIndividual byte, concrete, pi1With kiSpacing 0 byte, pi2With kiSpacing 2 bytes, pi3 With kiSpacing 4 bytes, pi4With kiSpacing 6 bytes, pi5With kiSpacing 8 bytes, pi6With kiSpacing 10 bytes, pi7With kiSpacing 12 bytes, pi8With kiSpacing 14 bytes, pi9With kiSpacing 16 bytes, pi10With ki18 bytes of spacing, pi11With ki20 bytes of spacing, and And pi2、pi3、pi4、pi5、pi6、pi7、pi8、pi9、pi10And pi11Relative to potential segmentation Point kiIt is respectively positioned on data flow point cutpoint and searches opposite direction.Judge Wi1[pi1-169,pi1In] at least partly Whether data meet predetermined condition C1, judge Wi2[pi2-169,pi2In], whether at least part of data Meet predetermined condition C2, judge Wi3[pi3-169,pi3In], whether at least part of data meet predetermined Condition C3, judge Wi4[pi4-169,pi4In], whether at least part of data meet predetermined condition C4、 Judge Wi5[pi5-169,pi5In], whether at least part of data meet predetermined condition C5, judge Wi6 [pi6-169,pi6In], whether at least part of data meet predetermined condition C6, judge Wi7[pi7-169, pi7In], whether at least part of data meet predetermined condition C7, judge Wi8[pi8-169,pi8In] at least Whether part data meet predetermined condition C8, judge Wi9[pi9-169,pi9At least part of data in] Whether meet predetermined condition C9, judge Wi10[pi10-169,pi10In], whether at least part of data meet Predetermined condition C10With judge Wi11[pi11-169,pi11In], whether at least part of data meet predetermined bar Part C11.When judging window Wi1In at least partly data meet predetermined condition C1, window Wi2In extremely Small part data meet predetermined condition C2, window Wi3In at least partly data meet predetermined condition C3, window Wi4In at least partly data meet predetermined condition C4, window Wi5In at least partly count According to meeting predetermined condition C5, window Wi6In at least partly data meet predetermined condition C6, window Wi7In at least partly data meet predetermined condition C7, window Wi8In at least partly data meet pre- Fixed condition C8, window Wi9In at least partly data meet predetermined condition C9, window Wi10In at least Part data meet predetermined condition C10With window Wi11In at least partly data meet predetermined condition C11Time, the most current potential cut-point kiFor data flow point cutpoint.When any one window in 11 windows When in Kou, at least part of data are unsatisfactory for the predetermined condition of correspondence, as shown in figure 14, Wi4[pi4 -169,pi4In], at least part of data are unsatisfactory for predetermined condition C4, then next potential segmentation is selected Point, for potential cut-point kiDifference, here shown as kj, kjIt is positioned at kiThe right, and kj With ki1 byte of spacing.As shown in figure 14, according to the rule preset on duplicate removal server 103, For potential cut-point kjDetermine 11 points, respectively pj1、pj2、pj3、pj4、pj5、pj6、pj7、 pj8、pj9、pj10And pj11, determine a pj1、pj2、pj3、pj4、pj5、pj6、pj7、pj8、pj9、 pj10And pj11Corresponding window is respectively Wj1[pj1-169,pj1]、Wj2[pj2-169,pj2]、Wj3 [pj3-169,pj3]、Wj4[pj4-169,pj4]、Wj5[pj5-169,pj5]、Wj6[pj6-169,pj6]、 Wj7[pj7-169,pj7]、Wj8[pj8-169,pj8]、Wj9[pj9-169,pj9]、Wj10[pj10-169, pj10] and Wj11[pj11-169,pj11], wherein, A1=A2=A3=A4=A5=A6=A7=A8=A9=A10= A11=169, B1=B2=B3=B4=B5=B6=B7=B8=B9=B10=B11=0, and C1=C2=C3=C4 =C5=C6=C7=C8=C9=C10=C11.Wherein, pjxWith potential cut-point kjSpacing dxIndividual Byte, concrete, pj1With kjSpacing 0 byte, pj2With kjSpacing 2 bytes, pj3With kj Spacing 4 bytes, pj4With kjSpacing 6 bytes, pj5With kjSpacing 8 bytes, pj6With kjBetween Away from 10 bytes, pj7With kjSpacing 12 bytes, pj8With kjSpacing 14 bytes, pj9With kj Spacing 16 bytes, pj10With kj18 bytes of spacing, pj11With kj20 bytes of spacing, and pj2、pj3、pj4、pj5、pj6、pj7、pj8、pj9、pj10And pj11Relative to potential cut-point kjAll It is positioned at data flow point cutpoint and searches opposite direction.Judge Wj1[pj1-169,pj1At least part of data in] Whether meet predetermined condition C1, judge Wj2[pj2-169,pj2In], whether at least part of data meet Predetermined condition C2, judge Wj3[pj3-169,pj3In], whether at least part of data meet predetermined condition C3, judge Wj4[pj4-169,pj4In], whether at least part of data meet predetermined condition C4, judge Wj5[pj5-169,pj5In], whether at least part of data meet predetermined condition C5, judge Wj6[pj6 -169,pj6In], whether at least part of data meet predetermined condition C6, judge Wj7[pj7-169, pj7In], whether at least part of data meet predetermined condition C7, judge Wj8[pj8-169,pj8In] extremely Whether small part data meet predetermined condition C8, judge Wj9[pj9-169,pj9In] at least partly Whether data meet predetermined condition C9, judge Wj10[pj10-169,pj10In], at least part of data are No meet predetermined condition C10With judge Wj11[pj11-169,pj11In], at least part of data are the fullest Foot predetermined condition C11.When judging window Wj1In at least partly data meet predetermined condition C1, window Mouth Wj2In at least partly data meet predetermined condition C2, window Wj3In at least partly data meet Predetermined condition C3, window Wj4In at least partly data meet predetermined condition C4, window Wj5In extremely Small part data meet predetermined condition C5, window Wj6In at least partly data meet predetermined condition C6, window Wj7In at least partly data meet predetermined condition C7, window Wj8In at least partly count According to meeting predetermined condition C8, window Wi9In at least partly data meet predetermined condition C9, window Wj10In at least partly data meet predetermined condition C10With window Wj11In at least partly data meet Predetermined condition C11Time, the most current potential cut-point kjFor data flow point cutpoint.When judging window Wj1、Wj2、Wj3、Wj4、Wj5、Wj6、Wj7、Wj8、Wj9、Wj10And Wj11In any one When in window, at least part of data are unsatisfactory for predetermined condition, as shown in figure 15, Wj3[pj3-169, pj3In], at least part of data are unsatisfactory for predetermined condition C3Time, put pi4Relative to data flow point cutpoint Search direction is positioned at a pj3The left side, from a pi4Along data flow point cutpoint search direction jump 21 Individual byte, it is thus achieved that next potential cut-point, for potential cut-point ki、kjDistinguish, table It is shown as kl.According in Figure 13 institute embodiment on duplicate removal server 103 preset rule, for Potential cut-point klThe point determined is 11, respectively pl1、pl2、pl3、pl4、pl5、pl6、 pl7、pl8、pl9、pl10And pl11, put pl1、pl2、pl3、pl4、pl5、pl6、pl7、pl8、 pl9、pl10And pl11Corresponding window is respectively Wl1[pl1-169,pl1]、Wl2[pl2-169,pl2]、 Wl3[pl3-169,pl3]、Wl4[pl4-169,pl4]、Wl5[pl5-169,pl5]、Wl6[pl6-169, pl6]、Wl7[pl7-169,pl7]、Wl8[pl8-169,pl8]、Wl9[pl9-169,pl9]、Wl10[pl10 -169,pl10] and Wl11[pl11-169,pl11], wherein, plxWith potential cut-point klSpacing dx Individual byte, concrete, pl1With potential cut-point kl0 byte of spacing, pl2With klBetween 2 bytes of distance, pl3With kl4 bytes of spacing, pl4With kl6 bytes of spacing, pl5With kl8 bytes of spacing, pl6With kl10 bytes of spacing, pl7With klSpacing From 12 bytes, pl8With kl14 bytes of spacing, pl9With kl16 bytes of spacing, pl10With kl18 bytes of spacing, pl11With kl20 bytes of spacing, and pl2、pl3、 pl4、pl5、pl6、pl7、pl8、pl9、pl10And pl11Relative to potential cut-point klIt is respectively positioned on Data flow point cutpoint searches opposite direction.Judge Wl1[pl1-169,pl1In], whether at least part of data Meet predetermined condition C1, judge Wl2[pl2-169,pl2In], whether at least part of data meet pre- Fixed condition C2, judge Wl3[pl3-169,pl3In], whether at least part of data meet predetermined condition C3, judge Wl4[pl4-169,pl4In], whether at least part of data meet predetermined condition C4, sentence Disconnected Wl5[pl5-169,pl5In], whether at least part of data meet predetermined condition C5, judge Wl6[pl6 -169,pl6In], whether at least part of data meet predetermined condition C6, judge Wl7[pl7-169,pl7] In at least partly data whether meet predetermined condition C7, judge Wl8[pl8-169,pl8In] at least Whether part data meet predetermined condition C8, judge Wl9[pl9-169,pl9At least partly count in] According to whether meeting predetermined condition C9, judge Wl10[pl10-169,pl10In], whether at least part of data Meet predetermined condition C10With judge Wl11[pl11-169,pl11In], whether at least part of data meet Predetermined condition C11.When judging window Wl1In at least partly data meet predetermined condition C1, window Wl2In at least partly data meet predetermined condition C2, window Wl3In at least partly data meet pre- Fixed condition C3, window Wl4In at least partly data meet predetermined condition C4, window Wl5In at least Part data meet predetermined condition C5, window Wl6In at least partly data meet predetermined condition C6、 Window Wl7In at least partly data meet predetermined condition C7, window Wl8In at least partly data full Foot predetermined condition C8, window Wl9In at least partly data meet predetermined condition C9, window Wl10In At least partly data meet predetermined condition C10With window Wl11In at least partly data meet predetermined Condition C11Time, the most current potential cut-point klFor data flow point cutpoint.As window Wl1、Wl2、 Wl3、Wl4、Wl5、Wl6、Wl7、Wl8、Wl9、Wl10And Wl11At least portion in middle either window When divided data is unsatisfactory for predetermined condition, select next potential cut-point, for potential cut-point ki、kjAnd klDifference, is expressed as km, kmIt is positioned at klThe right, and kmWith kl1 byte of spacing. The rule preset on duplicate removal server 103 according to embodiment illustrated in fig. 13, for potential cut-point kmThe point determined is 11, respectively pm1、pm2、pm3、pm4、pm5、pm6、pm7、pm8、 pm9、pm10And pm11, put pm1、pm2、pm3、pm4、pm5、pm6、pm7、pm8、pm9、 pm10And pm11Corresponding window is respectively Wm1[pm1-169,pm1]、Wm2[pm2-169,pm2]、 Wm3[pm3-169,pm3]、Wm4[pm4-169,pm4]、Wm5[pm5-169,pm5]、Wm6[pm6-169, pm6]、Wm7[pm7-169,pm7]、Wm8[pm8-169,pm8]、Wm9[pm9-169,pm9]、Wm10 [pm10-169,pm10] and Wm11[pm11-169,pm11], wherein, pmxWith potential cut-point kmIt Spacing dxIndividual byte, concrete, pm1With potential cut-point km0 byte of spacing, pm2 With km2 bytes of spacing, pm3With km4 bytes of spacing, pm4With kmSpacing 6 bytes, pm5With km8 bytes of spacing, pm6With km10 bytes of spacing, pm7 With km12 bytes of spacing, pm8With km14 bytes of spacing, pm9With kmSpacing From 16 bytes, pm10With km18 bytes of spacing, pm11With km20 words of spacing Joint, and pm2、pm3、pm4、pm5、pm6、pm7、pm8、pm9、pm10And pm11Relatively In potential cut-point kmIt is respectively positioned on data flow point cutpoint and searches opposite direction.Judge Wm1[pm1-169, pm1In], whether at least part of data meet predetermined condition C1, judge Wm2[pm2-169,pm2In] At least partly whether data meet predetermined condition C2, judge Wm3[pm3-169,pm3At least portion in] Whether divided data meets predetermined condition C3, judge Wm4[pm4-169,pm4At least part of data in] Whether meet predetermined condition C4, judge Wm5[pm5-169,pm5In], at least part of data are the fullest Foot predetermined condition C5, judge Wm6[pm6-169,pm6In], whether at least part of data meet predetermined Condition C6, judge Wm7[pm7-169,pm7In], whether at least part of data meet predetermined condition C7、 Judge Wm8[pm8-169,pm8In], whether at least part of data meet predetermined condition C8, judge Wm9 [pm9-169,pm9In], whether at least part of data meet predetermined condition C9, judge Wm10[pm10 -169,pm10In], whether at least part of data meet predetermined condition C10With judge Wm11[pm11-169, pm11In], whether at least part of data meet predetermined condition C11.When judging window Wm1In at least portion Divided data meets predetermined condition C1, window Wm2In at least partly data meet predetermined condition C2、 Window Wm3In at least partly data meet predetermined condition C3, window Wm4In at least partly data full Foot predetermined condition C4, window Wm5In at least partly data meet predetermined condition C5, window Wm6In At least partly data meet predetermined condition C6, window Wm7In at least partly data meet predetermined bar Part C7, window Wm8In at least partly data meet predetermined condition C8, window Wm9In at least partly Data meet predetermined condition C9, window Wm10In at least partly data meet predetermined condition C10And window Mouth Wm11In at least partly data meet predetermined condition C11Time, the most current potential cut-point kmFor number According to flow point cutpoint.When data at least part of in any one window are unsatisfactory for predetermined condition, then press Jump is performed, to obtain next potential cut-point and to determine whether according to previously described scheme Data flow point cutpoint.
Embodiments provide one and judge window Wiz[piz-Az,piz+BzAt least portion in] Whether divided data meets predetermined condition CzMethod, in the present embodiment use random function judge window Mouth Wiz[piz-Az,piz+BzIn], whether at least part of data meet predetermined condition Cz, with Fig. 5 As a example by shown embodiment, according to the rule preset on duplicate removal server 103, for potential Cut-point kiDetermine a pi1And some pi1Corresponding window Wi1[pi1-169,pi1], it is judged that Wi1[pi1-169, pi1In], whether at least part of data meet predetermined condition C1, as shown in figure 16, Wi1Represent window Mouth Wi1[pi1-169,pi1], for judging Wi1[pi1-169,pi1In], whether at least part of data meet pre- Fixed condition C1, select 5 bytes, 1 byte that in Figure 16, " ■ " expression selects, adjacent two 42 bytes are differed between the byte selected.5 byte datas selected are recycled 51 times, Obtain 255 bytes altogether, to increase randomness.The most each byte is formed by 8, is designated as am,1… am,8, represent in 255 bytes the 1st to the 8th of m-th byte, therefore, 255 bytes pair The position answered can be expressed as: a 1 , 1 a 1 , 2 ... a 1 , 8 a 2 , 1 a 2 , 2 ... a 2 , 8 . . . . . . . . . . . . a 255 , 1 a 255 , 2 ... a 255 , 8 , Work as am,nWhen=1, Vam,n=1, when am,nWhen=0, Vam,n=-1, wherein am,nRepresent am,1…am,8In any one, 255 bytes pair The position answered is according to am,nWith Vam,nTransformational relation obtain matrix Va, can be expressed as: V a 1 , 1 V a 1 , 2 ... V a 1 , 8 V a 2 , 1 V a 2 , 2 ... V a 2 , 8 . . . . . . . . . . . . V a 255 , 1 V a 255 , 2 ... V a 255 , 8 . Choose a large amount of random number, form matrix, by random data The matrix of composition once forms, and keeps constant, as (divided with normal state here from obedience specific distribution As a example by cloth) random number in select 255*8 random number to form matrix R: h 1 , 1 h 1 , 2 ... h 1 , 8 h 2 , 1 h 2 , 2 ... h 2 , 8 . . . . . . . . . . . . h 255 , 1 h 255 , 2 ... h 255 , 8 , By matrix VaThe random number phase of m row and the m row of matrix R Taking advantage of, then summation obtains a value, is embodied as Sam=Vam,1*hm,1+Vam,2*hm,2+…+Vam,8 *hm,8.According to the method, it is thus achieved that Sa1、Sa2... to Sa255, add up Sa1、Sa2... to Sa255In Meet number K of the value of specified conditions (here as a example by more than 0).Owing to matrix R is just obeying State is distributed, then SamAs matrix R, still Normal Distribution, according to theory of probability, normal state The distribution random numbers probability more than 0 is 1/2, at Sa1、Sa2... to Sa255In, each value is more than 0 Probability be 1/2, so K meets binomial distribution: P ( k = n ) = C 255 n ( 1 2 ) n ( 1 2 ) 255 - n = C 255 n ( 1 2 ) 255 . According to statistical result, it is judged that Sa1、Sa2... to Sa255Value more than 0 number K whether be even number, The random number of binomial distribution be the probability of even number for for 1/2, so K meets bar with the probability of 1/2 Part.When K is even number, show Wi1[pi1-169,pi1In], at least part of data meet predetermined condition C1;When K is odd number, show Wi1[pi1-169,pi1In], at least part of data are unsatisfactory for making a reservation for Condition C1, C here1I.e. refer to the S obtained according to aforesaid waya1、Sa2... to Sa255Value more than 0 Number K be even number.In the embodiment shown in Fig. 5, at Wi1[pi1-169,pi1]、Wi2[pi2 -169,pi2]、Wi3[pi3-169,pi3]、Wi4[pi4-169,pi4]、Wi5[pi5-169,pi5]、Wi6[pi6 -169,pi6]、Wi7[pi7-169,pi7]、Wi8[pi8-169,pi8]、Wi9[pi9-169,pi9]、Wi10[pi10 -169,pi10] and Wi11[pi11-169,pi11In], each window size is identical, i.e. window size is 169 Byte, judges that the mode that in window, whether at least part of data meet predetermined condition is the most identical simultaneously, It is specifically shown in above-mentioned judgement Wi1[pi1-169,pi1In], whether at least part of data meet predetermined condition C1 Description.Therefore, as shown in figure 16,Represent and judge window Wi2[pi2-169,pi2In] extremely Whether small part data meet predetermined condition C2Time select 1 byte, adjacent two select 42 bytes are differed between byte.5 byte datas selected are recycled 51 times, obtains altogether 255 bytes, to increase randomness.The most each byte is formed by 8, is designated as bm,1…bm,8, Representing in 255 bytes the 1st to the 8th of m-th byte, therefore, 255 bytes are corresponding Position can be expressed as: b 1 , 1 b 1 , 2 ... b 1 , 8 b 2 , 1 b 2 , 2 ... b 2 , 8 . . . . . . . . . . . . b 255 , 1 b 255 , 2 ... b 255 , 8 , Work as bm,nWhen=1, Vbm,n=1, work as bm,n=0 Time, Vbm,n=-1, wherein bm,nRepresent bm,1…bm,8In any one, the position that 255 bytes are corresponding According to bm,nWith Vbm,nTransformational relation obtain matrix Vb, can be expressed as: V b 1 , 1 V b 1 , 2 ... V b 1 , 8 V b 2 , 1 V b 2 , 2 ... V b 2 , 8 . . . . . . . . . . . . V b 255 , 1 V b 255 , 2 ... V b 255 , 8 , Judge Wi1[pi1-169,pi1In], at least part of data are the fullest Foot predetermined condition mode with judge window Wi2[pi2-169,pi2In], whether at least part of data The mode meeting predetermined condition is identical, therefore use matrix R: h 1 , 1 h 1 , 2 ... h 1 , 8 h 2 , 1 h 2 , 2 ... h 2 , 8 . . . . . . . . . . . . h 255 , 1 h 255 , 2 ... h 255 , 8 , By matrix VbM row be multiplied with the random number of the m row of matrix R, then summation obtain one Individual value, is embodied as Sbm=Vbm,1*hm,1+Vbm,2*hm,2+…+Vbm,8*hm,8.According to the method, Obtain Sb1、Sb2... to Sb255, add up Sb1、Sb2... to Sb255In meet specified conditions (here As a example by more than 0) number K of value.Due to matrix R Normal Distribution, then SbmWith square R is the same for battle array, still Normal Distribution, and according to theory of probability, normal distribution random number is more than 0 Probability be 1/2, at Sb1、Sb2... to Sb255In, each value probability more than 0 is 1/2, institute Binomial distribution is met with K: P ( k = n ) = C 255 n ( 1 2 ) n ( 1 2 ) 255 - n = C 255 n ( 1 2 ) 255 . According to statistical result, Judge Sb1、Sb2... to Sb255Value more than 0 number K whether be even number, binomial distribution with Machine number be the probability of even number for for 1/2, so K meets condition with the probability of 1/2.When K is even number Time, show Wi2[pi2-169,pi2In], at least part of data meet predetermined condition C2;When K is strange During number, show Wi2[pi2-169,pi2In], at least part of data are unsatisfactory for predetermined condition C2, C here2 I.e. refer to the S obtained according to aforesaid wayb1、Sb2... to Sb255Value more than 0 number K be even number. In embodiment shown in Fig. 3, Wi2[pi2-169,pi2In], at least part of data meet predetermined condition C2
Therefore, as shown in figure 16,Represent and judge window Wi3[pi3-169,pi3At least portion in] Whether divided data meets predetermined condition C3Time select 1 byte, adjacent two select bytes Between differ 42 bytes.5 byte datas selected are recycled 51 times, obtains 255 altogether Byte, to increase randomness.Then use and judge window Wi1[pi1-169,pi1] and Wi2[pi2-169, pi2The method that in], whether at least part of data meet predetermined condition, it is judged that Wi3[pi3-169,pi3] In at least data whether meet predetermined condition C3.In embodiment shown in Fig. 5, Wi3[pi3-169, pi3In], at least part of data meet predetermined condition.As shown in figure 16,Represent and judge window Wi4[pi4-169,pi4In], whether at least part of data meet predetermined condition C4Time select 1 word Joint, differs 42 bytes between adjacent two bytes selected.By anti-for 5 byte datas of selection Utilize again 51 times, obtain 255 bytes altogether, to increase randomness.Then use and judge window Wi1 [pi1-169,pi1]、Wi2[pi2-169,pi2] and Wi3[pi3-169,pi3In], at least part of data are the fullest The method of foot predetermined condition, it is judged that Wi4[pi4-169,pi4In], whether at least part of data meet pre- Fixed condition C4.In embodiment shown in Fig. 5, Wi4[pi4-169,pi4At least part of data in] Meet predetermined condition C4.As shown in figure 16,Represent and judge window Wi5[pi5-169,pi5In] At least partly whether data meet predetermined condition C5Time select 1 byte, adjacent two selections Byte between differ 42 bytes.5 byte datas selected are recycled 51 times, obtains altogether Obtain 255 bytes, to increase randomness.Then use and judge window Wi1[pi1-169,pi1]、Wi2[pi2 -169,pi2]、Wi3[pi3-169,pi3] and Wi4[pi4-169,pi4In], whether at least part of data meet The method of predetermined condition, it is judged that Wi5[pi5-169,pi5In], at least whether data meet predetermined condition C5.In embodiment shown in Fig. 5, Wi5[pi5-169,pi5In], at least part of data are unsatisfactory for pre- Fixed condition C5
Work as Wi5[pi5-169,pi5In], at least part of data are unsatisfactory for C during predetermined condition5, from a pi5 Along data flow point cutpoint search direction 11 bytes of jump, at the end position of the 11st byte Obtain next potential cut-point kj, as shown in Figure 6, preset according on duplicate removal server 103 Rule, for potential cut-point kjDetermine a pj1, some pj1Corresponding window Wj1[pj1-169,pj1], Judge window Wj1[pj1-169,pj1In], whether at least part of data meet predetermined condition C1Side Formula with judge window Wi1[pi1-169,pi1In], whether at least part of data meet predetermined condition C1's Mode is identical, the most as shown in figure 17, and Wj1Represent window Wj1[pj1-169,pj1], for judging Wj1 [pj1-169,pj1In], whether at least part of data meet predetermined condition C1, select 5 bytes, figure In 17, " ■ " represents 1 byte selected, and differs 42 words between adjacent two bytes selected Joint.5 byte datas selected are recycled 51 times, obtains 255 bytes altogether, random to increase Property.The most each byte is formed by 8, is designated as am,1'…am,8', represent m in 255 bytes The 1st of individual byte to the 8th, therefore, position corresponding to 255 bytes can be expressed as: a 1 , 1 ′ a 1 , 2 ′ ... a 1 , 8 ′ a 2 , 1 ′ a 2 , 2 ′ ... a 2 , 8 ′ . . . . . . . . . . . . a 255 , 1 ′ a 255 , 2 ′ ... a 255 , 8 ′ , Work as am,nDuring '=1, Vam,n'=1, works as am,nDuring '=0, Vam,n'=-1, Wherein am,n' represent am,1'…am,8Any one in ', position corresponding to 255 bytes is according to am,n' with Vam,n' transformational relation obtain matrix Va', can be expressed as: V a 1 , 1 ′ V a 1 , 2 ′ ... V a 1 , 8 ′ V a 2 , 1 ′ V a 2 , 2 ′ ... V a 2 , 8 ′ . . . . . . . . . . . . V a 255 , 1 ′ V a 255 , 2 ′ ... V a 255 , 8 ′ . Judge window Wj1[pj1-169,pj1In], whether at least part of data meet predetermined condition and sentence Disconnected window Wi1[pi1-169,pi1In], whether at least part of data meet the mode phase of predetermined condition With, therefore use matrix R: h 1 , 1 h 1 , 2 ... h 1 , 8 h 2 , 1 h 2 , 2 ... h 2 , 8 . . . . . . . . . . . . h 255 , 1 h 255 , 2 ... h 255 , 8 , By matrix Va' m row and matrix The random number of the m row of R is multiplied, and then summation obtains a value, is embodied as Sam'=Vam,1' *hm,1+Vam,2'*hm,2+…+Vam,8'*hm,8.According to the method, it is thus achieved that Sa1'、Sa2' ... to Sa255', Statistics Sa1'、Sa2' ... to Sa255The value of specified conditions (here as a example by more than 0) is met in ' Number K.Due to matrix R Normal Distribution, then Sam' as matrix R, just still obeying State is distributed, and according to theory of probability, the normal distribution random number probability more than 0 is 1/2, at Sa1'、 Sa2' ... to Sa255In ', each value probability more than 0 is 1/2, so K meets binomial distribution: P ( k = n ) = C 255 n ( 1 2 ) n ( 1 2 ) 255 - n = C 255 n ( 1 2 ) 255 . According to statistical result, it is judged that Sa1'、Sa2' ... arrive Sa255' value more than 0 number K whether be even number, the random number of binomial distribution is the general of even number Rate is 1/2, so K meets condition with the probability of 1/2.When K is even number, show Wj1[pj1-169, pj1In], at least part of data meet predetermined condition C1;When K is odd number, show Wj1[pj1-169, pj1In], at least part of data are unsatisfactory for predetermined condition C1
Judge Wi2[pi2-169,pi2In], whether at least part of data meet predetermined condition C2Side Formula and judge Wj2[pj2-169,pj2In], whether at least part of data meet predetermined condition C2Side Formula is identical, therefore, as shown in figure 17,Represent and judge window Wj2[pj2-169,pj2In] extremely Whether small part data meet predetermined condition C2Time select 1 byte, adjacent two select 42 bytes are differed between byte.5 byte datas selected are recycled 51 times, obtains altogether 255 bytes, to increase randomness.The most each byte is formed by 8, is designated as bm,1'…bm,8', Representing in 255 bytes the 1st to the 8th of m-th byte, therefore, 255 bytes are corresponding Position can be expressed as: b 1 , 1 ′ b 1 , 2 ′ ... b 1 , 8 ′ b 2 , 1 ′ b 2 , 2 ′ ... b 2 , 8 ′ . . . . . . . . . . . . b 255 , 1 ′ b 255 , 2 ′ ... b 255 , 8 ′ , Work as bm,nDuring '=1, Vbm,n'=1, works as bm,n' When=0, Vbm,n'=-1, wherein bm,n' represent bm,1'…bm,8Any one in ', 255 byte correspondences Position according to bm,n' and Vbm,n' transformational relation obtain matrix Vb', can be expressed as: V b 1 , 1 V b 1 , 2 ... V b 1 , 8 V b 2 , 1 V b 2 , 2 ... V b 2 , 8 . . . . . . . . . . . . V b 255 , 1 V b 255 , 2 ... V b 255 , 8 . Window W2[p2-169,p2] and W2[q2-169,q2In] at least The mode whether part data meet predetermined condition is identical, the most still uses matrix R: h 1 , 1 h 1 , 2 ... h 1 , 8 h 2 , 1 h 2 , 2 ... h 2 , 8 . . . . . . . . . . . . h 255 , 1 h 255 , 2 ... h 255 , 8 , By matrix Vb' the random number of m row of m row and matrix R Being multiplied, then summation obtains a value, is embodied as Sbm'=Vbm,1'*hm,1+Vbm,2'*hm,2 +…+Vbm,8'*hm,8.According to the method, it is thus achieved that Sb1'、Sb2' ... to Sb255', add up Sb1'、Sb2'… To Sb255Number K of the value of specified conditions (here as a example by more than 0) is met in '.Due to square Battle array R Normal Distribution, then Sbm' as matrix R, still Normal Distribution, according to generally Rate opinion, the normal distribution random number probability more than 0 is 1/2, at Sb1'、Sb2' ... to Sb255In ', Each value probability more than 0 is 1/2, so K meets binomial distribution: P ( k = n ) = C 255 n ( 1 2 ) n ( 1 2 ) 255 - n = C 255 n ( 1 2 ) 255 . According to statistical result, it is judged that Sb1'、Sb2' ... arrive Sb255' value more than 0 number K whether be even number, the random number of binomial distribution is the general of even number Rate is for for 1/2, so K meets condition with the probability of 1/2.When K is even number, show Wj2[pj2 -169,pj2In], at least part of data meet predetermined condition C2;When K is odd number, show Wj2[pj2 -169,pj2In], at least part of data are unsatisfactory for predetermined condition C2.In like manner, it is judged that Wi3[pi3-169, pi3In], whether at least part of data meet predetermined condition C3Mode with judge Wj3[pj3-169, pj3In], whether at least part of data meet predetermined condition C3Mode identical, in like manner, it is judged that Wj4 [pj4-169,pj4In], whether at least part of data meet predetermined condition C4, judge Wj5[pj5-169, pj5In], whether at least part of data meet predetermined condition C5, judge Wj6[pj6-169,pj6In] extremely Whether small part data meet predetermined condition C6, judge Wj7[pj7-169,pj7In] at least partly Whether data meet predetermined condition C7, judge Wj8[pj8-169,pj8In], at least part of data are No meet predetermined condition C8, judge Wj9[pj9-169,pj9In], whether at least part of data meet Predetermined condition C9, judge Wj10[pj10-169,pj10In], whether at least part of data meet predetermined bar Part C10With judge Wj11[pj11-169,pj11In], whether at least part of data meet predetermined condition C11, Do not repeat them here.
Still as a example by Fig. 5 illustrated embodiment, it is provided that one judges window Wiz[piz-Az, piz+BzIn], whether at least part of data meet predetermined condition CzMethod, the present embodiment makes Window W is judged with random functioniz[piz-Az,piz+BzIn], whether at least part of data meet pre- Fixed condition Cz, according to the rule preset on duplicate removal server 103, for potential cut-point kiReally Fixed point pi1And pi1Corresponding window Wi1[pi1-169,pi1], it is judged that Wi1[pi1-169,pi1At least portion in] Whether divided data meets predetermined condition C1, as shown in figure 16, Wi1Represent window Wi1[pi1-169, pi1], for judging Wi1[pi1-169,pi1In], whether at least part of data meet predetermined condition C1, choosing Selecting 5 bytes, in Figure 16, " ■ " represents 1 byte selected, the word of adjacent two selections " ■ " 42 bytes are differed between joint.One of which implementation selects for using HASH function to calculate 5 bytes, use the calculated numerical value of HASH function to be one and fixing be uniformly distributed, If using the calculated numerical value of HASH function is even number, then judge Wi1[pi1-169,pi1] In at least partly data meet predetermined condition C1, i.e. C1Represent and use HASH according to aforesaid way The calculated numerical value of function is even number.Therefore, Wi1[pi1-169,pi1At least part of data in] The probability whether meeting predetermined condition is 1/2.In the embodiment shown in Fig. 5, use Hash Function judges Wi2[pi2-169,pi2In], whether at least part of data meet predetermined condition C2、Wi3 [pi3-169,pi3In], whether at least part of data meet predetermined condition C3、Wi4[pi4-169,pi4In] At least partly whether data meet predetermined condition C4And Wi5[pi5-169,pi5At least part of data in] Whether meet predetermined condition C5, implement and refer to describe the use of Fig. 5 illustrated embodiment Hash function judges Wi1[pi1-169,pi1In], whether at least part of data meet the side of predetermined condition Formula C1, do not repeat them here.
Work as Wi5[pi5-169,pi5In], at least part of data are unsatisfactory for predetermined condition C5Time, from a pi5 Along data flow point cutpoint search direction 11 bytes of jump, at the end position of the 11st byte Obtain current potential cut-point kj, as shown in Figure 6, according to preset on duplicate removal server 103 Rule, for potential cut-point kjDetermine a pj1, some pj1Corresponding window Wj1[pj1-169,pj1], Judge window Wj1[pj1-169,pj1In], whether at least part of data meet predetermined condition C1Side Formula with judge window Wi1[pi1-169,pi1In], whether at least part of data meet predetermined condition C1's Mode is identical, the most as shown in figure 17, and Wj1Represent window Wj1[pj1-169,pj1], for judging Wj1 [pj1-169,pj1In], whether at least part of data meet predetermined condition C1, select 5 bytes, figure In 17, " ■ " represents 1 byte selected, and differs 42 between adjacent two bytes " ■ " selected Individual byte.Hash function is used to calculate from window Wj1[pj1-169,pj15 bytes chosen in], If the numerical value obtained is even number, then Wj1[pj1-169,pj1In], at least part of data meet predetermined Condition C1.In Figure 17, it is judged that Wi2[pi2-169,pi2In], whether at least part of data meet predetermined Condition C2Mode and judge Wj2[pj2-169,pj2In], whether at least part of data meet predetermined Condition C2Mode identical, therefore, as shown in figure 17,Represent and judge window Wj2[pj2-169, pj2In], whether at least part of data meet predetermined condition C2Time select 1 byte, adjacent two The byte of individual selectionBetween differ 42 bytes.Hash function is used to calculate 5 selected Byte, if the numerical value obtained is even number, then Wj2[pj2-169,pj2In], at least part of data are full Foot predetermined condition C2.In Figure 17, it is judged that Wi3[pi3-169,pi3In], at least part of data are the fullest Foot predetermined condition C3Mode with judge Wj3[pj3-169,pj3In], at least part of data are the fullest Foot predetermined condition C3Mode identical, therefore, as shown in figure 17,Represent and judge window Wj3 [pj3-169,pj3In], whether at least part of data meet predetermined condition C3Time select 1 byte, Adjacent two bytes selectedBetween differ 42 bytes.Use Hash function to calculate to select 5 bytes, the numerical value obtained is even number, then Wj3[pj3-169,pj3At least part of data in] Meet predetermined condition C3.In Figure 17, it is judged that Wj4[pj4-169,pj4In], whether at least part of data Meet predetermined condition C4Mode and judge window Wi4[pi4-169,pi4At least part of data in] Whether meet predetermined condition C4Mode, therefore, as shown in figure 17,Represent and judge window Mouth Wj4[pj4-169,pj4In], whether at least part of data meet predetermined condition C4Time select 1 Byte, adjacent two bytes selectedBetween differ 42 bytes.Use Hash function meter Calculating 5 bytes selected, the numerical value obtained is even number, then Wj4[pj4-169,pj4At least portion in] Divided data meets predetermined condition C4.According to said method, it is judged that Wj5[pj5-169,pj5In] at least Whether part data meet predetermined condition C5, judge Wj6[pj6-169,pj6At least partly count in] According to whether meeting predetermined condition C6, judge Wj7[pj7-169,pj7In], whether at least part of data Meet predetermined condition C7, judge Wj8[pj8-169,pj8In], whether at least part of data meet pre- Fixed condition C8, judge Wj9[pj9-169,pj9In], whether at least part of data meet predetermined condition C9, judge Wj10[pj10-169,pj10In], whether at least part of data meet predetermined condition C10With Judge Wj11[pj11-169,pj11In], whether at least part of data meet predetermined condition C11, at this not Repeat again.
As a example by the embodiment shown in Fig. 5, it is provided that one judges window Wiz[piz-Az,piz +BzIn], whether at least part of data meet predetermined condition CzMethod, in the present embodiment use Random function judges window Wiz[piz-Az,piz+BzIn], whether at least part of data meet predetermined Condition Cz, according to the rule preset on duplicate removal server 103, for potential cut-point kiDetermine Point pi1And pi1Corresponding window Wi1[pi1-169,pi1], it is judged that Wi1[pi1-169,pi1In] at least partly Whether data meet predetermined condition C1, as shown in figure 16, Wi1Represent window Wi1[pi1-169,pi1], For judging Wi1[pi1-169,pi1In], whether at least part of data meet predetermined condition C1, select 5 Byte, in Figure 16, the byte " ■ " of serial number 169,127,85,43 and 1 represents selection respectively 1 byte, adjacent two select bytes between differ 42 bytes.By serial number 169, 127, the byte " ■ " of 85,43 and 1 is converted into a decimal value respectively, represents respectively For a1、a2、a3、a4And a5.Because 1 byte is formed by 8, so each byte " ■ " As numerical value, then an a1、a2、a3、a4And a5In any one arIt is satisfied by 0≤ar≤255。 a1、a2、a3、a4And a5The matrix of composition 1*5.Select from the random number obeying binomial distribution Select 256*5 random number, form matrix R, be expressed as: h 0 , 1 h 0 , 2 ... h 0 , 5 h 1 , 1 h 1 , 2 ... h 1 , 5 . . . . . . . . . . . . h 255 , 5 h 255 , 5 ... h 255 , 5 ,
According to a1Value and the row at place, search from matrix R correspondence value, such as a1=36, a1 It is positioned at the 1st row, then searches h36,1Corresponding value;According to a2Value and the row at place, from matrix R The middle value searching correspondence, such as a2=48, a2It is positioned at the 2nd row, then searches h48,2Corresponding value;Root According to a3Value and the row at place, search from matrix R correspondence value, such as a3=26, a3It is positioned at 3rd row, then search h26,3Corresponding value;According to a4Value and the row at place, look into from matrix R Look for the value of correspondence, such as a4=26, a4It is positioned at the 4th row, then searches h26,4Corresponding value;According to a5 Value and the row at place, search from matrix R correspondence value, such as a5=88, a5It is positioned at the 5th row, Then search h88,5Corresponding value.S1=h36,1+h48,2+h26,3+h26,4+h88,5, because matrix R obeys two Item distribution, therefore, S1Also binomial distribution is obeyed.Work as S1For even number, then Wi1[pi1-169,pi1] In at least partly data meet predetermined condition C1, work as S1For odd number, then Wi1[pi1-169,pi1In] At least partly data are unsatisfactory for predetermined condition C1, S1Probability for even number is 1/2, C1Expression is pressed Aforesaid way calculates S1For even number.In the embodiment shown in fig. 5, Wi1[pi1-169,pi1In] at least Part data meet predetermined condition C1.As shown in figure 16,Represent and judge window Wi2[pi2-169, pi2In], whether at least part of data meet predetermined condition C2Time 1 byte selecting respectively, at figure In 16, represent by sequence number 170,128,86,44 and 2 respectively, adjacent two bytes selected Between differ 42 bytes.Byte by sequence number 170,128,86,44 and 2Turn respectively Change a decimal value into, be expressed as b1、b2、b3、b4And b5.Because 1 byte Formed by 8, so each byteAs numerical value, then a b1、b2、b3、b4And b5 In any one brIt is satisfied by 0≤br≤255。b1、b2、b3、b4And b5The matrix of composition 1*5. In present embodiment, it is judged that Wi1And Wi2In at least partly data whether meet the side of predetermined condition Formula is identical, the most still uses matrix R, according to b1Value and the row at place, from matrix R Search corresponding value, such as b1=66, b1It is positioned at the 1st row, then searches h66,1Corresponding value;According to b2 Value and the row at place, search from matrix R correspondence value, such as b2=48, b2It is positioned at the 2nd row, Then search h48,2Corresponding value;According to b3Value and the row at place, search corresponding from matrix R Value, such as b3=99, b3It is positioned at the 3rd row, then searches h99,3Corresponding value;According to b4Value and institute Row, search from matrix R correspondence value, such as b4=26, b4It is positioned at the 4th row, then searches h26,4 Corresponding value;According to b5Value and the row at place, search from matrix R correspondence value, such as b5=90, b5It is positioned at the 5th row, then searches h90,5Corresponding value.S2=h66,1+h48,2+h99,3+h26,4+h90,5, because Matrix R obeys binomial distribution, therefore, S2Also binomial distribution is obeyed.Work as S2For even number, then Wi2 [pi2-169,pi2In], at least part of data meet predetermined condition C2, work as S2For odd number, then Wi2[pi2 -169,pi2In], at least part of data are unsatisfactory for predetermined condition C2, S2Probability for even number is 1/2. In the embodiment shown in fig. 5, Wi2[pi2-169,pi2In], at least part of data meet predetermined condition C2.Use same rule, judge W respectivelyi3[pi3-169,pi3In], whether at least part of data Meet predetermined condition C3, judge Wi4[pi4-169,pi4In], whether at least part of data meet predetermined Condition C4, judge Wi5[pi5-169,pi5In], whether at least part of data meet predetermined condition C5、 Judge Wi6[pi6-169,pi6In], whether at least part of data meet predetermined condition C6, judge Wi7 [pi7-169,pi7In], whether at least part of data meet predetermined condition C7, judge Wi8[pi8-169, pi8In], whether at least part of data meet predetermined condition C8, judge Wi9[pi9-169,pi9In] at least Whether part data meet predetermined condition C9, judge Wi10[pi10-169,pi10At least part of data in] Whether meet predetermined condition C10With judge Wi11[pi11-169,pi11In], at least part of data are the fullest Foot predetermined condition C11.In embodiment shown in Fig. 5, Wi5[pi5-169,pi5At least partly count in] According to being unsatisfactory for predetermined condition C5, from a pi5Along data flow point cutpoint search direction 11 words of jump Joint, the end position the 11st byte obtains current potential cut-point kj, as shown in Figure 6, According to the rule preset on duplicate removal server 103, for potential cut-point kjDetermine a pj1, point pj1Corresponding window Wj1[pj1-169,pj1], it is judged that window Wj1[pj1-169,pj1In] at least partly Whether data meet predetermined condition C1Mode with judge window Wi1[pi1-169,pi1At least portion in] Whether divided data meets predetermined condition C1Mode identical, the most as shown in figure 17, Wj1Represent Window Wj1[pj1-169,pj1], for judging Wj1[pj1-169,pj1In], at least part of data are the fullest Foot predetermined condition C1, in Figure 17, the byte " ■ " of serial number 169,127,85,43 and 1 is respectively Represent 1 byte selected, between adjacent two bytes selected, differ 42 bytes.By sequence Number be 169,127,85,43 and 1 byte " ■ " be converted into a decimal value respectively, It is expressed as a1'、a2'、a3'、a4' and a5'.Because 1 byte is formed by 8, so often Individual byte " ■ " is as numerical value, then an a1'、a2'、a3'、a4' and a5Any one a in 'r' all Meet 0≤ar'≤255。a1'、a2'、a3'、a4' and a5' composition 1*5 matrix.Judge window Wj1 [pj1-169,pj1In], whether at least part of data meet predetermined condition C1Mode with judge window Wi1[pi1-169,pi1In], whether at least part of data meet predetermined condition C1Mode identical, because of This, still use matrix R, be expressed as: h 0 , 1 h 0 , 2 ... h 0 , 5 h 1 , 1 h 1 , 2 ... h 1 , 5 . . . . . . . . . . . . h 255 , 5 h 255 , 5 ... h 255 , 5 ,
According to a1' value and the row at place, search from matrix R correspondence value, such as a1'=16, a1' position In the 1st row, then search h16,1Corresponding value;According to a2' value and the row at place, from matrix R Search corresponding value, such as a2'=98, a2' be positioned at the 2nd row, then search h98,2Corresponding value;According to a3' value and the row at place, search from matrix R correspondence value, such as a3'=56, a3' it is positioned at 3 row, then search h56,3Corresponding value;According to a4' value and the row at place, from matrix R search Corresponding value, such as a4'=36, a4' be positioned at the 4th row, then search h36,4Corresponding value;According to a5' Value and the row at place, search the value of correspondence, such as a from matrix R5'=99, a5' it is positioned at the 5th row, Then search h99,5Corresponding value.S1'=h16,1+h98,2+h56,3+h36,4+h99,5, because matrix R obeys binomial Distribution, therefore, S1' also obey binomial distribution.Work as S1' for even number, then Wj1[pj1-169,pj1] In at least partly data meet predetermined condition C1, work as S1' for odd number, then Wj1[pj1-169,pj1] In at least partly data be unsatisfactory for predetermined condition C1, S1' it is 1/2 for the probability of even number.
Judge Wi2[pi2-169,pi2In], whether at least part of data meet predetermined condition C2Side Formula and judge Wj2[pj2-169,pj2In], whether at least part of data meet predetermined condition C2Side Formula is identical, therefore, as shown in figure 17,Represent and judge window Wj2[pj2-169,pj2In] extremely Whether small part data meet predetermined condition C2Time select 1 byte, adjacent two select Differ 42 bytes between byte, represent by sequence number 170,128,86,44 and 2 respectively, phase 42 bytes are differed between adjacent two bytes selected.By sequence number 170,128,86,44 and 2 ByteIt is converted into a decimal value respectively, is expressed as b1'、b2'、b3'、b4' And b5'.Because 1 byte is formed by 8, so each byteAs a numerical value, then b1'、b2'、b3'、b4' and b5Any one b in 'r' it is satisfied by 0≤br'≤255。b1'、b2'、b3'、b4' And b5' composition 1*5 matrix.With judge window Wi2[pi2-169,pi2In], at least part of data are No meet predetermined condition C2Use identical matrix R, according to b1' value and the row at place, from square Battle array R searches the value of correspondence, such as b1'=210, b1' be positioned at the 1st row, then search h210,1Corresponding value; According to b2' value and the row at place, search from matrix R correspondence value, such as b2'=156, b2' position In the 2nd row, then search h156,2Corresponding value;According to b3' value and the row at place, from matrix R Search corresponding value, such as b3'=144, b3' be positioned at the 3rd row, then search h144,3Corresponding value;Root According to b4' value and the row at place, search from matrix R correspondence value, such as b4'=60, b4' be positioned at 4th row, then search h60,4Corresponding value;According to b5' value and the row at place, look into from matrix R Look for the value of correspondence, such as b5'=90, b5' be positioned at the 5th row, then search h90,5Corresponding value.S2'=h210,1 +h156,2+h144,3+h60,4+h90,5, with S2Rule of judgment identical, work as S2' for even number, then Wj2[pj2 -169,pj2In], at least part of data meet predetermined condition C2, work as S2' for odd number, then Wj2[pj2 -169,pj2In], at least part of data are unsatisfactory for predetermined condition C2, S2' it is 1/2 for the probability of even number.
In like manner, it is judged that Wi3[pi3-169,pi3In], whether at least part of data meet predetermined condition C3 Mode with judge Wj3[pj3-169,pj3In], whether at least part of data meet predetermined condition C3 Mode identical, in like manner, it is judged that Wj4[pj4-169,pj4In], whether at least part of data meet pre- Fixed condition C4, judge Wj5[pj5-169,pj5In], whether at least part of data meet predetermined condition C5、 Judge Wj6[pj6-169,pj6In], whether at least part of data meet predetermined condition C6, judge Wj7 [pj7-169,pj7In], whether at least part of data meet predetermined condition C7, judge Wj8[pj8-169, pj8In], whether at least part of data meet predetermined condition C8, judge Wj9[pj9-169,pj9In] extremely Whether small part data meet predetermined condition C9, judge Wj10[pj10-169,pj10In] at least partly Whether data meet predetermined condition C10With judge Wj11[pj11-169,pj11At least part of data in] Whether meet predetermined condition C11, do not repeat them here.
As a example by the embodiment shown in Fig. 5, it is provided that one judges window Wiz[piz-Az,piz +BzIn], whether at least part of data meet predetermined condition CzMethod, in the present embodiment use Random function judges window Wiz[piz-Az,piz+BzIn], whether at least part of data meet predetermined Condition Cz, according to the rule preset on duplicate removal server 103, for potential cut-point kiDetermine Point pi1And pi1Corresponding window Wi1[pi1-169,pi1], it is judged that Wi1[pi1-169,pi1In] at least partly Whether data meet predetermined condition C1, as shown in figure 16, Wi1Represent window Wi1[pi1-169,pi1], For judging Wi1[pi1-169,pi1In], whether at least part of data meet predetermined condition C1, select 5 Byte, in Figure 16, the byte " ■ " of serial number 169,127,85,43 and 1 represents selection respectively 1 byte, adjacent two select bytes between differ 42 bytes.By serial number 169, 127, the byte " ■ " of 85,43 and 1 is converted into a decimal value respectively, represents respectively For a1、a2、a3、a4And a5.Because 1 byte is formed by 8, so each byte " ■ " As numerical value, then an a1、a2、a3、a4And a5In any one asIt is satisfied by 0≤as≤255。 a1、a2、a3、a4And a5The matrix of composition 1*5.Select from the random number obeying binomial distribution Select 256*5 random number, form matrix R, be expressed as: h 0 , 1 h 0 , 2 ... h 0 , 5 h 1 , 1 h 1 , 2 ... h 1 , 5 . . . . . . . . . . . . h 255 , 5 h 255 , 5 ... h 255 , 5 , From clothes From the random number of binomial distribution, select 256*5 random number, form matrix G, be expressed as: g 0 , 1 g 0 , 2 ... g 0 , 5 g 1 , 1 g 1 , 2 ... g 1 , 5 . . . . . . . . . . . . g 255 , 5 g 255 , 5 ... g 255 , 5 .
According to a1Value and the row at place, such as a1=36, a1It is positioned at the 1st row, then from matrix R Search h36,1Corresponding value, searches g from matrix G36,1Corresponding value;According to a2Value and The row at place, such as a2=48, a2It is positioned at the 2nd row, then from matrix R, looks into h48,2Corresponding value, G is searched from matrix G48,2Corresponding value;According to a3Value and the row at place, such as a3=26, a3 It is positioned at the 3rd row, then from matrix R, searches h26,3Corresponding value, searches g from matrix G26,3Right The value answered;According to a4Value and the row at place, such as a4=26, a4It is positioned at the 4th row, then from matrix R searches h26,4Corresponding value, searches g from matrix G26,4Corresponding value;According to a5Value and The row at place, such as a5=88, a5It is positioned at the 5th row, then from matrix R, searches h88,5Corresponding value, G is searched from matrix G88,5Corresponding value.S1h=h36,1+h48,2+h26,3+h26,4+h88,5, because matrix R obeys binomial distribution, therefore, S1hAlso binomial distribution is obeyed;S1g=g36,1+g48,2+g26,3+g26,4+ g88,5, because matrix G obeys binomial distribution, therefore S1gAlso binomial distribution is obeyed.Work as S1hAnd S1g In have 1 for even number, then Wi1[pi1-169,pi1In], at least part of data meet predetermined condition C1, Work as S1hAnd S1gIt is odd number, then Wi1[pi1-169,pi1In], at least part of data are unsatisfactory for predetermined bar Part C1, C1The S that statement obtains according to the method described above1hAnd S1gIn have 1 for even number.Because S1hWith S1gAll obey binomial distribution, therefore S1hProbability for even number is 1/2, S1gProbability for even number is 1/2, S1hAnd S1gIn to have 1 probability for even number be 1-1/4=3/4, therefore, Wi1[pi1-169,pi1] In at least partly data meet predetermined condition C1Probability be 3/4.In the embodiment shown in fig. 5, Wi1[pi1-169,pi1In], at least part of data meet predetermined condition C1.The embodiment party shown in Fig. 5 In formula, at Wi1[pi1-169,pi1]、Wi2[pi2-169,pi2]、Wi3[pi3-169,pi3]、Wi4[pi4-169, pi4]、Wi5[pi5-169,pi5]、Wi6[pi6-169,pi6]、Wi7[pi7-169,pi7]、Wi8[pi8-169, pi8]、Wi9[pi9-169,pi9]、Wi10[pi10-169,pi10] and Wi11[pi11-169,pi11In], each window Size is identical, i.e. window size is 169 bytes, judges that in window, at least part of data are simultaneously The no mode meeting predetermined condition is the most identical, is specifically shown in above-mentioned judgement Wi1[pi1-169,pi1In] at least Whether part data meet predetermined condition C1Description.Therefore, as shown in figure 16,Represent Judge window Wi2[pi2-169,pi2In], whether at least part of data meet predetermined condition C2Time-division 1 byte not selected, in figure 16, represents by sequence number 170,128,86,44 and 2 respectively, 42 bytes are differed between adjacent two bytes selected.By sequence number 170,128,86,44 With 2 byteIt is converted into a decimal value respectively, is expressed as b1、b2、b3、 b4And b5.Because 1 byte is formed by 8, so each byteAs a numerical value, Then b1、b2、b3、b4And b5In any one bsIt is satisfied by 0≤bs≤255。b1、b2、b3、b4 And b5The matrix of composition 1*5.In present embodiment, it is judged that in each window, at least part of data are The no mode meeting predetermined condition is identical, the most still uses same matrix R and G.According to b1's Value and the row at place, such as b1=66, b1It is positioned at the 1st row, then from matrix R, searches h66,1Corresponding Value, searches g from matrix G66,1Corresponding value;According to b2Value and the row at place, such as b2=48, b2It is positioned at the 2nd row, then from matrix R, searches h48,2Corresponding value, searches g from matrix G48,2Right The value answered;According to b3Value and the row at place, such as b3=99, b3It is positioned at the 3rd row, then from matrix R searches h99,3Corresponding value, searches g from matrix G99,3Corresponding value;According to b4Value and The row at place, such as b4=26, b4It is positioned at the 4th row, then from matrix R, searches h26,4Corresponding value, G is searched from matrix G26,4Corresponding value;According to b5Value and the row at place, such as b5=90, b5Position In the 5th row, then from matrix R, search h90,5Corresponding value, searches g from matrix G90,5Corresponding Value.S2h=h66,1+h48,2+h99,3+h26,4+h90,5, because matrix R obedience binomial distribution, therefore, S2hAlso binomial distribution is obeyed.S2g=g66,1+g48,2+g99,3+g26,4+g90,5, because matrix G obeys Binomial distribution, therefore, S2gAlso binomial distribution is obeyed.Work as S2hAnd S2gIn have 1 for even number, Then Wi2[pi2-169,pi2In], at least part of data meet predetermined condition C2, work as S2hAnd S2gIt is Odd number, then Wi2[pi2-169,pi2In], at least part of data are unsatisfactory for predetermined condition C2, S2hAnd S2g In to have 1 probability for even number be 3/4.In the embodiment shown in fig. 5, Wi2[pi2-169,pi2] In at least partly data meet predetermined condition C2.Use same rule, judge W respectivelyi3[pi3 -169,pi3In], whether at least part of data meet predetermined condition C3, judge Wi4[pi4-169,pi4] In at least partly data whether meet predetermined condition C4, judge Wi5[pi5-169,pi5At least portion in] Whether divided data meets predetermined condition C5, judge Wi6[pi6-169,pi6In], at least part of data are No meet predetermined condition C6, judge Wi7[pi7-169,pi7In], whether at least part of data meet pre- Fixed condition C7, judge Wi8[pi8-169,pi8In], whether at least part of data meet predetermined condition C8、 Judge Wi9[pi9-169,pi9In], whether at least part of data meet predetermined condition C9, judge Wi10 [pi10-169,pi10In], whether at least part of data meet predetermined condition C10With judge Wi11[pi11 -169,pi11In], whether at least part of data meet predetermined condition C11.Embodiment shown in Fig. 5 In, Wi5[pi5-169,pi5In], at least part of data are unsatisfactory for predetermined condition C5, from a pi5Along Data flow point cutpoint search direction 11 bytes of jump, the end position the 11st byte obtains Current potential cut-point kj, as shown in Figure 6, according to the rule preset on duplicate removal server 103, For potential cut-point kjDetermine a pj1, some pj1Corresponding window Wj1[pj1-169,pj1], it is judged that Window Wj1[pj1-169,pj1In], whether at least part of data meet predetermined condition C1Mode with Judge window Wi1[pi1-169,pi1In], whether at least part of data meet predetermined condition C1Mode Identical, the most as shown in figure 17, Wj1Represent window Wj1[pj1-169,pj1], for judging Wj1[pj1 -169,pj1In], whether at least part of data meet predetermined condition C1, serial number 169 in Figure 17, 127, the byte " ■ " of 85,43 and 1 represents 1 byte of selection, adjacent two selections respectively Byte between differ 42 bytes.By the byte " ■ " of serial number 169,127,85,43 and 1 It is converted into a decimal value respectively, is expressed as a1'、a2'、a3'、a4' and a5'。 Because 1 byte is formed by 8, so each byte " ■ " is as numerical value, then an a1'、a2'、 a3'、a4' and a5Any one a in 's' it is satisfied by 0≤as'≤255。a1'、a2'、a3'、a4' and a5' composition 1*5 matrix.Use and judge window Wi1[pi1-169,pi1In], at least part of data are No meet predetermined condition C1Identical matrix R and G, is expressed as: h 0 , 1 h 0 , 2 ... h 0 , 5 h 1 , 1 h 1 , 2 ... h 1 , 5 . . . . . . . . . . . . h 255 , 5 h 255 , 5 ... h 255 , 5 With g 0 , 1 g 0 , 2 ... g 0 , 5 g 1 , 1 g 1 , 2 ... g 1 , 5 . . . . . . . . . . . . g 255 , 5 g 255 , 5 ... g 255 , 5 .
According to a1' value and the row at place, such as a1'=16, a1' be positioned at the 1st row, then look into from matrix R Look for h16,1Corresponding value, searches g from matrix G16,1Corresponding value;According to a2' value and place Row, such as a2'=98, a2' be positioned at the 2nd row, then from matrix R, search h98,2Corresponding value, from square Battle array G searches g98,2Corresponding value;According to a3' value and the row at place, such as a3'=56, a3' position In the 3rd row, then from matrix R, search h56,3Corresponding value, searches g from matrix G56,3Corresponding Value;According to a4' value and the row at place, such as a4'=36, a4' it is positioned at the 4th row, then from matrix R Middle lookup h36,4Corresponding value, searches g from matrix G36,4Corresponding value;According to a5' value and The row at place, such as a5'=99, a5' be positioned at the 5th row, then from matrix R, search h99,5Corresponding Value, searches g from matrix G99,5Corresponding value.S1h'=h16,1+h98,2+h56,3+h36,4+h99,5, because of Binomial distribution, therefore, S is obeyed for matrix R1h' also obey binomial distribution;S1g'=g16,1+g98,2+ g56,3+g36,4+g99,5, because matrix G obeys binomial distribution, therefore S1g' also obey binomial distribution. Work as S1h' and S1g1 is had for even number, then W in 'j1[pj1-169,pj1In], at least part of data meet Predetermined condition C1, work as S1h' and S1g' be odd number, then Wj1[pj1-169,pj1At least partly count in] According to being unsatisfactory for predetermined condition C1, S1h' and S1g' to have 1 probability for even number be 3/4.
Judge Wi2[pi2-169,pi2In], whether at least part of data meet predetermined condition C2Side Formula and judge Wj2[pj2-169,pj2In], whether at least part of data meet predetermined condition C2Side Formula is identical, therefore, as shown in figure 17,Represent and judge window Wj2[pj2-169,pj2In] extremely Whether small part data meet predetermined condition C2Time select 1 byte, adjacent two select 42 bytes are differed between byte.In fig. 17, respectively by sequence number 170,128,86,44 Represent with 2, between adjacent two bytes selected, differ 42 bytes.By sequence number 170,128, 86, the byte of 44 and 2It is converted into a decimal value respectively, is expressed as b1'、 b2'、b3'、b4' and b5'.Because 1 byte is formed by 8, so each byteAs One numerical value, then b1'、b2'、b3'、b4' and b5Any one b in 's' it is satisfied by 0≤bs'≤255。 b1'、b2'、b3'、b4' and b5' composition 1*5 matrix.Use and judge window Wi2[pi2-169,pi2] In at least partly data whether meet predetermined condition C2Identical matrix R and G, according to b1' value With the row at place, such as b1'=210, b1' be positioned at the 1st row, then from matrix R, search h210,1Corresponding Value, from matrix G search g210,1Corresponding value;According to b2' value and the row at place, such as b2' =156, b2' be positioned at the 2nd row, then from matrix R, search h156,2Corresponding value, looks into from matrix G Look for g156,2Corresponding value;According to b3' value and the row at place, such as b3'=144, b3' it is positioned at the 3rd Row, then search h from matrix R144,3Corresponding value, searches g from matrix G144,3Corresponding value; According to b4' value and the row at place, such as b4'=60, b4' be positioned at the 4th row, then look into from matrix R Look for h60,4Corresponding value, searches g from matrix G60,4Corresponding value;According to b5' value and place Row, such as b5'=90, b5' be positioned at the 5th row, then from matrix R, search h90,5Corresponding value, from square Battle array G searches g90,5Corresponding value.S2h'=h210,1+h156,2+h144,3+h60,4+h90,5,S2g'=g210,1+ g156,2+g144,3+g60,4+g90,5.Work as S2h' and S2g1 is had for even number, then W in 'j2[pj2-169,pj2] In at least partly data meet predetermined condition C2, work as S2h' and S2g' be odd number, then Wj2[pj2 -169,pj2In], at least part of data are unsatisfactory for predetermined condition C2, S2h' and S2g1 is had for even in ' The probability of number is 3/4.
In like manner, it is judged that Wi3[pi3-169,pi3In], whether at least part of data meet predetermined condition C3 Mode with judge Wj3[pj3-169,pj3In], whether at least part of data meet predetermined condition C3 Mode identical, in like manner, it is judged that Wj4[pj4-169,pj4In], whether at least part of data meet pre- Fixed condition C4, judge Wj5[pj5-169,pj5In], whether at least part of data meet predetermined condition C5、 Judge Wj6[pj6-169,pj6In], whether at least part of data meet predetermined condition C6, judge Wj7 [pj7-169,pj7In], whether at least part of data meet predetermined condition C7, judge Wj8[pj8-169, pj8In], whether at least part of data meet predetermined condition C8, judge Wj9[pj9-169,pj9In] extremely Whether small part data meet predetermined condition C9, judge Wj10[pj10-169,pj10In] at least partly Whether data meet predetermined condition C10With judge Wj11[pj11-169,pj11At least part of data in] Whether meet predetermined condition C11, do not repeat them here.
As a example by the embodiment shown in Fig. 5, it is provided that one judges window Wiz[piz-Az,piz +BzIn], whether at least part of data meet predetermined condition CzMethod, in the present embodiment use Random function judges window Wiz[piz-Az,piz+BzIn], whether at least part of data meet predetermined Condition Cz, according to the rule preset on duplicate removal server 103, for potential cut-point kiDetermine Point pi1And pi1Corresponding window Wi1[pi1-169,pi1], it is judged that Wi1[pi1-169,pi1In] at least partly Whether data meet predetermined condition C1, as shown in figure 16, Wi1Represent window Wi1[pi1-169,pi1], For judging Wi1[pi1-169,pi1In], whether at least part of data meet predetermined condition C1, select 5 Byte, in Figure 16, the byte " ■ " of serial number 169,127,85,43 and 1 represents selection respectively 1 byte, adjacent two select bytes between differ 42 bytes.By serial number 169, 127, the byte " ■ " of 85,43 and 1 regards 40 positions successively as, is expressed as a1、a2、a3、 a4…a40。a1、a2、a3、a4…a40In arbitrary at, work as atWhen=0, Vat=-1, when atWhen=1, Vat=1, according to atWith VatCorresponding relation, generates Va1、Va2、Va3、Va4…Va40。 From the random number of Normal Distribution, select 40 randoms number, be expressed as: h1、h2、 h3、h4...h40。Sa=Va1*h1+Va2*h2+Va3*h3+Va4*h4+…+Va40*h40.Because h1、 h2、h3、h4...h40Normal Distribution, therefore, SaAlso Normal Distribution.Work as SaFor Positive number, then Wi1[pi1-169,pi1In], at least part of data meet predetermined condition C1, work as SaIt is negative Number or 0, then Wi1[pi1-169,pi1In], at least part of data are unsatisfactory for predetermined condition C1, SaFor just The probability of number is 1/2.In the embodiment shown in fig. 5, Wi1[pi1-169,pi1At least part of data in] Meet predetermined condition C1.As shown in figure 16,Represent and judge window Wi2[pi2-169,pi2In] At least partly whether data meet predetermined condition C2Time 1 byte selecting respectively, in figure 16, Represent by sequence number 170,128,86,44 and 2 respectively, phase between adjacent two bytes selected Differ from 42 bytes.Byte by sequence number 170,128,86,44 and 2Regard 40 successively as Individual position, is expressed as b1、b2、b3、b4…b40。b1、b2、b3、b4…b40In appoint One bt, work as btWhen=0, Vbt=-1, works as btWhen=1, Vbt=1, according to btWith VbtCorresponding relation, Generate Vb1、Vb2、Vb3、Vb4…Vb40.Judge window Wi1[pi1-169,pi1At least partly count in] According to whether meeting predetermined condition C1Mode with judge window Wi2[pi2-169,pi2At least portion in] Whether divided data meets predetermined condition C2Mode identical, therefore, use identical random number: h1、h2、h3、h4...h40, Sb=Vb1*h1+Vb2*h2+Vb3*h3+Vb4*h4+…+Vb40*h40。 Because h1、h2、h3、h4...h40Normal Distribution, therefore, SbAlso Normal Distribution. Work as SbFor positive number, then Wi2[pi2-169,pi2In], at least part of data meet predetermined condition C2, when SbFor negative or 0, then Wi2[pi2-169,pi2In], at least part of data are unsatisfactory for predetermined condition C2, SbProbability for positive number is 1/2.In the embodiment shown in fig. 5, Wi2[pi2-169,pi2In] at least Part data meet predetermined condition C2.Use same rule, judge W respectivelyi3[pi3-169,pi3] In at least partly data whether meet predetermined condition C3, judge Wi4[pi4-169,pi4At least portion in] Whether divided data meets predetermined condition C4, judge Wi5[pi5-169,pi5In], at least part of data are No meet predetermined condition C5, judge Wi6[pi6-169,pi6In], whether at least part of data meet pre- Fixed condition C6, judge Wi7[pi7-169,pi7In], whether at least part of data meet predetermined condition C7、 Judge Wi8[pi8-169,pi8In], whether at least part of data meet predetermined condition C8, judge Wi9 [pi9-169,pi9In], whether at least part of data meet predetermined condition C9, judge Wi10[pi10-169, pi10In], whether at least part of data meet predetermined condition C10With judge Wi11[pi11-169,pi11In] At least partly whether data meet predetermined condition C11.In embodiment shown in Fig. 5, Wi5[pi5 -169,pi5In], at least part of data are unsatisfactory for predetermined condition C5, from a pi5Split along data stream Point search direction 11 bytes of jump, the end position the 11st byte obtains the most potential point Cutpoint kj, as shown in Figure 6, according to the rule preset on duplicate removal server 103, for potential point Cutpoint kjDetermine a pj1, some pj1Corresponding window Wj1[pj1-169,pj1], it is judged that window Wj1[pj1 -169,pj1In], whether at least part of data meet predetermined condition C1Mode with judge window Wi1 [pi1-169,pi1In], whether at least part of data meet predetermined condition C1Mode identical, the most such as Shown in Figure 17, Wj1Represent window Wj1[pj1-169,pj1], for judging Wj1[pj1-169,pj1In] extremely Whether small part data meet predetermined condition C1, for judging Wj1[pj1-169,pj1In] at least partly Whether data meet predetermined condition C1, select 5 bytes, serial number 169 in Figure 17,127, 85, the byte " ■ " of 43 and 1 represents 1 byte of selection, adjacent two words selected respectively 42 bytes are differed between joint.The byte " ■ " of serial number 169,127,85,43 and 1 is depended on Secondary regard 40 positions as, be expressed as a1'、a2'、a3'、a4'…a40'。a1'、a2'、a3'、 a4'…a40Arbitrary a in 't', work as atDuring '=0, Vat'=-1, works as atDuring '=1, Vat'=1, according to at' and Vat' corresponding relation, generate Va1'、Va2'、Va3'、Va4'…Va40'.Judge window Wj1[pj1 -169,pj1In], whether at least part of data meet predetermined condition C1Mode with judge window Wi1 [pi1-169,pi1In], whether at least part of data meet predetermined condition C1Mode identical, therefore make Random number with identical: h1、h2、h3、h4...h40。Sa'=Va1'*h1+Va2'*h2+Va3'*h3 +Va4'*h4+…+Va40'*h40.Because h1、h2、h3、h4...h40Normal Distribution, because of This, Sa' also Normal Distribution.Work as Sa' for positive number, then Wj1[pj1-169,pj1At least portion in] Divided data meets predetermined condition C1, work as Sa' for negative or 0, then Wj1[pj1-169,pj1In] at least Part data are unsatisfactory for predetermined condition C1, Sa' it is 1/2 for the probability of positive number.
Judge Wi2[pi2-169,pi2In], whether at least part of data meet predetermined condition C2Side Formula and judge Wj2[pj2-169,pj2In], whether at least part of data meet predetermined condition C2Side Formula is identical, therefore, as shown in figure 17,Represent and judge window Wj2[pj2-169,pj2In] extremely Whether small part data meet predetermined condition C2Time select 1 byte, adjacent two select 42 bytes are differed between byte.In fig. 17, respectively by sequence number 170,128,86,44 Represent with 2, between adjacent two bytes selected, differ 42 bytes.By sequence number 170,128, 86, the byte of 44 and 2Regard 40 positions successively as, be expressed as b1'、b2'、b3'、b4'… b40'。b1'、b2'、b3'、b4'…b40Arbitrary b in 't', work as btDuring '=0, Vbt'=-1, works as bt'=1 Time, Vbt'=1, according to bt' and Vbt' corresponding relation, generate Vb1'、Vb2'、Vb3'、Vb4'…Vb40'。 Judge Wi2[pi2-169,pi2In], whether at least part of data meet predetermined condition C2Mode and Judge Wj2[pj2-169,pj2In], whether at least part of data meet predetermined condition C2Mode phase With, therefore, use identical random number: h1、h2、h3、h4...h40, Sb'=Vb1'*h1+Vb2' *h2+Vb3'*h3+Vb4'*h4+…+Vb40'*h40.Because h1、h2、h3、h4...h40Just obey State is distributed, therefore, and Sb' also Normal Distribution.Work as Sb' for positive number, then Wj2[pj2-169,pj2] In at least partly data meet predetermined condition C2, work as Sb' for negative or 0, then Wj2[pj2-169,pj2] In at least partly data be unsatisfactory for predetermined condition C2, Sb' it is 1/2 for the probability of positive number.
In like manner, it is judged that Wi3[pi3-169,pi3In], whether at least part of data meet predetermined condition C3 Mode with judge Wj3[pj3-169,pj3In], whether at least part of data meet predetermined condition C3 Mode identical, in like manner, it is judged that Wj4[pj4-169,pj4In], whether at least part of data meet pre- Fixed condition C4, judge Wj5[pj5-169,pj5In], whether at least part of data meet predetermined condition C5、 Judge Wj6[pj6-169,pj6In], whether at least part of data meet predetermined condition C6, judge Wj7 [pj7-169,pj7In], whether at least part of data meet predetermined condition C7, judge Wj8[pj8-169, pj8In], whether at least part of data meet predetermined condition C8, judge Wj9[pj9-169,pj9In] extremely Whether small part data meet predetermined condition C9, judge Wj10[pj10-169,pj10In] at least partly Whether data meet predetermined condition C10With judge Wj11[pj11-169,pj11At least part of data in] Whether meet predetermined condition C11, do not repeat them here.
Still as a example by Fig. 5 illustrated embodiment, it is provided that one judges window Wiz[piz-Az, piz+BzIn], whether at least part of data meet predetermined condition CzMethod, the present embodiment makes Window W is judged with random functioniz[piz-Az,piz+BzIn], whether at least part of data meet pre- Fixed condition Cz, according to the rule preset on duplicate removal server 103, for potential cut-point kiReally Fixed point pi1And pi1Corresponding window Wi1[pi1-169,pi1], it is judged that Wi1[pi1-169,pi1At least portion in] Whether divided data meets predetermined condition C1, as shown in figure 16, Wi1Represent window Wi1[pi1-169,pi1], For judging Wi1[pi1-169,pi1In], whether at least part of data meet predetermined condition C1, select 5 Byte, in Figure 16, the byte " ■ " of serial number 169,127,85,43 and 1 represents selection respectively 1 byte, adjacent two select bytes between differ 42 bytes.By serial number 169, 127, the byte " ■ " of 85,43 and 1 is converted into 1 decimal number, and scope is 0-(2^40-1), Using uniform random number maker is that each decimal number in 0-(2^40-1) generates 1 Individual designated value, right between each decimal number and the designated value in record 0-(2^40-1) Should be related to R, once specify, the designated value that this decimal number is corresponding is the most constant, and this designated value takes From being uniformly distributed, if this designated value is even number, then Wi1[pi1-169,pi1At least partly count in] According to meeting predetermined condition C1If this designated value is odd number, then Wi1[pi1-169,pi1In] at least Part data are unsatisfactory for predetermined condition C1, C1Represent that the designated value obtained according to the method described above is for even Number.Because the probability that equally distributed random number is even number is 1/2, therefore, [pi1-169,pi1] In at least partly data meet predetermined condition C1Probability be 1/2.At the embodiment shown in Fig. 5 In, use same rule, judge W respectivelyi2[pi2-169,pi2In], whether at least part of data Meet predetermined condition C2, it is judged that Wi3[pi3-169,pi3In], whether at least part of data meet predetermined Condition C3, judge Wi4[pi4-169,pi4In], whether at least part of data meet predetermined condition C4、 Judge Wi5[pi5-169,pi5In], whether at least part of data meet predetermined condition C5, at this no longer Repeat.
Work as Wi5[pi5-169,pi5In], at least part of data are unsatisfactory for predetermined condition C5, from a pi5Edge Data flow point cutpoint search direction 11 bytes of jump, the end position the 11st byte obtains Obtain current potential cut-point kj, as shown in Figure 6, according to the rule preset on duplicate removal server 103 Then, for potential cut-point kjDetermine a pj1, some pj1Corresponding window Wj1[pj1-169,pj1], Judge window Wj1[pj1-169,pj1In], whether at least part of data meet predetermined condition C1Side Formula with judge window Wi1[pi1-169,pi1In], whether at least part of data meet predetermined condition C1's Mode is identical, therefore, uses each decimal number in identical 0-(2^40-1) and finger Corresponding relation R between definite value, as shown in figure 17, Wj1Represent window Wj1[pj1-169,pj1], For judging Wj1[pj1-169,pj1In], whether at least part of data meet predetermined condition C1, select 5 Individual byte, in Figure 17, " ■ " represents 1 byte selected, adjacent two bytes " ■ " selected Between differ 42 bytes.The byte " ■ " of serial number 169,127,85,43 and 1 is changed Become 1 decimal number, search, at R, the designated value that this decimal number is corresponding, if this designated value is Even number, then Wj1[pj1-169,pj1In], at least part of data meet predetermined condition C1If this refers to Definite value is odd number, then Wj1[pj1-169,pj1In], at least part of data are unsatisfactory for predetermined condition C1, Because the probability that equally distributed random number is even number is 1/2, therefore, Wj1[pj1-169,pj1] In at least partly data meet predetermined condition C1Probability be 1/2.In like manner, it is judged that Wi2[pi2-169, pi2In], whether at least part of data meet predetermined condition C2Mode and judge Wj2[pj2-169, pj2In], whether at least part of data meet predetermined condition C2Mode identical, it is judged that Wi3[pi3 -169,pi3In], whether at least part of data meet predetermined condition C3Mode with judge Wj3[pj3 -169,pj3In], whether at least part of data meet predetermined condition C3Mode identical, in like manner, sentence Disconnected Wj4[pj4-169,pj4In], whether at least part of data meet predetermined condition C4, judge Wj5[pj5 -169,pj5In], whether at least part of data meet predetermined condition C5, judge Wj6[pj6-169,pj6] In at least partly data whether meet predetermined condition C6, judge Wj7[pj7-169,pj7In] at least Whether part data meet predetermined condition C7, judge Wj8[pj8-169,pj8At least partly count in] According to whether meeting predetermined condition C8, judge Wj9[pj9-169,pj9In], whether at least part of data Meet predetermined condition C9, judge Wj10[pj10-169,pj10In], whether at least part of data meet pre- Fixed condition C10With judge Wj11[pj11-169,pj11In], whether at least part of data meet predetermined bar Part C11, do not repeat them here.
Duplicate removal server 103 in the embodiment of the present invention shown in Fig. 1, refers to realize this The device of the technical scheme described by bright embodiment, as shown in figure 18, generally includes central authorities' process Unit, main storage and input/output interface.CPU, main storage and input The intercommunication of output interface, main memory store executable instruction, CPU is held The executable instruction of storage in row main storage, thus perform specific function, as the present invention is real Execute the lookup data flow point cutpoint described by illustration 4 to Figure 17.Therefore, as shown in figure 19, root According to the embodiment of the present invention shown in Fig. 4 to Figure 17, duplicate removal server 103, at duplicate removal server 103 On be preset with rule, described rule is: for potential cut-point k determine M some px, some pxRight The window W answeredx[px-Ax,px+Bx] and window Wx[px-Ax,px+Bx] corresponding predetermined condition Cx, wherein, x is 1 to M continuous print natural number, M >=2, AxAnd BxFor integer;Duplicate removal takes Business device 103 includes determining unit 1901 and judging processing unit 1902.Wherein it is determined that unit 1901, For being used for performing step a): be a) current potential cut-point k according to described ruleiDetermine a piz And described some pizCorresponding window Wiz[piz-Az,piz+Bz], i and z is integer, and 1≤z ≤M;Judge processing unit 1902, for described window Wiz[piz-Az,piz+BzAt least portion in] Whether divided data meets predetermined condition Cz
As described window Wiz[piz-Az,piz+BzIn], at least part of data are unsatisfactory for described predetermined Condition Cz, from described some pizAlong the described data flow point cutpoint search direction N number of data flow point of jump Cutpoint minimum searches unit U, and N*U is not more than ‖ Bz‖+maxx(‖Ax‖+‖(ki-pix) ‖), Obtain new potential cut-point, the most described determine that unit is that described new potential cut-point performs step A);As described current potential cut-point kiM window in each window Wix[pix- Ax,pix+BxIn], at least part of data meet predetermined condition Cx, the most described current potential segmentation Point kiFor data flow point cutpoint.
Further, described rule also includes: at least two point peAnd pf, meet condition Ae=Af, Be=Bf, Ce=Cf.Further, described rule also includes: described at least two point peWith pf, relative to described potential cut-point k, search in the reverse direction at described data flow point cutpoint.
Further, described rule also includes: described at least two point peAnd pfBetween distance It is 1 U.
Further, described judgement processing unit 1902 is specifically for using random function to judge institute State window Wiz[piz-Az,piz+BzIn], whether at least part of data meet described predetermined condition Cz.Specifically, described judgement processing unit 1902 is described specifically for using hash function to judge Window Wiz[piz-Az,piz+BzIn], whether at least part of data meet described predetermined condition Cz。 Specifically, described judgement processing unit 1902 is specifically for using random function to judge described window Wiz[piz-Az,piz+BzIn], whether at least part of data meet described predetermined condition Cz, specifically Including:
At described window Wiz[piz-Az,piz+BzF byte is selected, by described F byte in] Recycling H time, obtain F*H byte altogether, the most each byte is formed by 8, is designated as am,1… am,8, represent the 1st to the 8th of m-th byte in described F*H byte, described F*H word The position that joint is corresponding can be expressed as: a 1 , 1 a 1 , 2 ... a 1 , 8 a 2 , 1 a 2 , 2 ... a 2 , 8 . . . . . . . . . . . . a F * H , 1 a F * H , 2 ... a F * H , 8 , Work as am,nWhen=1, Vam,n =1, work as am,nWhen=0, Vam,n=-1, wherein am,nRepresent am,1…am,8In any one, described Position corresponding to F*H byte is according to am,nWith Vam,nTransformational relation obtain matrix Va, described matrix VaIt is expressed as: V a 1 , 1 V a 1 , 2 ... V a 1 , 8 V a 2 , 1 V a 2 , 2 ... V a 2 , 8 . . . . . . . . . . . . V a F * H , 1 V a F * H , 2 ... V a F * H , 8 , Select from the random number of service normal distribution Select F*H*8 random number composition matrix R, described matrix R to be expressed as: h 1 , 1 h 1 , 2 ... h 1 , 8 h 2 , 1 h 2 , 2 ... h 2 , 8 . . . . . . . . . . . . h F * H , 1 h F * H , 2 ... h F * H , 8 , By described matrix VaThe m row of m row and described matrix R Random number be multiplied, then summation obtain a value, be embodied as Sam=Vam,1*hm,1+Vam,2 *hm,2+…+Vam,8*hm,8, in like manner, it is thus achieved that Sa1、Sa2... to SaF*H, add up Sa1、Sa2… To SaF*HIn meet number K of value more than 0, when K is even number, the most described window Wiz[piz-Az, piz+BzIn], at least part of data meet described predetermined condition Cz
Further, described judgement processing unit 1902 is for as described window Wiz[piz-Az,piz+ BzIn], at least part of data are unsatisfactory for described predetermined condition Cz, from described some pizAlong described data Flow point cutpoint search direction jump N number of data flow point cutpoint minimum search unit U, it is thus achieved that described newly Potential cut-point, described determine that unit 1901 is that described new potential cut-point performs step a), According to described rule, the some p determined for described new potential cut-pointicCorresponding window Wic [pic-Ac,pic+Bc] left margin and described window Wiz[piz-Az,piz+Bz] right margin overlap Or the described window W determined for described new potential cut-pointic[pic-Ac,pic+Bc] a left side Border is positioned at described window Wiz[piz-Az,piz+BzWithin the scope of];Wherein, for described new diving At the described window W that cut-point determinesic[pic-Ac,pic+Bc] it is according to described rule, for described M the point that new potential cut-point determines is arranged according in the sequence of data stream search direction acquisition The point of sequence first.
According to shown in Fig. 4 to Figure 17 the embodiment of the present invention provide based on whois lookup data In the method for flow point cutpoint, for potential cut-point kiDetermine a pixAnd some pixWindow Wix[pix- Ax, pix+Bx], wherein, x is respectively 1 and arrives M continuous print natural number, M >=2, can sentence parallel In disconnected M window, in each window, at least partly whether data meet predetermined condition Cx, or Judge in window, whether at least part of data meet predetermined condition successively, it is also possible to judge window Wi1[pi1-A1, pi1+B1In], at least part of data meet predetermined condition C1Time, then judge Wi2 [pi2-A2, pi2+B2In], at least part of data meet predetermined condition C2Time, until judging Wim [pim-Am, pim+BmIn], at least part of data meet predetermined condition Cm.Other windows in embodiment The judgement of mouth is identical with this, repeats no more.
It addition, according to the embodiment of the present invention shown according to Fig. 4 to Figure 17, in actual application, Being preset with rule on duplicate removal server 103, described rule is: determine M for potential cut-point k Individual some px, some pxCorresponding window Wx[px-Ax,px+Bx] and window Wx[px-Ax,px+Bx] Corresponding predetermined condition Cx, x is respectively 1 and arrives M continuous print natural number, M >=2, presets rule at this In then, A1、A2、A3…AmCan not be the most equal, B1、B2、B3…BmCan not be complete Portion is equal, C1、C2、C3…CMCan not also be the most identical.At the embodiment shown in Fig. 5 In, at window Wi1[pi1-169,pi1]、Wi2[pi2-169,pi2]、Wi3[pi3-169,pi3]、Wi4 [pi4-169,pi4]、Wi5[pi5-169,pi5]、Wi6[pi6-169,pi6]、Wi7[pi7-169,pi7]、 Wi8[pi8-169,pi8]、Wi9[pi9-169,pi9]、Wi10[pi10-169,pi10] and Wi11[pi11-169, pi11In], each window size is identical, i.e. window size is 169 bytes, judges in window simultaneously The mode whether at least part of data meet predetermined condition is the most identical, is specifically shown in above-mentioned judgement Wi1 [pi1-169,pi1In], whether at least part of data meet predetermined condition C1Description, but at Figure 11 In shown embodiment, Wi1[pi1-169,pi1]、Wi2[pi2-169,pi2]、Wi3[pi3-169, pi3]、Wi4[pi4-169,pi4]、Wi5[pi5-169,pi5]、Wi6[pi6-169,pi6]、Wi7[pi7-169, pi7]、Wi8[pi8-169,pi8]、Wi9[pi9-169,pi9]、Wi10[pi10-169,pi10] and Wi11[pi11 -182,pi11] window size can differ, and judges that in window, at least part of data are the fullest simultaneously The mode of foot predetermined condition can also differ.In all embodiments, according in duplicate removal service The rule preset on device 103, it is judged that window Wi1In at least partly data whether meet predetermined condition C1Mode with judge window Wj1In at least partly data whether meet predetermined condition C1Mode Inevitable identical, it is judged that Wi2In at least partly data whether meet predetermined condition C2Mode with sentence Disconnected Wj2In at least partly data whether meet predetermined condition C2Mode inevitable the most identical ... judge window Mouth WiMIn at least partly data whether meet predetermined condition CMMode with judge window WjMIn At least partly whether data meet predetermined condition CMMode inevitable the most identical.Do not repeat them here, Simultaneously according to the embodiment of the present invention shown in Fig. 4 to Figure 17, although all as a example by M=11, but root According to being actually needed, the value of M is not limited to 11, and those skilled in the art implement according to the present invention Description in example, determines the value of M.
According to the embodiment of the present invention shown in Fig. 4 to Figure 17, duplicate removal server 103 is preset with Rule, ka、ki、kj、klAnd kmFor searching cut-point along data flow point cutpoint search direction Time obtain potential cut-point, ka、ki、kj、klAnd kmAll according to this rule.The present invention is real Execute the window W in examplex[px-Ax,px+Bx] represent a particular range, select at this particular range Select data to judge whether these data meet predetermined condition Cx, specifically, can be specific at this In the range of select part data, it is also possible to select total data to judge whether these data meet Predetermined condition Cx.Window concept specifically used in the embodiment of the present invention can refer to window Wx[px -Ax,px+Bx], do not repeat them here.
According to the embodiment of the present invention shown in Fig. 4 to Figure 17, window Wx[px-Ax,px+BxIn], (px-Ax) and (px+Bx) represent this window Wx[px-Ax,px+Bx] two borders, Wherein (px-Ax) represent window Wx[px-Ax,px+Bx] relative to a pxIt is positioned at data flow point Cutpoint searches reciprocal border, (px+Bx) represent window Wx[px-Ax,px+Bx] relatively In a pxIt is positioned at the border of data flow point cutpoint search direction.Specifically, in the embodiment of the present invention In, it is from left to right in the data flow point cutpoint search direction shown in Fig. 3 to Figure 15, wherein (px-Ax) represent window Wx[px-Ax,px+Bx] relative to a pxIt is positioned at data flow point cutpoint Search reciprocal border (i.e. left margin), (px+Bx) represent window Wx[px-Ax,px +Bx] relative to a pxIt is positioned at the border (i.e. right margin) of data flow point cutpoint search direction. If being from right to left in the data flow point cutpoint search direction shown in Fig. 3 to Figure 15, wherein (px-Ax) represent window Wx[px-Ax,px+Bx] relative to a pxIt is positioned at data flow point cutpoint Search reciprocal border (i.e. right margin), (px+Bx) represent window Wx[px-Ax,px +Bx] relative to a pxIt is positioned at the border (i.e. left margin) of data flow point cutpoint search direction.
Those of ordinary skill in the art are it is to be appreciated that combine respectively showing of embodiment of the present invention description The unit of example and algorithm steps, the key feature of the embodiment of the present invention can be tied mutually with other technologies Close, present with increasingly complex form, but still the key feature of the present invention can be comprised.Truly May use standby cut-point in environment, such as one embodiment is, according in duplicate removal service The rule preset on device 103, for potential cut-point kiDetermine 11 some px, x is 1 to 11 continuous Natural number, determine pxCorresponding window Wx[px-Ax,px+Bx] and window Wx[px-Ax,px+ Bx] corresponding predetermined condition Cx, as each window W in 11 windowsx[px-Ax,px+Bx] In at least partly data be satisfied by predetermined condition Cx, the most potential cut-point kiFor data flow point cutpoint, When exceeding the maximum data block of setting, do not find cut-point yet, at this moment may use standby Preset rules, standby preset rules with on duplicate removal server 103 preset rule similar, Standby preset rules is: the most potential cut-point kiDetermine 10 some px, x is 1 to 10 Continuous print natural number, determines pxCorresponding window Wx[px-Ax,px+Bx] and window Wx[px-Ax, px+Bx] corresponding predetermined condition Cx, as each window W in 10 windowsx[px-Ax,px+ BxIn], at least part of data are satisfied by predetermined condition Cx, the most potential cut-point kiFor data flow point Cutpoint, when exceeding the maximum data block of setting, when not finding data flow point cutpoint yet, from The end position of maximum data block is as force-splitting point.
Duplicate removal server 103 is preset with rule, described rule is potential cut-point k Determine M point, be not necessarily to first there is a potential cut-point k, can be by really M fixed point judges potential cut-point k.
The embodiment of the present invention provides a kind of side based on duplicate removal whois lookup data flow point cutpoint Method, as shown in figure 20, including:
Being preset with rule on duplicate removal server 103, described rule is: true for potential cut-point k Determine M window Wx[k-Ax,k+Bx] and window Wx[k-Ax,k+Bx] corresponding predetermined condition Cx, Wherein, x is 1 to M continuous print natural number, M >=2, AxAnd BxFor integer;Shown in Fig. 3 Embodiment in, about the value of M, one of which implementation, M*U value is not more than Ultimate range between two the adjacent data flow point cutpoints preset, the data block i.e. preset is Long length.Judge window Wz[k-Az, k+BzIn], whether at least part of data meet predetermined bar Part Cz, wherein, z is integer, 1≤z≤M, (k-Az) and (k+Bz) represent window respectively WzTwo borders.When judging any one window Wz[k-Az, k+BzAt least partly count in] According to being unsatisfactory for predetermined condition Cz, then jump along data flow point cutpoint search direction from potential cut-point k Jump N number of byte, N≤‖ Bz‖+maxx(‖Ax‖).Wherein, ‖ Bz‖ represents Wz[k-Az, k+BzB in]zAbsolute value, maxx(‖Ax‖) represent A in M windowxIn absolute value Big value, will specifically introduce the principle of N value in embodiment below.When judging in M window Each window Wx[k-Ax,k+BxIn], at least part of data meet predetermined condition Cx, then dive It is data flow point cutpoints at cut-point k.
Specifically, to current potential cut-point ki, according to described rule, perform following steps:
Step 2001: be current potential cut-point k according to described ruleiDetermine the window W of correspondenceiz [ki-Az,ki+Bz], i and z is integer, and 1≤z≤M;
Step 2002: judge described window Wiz[ki-Az,ki+BzIn], at least part of data are the fullest Foot predetermined condition Cz
As described window Wiz[ki-Az,ki+BzIn], at least part of data are unsatisfactory for described predetermined bar Part Cz, from described current potential cut-point kiAlong described data flow point cutpoint search direction jump N Individual data flow point cutpoint minimum searches unit U, and N*U is not more than ‖ Bz‖+maxx(‖Ax‖), Obtain new potential cut-point, perform step 2001;
As described current potential cut-point kiM window in each window Wix[ki-Ax,ki +BxIn], at least part of data meet predetermined condition Cx, the most described current potential cut-point kiFor number According to flow point cutpoint.
Further, described rule also includes: at least two window Wie[ki-Ae,ki+Be] and Wif [ki-Af,ki+Bf], meet condition: | Ae+Be|=| Af+Bf|, Ce=Cf;Further, institute State rule also to include: AeAnd AfFor positive integer;Further, described rule also includes: Ae-1= Af, Be+ 1=Bf.Wherein, | Ae+Be| represent window WieSize, | Af+Bf| represent window WifSize.
Further, described window W is judgediz[ki-Az,ki+BzIn], whether at least part of data Meet described predetermined condition Cz, specifically include: use random function to judge described window Wiz[ki- Az,ki+BzIn], whether at least part of data meet described predetermined condition Cz;Further, institute State use random function and judge described window Wiz[ki-Az,ki+BzIn], whether at least part of data Meet described predetermined condition Cz, it is specially and uses hash function to judge described window Wiz[ki-Az,ki +BzIn], whether at least part of data meet described predetermined condition Cz
As described window Wiz[ki-Az,ki+BzIn], at least part of data are unsatisfactory for described predetermined Condition Cz, from described current potential cut-point kiJump along described data flow point cutpoint search direction N number of data flow point cutpoint minimum searches unit U, it is thus achieved that described new potential cut-point, according to institute State rule, the window W determined for described new potential cut-pointic[ki-Ac,ki+Bc] the left side Boundary and described window Wiz[ki-Az,ki+Bz] right margin overlap or be described newly potential The described window W that cut-point determinesic[ki-Ac,ki+Bc] left margin be positioned at described window Wiz [ki-Az,ki+BzWithin the scope of];Wherein, described in determining for described new potential cut-point Window Wic[ki-Ac,ki+Bc] it is according to described rule, determine for described new potential cut-point The sequence that obtains according to data stream search direction of M window in sort first window.
The embodiment of the present invention at least partly counts in some window in M window by judging According to whether meeting predetermined condition, search data flow point cutpoint, when at least portion in some window Divided data is unsatisfactory for predetermined condition, then skip N*U length, and wherein, N*U is not more than ‖ Bz‖+maxx(‖Ax‖), it is thus achieved that next potential cut-point, improve data flow point cutpoint Search efficiency.
During data de-duplication, for ensureing that data block size is uniform, average can be considered According to block (being referred to as average piecemeal) size, i.e. meeting minimum data block size and maximum While data block size limits, can determine whether average data block size, to ensure the data obtained Block size is uniform.Window Wx[k-Ax, k+Bx] number M and window Wx[k-Ax, k+Bx] In at least partly data meet pre-conditioned probability the two factor and determine and find data stream The probability (representing with P (n)) of cut-point, the former affects the length of jump, and the latter affects jump Probability, the two joint effect average mark block size.It is said that in general, it is solid at average mark block size Regularly, Wx[k-Ax, k+Bx] number increase, then single window Wx[k-Ax, k+BxIn] extremely Small part data meet the probability of predetermined condition also to be increased, such as pre-on duplicate removal server 103 Being provided with rule, described rule is: determine 11 window W for potential cut-point kx[k-Ax, k+Bx], X is respectively 1 to 11 continuous print natural numbers, any one window W in 11 windowsx[k-Ax, k+Bx] In at least partly data meet pre-conditioned probability is 1/2.And it is pre-on duplicate removal server 103 If another group rule be: determine 24 window W for potential cut-point kx[k-Ax, k+Bx], X is respectively 1 to 24 continuous print natural numbers, any one window W in 24 windowsx[k-Ax, k+Bx] In at least partly data meet pre-conditioned probability 3/4, concrete window Wx[k-Ax, k+Bx] In at least partly data meet pre-conditioned probability and set can be found in and judge window Wx[k-Ax, k+BxIn], whether at least part of data meet the description of pre-conditioned part.Window Wx[k-Ax, k+Bx] number M and window Wx[k-Ax, k+BxThe default bar that in], at least part of data meet The probability the two factor of part determines P (n), and P (n) represents: from data stream original position or Search after n data flow point cutpoint minimum searches unit from a upper data flow point cutpoint and do not find number Probability according to flow point cutpoint.The calculating process of P (n) is determined, actually about the two factor Step-length Fibonacci ordered series of numbers more than, after will be described in detail.After obtaining P (n), 1-P (n) Being the distribution function of data flow point cutpoint, (1-P (n))-(1-P (n-1))=P (n-1)-P (n) is N data flow point cutpoint minimum is searched unit and is found data flow point cutpoint probability, namely data The density function of flow point cutpoint, the density function according to data flow point cutpoint just can be with integrationThus try to achieve the desired length of data flow point cutpoint, i.e. average mark Block size, wherein, 4*1024 (byte) represents minimum data block length, 12*1024 (byte) Represent maximum data block length.
On the basis of the data flow point cutpoint shown in Fig. 3 is searched, the embodiment party shown in Figure 21 In formula, being preset with rule on duplicate removal server 103, described rule is: for potential cut-point k Determine 11 window Wx[k-Ax,k+Bx] and window Wx[k-Ax,k+Bx] corresponding predetermined condition Cx, wherein, x is 1 to 11 continuous print natural numbers, AxAnd BxFor integer.Wherein, A1=169, B1 =0;A2=170, B2=-1;A3=171, B3=-2;A4=172, B4=-3;A5=173, B5=-4;A6=174, B6=-5;A7=175, B7=-6;A8=176, B8=-7;A9=177, B9=-8;A10=178, B10=-9;A11=179, B11=-10, and C1=C2=C3=C4=C5=C6=C7=C8=C9=C10=C11, then 11 windows It is respectively W1[k-169,k]、W2[k-170,k-1]、W3[k-171,k-2]、W4[k-172,k-3]、 W5[k-173,k-4]、W6[k-174,k-5]、W7[k-175,k-6]、W8[k-176,k-7]、W9[k-177, k-8]、W10[k-178, k-9] and W11[k-179,k-10]。kaFor data flow point cutpoint, Tu21Zhong Shown data flow point cutpoint search direction is from left to right, from data flow point cutpoint kaSkip minimum After data block 4KB, minimum data block 4KB end position is as next potential cut-point ki, According to the rule preset for duplicate removal server 103, for potential cut-point kiDetermine window Wix[ki- Ax,ki+Bx], in the present embodiment, x is respectively 1 to 11 continuous print natural numbers.Shown in Figure 21 Embodiment in, for potential cut-point kiThe window determined is 11, respectively Wi1[ki-169, ki]、Wi2[ki-170,ki-1]、Wi3[ki-171,ki-2]、Wi4[ki-172,ki-3]、Wi5[ki-173, ki-4]、Wi6[ki-174,ki-5]、Wi7[ki-175,ki-6]、Wi8[ki-176,ki-7]、Wi9[ki-177, ki-8]、Wi10[ki-178,ki-9] and Wi11[ki-179,ki-10].Judge Wi1[ki-169,kiIn] extremely Whether small part data meet predetermined condition C1, judge Wi2[ki-170,ki-1] at least partly count in According to whether meeting predetermined condition C2, judge Wi3[ki-171,ki-2] in, at least part of data are the fullest Foot predetermined condition C3, judge Wi4[ki-172,ki-3] in, whether at least part of data meet predetermined bar Part C4, judge Wi5[ki-173,ki-4] in, whether at least part of data meet predetermined condition C5, sentence Disconnected Wi6[ki-174,ki-5] in, whether at least part of data meet predetermined condition C6, judge Wi7[ki -175,ki-6] in, whether at least part of data meet predetermined condition C7, judge Wi8[ki-176,ki-7] In at least partly data whether meet predetermined condition C8, judge Wi9[ki-177,ki-8] at least portion in Whether divided data meets predetermined condition C9, judge Wi10[ki-178,ki-9] in, at least part of data are No meet predetermined condition C10With judge Wi11[ki-179,ki-10] in, at least part of data are the fullest Foot predetermined condition C11.When judging window Wi1In at least partly data meet predetermined condition C1, window Mouth Wi2In at least partly data meet predetermined condition C2, window Wi3In at least partly data meet Predetermined condition C3, window Wi4In at least partly data meet predetermined condition C4, window Wi5In extremely Small part data meet predetermined condition C5, window Wi6In at least partly data meet predetermined condition C6, window Wi7In at least partly data meet predetermined condition C7, window Wi8In at least partly count According to meeting predetermined condition C8, window Wi9In at least partly data meet predetermined condition C9, window Wi10In at least partly data meet predetermined condition C10With window Wi11In at least partly data meet Predetermined condition C11Time, the most current potential cut-point kiFor data flow point cutpoint.When in 11 windows When in any one window, at least part of data are unsatisfactory for the predetermined condition of correspondence, as shown in figure 22, Wi5[ki-173,ki-4], then from potential cut-point kiJump along data flow point cutpoint search direction N number of byte, the most N number of byte is not more than ‖ B5‖+maxx(‖Ax‖), shown in Figure 22 In embodiment, N number of byte of jumping is not more than 183 bytes, in the present embodiment, N=7, Obtain new potential cut-point, for potential cut-point kiDifference, here by new potential segmentation Point is expressed as kj.According in the embodiment shown in Figure 21, duplicate removal server 103 is preset Regular, described rule is: for potential cut-point kjDetermine window Wjx[kj-Ax,kj+Bx], In the present embodiment, x is respectively 1 to 11 continuous print natural numbers.For potential cut-point kjDetermine Window be 11, respectively Wj1[kj-169,kj]、Wj2[kj-170,kj-1]、Wj3[kj-171, kj-2]、Wj4[kj-172,kj-3]、Wj5[kj-173,kj-4]、Wj6[kj-174,kj-5]、Wj7[kj -175,kj-6]、Wj8[kj-176,kj-7]、Wj9[kj-177,kj-8]、Wj10[kj-178,kj-9] and Wj11 [kj-179,kj-10].As shown in figure 22, the 11st the window W determined for potential cut-pointj11[kj -179,kj-10], potential cut-point k is being ensurediWith potential cut-point kjBetween scope all sentencing Within the scope of Duan, the most in the present embodiment, it is necessary to assure window Wj11[kj-179,kj-10] Left margin and window Wi5[ki-173,ki-4] right margin (ki-4) overlap, or be positioned at window Wi5 [ki-173,ki-4] within the scope of, described window Wj11[kj-179,kj-10] it is according to described rule, For described potential cut-point kjThe sequence that M the window determined obtains according to data stream search direction The window of sequence first in row.Therefore, in this restriction, as window Wi5[ki-173,ki-4] In at least partly data be unsatisfactory for predetermined condition C5, from potential cut-point kiAlong data flow point cutpoint The distance that search direction is jumped is not more than ‖ B5‖+maxx(‖Ax‖).Judge Wj1[kj-169, kjIn], whether at least part of data meet predetermined condition C1, judge Wj2[kj-170,kj-1] in extremely Whether small part data meet predetermined condition C2, judge Wj3[kj-171,kj-2] at least partly count in According to whether meeting predetermined condition C3, judge Wj4[kj-172,kj-3] in, at least part of data are the fullest Foot predetermined condition C4, judge Wj5[kj-173,kj-4] in, whether at least part of data meet predetermined bar Part C5, judge Wj6[kj-174,kj-5] in, whether at least part of data meet predetermined condition C6, sentence Disconnected Wj7[kj-175,kj-6] in, whether at least part of data meet predetermined condition C7, judge Wj8[kj -176,kj-7] in, whether at least part of data meet predetermined condition C8, judge Wj9[kj-177,kj-8] In at least partly data whether meet predetermined condition C9, judge Wj10[kj-178,kj-9] at least Whether part data meet predetermined condition C10With judge Wj11[kj-179,kj-10] at least partly Whether data meet predetermined condition C11.When judging window Wj1In at least partly data meet predetermined Condition C1, window Wj2In at least partly data meet predetermined condition C2, window Wj3In at least portion Divided data meets predetermined condition C3, window Wj4In at least partly data meet predetermined condition C4、 Window Wj5In at least partly data meet predetermined condition C5, window Wj6In at least partly data full Foot predetermined condition C6, window Wj7In at least partly data meet predetermined condition C7, window Wj8In At least partly data meet predetermined condition C8, window Wj9In at least partly data meet predetermined bar Part C9, window Wj10In at least partly data meet predetermined condition C10With window Wj11In at least portion Divided data meets predetermined condition C11Time, the most current potential cut-point kiFor data flow point cutpoint, kj With kaBetween data constitute 1 data block, simultaneously according to kaIdentical mode skips minimum Piecemeal size 4KB, it is thus achieved that next potential cut-point, and according on duplicate removal server 103 The rule preset, it is judged that whether next potential cut-point is data flow point cutpoints.Latent when judging At cut-point kjWhen not being data flow point cutpoint, according to kiIdentical mode obtains next latent At cut-point, and according under the rule preset on duplicate removal server 103 and said method judgement Whether one potential cut-point is data flow point cutpoints.When exceeding the maximum data block of setting still When not finding data flow point cutpoint, then from the end position of maximum data block as force-splitting Point.
In embodiment as shown in figure 21, according to the rule preset on duplicate removal server 103 Then, from judging Wi1[ki-169,kiIn], whether at least part of data meet predetermined condition C1Start, When judging Wi1[ki-169,ki]、Wi2[ki-170,ki-1]、Wi3[ki-171,ki-2] and Wi4[ki-172, ki-3] in, at least part of data, at least part of data meet predetermined condition C respectively1、C2、C3With C4, it is judged that Wi5[ki-173,ki-4] in, at least part of data are unsatisfactory for predetermined condition C5Time, from latent At cut-point kiAlong data flow point cutpoint search direction 6 bytes of jump, the 6th byte End position obtains new potential cut-point, for distinguishing with other potential cut-points, shown herein as For kg, according to the rule preset on duplicate removal server 103, for potential cut-point kgDetermine 11 Individual window, respectively Wg1[kg-169,kg]、Wg2[kg-170,kg-1]、Wg3[kg-171,kg-2]、 Wg4[kg-172,kg-3]、Wg5[kg-173,kg-4]、Wg6[kg-174,kg-5]、Wg7[kg-175,kg -6]、Wg8[kg-176,kg-7]、Wg9[kg-177,kg-8]、Wg10[kg-178,kg-9] and Wg11[kg -179,kg-10].Judge Wg1[kg-169,kgIn], whether at least part of data meet predetermined condition C1、 Judge Wg2[kg-170,kg-1] in, whether at least part of data meet predetermined condition C2, judge Wg3 [kg-171,kg-2] in, whether at least part of data meet predetermined condition C3, judge Wg4[kg-172, kg-3] in, whether at least part of data meet predetermined condition C4, judge Wg5[kg-173,kg-4] in At least partly whether data meet predetermined condition C5, judge Wg6[kg-174,kg-5] at least portion in Whether divided data meets predetermined condition C6, judge Wg7[kg-175,kg-6] at least part of data in Whether meet predetermined condition C7, judge Wg8[kg-176,kg-7] in, at least part of data are the fullest Foot predetermined condition C8, judge Wg9[kg-177,kg-8] in, whether at least part of data meet predetermined Condition C9, judge Wg10[kg-178,kg-9] in, whether at least part of data meet predetermined condition C10 With judge Wg11[kg-179,kg-10] in, whether at least part of data meet predetermined condition C11.Window Wg11[kg-179,kg-10] with window Wi5[ki-173,ki-4] overlap, and C5=C11, therefore, When judging Wi5[ki-173,ki-4] in, at least part of data are unsatisfactory for predetermined condition C5Time, from potential Cut-point kiAlong data flow point cutpoint T byte of search direction jump, it is thus achieved that potential segmentation Point kgStill the condition as data flow point cutpoint is not met.Therefore, if from potential cut-point kiDouble counting can be there is along data flow point cutpoint search direction 6 bytes of jumping, therefore, From potential cut-point kiWeight can be reduced along data flow point cutpoint search direction 7 bytes of jump Multiple calculating, in hgher efficiency.Therefore improve the speed searching data flow point cutpoint.When default rule Window W in fixedx[k-Ax,k+BxIn], at least part of data meet predetermined condition CxProbability be When 1/2, i other words perform jump with the probability of 1/2, the most at most can jump ‖ B11‖+‖ A11‖=189 byte.
In the present embodiment, pre-defined rule is: determine 11 window W for potential cut-point kx [k-Ax,k+Bx] and window Wx[k-Ax,k+BxIn], at least part of data meet pre-conditioned Cx, Wherein Wx[k-Ax,k+BxIn], at least part of data meet pre-conditioned CxProbability be 1/2, x It is respectively 1 to 11 continuous print natural number and AxAnd BxFor integer.Wherein, A1=169, B1=0; A2=170, B2=-1;A3=171, B3=-2;A4=172, B4=-3;A5=173, B5=-4;A6=174, B6=-5; A7=175, B7=-6;A8=176, B8=-7;A9=177, B9=-8;A10=178, B10=-9;A11=179, B11=-10, and C1=C2=C3=C4=C5=C6=C7=C8=C9=C10=C11.It is potential point Cutpoint k selects 11 windows, and is continuous 11 windows, can be counted by the two factor Calculate P (n).The selection mode of 11 windows and judge in each window in 11 windows at least Part data meet predetermined condition CxFollow the rule preset on duplicate removal server 103, therefore Whether there are in continuous 11 windows at least part of data in each window and meet predetermined condition CxJust determine whether potential cut-point k is data flow point cutpoints.We claim between two bytes Gap is a point.P (n) represents: there is not continuous print 11 in n window of continuous print full , the most there is not the probability of data flow point cutpoint in the probability of the window of foot condition.From file header/ One cut-point jumps after minimum piecemeal size 4KB, searches opposite direction rollback to data flow point cutpoint 10 bytes, find the 4086th point, the most there is not data flow point cutpoint, so P (4086)=1, the like, P (4087)=1 ... P (4095)=1.The 4096th At individual point, i.e. at minimum piecemeal size, with every in these 11 windows of probability of (1/2) ^11 In one window, at least part of data meet predetermined condition Cx, therefore with the probability of (1/2) ^11 There is data flow point cutpoint, there is not data flow point cutpoint, institute with the probability of 1-(1/2) ^11 With P (4096)=1-(1/2) ^11.
At the n-th window, 12 kinds of situations can be divided into carry out recursion P (n).
In situation 1: the n-th window, at least part of data are unsatisfactory for predetermined condition with the probability of 1/2, Now there is not continuous print 11 with the probability of P (n-1) in n-1 window before the n-th window In window, at least part of data of each window are satisfied by predetermined condition, and therefore P (n) comprises 1/2* P(n-1).In n-th window, at least part of data are unsatisfactory for predetermined condition, and while n-th There are at least part of data in 11 each windows of window of continuous print in some n-1 window above The situation being satisfied by predetermined condition is unrelated with P (n).
In situation 2: the n-th window, at least part of data meet predetermined condition with the probability of 1/2, In (n-1)th window, at least part of data are unsatisfactory for predetermined condition with the probability of 1/2, and now (n-1)th N-2 window before individual window does not exist in 11 windows of continuous print with the probability of P (n-2) In each window, at least part of data are satisfied by predetermined condition, and therefore P (n) comprises 1/2*1/2*P (n-2).In n-th window, at least part of data meet predetermined condition, in (n-1)th some window N-2 the window that at least partly data are unsatisfactory for before predetermined condition, and (n-1)th window is deposited In 11 windows of continuous print, at least part of data of each window meet the situation of predetermined condition Unrelated with P (n).
According to foregoing description, in 11: the n-th to n-9 window of situation, at least part of data are with (1/2) The probability of ^10 meets predetermined condition, in the (n-1)th 0 windows at least partly data with 1/2 probability Being unsatisfactory for predetermined condition, now n-11 window before the (n-1)th 0 windows is with P's (n-11) There are not in 11 windows of continuous print at least part of data in each window and be satisfied by pre-in probability Fixed condition, therefore P (n) comprises (1/2) ^10*1/2*P (n-11).The n-th to n-9 window In Kou, at least part of data are satisfied by predetermined condition, and in the (n-1)th 0 windows, at least partly data are not Meet predetermined condition, and n-11 window before the (n-1)th 0 windows exists continuous print 11 In window, in each window, at least part of data are satisfied by situation and P (n) nothing of predetermined condition Close.
In the window that situation is 12: the n-th to n-10, at least part of data are with the probability of (1/2) ^11 Meeting predetermined condition, this situation is unrelated with P (n).
Therefore, P (n)=1/2*P (n-1)+(1/2) ^2*P (n-2)+...+(1/2) ^11*P(n-11).Another kind of preset rules: determine 24 window W for potential cut-point kx[k -Ax,k+Bx] and window Wx[k-Ax,k+Bx] corresponding predetermined condition Cx, wherein, x is 1 to 11 Continuous print natural number, A1=169, B1=0;A2=170, B2=-1;A3=171, B3=-2;A4=172, B4 =-3;A5=173, B5=-4;A6=174, B6=-5;A7=175, B7=-6;A8=176, B8=-7;A9=177, B9=-8;A10=178, B10=-9;A11=179, B11=-10 ... A24=192, B24=-23, and C1=C2 =C3=C4=C5=C6=C7=C8=C9=...=C24, window Wx[k-Ax,k+BxIn] at least partly Data meet predetermined condition CxProbability be 3/4, P (n) can be calculated by the two factor.
The most whether there are at least part of data in each window in continuous 24 windows equal Meet predetermined condition CxJust determine whether potential cut-point k is data flow point cutpoints, can pass through Equation below calculates:
P (1)=1, P (2) ... P (23)=1, P (24)=1-(3/4) ^24,
P (n)=1/4*P (n-1)+1/4* (3/4) * P (n-2)+...+1/4* (3/4) ^23*P(n-24)。
Through calculating, P (5*1024)=0.78, P (11*1024)=0.17, P (12*1024)=0.13, I.e. from data stream original position/a data flow point cutpoint find after 12KB the probability with 13% Do not find data flow point cutpoint yet, force to split.By this probability, try to achieve data stream The density function of cut-point, through integration try to achieve about averagely from data stream original position/on One data flow point cutpoint finds data flow point cutpoint when searching 7.6KB, i.e. average mark block length is big It is about 7.6KB.At least part of data meet predetermined with the probability of 1/2 with 11 windows of continuous print Condition is different, when tradition CDC algorithm uses a window to meet condition with the probability of 1/2^12, The effect of average mark block length 7.6KB can be reached.
On the basis of the data flow point cutpoint shown in Fig. 3 is searched, the embodiment party shown in Figure 23 In formula, being preset with rule on duplicate removal server 103, described rule is: for potential cut-point k Determine 11 window Wx[k-Ax,k+Bx] and window Wx[k-Ax,k+Bx] corresponding predetermined condition Cx, wherein, x is 1 to 11 continuous print natural numbers, AxAnd BxFor integer.Wherein, window Wx [k-Ax,k+BxIn], at least part of data meet predetermined condition CxProbability be 1/2, A1=171, B1 =-2;A2=172, B2=-3;A3=173, B3=-4;A4=174, B4=-5;A5=175, B5=-6;A6=176, B6=-7;A7=177, B7=-8;A8=178, B8=-9;A9=179, B9=-10;A10=170, B10=-1;A11 =169, B11=0, and C1=C2=C3=C4=C5=C6=C7=C8=C9=C10=C11。kaFor number According to flow point cutpoint, the cutpoint search direction of data flow point shown in Figure 23 is from left to right, from data Flow point cutpoint kaAfter skipping minimum data block 4KB, in minimum data block 4KB end position conduct Next potential cut-point ki, according to the rule preset on duplicate removal server 103, for potential Cut-point kiDetermine Wx[k-Ax,k+Bx] and window Wx[k-Ax,k+Bx] corresponding pre-conditioned Cx, wherein x is 1 to 11 continuous print natural numbers.11 windows determined are respectively Wi1[ki-171, ki-2]、Wi2[ki-172,ki-3]、Wi3[ki-173,ki-4]、Wi4[ki-174,ki-5]、Wi5[ki-175, ki-6]、Wi6[ki-176,ki-7]、Wi7[ki-177,ki-8]、Wi8[ki-178,ki-9]、Wi9[ki-179, ki-10]、Wi10[ki-170,ki-1] and Wi11[ki-169,ki].Judge Wi1[ki-171,ki-2] in extremely Whether small part data meet predetermined condition C1, judge Wi2[ki-172,ki-3] at least partly count in According to whether meeting predetermined condition C2, judge Wi3[ki-173,ki-4] in, at least part of data are the fullest Foot predetermined condition C3, judge Wi4[ki-174,ki-5] in, whether at least part of data meet predetermined bar Part C4, judge Wi5[ki-175,ki-6] in, whether at least part of data meet predetermined condition C5, sentence Disconnected Wi6[ki-176,ki-7] in, whether at least part of data meet predetermined condition C6, judge Wi7[ki -177,ki-8] in, whether at least part of data meet predetermined condition C7, judge Wi8[ki-178,ki-9] In at least partly data whether meet predetermined condition C8, judge Wi9[ki-179,ki-10] at least Whether part data meet predetermined condition C9, judge Wi10[ki-170,ki-1] at least part of data in Whether meet predetermined condition C10With judge Wi11[ki-169,kiIn], whether at least part of data meet Predetermined condition C11.When judging window Wi1In at least partly data meet predetermined condition C1, window Wi2In at least partly data meet predetermined condition C2, window Wi3In at least partly data meet pre- Fixed condition C3, window Wi4In at least partly data meet predetermined condition C4, window Wi5In at least Part data meet predetermined condition C5, window Wi6In at least partly data meet predetermined condition C6、 Window Wi7In at least partly data meet predetermined condition C7, window Wi8In at least partly data full Foot predetermined condition C8, window Wi9In at least partly data meet predetermined condition C9, window Wi10In At least partly data meet predetermined condition C10With window Wi11In at least partly data meet predetermined Condition C11Time, the most current potential cut-point kiFor data flow point cutpoint.When arbitrary in 11 windows When in individual window, at least part of data are unsatisfactory for the predetermined condition of correspondence, as shown in figure 24, Wi3 [pi3-169,pi3In], at least part of data are unsatisfactory for predetermined condition C3, put pi3Along data flow point It is described as a example by cutpoint search direction 11 bytes of jump.As shown in figure 24, when judging W3No Meet predetermined condition C3Time, with kiFor starting point, jump along data flow point cutpoint search direction N number of byte, the most N number of byte is not more than ‖ B3‖+maxx(‖Ax‖), in the present embodiment, N=7, at the end position of the 7th byte, it is thus achieved that next potential cut-point, for potential Cut-point kiDifference, is expressed as k by new potential cut-point herej, according at duplicate removal server The rule preset on 103, for potential cut-point kjDetermine 11 window Wjx[kj-Ax,kj+Bx], It is respectively Wj1[kj-171,kj-2]、Wj2[kj-172,kj-3]、Wj3[kj-173,kj-4]、Wj4[kj -174,kj-5]、Wj5[kj-175,kj-6]、Wj6[kj-176,kj-7]、Wj7[kj-177,kj-8]、Wj8 [kj-178,kj-9]、Wj9[kj-179,kj-10]、Wj10[kj-170,kj-1] and Wj11[kj-169,kj]。 Judge Wj1[kj-171,kj-2] in, whether at least part of data meet predetermined condition C1, judge Wj2 [kj-172,kj-3] in, whether at least part of data meet predetermined condition C2, judge Wj3[kj-173, kj-4] in, whether at least part of data meet predetermined condition C3, judge Wj4[kj-174,kj-5] in At least partly whether data meet predetermined condition C4, judge Wj5[kj-175,kj-6] at least partly Whether data meet predetermined condition C5, judge Wj6[kj-176,kj-7] in, whether at least part of data Meet predetermined condition C6, judge Wj7[kj-177,kj-8] in, whether at least part of data meet predetermined Condition C7, judge Wj8[kj-178,kj-9] in, whether at least part of data meet predetermined condition C8、 Judge Wj9[kj-179,kj-10] in, whether at least part of data meet predetermined condition C9, judge Wj10[kj-170,kj-1] in, whether at least part of data meet predetermined condition C10With judge Wj11[kj -169,kjIn], whether at least part of data meet predetermined condition C11.Certainly in the embodiment of the present invention In, it is judged that potential cut-point kaAlso in compliance with this principle when whether being data flow point cutpoint, specifically real The most no longer describe, be referred to judge potential cut-point kiDescription.When judging window Wj1In At least partly data meet predetermined condition C1, window Wj2In at least partly data meet predetermined bar Part C2, window Wj3In at least partly data meet predetermined condition C3, window Wj4In at least partly Data meet predetermined condition C4, window Wj5In at least partly data meet predetermined condition C5, window Mouth Wj6In at least partly data meet predetermined condition C6, window Wj7In at least partly data meet Predetermined condition C7, window Wj8In at least partly data meet predetermined condition C8, window Wj9In extremely Small part data meet predetermined condition C9, window Wj10In at least partly data meet predetermined condition C10With window Wj11In at least partly data meet predetermined condition C11Time, the most current potential segmentation Point kjFor data flow point cutpoint, kjWith kaBetween data constitute 1 data block, simultaneously according to With kaIdentical mode skips minimum piecemeal size 4KB, it is thus achieved that next potential cut-point, and According to the rule preset on duplicate removal server 103, it is judged that whether next potential cut-point is Data flow point cutpoint.When judging potential cut-point kjWhen not being data flow point cutpoint, according to ki Identical mode obtains next potential cut-point, and presets according on duplicate removal server 103 Rule and said method judge whether next potential cut-point is data flow point cutpoints.When super Cross the maximum data block set when the most not finding data flow point cutpoint, then from maximum data block End position as force-splitting point.Certainly the enforcement of the method by maximum data block length and Constitute the size constraint of the file of this data stream, do not repeat them here.
On the basis of the data flow point cutpoint shown in Fig. 3 is searched, the embodiment party shown in Figure 25 In formula, being preset with rule on duplicate removal server 103, described rule is: for potential cut-point k Determine 11 window Wx[k-Ax,k+Bx] and window Wx[k-Ax,k+Bx] corresponding predetermined condition Cx, wherein x is 1 to 11 consecution natural numbers, A1=166, B1=3;A2=167, B2=2;A3=168, B3 =1;A4=169, B4=0;A5=170, B5=-1;A6=171, B6=-2;A7=172, B7=-3;A8=173, B8=-4;A9=174, B9=-5;A10=175, B10=-6;A11=176, B11=-7;And C1=C2=C3=C4 =C5=C6=C7=C8=C9=C10=C11, then 11 windows are respectively W1[k-166,k+3]、W2 [k-167,k+2]、W3[k-168,k+1]、W4[k-169,k]、W5[k-170,k-1]、W6[k-171, k-2]、W7[k-172,k-3]、W8[k-173,k-4]、W9[k-174,k-5]、W10[k-175,k-6] And W11[k-176,k-7]。kaFor data flow point cutpoint, the cutpoint of data flow point shown in Figure 25 is looked into Looking for direction is from left to right, from data flow point cutpoint kaAfter skipping minimum data block 4KB, minimum Data block 4KB end position is as next potential cut-point ki, in the present embodiment, according to The rule preset on duplicate removal server 103, for potential cut-point kiDetermine 11 window Wix[k- Ax,k+Bx] and window Wix[k-Ax,k+Bx] corresponding predetermined condition Cx, x is respectively 1 to 11 even Continuous natural number.In the embodiment shown in Figure 25, for potential cut-point kiDetermine 11 windows Mouthful, respectively Wi1[ki-166,ki+3]、Wi2[ki-167,ki+2]、Wi3[ki-168,ki+1]、 Wi4[ki-169,ki]、Wi5[ki-170,ki-1]、Wi6[ki-171,ki-2]、Wi7[ki-172,ki-3]、 Wi8[ki-173,ki-4]、Wi9[ki-174,ki-5]、Wi10[ki-175,ki-6] and Wi11[ki-176,ki-7]。 Judge Wi1[ki-166,ki+ 3] in, whether at least part of data meet predetermined condition C1, judge Wi2 [ki-167,ki+ 2] in, whether at least part of data meet predetermined condition C2, judge Wi3[ki-168, ki+ 1] in, whether at least part of data meet predetermined condition C3, judge Wi4[ki-169,kiIn] extremely Whether small part data meet predetermined condition C4, judge Wi5[ki-170,ki-1] at least partly count in According to whether meeting predetermined condition C5, judge Wi6[ki-171,ki-2] in, at least part of data are the fullest Foot predetermined condition C6, judge Wi7Wi7[ki-172,ki-3] in, whether at least part of data meet pre- Fixed condition C7, judge Wi8[ki-173,ki-4] in, whether at least part of data meet predetermined condition C8、 Judge Wi9[ki-174,ki-5] in, whether at least part of data meet predetermined condition C9, judge Wi10 [ki-175,ki-6] in, whether at least part of data meet predetermined condition C10With judge Wi11[ki-176, ki-7] in, whether at least part of data meet predetermined condition C11.When judging window Wi1In at least portion Divided data meets predetermined condition C1, window Wi2In at least partly data meet predetermined condition C2、 Window Wi3In at least partly data meet predetermined condition C3, window Wi4In at least partly data full Foot predetermined condition C4, window Wi5In at least partly data meet predetermined condition C5, window Wi6In At least partly data meet predetermined condition C6, window Wi7In at least partly data meet predetermined bar Part C7, window Wi8In at least partly data meet predetermined condition C8, window Wi9In at least partly Data meet predetermined condition C9, window Wi10In at least partly data meet predetermined condition C10And window Mouth Wi11In at least partly data meet predetermined condition C11Time, the most current potential cut-point kiFor number According to flow point cutpoint.When data at least part of in any one window in 11 windows are unsatisfactory for correspondence During predetermined condition, as shown in figure 26, Wi7[ki-172,ki-3], then from potential cut-point kiAlong The data flow point cutpoint search direction N number of byte of jump, the most N number of byte is not more than ‖ B7‖+ maxx(‖Ax‖), in the embodiment shown in Figure 26, N number of byte of jumping is not more than 185 Individual byte, in the present embodiment, N=5, obtain new potential cut-point, for potential segmentation Point kiDifference, is expressed as k by new potential cut-point herej, according to the embodiment party shown in Figure 25 The rule preset on duplicate removal server 103 in formula, for potential cut-point kjThe window determined is 11, respectively Wj1[kj-166,kj+3]、Wj2[kj-167,kj+2]、Wj3[kj-168,kj+1]、 Wj4[kj-169,kj]、Wj5[kj-170,kj-1]、Wj6[kj-171,kj-2]、Wj7[kj-172,kj-3]、 Wj8[kj-173,kj-4]、Wj9[kj-174,kj-5]、Wj10[kj-175,kj-6] and Wj11[kj-176,kj -7].Judge Wj1[kj-166,kj+ 3] in, whether at least part of data meet predetermined condition C1, sentence Disconnected Wj2[kj-167,kj+ 2] in, whether at least part of data meet predetermined condition C2, judge Wj3[kj -168,kj+ 1] in, whether at least part of data meet predetermined condition C3, judge Wj4[kj-169,kj] In at least partly data whether meet predetermined condition C4, judge Wj5[kj-170,kj-1] at least portion in Whether divided data meets predetermined condition C5, judge Wj6[kj-171,kj-2] in, at least part of data are No meet predetermined condition C6, judge Wj7[kj-172,kj-3] in, whether at least part of data meet pre- Fixed condition C7, judge Wj8[kj-173,kj-4] in, whether at least part of data meet predetermined condition C8、 Judge Wj9[kj-174,kj-5] in, whether at least part of data meet predetermined condition C9, judge Wj10 [kj-175,kj-6] in, whether at least part of data meet predetermined condition C10With judge Wj11[kj-176, kj-7] in, whether at least part of data meet predetermined condition C11.The most in embodiments of the present invention, Judge potential cut-point kaAlso in compliance with this principle when whether being data flow point cutpoint, implement not Describe again, be referred to judge potential cut-point kiDescription.When judging window Wj1In at least Part data meet predetermined condition C1, window Wj2In at least partly data meet predetermined condition C2、 Window Wj3In at least partly data meet predetermined condition C3, window Wj4In at least partly data full Foot predetermined condition C4, window Wj5In at least partly data meet predetermined condition C5, window Wj6In At least partly data meet predetermined condition C6, window Wj7In at least partly data meet predetermined bar Part C7, window Wj8In at least partly data meet predetermined condition C8, window Wj9In at least partly Data meet predetermined condition C9, window Wj10In at least partly data meet predetermined condition C10And window Mouth Wj11In at least partly data meet predetermined condition C11Time, the most current potential cut-point kjFor number According to flow point cutpoint, kjWith kaBetween data constitute 1 data block, simultaneously according to kaIdentical Mode skip minimum piecemeal size 4KB, it is thus achieved that next potential cut-point, and according to going The rule preset on weight server 103, it is judged that whether next potential cut-point is data flow point Cutpoint.When judging potential cut-point kjWhen not being data flow point cutpoint, according to kiIdentical side Formula obtain next potential cut-point, and according on duplicate removal server 103 preset rule and Said method judges whether next potential cut-point is data flow point cutpoints.When exceeding setting When maximum data block does not the most find data flow point cutpoint, then from the stop bits of maximum data block Put as force-splitting point.
On the basis of the data flow point cutpoint shown in Fig. 3 is searched, in the reality shown in Figure 27 Executing in mode, be preset with rule on duplicate removal server 103, described rule is: be latent 11 window W are determined at cut-point kx[k-Ax,k+Bx] and window Wx[k-Ax,k+Bx] right Predetermined condition C answeredx, wherein x is 1 to 11 continuous print natural numbers, A1=169, B1=0; A2=170, B2=-1;A3=171, B3=-2;A4=172, B4=-3;A5=173, B5=-4; A6=174, B6=-5;A7=175, B7=-6;A8=176, B8=-7;A9=177, B9=-8; A10=168, B10=1;A11=179, B11=3;And C1=C2=C3=C4=C5=C6=C7=C8 =C9=C10≠C11, then 11 windows are respectively W1[k-169,k]、W2[k-170,k-1]、W3 [k-171,k-2]、W4[k-172,k-3]、W5[k-173,k-4]、W6[k-174,k-5]、W7 [k-175,k-6]、W8[k-176,k-7]、W9[k-177,k-8]、W10[k-168, k+1] and W11 [k-179,k+3]。kaFor data flow point cutpoint, the cutpoint of data flow point shown in Figure 27 is searched Direction is from left to right, from data flow point cutpoint kaAfter skipping minimum data block 4KB, Small data block 4KB end position is as next potential cut-point ki, in the present embodiment, According to the rule preset on duplicate removal server 103, for potential cut-point kiDetermine window Wix [ki-Ax, ki+Bx], x is respectively 1 to 11 continuous print natural numbers, shown in Figure 27 In embodiment, for potential cut-point kiDetermine that 11 windows are respectively Wi1[ki-169,ki]、 Wi2[ki-170,ki-1]、Wi3[ki-171,ki-2]、Wi4[ki-172,ki-3]、Wi5[ki-173, ki-4]、Wi6[ki-174,ki-5]、Wi7[ki-175,ki-6]、Wi8[ki-176,ki-7]、Wi9 [ki-177,ki-8]、Wi10[ki-168,ki+ 1] and Wi11[ki-179,ki+3].Judge Wi1[ki -169,kiIn], whether at least part of data meet predetermined condition C1, judge Wi2[ki-170,ki -1] in, whether at least part of data meet predetermined condition C2, judge Wi3[ki-171,ki-2] in At least partly whether data meet predetermined condition C3, judge Wi4[ki-172,ki-3] at least Whether part data meet predetermined condition C4, judge Wi5[ki-173,ki-4] at least partly Whether data meet predetermined condition C5, judge Wi6[ki-174,ki-5] at least part of data in Whether meet predetermined condition C6, judge Wi7[ki-175,ki-6] in, whether at least part of data Meet predetermined condition C7, judge Wi8[ki-176,ki-7] in, whether at least part of data meet Predetermined condition C8, judge Wi9[ki-177,ki-8] in, whether at least part of data meet predetermined Condition C9, judge Wi10[ki-168,ki+ 1] in, whether at least part of data meet predetermined condition C10With judge Wi11[ki-179,ki+ 3] in, whether at least part of data meet predetermined condition C11。 When judging window Wi1In at least partly data meet predetermined condition C1, window Wi2In at least portion Divided data meets predetermined condition C2, window Wi3In at least partly data meet predetermined condition C3、 Window Wi4In at least partly data meet predetermined condition C4, window Wi5In at least part of data Meet predetermined condition C5, window Wi6In at least partly data meet predetermined condition C6, window Wi7In at least partly data meet predetermined condition C7, window Wi8In at least partly data meet Predetermined condition C8, window Wi9In at least partly data meet predetermined condition C9, window Wi10In At least partly data meet predetermined condition C10With window Wi11In at least partly data meet pre- Fixed condition C11Time, the most current potential cut-point kiFor data flow point cutpoint.When judging window Wi11In at least partly data be unsatisfactory for predetermined condition C11Time, then from potential cut-point kiAlong Data flow point cutpoint search direction 1 byte of jump, obtains new potential cut-point, for With potential cut-point kiDifference, is expressed as k by new potential cut-point herej.Work as Wi1、Wi2、 Wi3、Wi4、Wi5、Wi6、Wi7、Wi8、Wi9And Wi10Any one window in 10 windows In time at least partly data are unsatisfactory for the predetermined condition of correspondence, as shown in figure 28, Wi4[ki -172,ki-3], then from a kiAlong the data flow point cutpoint search direction N number of byte of jump, The most N number of byte is not more than ‖ B4‖+maxx(‖Ax‖), in the enforcement shown in Figure 28 In mode, N number of byte of jumping is not more than 182 bytes, in the present embodiment, N=6, Obtain new potential cut-point, for potential cut-point kiDifference, here by new potential point Cutpoint is expressed as kj, according in the embodiment shown in Figure 27 on duplicate removal server 103 The rule preset, for potential cut-point kjThe window determined is respectively Wj1[kj-169,kj]、 Wj2[kj-170,kj-1]、Wj3[kj-171,kj-2]、Wj4[kj-172,kj-3]、Wj5[kj-173, kj-4]、Wj6[kj-174,kj-5]、Wj7[kj-175,kj-6]、Wj8[kj-176,kj-7]、Wj9 [kj-177,kj-8]、Wj10[kj-168,kj+ 1] and Wj11[kj-179,kj+3].Judge Wj1 [kj-169,kjIn], whether at least part of data meet predetermined condition C1, judge Wj2[kj-170, kj-1] in, whether at least part of data meet predetermined condition C2, judge Wj3[kj-171,kj-2] In at least partly data whether meet predetermined condition C3, judge Wj4[kj-172,kj-3] in extremely Whether small part data meet predetermined condition C4, judge Wj5[kj-173,kj-4] at least portion in Whether divided data meets predetermined condition C5, judge Wj6[kj-174,kj-5] at least partly count in According to whether meeting predetermined condition C6, judge Wj7[kj-175,kj-6] in, at least part of data are No meet predetermined condition C7, judge Wj8[kj-176,kj-7] in, at least part of data are the fullest Foot predetermined condition C8, judge Wj9[kj-177,kj-8] in, whether at least part of data meet pre- Fixed condition C9, judge Wj10[kj-168,kj+ 1] in, whether at least part of data meet predetermined bar Part C10With judge Wj11[kj-179,kj+ 3] in, whether at least part of data meet predetermined condition C11.The most in embodiments of the present invention, it is judged that potential cut-point kaWhether it is data flow point Also in compliance with this principle during cutpoint, implement and no longer describe, be referred to judge potential point Cutpoint kiDescription.When judging window Wj1In at least partly data meet predetermined condition C1、 Window Wj2In at least partly data meet predetermined condition C2, window Wj3In at least part of data Meet predetermined condition C3, window Wj4In at least partly data meet predetermined condition C4, window Wj5In at least partly data meet predetermined condition C5, window Wj6In at least partly data meet Predetermined condition C6, window Wj7In at least partly data meet predetermined condition C7, window Wj8In At least partly data meet predetermined condition C8, window Wj9In at least partly data meet predetermined Condition C9, window Wj10In at least partly data meet predetermined condition C10With window Wj11In extremely Small part data meet predetermined condition C11Time, the most current potential cut-point kjFor data flow point Cutpoint, kjWith kaBetween data constitute 1 data block, simultaneously according to kaIdentical Mode skips minimum piecemeal size 4KB, it is thus achieved that next potential cut-point, and according to The rule preset on duplicate removal server 103, it is judged that whether next potential cut-point is several According to flow point cutpoint.When judging potential cut-point kjWhen not being data flow point cutpoint, according to ki Identical mode obtains next potential cut-point, and according on duplicate removal server 103 The rule preset and said method judge whether next potential cut-point is the segmentation of data stream Point.When the maximum data block exceeding setting does not the most find data flow point cutpoint, then From the end position of maximum data block as force-splitting point.
On the basis of the data flow point cutpoint shown in Fig. 3 is searched, the embodiment party shown in Figure 29 In formula, being preset with rule on duplicate removal server 103, described rule is: for potential cut-point k Determine 11 window Wx[px-Ax,px+Bx] and window Wx[px-Ax,px+Bx] corresponding making a reservation for Condition Cx, x is respectively 1 to 11 continuous print natural numbers, wherein, window Wx[px-Ax,px+Bx] In at least partly data meet the probability of predetermined condition is 1/2, A1=169, B1=0;A2=171, B2 =-2;A3=173, B3=-4;A4=175, B4=-6;A5=177, B5=-8;A6=179, B6=-10;A7=181, B7=-12;A8=183, B8=-14;A9=185, B9=-16;A10=187, B10=-18;A11=189, B11=-20; And C1=C2=C3=C4=C5=C6=C7=C8=C9=C10=C11, then 11 windows are respectively W1 [k-169,k]、W2[k-171,k-2]、W3[k-173,k-4]、W4[k-175,k-6]、W5[k-177, k-8]、W6[k-179,k-10]、W7[k-181,k-12]、W8[k-183,k-14]、W9[k-185,k-16]、 W10[k-187, k-18] and W11[k-189,k-20]。kaFor data flow point cutpoint, shown in Figure 29 Data flow point cutpoint search direction is from left to right, from data flow point cutpoint kaSkip minimum data After block 4KB, at minimum data block 4KB end position as next potential cut-point ki, for Potential cut-point kiDetermine a pix, in the present embodiment, according to pre-on duplicate removal server 103 If rule, x is respectively 1 to 11 continuous print natural numbers.In the embodiment shown in Figure 29, According to pre-defined rule, for potential cut-point ki11 windows determined are respectively Wi1[ki-169, ki]、Wi2[ki-171,ki-2]、Wi3[ki-173,ki-4]、Wi4[ki-175,ki-6]、Wi5[ki-177, ki-8]、Wi6[ki-179,ki-10]、Wi7[ki-181,ki-12]、Wi8[ki-183,ki-14]、Wi9 [ki-185,ki-16]、Wi10[ki-187,ki-18] and Wi11[ki-189,ki-20].Judge Wi1[ki -169,kiIn], whether at least part of data meet predetermined condition C1, judge Wi2[ki-171,ki-2] In at least partly data whether meet predetermined condition C2, judge Wi3[ki-173,ki-4] at least Whether part data meet predetermined condition C3, judge Wi4[ki-175,ki-6] at least partly count in According to whether meeting predetermined condition C4, judge Wi5[ki-177,ki-8] in, whether at least part of data Meet predetermined condition C5, judge Wi6[ki-179,ki-10] in, whether at least part of data meet pre- Fixed condition C6, judge Wi7[ki-181,ki-12] in, whether at least part of data meet predetermined condition C7, judge Wi8[ki-183,ki-14] in, whether at least part of data meet predetermined condition C8, sentence Disconnected Wi9[ki-185,ki-16] in, whether at least part of data meet predetermined condition C9, judge Wi10 [ki-187,ki-18] in, whether at least part of data meet predetermined condition C10With judge Wi11[ki -189,ki-20] in, whether at least part of data meet predetermined condition C11.When judging window Wi1In At least partly data meet predetermined condition C1, window Wi2In at least partly data meet predetermined bar Part C2, window Wi3In at least partly data meet predetermined condition C3, window Wi4In at least partly Data meet predetermined condition C4, window Wi5In at least partly data meet predetermined condition C5, window Mouth Wi6In at least partly data meet predetermined condition C6, window Wi7In at least partly data meet Predetermined condition C7, window Wi8In at least partly data meet predetermined condition C8, window Wi9In extremely Small part data meet predetermined condition C9, window Wi10In at least partly data meet predetermined condition C10With window Wi11In at least partly data meet predetermined condition C11Time, the most current potential cut-point kiFor data flow point cutpoint.When data at least part of in any one window in 11 windows are unsatisfactory for During corresponding predetermined condition, as shown in figure 30, Wi4[ki-175,ki-6] at least part of data in It is unsatisfactory for predetermined condition C4, then select next potential cut-point, for potential cut-point kiDistrict Not, here shown as kj, kjIt is positioned at kiThe right, and kjWith ki1 byte of spacing.Such as figure Shown in 30, according to the rule preset for duplicate removal server 103, for potential cut-point kjDetermine 11 Window is respectively Wj1[kj-169,kj]、Wj2[kj-171,kj-2]、Wj3[kj-173,kj-4]、 Wj4[kj-175,kj-6]、Wj5[kj-177,kj-8]、Wj6[kj-179,kj-10]、Wj7[kj-181, kj-12]、Wj8[kj-183,kj-14]、Wj9[kj-185,kj-16]、Wj10[kj-187,kj-18] And Wj11[kj-189,kj, and C-20]1=C2=C3=C4=C5=C6=C7=C8=C9=C10=C11。 Judge Wj1[kj-169,kjIn], whether at least part of data meet predetermined condition C1, judge Wj2 [kj-171,kj-2] in, whether at least part of data meet predetermined condition C2, judge Wj3[kj-173, kj-4] in, whether at least part of data meet predetermined condition C3, judge Wj4[kj-175,kj-6] in At least partly whether data meet predetermined condition C4, judge Wj5[kj-177,kj-8] at least portion in Whether divided data meets predetermined condition C5, judge Wj6[kj-179,kj-10] at least part of data in Whether meet predetermined condition C6, judge Wj7[kj-181,kj-12] in, at least part of data are the fullest Foot predetermined condition C7, judge Wj8[kj-183,kj-14] in, whether at least part of data meet predetermined Condition C8, judge Wj9[kj-185,kj-16] in, whether at least part of data meet predetermined condition C9、 Judge Wj10[kj-187,kj-18] in, whether at least part of data meet predetermined condition C10And judgement Wj11[kj-189,kj-20] in, whether at least part of data meet predetermined condition C11.When judging window Mouth Wj1In at least partly data meet predetermined condition C1, window Wj2In at least partly data meet Predetermined condition C2, window Wj3In at least partly data meet predetermined condition C3, window Wj4In extremely Small part data meet predetermined condition C4, window Wj5In at least partly data meet predetermined condition C5, window Wj6In at least partly data meet predetermined condition C6, window Wj7In at least partly count According to meeting predetermined condition C7, window Wj8In at least partly data meet predetermined condition C8, window Wi9In at least partly data meet predetermined condition C9, window Wj10In at least partly data meet Predetermined condition C10With window Wj11In at least partly data meet predetermined condition C11Time, the most currently Potential cut-point kjFor data flow point cutpoint.When judging window Wj1、Wj2、Wj3、Wj4、Wj5、 Wj6、Wj7、Wj8、Wj9、Wj10And Wj11In in any one window at least partly data be discontented with During foot predetermined condition, as shown in figure 31, Wj3[kj-173,kj-4] in, at least part of data are discontented with Foot predetermined condition C3Time, kjIt is positioned at kiThe right is from kiJump along data flow point cutpoint search direction N number of byte, the most N number of byte is not more than ‖ B4‖+maxx(‖Ax‖), shown in Figure 28 In embodiment, N number of byte is not more than 195 bytes, in the present embodiment, N=15, obtains Next potential cut-point, for potential cut-point ki、kjDistinguish, be expressed as kl.Root According to Figure 29 institute embodiment being the default rule of duplicate removal server 103, for potential cut-point kl Determine that 11 windows are respectively Wl1[kl-169,kl]、Wl2[kl-171,kl-2]、Wl3[kl-173, kl-4]、Wl4[kl-175,kl-6]、Wl5[kl-177,kl-8]、Wl6[kl-179,kl-10]、Wl7 [kl-181,kl-12]、Wl8[kl-183,kl-14]、Wl9[kl-185,kl-16]、Wl10[kl-187, kl-18] and Wl11[kl-189,kl-20].Judge Wl1[kl-169,klIn], whether at least part of data Meet predetermined condition C1, judge Wl2[kl-171,kl-2] in, whether at least part of data meet pre- Fixed condition C2, judge Wl3[kl-173,kl-4] in, whether at least part of data meet predetermined condition C3, judge Wl4[kl-175,kl-6] in, whether at least part of data meet predetermined condition C4, sentence Disconnected Wl5[kl-177,kl-8] in, whether at least part of data meet predetermined condition C5, judge Wl6[kl -179,kl-10] in, whether at least part of data meet predetermined condition C6, judge Wl7[kl-181, kl-12] in, whether at least part of data meet predetermined condition C7, judge Wl8[kl-183,kl-14] In at least partly data whether meet predetermined condition C8, judge Wl9[kl-185,kl-16] at least Whether part data meet predetermined condition C9, judge Wl10[kl-187,kl-18] at least partly count in According to whether meeting predetermined condition C10With judge Wl11[kl-189,kl-20] in, at least part of data are No meet predetermined condition C11.When judging window Wl1In at least partly data meet predetermined condition C1、 Window Wl2In at least partly data meet predetermined condition C2, window Wl3In at least partly data full Foot predetermined condition C3, window Wl4In at least partly data meet predetermined condition C4, window Wl5In At least partly data meet predetermined condition C5, window Wl6In at least partly data meet predetermined bar Part C6, window Wl7In at least partly data meet predetermined condition C7, window Wl8In at least partly Data meet predetermined condition C8, window Wl9In at least partly data meet predetermined condition C9, window Mouth Wl10In at least partly data meet predetermined condition C10With window Wl11In at least partly data full Foot predetermined condition C11Time, the most current potential cut-point klFor data flow point cutpoint.As window Wl1、 Wl2、Wl3、Wl4、Wl5、Wl6、Wl7、Wl8、Wl9、Wl10And Wl11In middle either window When at least partly data are unsatisfactory for predetermined condition, select next potential cut-point, for potential Cut-point ki、kjAnd klDifference, is expressed as km, kmIt is positioned at klThe right, and kmWith klSpacing 1 byte.It is the rule that duplicate removal server 103 is preset according to embodiment illustrated in fig. 29, for potential Cut-point km11 windows determined are respectively Wm1[km-169,km]、Wm2[km-171,km-2]、 Wm3[km-173,km-4]、Wm4[km-175,km-6]、Wm5[km-177,km-8]、Wm6[km -179,km-10]、Wm7[km-181,km-12]、Wm8[km-183,km-14]、Wm9[km-185, km-16]、Wm10[km-187,km-18] and Wm11[km-189,km-20].Judge Wm1[km-169, kmIn], whether at least part of data meet predetermined condition C1, judge Wm2[km-171,km-2] in At least partly whether data meet predetermined condition C2, judge Wm3[km-173,km-4] at least portion in Whether divided data meets predetermined condition C3, judge Wm4[km-175,km-6] at least part of data in Whether meet predetermined condition C4, judge Wm5[km-177,km-8] in, at least part of data are the fullest Foot predetermined condition C5, judge Wm6[km-179,km-10] in, whether at least part of data meet pre- Fixed condition C6, judge Wm7[km-181,km-12] in, whether at least part of data meet predetermined bar Part C7, judge Wm8[km-183,km-14] in, whether at least part of data meet predetermined condition C8、 Judge Wm9[km-185,km-16] in, whether at least part of data meet predetermined condition C9, judge Wm10[km-187,km-18] in, whether at least part of data meet predetermined condition C10And judgement Wm11[km-189,km-20] in, whether at least part of data meet predetermined condition C11.When judging window Mouth Wm1In at least partly data meet predetermined condition C1, window Wm2In at least partly data full Foot predetermined condition C2, window Wm3In at least partly data meet predetermined condition C3, window Wm4In At least partly data meet predetermined condition C4, window Wm5In at least partly data meet predetermined bar Part C5, window Wm6In at least partly data meet predetermined condition C6, window Wm7In at least portion Divided data meets predetermined condition C7, window Wm8In at least partly data meet predetermined condition C8、 Window Wm9In at least partly data meet predetermined condition C9, window Wm10In at least part of data Meet predetermined condition C10With window Wm11In at least partly data meet predetermined condition C11Time, then Current potential cut-point kmFor data flow point cutpoint.When data at least part of in any one window not When meeting predetermined condition, scheme the most as described above performs jump, latent to obtain the next one At cut-point and determine whether data flow point cutpoint.
Embodiments provide one and judge window Wiz[ki-Az,ki+BzIn] at least Whether part data meet predetermined condition CzMethod, in the present embodiment use random function Judge window Wiz[ki-Az,ki+BzIn], whether at least part of data meet predetermined condition Cz, As a example by the embodiment shown in Figure 21, according to the rule preset on duplicate removal server 103 Then, for potential cut-point kiDetermine window Wi1[ki-169,ki], it is judged that Wi1[ki-169,ki] In at least partly data whether meet predetermined condition C1, as shown in figure 32, Wi1Represent window Mouth Wi1[ki-169,ki], for judging Wi1[ki-169,kiIn], whether at least part of data meet Predetermined condition C1, selecting 5 bytes, in Figure 32, " ■ " represents 1 byte selected, 42 bytes are differed between adjacent two bytes selected.By anti-for 5 byte datas of selection Utilize again 51 times, obtain 255 bytes altogether, to increase randomness.The most each byte by 8 compositions, are designated as am,1…am,8, represent in 255 bytes that the 1st of m-th byte the arrives 8th, therefore, position corresponding to 255 bytes can be expressed as: a 1 , 1 a 1 , 2 ... a 1 , 8 a 2 , 1 a 2 , 2 ... a 2 , 8 . . . . . . . . . . . . a F * H , 1 a F * H , 2 ... a F * H , 8 , Work as am,nWhen=1, Vam,n=1, work as am,nWhen=0, Vam,n=-1, Wherein am,nRepresent am,1…am,8In any one, position corresponding to 255 bytes is according to am,nWith Vam,nTransformational relation obtain matrix Va, can be expressed as: V a 1 , 1 V a 1 , 2 ... V a 1 , 8 V a 2 , 1 V a 2 , 2 ... V a 2 , 8 . . . . . . . . . . . . V a F * H , 1 V a F * H , 2 ... V a F * H , 8 . Choose a large amount of random number, form matrix, by random number According to composition matrix once form, keep constant, as from obey specific distribution (here with As a example by normal distribution) random number in select 255*8 random number to form matrix R: h 1 , 1 h 1 , 2 ... h 1 , 8 h 2 , 1 h 2 , 2 ... h 2 , 8 . . . . . . . . . . . . h 255 , 1 h 255 , 2 ... h 255 , 8 , By matrix VaM row and the m row of matrix R random Number is multiplied, and then summation obtains a value, is embodied as Sam=Vam,1*hm,1+Vam,2*hm,2 +…+Vam,8*hm,8.According to the method, it is thus achieved that Sa1、Sa2... to Sa255, add up Sa1、Sa2… To Sa255In meet number K of value of specified conditions (here as a example by more than 0).Due to Matrix R Normal Distribution, then SamAs matrix R, still Normal Distribution, According to theory of probability, the normal distribution random number probability more than 0 is 1/2, at Sa1、Sa2… To Sa255In, each value probability more than 0 is 1/2, so K meets binomial distribution: P ( k = n ) = C 255 n ( 1 2 ) n ( 1 2 ) 255 - n = C 255 n ( 1 2 ) 255 . According to statistical result, it is judged that Sa1、Sa2… To Sa255Value more than 0 number K whether be even number, the random number of binomial distribution be idol The probability of number is 1/2, so K meets condition with the probability of 1/2.When K is even number, Show Wi1[ki-169,kiIn], at least part of data meet predetermined condition C1;When K is odd number Time, show W1[ki-169,kiIn], at least part of data are unsatisfactory for predetermined condition C1, C here1 I.e. refer to the S obtained according to aforesaid waya1、Sa2... to Sa255Value more than 0 number K be Even number.In the embodiment shown in Figure 21, at Wi1[ki-169,ki]、Wi2[ki-170,ki -1]、Wi3[ki-171,ki-2]、Wi4[ki-172,ki-3]、Wi5[ki-173,ki-4]、Wi6[ki-174, ki-5]、Wi7[ki-175,ki-6]、Wi8[ki-176,ki-7]、Wi9[ki-177,ki-8]、Wi10[ki -178,ki-9] and Wi11[ki-179,ki-10] in, each window size is identical, i.e. window size is equal It is 169 bytes, judges in window, whether at least part of data meet predetermined condition simultaneously Mode is the most identical, is specifically shown in above-mentioned judgement Wi1[ki-169,kiIn], whether at least part of data Meet predetermined condition C1Description.Therefore, as shown in figure 32,Represent and judge window Wi2[ki-170,ki-1] in, whether at least part of data meet predetermined condition C2Time select 1 Individual byte, differs 42 bytes between adjacent two bytes selected.5 words that will select Joint number, according to recycling 51 times, obtains 255 bytes, altogether to increase randomness.The most every Individual byte is formed by 8, is designated as bm,1…bm,8, represent m-th byte in 255 bytes The the 1st to the 8th, therefore, position corresponding to 255 bytes can be expressed as: b 1 , 1 b 1 , 2 ... b 1 , 8 b 2 , 1 b 2 , 2 ... b 2 , 8 . . . . . . . . . . . . b 255 , 1 b 255 , 2 ... b 255 , 8 , Work as bm,nWhen=1, Vbm,n=1, work as bm,nWhen=0, Vbm,n=-1, Wherein bm,nRepresent bm,1…bm,8In any one, position corresponding to 255 bytes is according to bm,nWith Vbm,nTransformational relation obtain matrix Vb, can be expressed as: V b 1 , 1 V b 1 , 2 ... V b 1 , 8 V b 2 , 1 V b 2 , 2 ... V b 2 , 8 . . . . . . . . . . . . V b 255 , 1 V b 255 , 2 ... V b 255 , 8 . Judge Wi1[ki-169,kiIn], whether at least part of data meet the mode of predetermined condition and sentence Disconnected window Wi2[ki-170,ki-1] in, whether at least part of data meet the mode of predetermined condition Identical, therefore use matrix R: h 1 , 1 h 1 , 2 ... h 1 , 8 h 2 , 1 h 2 , 2 ... h 2 , 8 . . . . . . . . . . . . h 255 , 1 h 255 , 2 ... h 255 , 8 , By matrix VbM row Being multiplied with the random number of the m row of matrix R, then summation obtains a value, concrete table It is shown as Sbm=Vbm,1*hm,1+Vbm,2*hm,2+…+Vbm,8*hm,8.According to the method, it is thus achieved that Sb1、 Sb2... to Sb255, add up Sb1、Sb2... to Sb255In meet specified conditions (here to be more than As a example by 0) number K of value.Due to matrix R Normal Distribution, then SbmWith matrix R Equally, still Normal Distribution, according to theory of probability, normal distribution random number is more than 0 Probability be 1/2, at Sb1、Sb2... to Sb255In, each value probability more than 0 is 1/2, So K meets binomial distribution: P ( k = n ) = C 255 n ( 1 2 ) n ( 1 2 ) 255 - n = C 255 n ( 1 2 ) 255 . According to statistics Result, it is judged that Sb1、Sb2... to Sb255Value more than 0 number K whether be even number, two The probability that random number is even number of distribution for for 1/2, so K with 1/2 probability satisfied Condition.When K is even number, show Wi2[ki-170,ki-1] in, at least part of data meet Predetermined condition C2;When K is odd number, show Wi2[ki-170,ki-1] at least partly count in According to being unsatisfactory for predetermined condition C2, C here2I.e. refer to the S obtained according to aforesaid wayb1、Sb2... arrive Sb255Value more than 0 number K be even number.In embodiment shown in Figure 21, Wi2[ki -170,ki-1] in, at least part of data meet predetermined condition C2
Therefore, as shown in figure 32,Represent and judge window Wi3[ki-171,ki-2] at least portion in Whether divided data meets predetermined condition C3Time select 1 byte, adjacent two select bytes Between differ 42 bytes.5 byte datas selected are recycled 51 times, obtains 255 altogether Byte, to increase randomness.Then use and judge window Wi1[ki-169,ki] and Wi2[ki-170,ki -1] method that in, whether at least part of data meet predetermined condition, it is judged that Wi3[ki-171,ki-2] In at least data whether meet predetermined condition C3.In embodiment shown in Figure 21, Wi3[ki-171, ki-2] in, at least part of data meet predetermined condition.As shown in figure 32,Represent and judge window Mouth Wi4[ki-172,ki-3] in, whether at least part of data meet predetermined condition C4Time select 1 Byte, differs 42 bytes between adjacent two bytes selected.5 byte datas that will select Recycle 51 times, obtain 255 bytes altogether, to increase randomness.Then use and judge window Wi1 [ki-169,ki]、Wi2[ki-170,ki-1] and Wi3[ki-171,ki-2] in, at least part of data are the fullest The method of foot predetermined condition, it is judged that Wi4[ki-172,ki-3] in, whether at least part of data meet pre- Fixed condition C4.In embodiment shown in Figure 21, Wi4[ki-172,ki-3] at least part of data in Meet predetermined condition C4.As shown in figure 32,Represent and judge window Wi5[ki-173,ki-4] In at least partly data whether meet predetermined condition C5Time select 1 byte, adjacent two choosings 42 bytes are differed between the byte selected.5 byte datas selected are recycled 51 times, altogether Obtain 255 bytes, to increase randomness.Then use and judge window Wi1[ki-169,ki]、Wi2 [ki-170,ki-1]、Wi3[ki-171,ki-2] and Wi4[ki-172,ki-3] in, whether at least part of data The method meeting predetermined condition, it is judged that Wi5[ki-173,ki-4] in, at least whether data meet predetermined Condition C5.In embodiment shown in Figure 21, Wi5[ki-173,ki-4] in, at least part of data are not Meet predetermined condition C5
Work as Wi5[ki-173,ki-4] in, at least part of data are unsatisfactory for C during predetermined condition5, from a pi5 Along data flow point cutpoint search direction 7 bytes of jump, the end position the 7th byte obtains Obtain next potential cut-point kj, as shown in figure 22, according to preset for duplicate removal server 103 Rule, for potential cut-point kjDetermine window Wj1[kj-169,kj], it is judged that window Wj1[kj-169, kjIn], whether at least part of data meet predetermined condition C1Mode with judge window Wi1[ki -169,kiIn], whether at least part of data meet predetermined condition C1Mode identical, therefore as figure Shown in 33, Wj1Represent window, whether meet predetermined condition C for data at least part of in judging1, Selecting 5 bytes, in Figure 33, " ■ " represents 1 byte selected, adjacent two bytes selected Between differ 42 bytes.5 byte datas selected are recycled 51 times, obtains 255 altogether Byte, to increase randomness.The most each byte is formed by 8, is designated as am,1'…am,8', table Show m-th byte in 255 bytes the 1st to the 8th, therefore, the position that 255 bytes are corresponding Can be expressed as: a 1 , 1 ′ a 1 , 2 ′ ... a 1 , 8 ′ a 2 , 1 ′ a 2 , 2 ′ ... a 2 , 8 ′ . . . . . . . . . . . . a 255 , 1 ′ a 255 , 2 ′ ... a 255 , 8 ′ , Work as am,nDuring '=1, Vam,n'=1, works as am,n' When=0, Vam,n'=-1, wherein am,n' represent am,1'…am,8Any one in ', 255 bytes pair The position answered is according to am,n' and Vam,n' transformational relation obtain matrix Va', can be expressed as: V a 1 , 1 ′ V a 1 , 2 ′ ... V a 1 , 8 ′ V a 2 , 1 ′ V a 2 , 2 ′ ... V a 2 , 8 ′ . . . . . . . . . . . . V a 255 , 1 ′ V a 255 , 2 ′ ... V a 255 , 8 ′ . Judge in window, whether at least part of data meet predetermined Condition with judge window Wi1[ki-169,kiIn], whether at least part of data meet predetermined condition Mode identical, therefore use matrix R: h 1 , 1 h 1 , 2 ... h 1 , 8 h 2 , 1 h 2 , 2 ... h 2 , 8 . . . . . . . . . . . . h 255 , 1 h 255 , 2 ... h 255 , 8 , By matrix Va' m Row is multiplied with the random number of the m row of matrix R, and then summation obtains a value, specifically represents For Sam'=Vam,1'*hm,1+Vam,2'*hm,2+…+Vam,8'*hm,8.According to the method, it is thus achieved that Sa1'、 Sa2' ... to Sa255', add up Sa1'、Sa2' ... to Sa255Specified conditions are met (here with greatly in ' As a example by 0) number K of value.Due to matrix R Normal Distribution, then Sam' and matrix R Equally, still Normal Distribution, according to theory of probability, general more than 0 of normal distribution random number Rate is 1/2, at Sa1'、Sa2' ... to Sa255In ', each value probability more than 0 is 1/2, so K meets binomial distribution: P ( k = n ) = C 255 n ( 1 2 ) n ( 1 2 ) 255 - n = C 255 n ( 1 2 ) 255 . According to statistical result, Judge Sa1'、Sa2' ... to Sa255' value more than 0 number K whether be even number, binomial distribution Random number be the probability of even number be 1/2, so K meets condition with the probability of 1/2.When K is even number Time, show Wj1[kj-169,kjIn], at least part of data meet predetermined condition C1;When K is odd number Time, show Wj1[kj-169,kjIn], at least part of data are unsatisfactory for predetermined condition C1
Judge Wi2[ki-170,ki-1] in, whether at least part of data meet predetermined condition C2Side Formula and judge Wj2[kj-170,kj-1] in, whether at least part of data meet predetermined condition C2Side Formula is identical, therefore, as shown in figure 33,Represent and judge window Wj2[kj-170,kj-1] in extremely Whether small part data meet predetermined condition C2Time select 1 byte, adjacent two select 42 bytes are differed between byte.5 byte datas selected are recycled 51 times, obtains altogether 255 bytes, to increase randomness.The most each byte is formed by 8, is designated as bm,1'…bm,8', Representing in 255 bytes the 1st to the 8th of m-th byte, therefore, 255 bytes are corresponding Position can be expressed as: b 1 , 1 ′ b 1 , 2 ′ ... b 1 , 8 ′ b 2 , 1 ′ b 2 , 2 ′ ... b 2 , 8 ′ . . . . . . . . . . . . b 255 , 1 ′ b 255 , 2 ′ ... b 255 , 8 ′ , Work as bm,nDuring '=1, Vbm,n'=1, works as bm,n' When=0, Vbm,n'=-1, wherein bm,n' represent bm,1'…bm,8Any one in ', 255 byte correspondences Position according to bm,n' and Vbm,n' transformational relation obtain matrix Vb', can be expressed as: V b 1 , 1 V b 1 , 2 ... V b 1 , 8 V b 2 , 1 V b 2 , 2 ... V b 2 , 8 . . . . . . . . . . . . V b 255 , 1 V b 255 , 2 ... V b 255 , 8 . Judge window Wi2[ki-170,ki-1] in, at least part of data are No meet predetermined condition C1And Wj2[kj-170,kj-1] in, whether at least part of data meet predetermined Condition C1Mode identical, the most still use matrix R: h 1 , 1 h 1 , 2 ... h 1 , 8 h 2 , 1 h 2 , 2 ... h 2 , 8 . . . . . . . . . . . . h 255 , 1 h 255 , 2 ... h 255 , 8 , By matrix Vb' m row be multiplied with the random number of the m row of matrix R, then summation obtain a value, It is embodied as Sbm'=Vbm,1'*hm,1+Vbm,2'*hm,2+…+Vbm,8'*hm,8.According to the method, Obtain Sb1'、Sb2' ... to Sb255', add up Sb1'、Sb2' ... to Sb255Specified conditions (this is met in ' In as a example by more than 0) number K of value.Due to matrix R Normal Distribution, then Sbm' with Matrix R is the same, still Normal Distribution, and according to theory of probability, normal distribution random number is more than 0 Probability be 1/2, at Sb1'、Sb2' ... to Sb255In ', each value probability more than 0 is 1/2, So K meets binomial distribution: P ( k = n ) = C 255 n ( 1 2 ) n ( 1 2 ) 255 - n = C 255 n ( 1 2 ) 255 . According to statistics knot Really, it is judged that Sb1'、Sb2' ... to Sb255' value more than 0 number K whether be even number, binomial divides The random number of cloth be the probability of even number for for 1/2, so K meets condition with the probability of 1/2.Work as K During for even number, in showing, at least part of data meet predetermined condition C2;When K is odd number, table Bright Wj2[kj-170,kj-1] in, at least part of data are unsatisfactory for predetermined condition C2.In like manner, it is judged that Wi3 [ki-171,ki-2] in, whether at least part of data meet predetermined condition C3Mode with judge Wj3 [kj-171,kj-2] in, whether at least part of data meet predetermined condition C3Mode identical, in like manner, Judge Wj4[kj-172,kj-3] in, whether at least part of data meet predetermined condition C4, judge Wj5 [kj-173,kj-4] in, whether at least part of data meet predetermined condition C5, judge Wj6[kj-174, kj-5] in, whether at least part of data meet predetermined condition C6, judge Wj7[kj-175,kj-6] in At least partly whether data meet predetermined condition C7, judge Wj8[kj-176,kj-7] at least partly Whether data meet predetermined condition C8, judge Wj9[kj-177,kj-8] in, whether at least part of data Meet predetermined condition C9, judge Wj10[kj-178,kj-9] in, whether at least part of data meet pre- Fixed condition C10With judge Wj11[kj-179,kj-10] in, whether at least part of data meet predetermined bar Part C11, do not repeat them here.
The present embodiment use random function judge window Wiz[ki-Az,ki+BzAt least portion in] Whether divided data meets predetermined condition Cz, still as a example by Figure 21 illustrated embodiment, according to The rule preset on duplicate removal server 103, for potential cut-point kiDetermine window Wi1[ki-169, ki], it is judged that Wi1[ki-169,kiIn], whether at least part of data meet predetermined condition C1, as Shown in Figure 32, Wi1Represent window Wi1[ki-169,ki], for judging Wi1[ki-169,kiIn] at least Whether part data meet predetermined condition C1, selecting 5 bytes, in Figure 32, " ■ " represents selection 1 byte, between the byte of adjacent two selections " ■ " differ 42 bytes.One of which Implementation is to use HASH function to calculate 5 bytes selected, and uses HASH function to calculate The numerical value obtained is one and fixing is uniformly distributed, if using the calculated number of HASH function Value is even number, then judge Wi1[ki-169,kiIn], at least part of data meet predetermined condition C1, I.e. C1Representing uses the calculated numerical value of HASH function to be even number according to aforesaid way.Therefore, Wi1[ki-169,kiThe probability that in], whether at least part of data meet predetermined condition is 1/2.At figure In embodiment shown in 21, Hash function is used to judge Wi2[ki-170,ki-1] at least partly Whether data meet predetermined condition C2、Wi3[ki-171,ki-2] in, whether at least part of data meet Predetermined condition C3、Wi4[ki-172,ki-3] in, whether at least part of data meet predetermined condition C4With Wi5[ki-173,ki-4] in, whether at least part of data meet predetermined condition C5, implement and can join Examining description Figure 21 illustrated embodiment uses Hash function to judge Wi1[ki-169,kiAt least portion in] Whether divided data meets mode C of predetermined condition1, do not repeat them here.
Work as Wi5[ki-173,ki-4] in, at least part of data are unsatisfactory for predetermined condition C5Time, from potential Cut-point kiAlong data flow point cutpoint search direction 7 bytes of jump, at the knot of the 7th byte Bundle position obtains current potential cut-point kj, as shown in figure 22, according to for duplicate removal server 103 The rule preset, for potential cut-point kjDetermine window Wj1[kj-169,kj], it is judged that window Wj1 [kj-169,kjIn], whether at least part of data meet predetermined condition C1Mode with judge window Wi1[ki-169,kiIn], whether at least part of data meet predetermined condition C1Mode identical, therefore As shown in figure 33, Wj1Represent window Wj1[kj-169,kj], for judging Wj1[kj-169,kjIn] extremely Whether small part data meet predetermined condition C1, selecting 5 bytes, in Figure 33, " ■ " represents choosing 1 byte selected, differs 42 bytes between adjacent two bytes " ■ " selected.Use Hash Function calculates from window Wj1[kj-169,kj5 bytes chosen in], if the numerical value obtained is Even number, then Wj1[kj-169,kjIn], at least part of data meet predetermined condition C1.In Figure 33, sentence Disconnected Wi2[ki-170,ki-1] in, whether at least part of data meet predetermined condition C2Mode and sentence Disconnected Wj2[kj-170,kj-1] in, whether at least part of data meet predetermined condition C2Mode identical, Therefore, as shown in figure 33,Represent and judge window Wj2[kj-170,kj-1] at least partly count in According to whether meeting predetermined condition C2Time select 1 byte, adjacent two select bytes Between differ 42 bytes.Hash function is used to calculate 5 bytes selected, if obtain Numerical value is even number, then Wj2[kj-170,kj-1] in, at least part of data meet predetermined condition C2.Figure In 33, it is judged that Wi3[ki-171,ki-2] in, whether at least part of data meet predetermined condition C3Side Formula with judge Wj3[kj-171,kj-2] in, whether at least part of data meet predetermined condition C3Side Formula is identical, therefore, as shown in figure 33,Represent and judge window Wj3[kj-171,kj-2] in extremely Whether small part data meet predetermined condition C3Time select 1 byte, adjacent two select ByteBetween differ 42 bytes.Hash function is used to calculate 5 bytes selected, To numerical value be even number, then Wj3[kj-171,kj-2] in, at least part of data meet predetermined condition C3。 In Figure 33, it is judged that Wj4[kj-172,kj-3] in, whether at least part of data meet predetermined condition C4's Mode and judge window Wi4[ki-172,ki-3] in, whether at least part of data meet predetermined condition C4Mode, therefore, as shown in figure 33,Represent and judge window Wj4[kj-172,kj-3] In at least partly data whether meet predetermined condition C4Time select 1 byte, adjacent two choosings The byte selectedBetween differ 42 bytes.Hash function is used to calculate 5 bytes selected, The numerical value obtained is even number, then Wj4[kj-172,kj-3] in, at least part of data meet predetermined condition C4.According to said method, it is judged that Wj5[kj-173,kj-4] in, whether at least part of data meet pre- Fixed condition C5, judge Wj6[kj-174,kj-5] in, whether at least part of data meet predetermined condition C6、 Judge Wj7[kj-175,kj-6] in, whether at least part of data meet predetermined condition C7, judge Wj8 [kj-176,kj-7] in, whether at least part of data meet predetermined condition C8, judge Wj9[kj-177, kj-8] in, whether at least part of data meet predetermined condition C9, judge Wj10[kj-178,kj-9] in At least partly whether data meet predetermined condition C10With judge Wj11[kj-179,kj-10] at least Whether part data meet predetermined condition C11, do not repeat them here.
The present embodiment use random function judge window Wiz[ki-Az,ki+BzAt least portion in] Whether divided data meets predetermined condition Cz, as a example by the embodiment shown in Figure 21, according to going The rule preset on weight server 103, for potential cut-point kiDetermine window Wi1[ki-169,ki], Judge Wi1[ki-169,kiIn], whether at least part of data meet predetermined condition C1, such as Figure 32 institute Show, Wi1Represent window Wi1[ki-169,ki], for judging Wi1[ki-169,kiAt least partly count in] According to whether meeting predetermined condition C1, select 5 bytes, serial number 169 in Figure 32,127,85, The byte " ■ " of 43 and 1 represents 1 byte of selection respectively, adjacent two bytes selected it Between differ 42 bytes.The byte " ■ " of serial number 169,127,85,43 and 1 is turned respectively Change a decimal value into, be expressed as a1、a2、a3、a4And a5.Because 1 Byte is formed by 8, so each byte " ■ " is as numerical value, then an a1、a2、a3、 a4And a5In any one arIt is satisfied by 0≤ar≤255。a1、a2、a3、a4And a5Composition The matrix of 1*5.From the random number obeying binomial distribution, select 256*5 random number, form square Battle array R, is expressed as: h 0 , 1 h 0 , 2 ... h 0 , 5 h 1 , 1 h 1 , 2 ... h 1 , 5 . . . . . . . . . . . . h 255 , 5 h 255 , 5 ... h 255 , 5 ,
According to a1Value and the row at place, search from matrix R correspondence value, such as a1=36, a1 It is positioned at the 1st row, then searches h36,1Corresponding value;According to a2Value and the row at place, from matrix R The middle value searching correspondence, such as a2=48, a2It is positioned at the 2nd row, then searches h48,2Corresponding value;Root According to a3Value and the row at place, search from matrix R correspondence value, such as a3=26, a3It is positioned at 3rd row, then search h26,3Corresponding value;According to a4Value and the row at place, look into from matrix R Look for the value of correspondence, such as a4=26, a4It is positioned at the 4th row, then searches h26,4Corresponding value;According to a5 Value and the row at place, search from matrix R correspondence value, such as a5=88, a5It is positioned at the 5th Row, then search h88,5Corresponding value.S1=h36,1+h48,2+h26,3+h26,4+h88,5, because matrix R clothes From binomial distribution, therefore, S1Also binomial distribution is obeyed.Work as S1For even number, then Wi1[ki-169, kiIn], at least part of data meet predetermined condition C1, work as S1For odd number, then Wi1[ki-169,ki] In at least partly data be unsatisfactory for predetermined condition C1, S1Probability for even number is 1/2, C1Represent Calculate S in a manner described1For even number.In embodiment illustrated in fig. 21, Wi1[ki-169,kiIn] At least partly data meet predetermined condition C1.As shown in figure 32,Represent and judge window Wi2[ki -170,ki-1] in, whether at least part of data meet predetermined condition C2Time 1 byte selecting respectively, In Figure 32, represent by sequence number 170,128,86,44 and 2 respectively, adjacent two selections 42 bytes are differed between byte.Byte by sequence number 170,128,86,44 and 2Point It is not converted into a decimal value, is expressed as b1、b2、b3、b4And b5.Because 1 Individual byte is formed by 8, so each byteAs numerical value, then a b1、b2、b3、 b4And b5In any one brIt is satisfied by 0≤br≤255。b1、b2、b3、b4And b5Composition 1*5 Matrix.In present embodiment, it is judged that Wi1And Wi2In at least partly data whether meet predetermined The mode of condition is identical, the most still uses matrix R, according to b1Value and the row at place, from Matrix R searches the value of correspondence, such as b1=66, b1It is positioned at the 1st row, then searches h66,1Corresponding Value;According to b2Value and the row at place, search from matrix R correspondence value, such as b2=48, b2 It is positioned at the 2nd row, then searches h48,2Corresponding value;According to b3Value and the row at place, from matrix R The middle value searching correspondence, such as b3=99, b3It is positioned at the 3rd row, then searches h99,3Corresponding value;Root According to b4Value and the row at place, search from matrix R correspondence value, such as b4=26, b4It is positioned at 4th row, then search h26,4Corresponding value;According to b5Value and the row at place, look into from matrix R Look for the value of correspondence, such as b5=90, b5It is positioned at the 5th row, then searches h90,5Corresponding value.S2=h66,1+ h48,2+h99,3+h26,4+h90,5, because matrix R obeys binomial distribution, therefore, S2Also obey binomial to divide Cloth.Work as S2For even number, then Wi2[ki-170,ki-1] in, at least part of data meet predetermined condition C2, Work as S2For odd number, then Wi2[ki-170,ki-1] in, at least part of data are unsatisfactory for predetermined condition C2, S2Probability for even number is 1/2.In embodiment illustrated in fig. 21, Wi2[ki-170,ki-1] in extremely Small part data meet predetermined condition C2.Use same rule, judge W respectivelyi3[ki-171, ki-2] in, whether at least part of data meet predetermined condition C3, judge Wi4[ki-172,ki-3] in extremely Whether small part data meet predetermined condition C4, judge Wi5[ki-173,ki-4] at least partly count in According to whether meeting predetermined condition C5, judge Wi6[ki-174,ki-5] in, at least part of data are the fullest Foot predetermined condition C6, judge Wi7[ki-175,ki-6] in, whether at least part of data meet predetermined bar Part C7, judge Wi8[ki-176,ki-7] in, whether at least part of data meet predetermined condition C8, sentence Disconnected Wi9[ki-177,ki-8] in, whether at least part of data meet predetermined condition C9, judge Wi10[ki -178,ki-9] in, whether at least part of data meet predetermined condition C10With judge Wi11[ki-179,ki -10] in, whether at least part of data meet predetermined condition C11.In embodiment shown in Figure 21, Wi5[ki-173,ki-4] in, at least part of data are unsatisfactory for predetermined condition C5, from potential cut-point ki Along data flow point cutpoint search direction 7 bytes of jump, the end position the 7th byte obtains Obtain current potential cut-point kj, as shown in figure 22, according to the rule preset for duplicate removal server 103 Then, for potential cut-point kjDetermine window Wj1[kj-169,kj], it is judged that window Wj1[kj-169,kj] In at least partly data whether meet predetermined condition C1Mode with judge window Wi1[ki-169,ki] In at least partly data whether meet predetermined condition C1Mode identical, the most as shown in figure 33, Wj1Represent window Wj1[kj-169,kj], for judging Wj1[kj-169,kjIn], at least part of data are No meet predetermined condition C1, the byte " ■ " of serial number 169,127,85,43 and 1 in Figure 33 Represent 1 byte of selection respectively, between adjacent two bytes selected, differ 42 bytes. The byte " ■ " of serial number 169,127,85,43 and 1 is converted into a decimal number respectively Value, is expressed as a1'、a2'、a3'、a4' and a5'.Because 1 byte is formed by 8, So each byte " ■ " is as numerical value, then an a1'、a2'、a3'、a4' and a5Appointing in ' One ar' it is satisfied by 0≤ar'≤255。a1'、a2'、a3'、a4' and a5' composition 1*5 matrix. Judge window Wj1[kj-169,kjIn], whether at least part of data meet predetermined condition C1Mode With judge window Wi1[ki-169,kiIn], whether at least part of data meet predetermined condition C1Side Formula is identical, therefore, still uses matrix R, is expressed as: h 0 , 1 h 0 , 2 ... h 0 , 5 h 1 , 1 h 1 , 2 ... h 1 , 5 . . . . . . . . . . . . h 255 , 5 h 255 , 5 ... h 255 , 5 ,
According to a1' value and the row at place, search from matrix R correspondence value, such as a1'=16, a1' be positioned at the 1st row, then search h16,1Corresponding value;According to a2' value and the row at place, from Matrix R searches the value of correspondence, such as a2'=98, a2' be positioned at the 2nd row, then search h98,2Right The value answered;According to a3' value and the row at place, search from matrix R correspondence value, as a3'=56, a3' be positioned at the 3rd row, then search h56,3Corresponding value;According to a4' value and place Row, search from matrix R correspondence value, such as a4'=36, a4' it is positioned at the 4th row, then Search h36,4Corresponding value;According to a5' value and the row at place, it is right to search from matrix R The value answered, such as a5'=99, a5' be positioned at the 5th row, then search h99,5Corresponding value.S1'=h16,1 +h98,2+h56,3+h36,4+h99,5, because matrix R obeys binomial distribution, therefore, S1' also obey two Item distribution.Work as S1' for even number, then Wj1[kj-169,kjIn], at least part of data meet predetermined Condition C1, work as S1' for odd number, then Wj1[kj-169,kjIn], at least part of data are unsatisfactory for making a reservation for Condition C1, S1' it is 1/2 for the probability of even number.
Judge Wi2[ki-170,ki-1] in, whether at least part of data meet predetermined condition C2Side Formula and judge Wj2[kj-170,kj-1] in, whether at least part of data meet predetermined condition C2Side Formula is identical, therefore, as shown in figure 33,Represent and judge window Wj2[kj-170,kj-1] in extremely Whether small part data meet predetermined condition C2Time select 1 byte, adjacent two select Differ 42 bytes between byte, represent by sequence number 170,128,86,44 and 2 respectively, phase 42 bytes are differed between adjacent two bytes selected.By sequence number 170,128,86,44 and 2 ByteIt is converted into a decimal value respectively, is expressed as b1'、b2'、b3'、 b4' and b5'.Because 1 byte is formed by 8, so each byteAs a numerical value, Then b1'、b2'、b3'、b4' and b5Any one b in 'r' it is satisfied by 0≤br'≤255。b1'、b2'、b3'、 b4' and b5' composition 1*5 matrix.With judge window Wi2[ki-170,ki-1] at least part of data in Whether meet predetermined condition C2Use identical matrix R, according to b1' value and the row at place, from Matrix R searches the value of correspondence, such as b1'=210, b1' be positioned at the 1st row, then search h210,1Corresponding Value;According to b2' value and the row at place, search from matrix R correspondence value, such as b2'=156, b2' be positioned at the 2nd row, then search h156,2Corresponding value;According to b3' value and the row at place, from square Battle array R searches the value of correspondence, such as b3'=144, b3' be positioned at the 3rd row, then search h144,3Corresponding Value;According to b4' value and the row at place, search from matrix R correspondence value, such as b4'=60, b4' It is positioned at the 4th row, then searches h60,4Corresponding value;According to b5' value and the row at place, from matrix R The middle value searching correspondence, such as b5'=90, b5' be positioned at the 5th row, then search h90,5Corresponding value.S2' =h210,1+h156,2+h144,3+h60,4+h90,5, with S2Rule of judgment identical, work as S2' for even number, Then Wj2[kj-170,kj-1] in, at least part of data meet predetermined condition C2, work as S2' for odd number, Then Wj2[kj-170,kj-1] in, at least part of data are unsatisfactory for predetermined condition C2, S2' for even number Probability is 1/2.
In like manner, it is judged that Wi3[ki-171,ki-2] in, whether at least part of data meet predetermined condition C3 Mode with judge Wj3[kj-171,kj-2] in, whether at least part of data meet predetermined condition C3 Mode identical, in like manner, it is judged that Wj4[kj-172,kj-3] in, whether at least part of data meet pre- Fixed condition C4, judge Wj5[kj-173,kj-4] in, whether at least part of data meet predetermined condition C5、 Judge Wj6[kj-174,kj-5] in, whether at least part of data meet predetermined condition C6, judge Wj7 [kj-175,kj-6] in, whether at least part of data meet predetermined condition C7, judge Wj8[kj-176, kj-7] in, whether at least part of data meet predetermined condition C8, judge Wj9[kj-177,kj-8] in At least partly whether data meet predetermined condition C9, judge Wj10[kj-178,kj-9] at least portion in Whether divided data meets predetermined condition C10With judge Wj11[kj-179,kj-10] at least partly count in According to whether meeting predetermined condition C11, do not repeat them here.
The present embodiment use random function judge window Wiz[ki-Az,ki+BzAt least portion in] Whether divided data meets predetermined condition Cz, as a example by the embodiment shown in Figure 21, according to going The rule preset on weight server 103, for potential cut-point kiDetermine window Wi1[ki-169,ki], Judge Wi1[ki-169,kiIn], whether at least part of data meet predetermined condition C1, such as Figure 32 Shown in, Wi1Represent window Wi1[ki-169,ki], for judging Wi1[ki-169,kiIn] at least partly Whether data meet predetermined condition C1, select 5 bytes, serial number 169 in Figure 32,127, 85, the byte " ■ " of 43 and 1 represents 1 byte of selection, adjacent two words selected respectively 42 bytes are differed between joint.By the byte " ■ " of serial number 169,127,85,43 and 1 point It is not converted into a decimal value, is expressed as a1、a2、a3、a4And a5.Because 1 Individual byte is formed by 8, so each byte " ■ " is as numerical value, then an a1、a2、a3、 a4And a5In any one asIt is satisfied by 0≤as≤255。a1、a2、a3、a4And a5Composition 1*5 Matrix.From the random number obeying binomial distribution, select 256*5 random number, form matrix R, It is expressed as: h 0 , 1 h 0 , 2 ... h 0 , 5 h 1 , 1 h 1 , 2 ... h 1 , 5 . . . . . . . . . . . . h 255 , 5 h 255 , 5 ... h 255 , 5 , 256*5 is selected from the random number obeying binomial distribution Individual random number, forms matrix G, is expressed as: g 0 , 1 g 0 , 2 ... g 0 , 5 g 1 , 1 g 1 , 2 ... g 1 , 5 . . . . . . . . . . . . g 255 , 5 g 255 , 5 ... g 255 , 5 .
According to a1Value and the row at place, such as a1=36, a1It is positioned at the 1st row, then from matrix R Search h36,1Corresponding value, searches g from matrix G36,1Corresponding value;According to a2Value and The row at place, such as a2=48, a2It is positioned at the 2nd row, then from matrix R, looks into h48,2Corresponding value, G is searched from matrix G48,2Corresponding value;According to a3Value and the row at place, such as a3=26, a3 It is positioned at the 3rd row, then from matrix R, searches h26,3Corresponding value, searches g from matrix G26,3Right The value answered;According to a4Value and the row at place, such as a4=26, a4It is positioned at the 4th row, then from matrix R searches h26,4Corresponding value, searches g from matrix G26,4Corresponding value;According to a5Value and The row at place, such as a5=88, a5It is positioned at the 5th row, then from matrix R, searches h88,5Corresponding value, G is searched from matrix G88,5Corresponding value.S1h=h36,1+h48,2+h26,3+h26,4+h88,5, because matrix R obeys binomial distribution, therefore, S1hAlso binomial distribution is obeyed;S1g=g36,1+g48,2+g26,3+g26,4+ g88,5, because matrix G obeys binomial distribution, therefore S1gAlso binomial distribution is obeyed.Work as S1hAnd S1g In have 1 for even number, then Wi1[ki-169,kiIn], at least part of data meet predetermined condition C1, Work as S1hAnd S1gIt is odd number, then Wi1[ki-169,kiIn], at least part of data are unsatisfactory for predetermined bar Part C1, C1The S that statement obtains according to the method described above1hAnd S1gIn have 1 for even number.Because S1hWith S1gAll obey binomial distribution, therefore S1hProbability for even number is 1/2, S1gProbability for even number It is 1/2, S1hAnd S1gIn to have 1 probability for even number be 1-1/4=3/4, therefore, Wi1[ki-169, kiIn], at least part of data meet predetermined condition C1Probability be 3/4.In embodiment illustrated in fig. 21 In, Wi1[ki-169,kiIn], at least part of data meet predetermined condition C1.Shown in Figure 21 In embodiment, at Wi1[ki-169,ki]、Wi2[ki-170,ki-1]、Wi3[ki-171,ki-2]、Wi4 [ki-172,ki-3]、Wi5[ki-173,ki-4]、Wi6[ki-174,ki-5]、Wi7[ki-175,ki-6]、Wi8 [ki-176,ki-7]、Wi9[ki-177,ki-8]、Wi10[ki-178,ki-9] and Wi11[ki-179,ki-10] In, each window size is identical, i.e. window size is 169 bytes, judges in window at least simultaneously The mode whether part data meet predetermined condition is the most identical, is specifically shown in above-mentioned judgement Wi1[ki-169, kiIn], whether at least part of data meet predetermined condition C1Description.Therefore, as shown in figure 32,Represent and judge window Wi2[ki-170,ki-1] in, whether at least part of data meet predetermined bar Part C2Time 1 byte selecting respectively, in Figure 32, respectively with sequence number 170,128,86, 44 and 2 represent, differ 42 bytes between adjacent two bytes selected.By sequence number 170,128, 86, the byte of 44 and 2It is converted into a decimal value respectively, is expressed as b1、 b2、b3、b4And b5.Because 1 byte is formed by 8, so each byteAs one Individual numerical value, then b1、b2、b3、b4And b5In any one bsIt is satisfied by 0≤bs≤255。b1、b2、 b3、b4And b5The matrix of composition 1*5.In present embodiment, it is judged that each window at least partly counts Identical according to the mode whether meeting predetermined condition, the most still use same matrix R and G.Root According to b1Value and the row at place, such as b1=66, b1It is positioned at the 1st row, then from matrix R, searches h66,1 Corresponding value, searches g from matrix G66,1Corresponding value;According to b2Value and the row at place, Such as b2=48, b2It is positioned at the 2nd row, then from matrix R, searches h48,2Corresponding value, from matrix G Search g48,2Corresponding value;According to b3Value and the row at place, such as b3=99, b3It is positioned at the 3rd row, From matrix R, then search h99,3Corresponding value, searches g from matrix G99,3Corresponding value;According to b4Value and the row at place, such as b4=26, b4It is positioned at the 4th row, then from matrix R, searches h26,4Right The value answered, searches g from matrix G26,4Corresponding value;According to b5Value and the row at place, such as b5 =90, b5It is positioned at the 5th row, then from matrix R, searches h90,5Corresponding value, searches from matrix G g90,5Corresponding value.S2h=h66,1+h48,2+h99,3+h26,4+h90,5, divide because matrix R obeys binomial Cloth, therefore, S2hAlso binomial distribution is obeyed.S2g=g66,1+g48,2+g99,3+g26,4+g90,5, because Matrix G obeys binomial distribution, therefore, S2gAlso binomial distribution is obeyed.Work as S2hAnd S2gIn have 1 Individual for even number, then Wi2[ki-170,ki-1] in, at least part of data meet predetermined condition C2, work as S2h And S2gIt is odd number, then Wi2[ki-170,ki-1] in, at least part of data are unsatisfactory for predetermined condition C2, S2hAnd S2gIn to have 1 probability for even number be 3/4.In embodiment illustrated in fig. 21, Wi2 [ki-170,ki-1] in, at least part of data meet predetermined condition C2.Use same rule, point Do not judge Wi3[ki-171,ki-2] in, whether at least part of data meet predetermined condition C3, judge Wi4[ki-172,ki-3] in, whether at least part of data meet predetermined condition C4, judge Wi5[ki -173,ki-4] in, whether at least part of data meet predetermined condition C5, judge Wi6[ki-174,ki-5] In at least partly data whether meet predetermined condition C6, judge Wi7[ki-175,ki-6] at least portion in Whether divided data meets predetermined condition C7, judge Wi8[ki-176,ki-7] in, at least part of data are No meet predetermined condition C8, judge Wi9[ki-177,ki-8] in, whether at least part of data meet pre- Fixed condition C9, judge Wi10[ki-178,ki-9] in, whether at least part of data meet predetermined condition C10With judge Wi11[ki-179,ki-10] in, whether at least part of data meet predetermined condition C11.Figure In embodiment shown in 21, Wi5[ki-173,ki-4] in, at least part of data are unsatisfactory for predetermined bar Part C5, from potential cut-point kiAlong data flow point cutpoint search direction jump 7 bytes, The end position of the 7th byte obtains current potential cut-point kj, as shown in figure 22, according to for The rule that duplicate removal server 103 is preset, for potential cut-point kjDetermine window Wj1[kj-169,kj], Judge window Wj1[kj-169,kjIn], whether at least part of data meet predetermined condition C1Mode With judge window Wi1[ki-169,kiIn], whether at least part of data meet predetermined condition C1Side Formula is identical, the most as shown in figure 33, and Wj1Represent window Wj1[kj-169,kj], for judging Wj1[kj -169,kjIn], whether at least part of data meet predetermined condition C1, serial number 169 in Figure 33,127, 85, the byte " ■ " of 43 and 1 represents 1 byte of selection, adjacent two words selected respectively 42 bytes are differed between joint.By the byte " ■ " of serial number 169,127,85,43 and 1 point It is not converted into a decimal value, is expressed as a1'、a2'、a3'、a4' and a5'.Cause It is that 1 byte is formed by 8, so each byte " ■ " is as numerical value, then an a1'、a2'、 a3'、a4' and a5Any one a in 's' it is satisfied by 0≤as'≤255。a1'、a2'、a3'、a4' and a5' composition 1*5 matrix.Use and judge window Wi1[ki-169,kiIn], at least part of data are No meet predetermined condition C1Identical matrix R and G, is expressed as: h 0 , 1 h 0 , 2 ... h 0 , 5 h 1 , 1 h 1 , 2 ... h 1 , 5 . . . . . . . . . . . . h 255 , 5 h 255 , 5 ... h 255 , 5 With g 0 , 1 g 0 , 2 ... g 0 , 5 g 1 , 1 g 1 , 2 ... g 1 , 5 . . . . . . . . . . . . g 255 , 5 g 255 , 5 ... g 255 , 5 .
According to a1' value and the row at place, such as a1'=16, a1' be positioned at the 1st row, then look into from matrix R Look for h16,1Corresponding value, searches g from matrix G16,1Corresponding value;According to a2' value and place Row, such as a2'=98, a2' be positioned at the 2nd row, then from matrix R, search h98,2Corresponding value, from square Battle array G searches g98,2Corresponding value;According to a3' value and the row at place, such as a3'=56, a3' position In the 3rd row, then from matrix R, search h56,3Corresponding value, searches g from matrix G56,3Corresponding Value;According to a4' value and the row at place, such as a4'=36, a4' it is positioned at the 4th row, then from matrix R Middle lookup h36,4Corresponding value, searches g from matrix G36,4Corresponding value;According to a5' value and The row at place, such as a5'=99, a5' be positioned at the 5th row, then from matrix R, search h99,5Corresponding Value, searches g from matrix G99,5Corresponding value.S1h'=h16,1+h98,2+h56,3+h36,4+h99,5, because of Binomial distribution, therefore, S is obeyed for matrix R1h' also obey binomial distribution;S1g'=g16,1+g98,2+ g56,3+g36,4+g99,5, because matrix G obeys binomial distribution, therefore S1g' also obey binomial distribution. Work as S1h' and S1g1 is had for even number, then W in 'j1[kj-169,kjIn], at least part of data meet pre- Fixed condition C1, work as S1h' and S1g' be odd number, then Wj1[kj-169,kjAt least part of data in] It is unsatisfactory for predetermined condition C1, S1h' and S1g' to have 1 probability for even number be 3/4.
Judge Wi2[ki-170,ki-1] in, whether at least part of data meet predetermined condition C2Side Formula and judge Wj2[kj-170,kj-1] in, whether at least part of data meet predetermined condition C2Side Formula is identical, therefore, as shown in figure 33,Represent and judge window Wj2[kj-170,kj-1] in extremely Whether small part data meet predetermined condition C2Time select 1 byte, adjacent two select 42 bytes are differed between byte.In fig. 33, respectively by sequence number 170,128,86,44 Represent with 2, between adjacent two bytes selected, differ 42 bytes.By sequence number 170,128, 86, the byte of 44 and 2It is converted into a decimal value respectively, is expressed as b1'、 b2'、b3'、b4' and b5'.Because 1 byte is formed by 8, so each byteAs One numerical value, then b1'、b2'、b3'、b4' and b5Any one b in 's' it is satisfied by 0≤bs'≤255。 b1'、b2'、b3'、b4' and b5' composition 1*5 matrix.Use and judge window Wj2[kj-170,kj -1] in, whether at least part of data meet predetermined condition C2Identical matrix R and G, according to b1' Value and the row at place, such as b1'=210, b1' be positioned at the 1st row, then from matrix R, search h210,1Right The value answered, searches g from matrix G210,1Corresponding value;According to b2' value and the row at place, as b2'=156, b2' be positioned at the 2nd row, then from matrix R, search h156,2Corresponding value, from matrix G Search g156,2Corresponding value;According to b3' value and the row at place, such as b3'=144, b3' it is positioned at the 3rd Row, then search h from matrix R144,3Corresponding value, searches g from matrix G144,3Corresponding value; According to b4' value and the row at place, such as b4'=60, b4' be positioned at the 4th row, then look into from matrix R Look for h60,4Corresponding value, searches g from matrix G60,4Corresponding value;According to b5' value and place Row, such as b5'=90, b5' be positioned at the 5th row, then from matrix R, search h90,5Corresponding value, from square Battle array G searches g90,5Corresponding value.S2h'=h210,1+h156,2+h144,3+h60,4+h90,5,S2g'=g210,1+ g156,2+g144,3+g60,4+g90,5.Work as S2h' and S2g1 is had for even number, then W in 'j2[kj-170,kj -1] in, at least part of data meet predetermined condition C2, work as S2h' and S2g' be odd number, then Wj2[kj -170,kj-1] in, at least part of data are unsatisfactory for predetermined condition C2, S2h' and S2g1 is had for even in ' The probability of number is 3/4.
In like manner, it is judged that Wi3[ki-171,ki-2] in, whether at least part of data meet predetermined condition C3 Mode with judge Wj3[kj-171,kj-2] in, whether at least part of data meet predetermined condition C3 Mode identical, in like manner, it is judged that Wj4[kj-172,kj-3] in, whether at least part of data meet pre- Fixed condition C4, judge Wj5[kj-173,kj-4] in, whether at least part of data meet predetermined condition C5、 Judge Wj6[kj-174,kj-5] in, whether at least part of data meet predetermined condition C6, judge Wj7 [kj-175,kj-6] in, whether at least part of data meet predetermined condition C7, judge Wj8[kj-176, kj-7] in, whether at least part of data meet predetermined condition C8, judge Wj9[kj-177,kj-8] in At least partly whether data meet predetermined condition C9, judge Wj10[kj-178,kj-9] at least portion in Whether divided data meets predetermined condition C10With judge Wj11[kj-179,kj-10] at least partly count in According to whether meeting predetermined condition C11, do not repeat them here.
The present embodiment use random function judge window Wiz[ki-Az,ki+BzAt least portion in] Whether divided data meets predetermined condition Cz, as a example by the embodiment shown in Figure 21, according to going The rule preset on weight server 103, for potential cut-point kiDetermine window Wi1[ki-169,ki], Judge Wi1[ki-169,kiIn], whether at least part of data meet predetermined condition C1, such as Figure 32 Shown in, Wi1Represent window Wi1[ki-169,ki], for judging Wi1[ki-169,kiIn] at least partly Whether data meet predetermined condition C1, select 5 bytes, serial number 169 in Figure 32,127, 85, the byte " ■ " of 43 and 1 represents 1 byte of selection, adjacent two words selected respectively 42 bytes are differed between joint.The byte " ■ " of serial number 169,127,85,43 and 1 is depended on Secondary regard 40 positions as, be expressed as a1、a2、a3、a4…a40。a1、a2、a3、a4… a40In arbitrary at, work as atWhen=0, Vat=-1, works as atWhen=1, Vat=1, according to atWith Vat Corresponding relation, generates Va1、Va2、Va3、Va4…Va40.From the random number of Normal Distribution 40 randoms number of middle selection, are expressed as: h1、h2、h3、h4...h40。Sa=Va1*h1+ Va2*h2+Va3*h3+Va4*h4+…+Va40*h40.Because h1、h2、h3、h4...h40Just obey State is distributed, therefore, and SaAlso Normal Distribution.Work as SaFor positive number, then Wi1[ki-169,ki] In at least partly data meet predetermined condition C1, work as SaFor negative or 0, then Wi1[ki-169,ki] In at least partly data be unsatisfactory for predetermined condition C1, SaProbability for positive number is 1/2.At Figure 21 In illustrated embodiment, Wi1[ki-169,kiIn], at least part of data meet predetermined condition C1.As Shown in Figure 32,Represent and judge window Wi2[ki-170,ki-1] in, whether at least part of data Meet predetermined condition C2Time 1 byte selecting respectively, in Figure 32, respectively with sequence number 170, 128,86,44 and 2 represent, differ 42 bytes between adjacent two bytes selected.By sequence The byte of numbers 170,128,86,44 and 2Regard 40 positions successively as, be expressed as b1、 b2、b3、b4…b40。b1、b2、b3、b4…b40In arbitrary bt, work as btWhen=0, Vbt=-1, Work as btWhen=1, Vbt=1, according to btWith VbtCorresponding relation, generates Vb1、Vb2、Vb3、Vb4…Vb40。 Judge window Wi1[ki-169,kiIn], whether at least part of data meet predetermined condition C1Mode With judge window Wi2[ki-170,ki-1] in, whether at least part of data meet predetermined condition C2's Mode is identical, therefore, uses identical random number: h1、h2、h3、h4...h40, Sb=Vb1 *h1+Vb2*h2+Vb3*h3+Vb4*h4+…+Vb40*h40.Because h1、h2、h3、h4...h40Clothes From normal distribution, therefore, SbAlso Normal Distribution.Work as SbFor positive number, then Wi2[ki-170, ki-1] in, at least part of data meet predetermined condition C2, work as SbFor negative or 0, then Wi2[ki-170, ki-1] in, at least part of data are unsatisfactory for predetermined condition C2, SbProbability for positive number is 1/2.? In embodiment illustrated in fig. 21, Wi2[ki-170,ki-1] in, at least part of data meet predetermined condition C2.Use same rule, judge W respectivelyi3[ki-171,ki-2] in, whether at least part of data Meet predetermined condition C3, judge Wi4[ki-172,ki-3] in, whether at least part of data meet predetermined Condition C4, judge Wi5[ki-173,ki-4] in, whether at least part of data meet predetermined condition C5、 Judge Wi6[ki-174,ki-5] in, whether at least part of data meet predetermined condition C6, judge Wi7 [ki-175,ki-6] in, whether at least part of data meet predetermined condition C7, judge Wi8[ki-176,ki -7] in, whether at least part of data meet predetermined condition C8, judge Wi9[ki-177,ki-8] at least Whether part data meet predetermined condition C9, judge Wi10[ki-178,ki-9] at least part of data in Whether meet predetermined condition C10With judge Wi11[ki-179,ki-10] in, whether at least part of data Meet predetermined condition C11.In embodiment shown in Figure 21, Wi5[ki-173,ki-4] at least portion in Divided data is unsatisfactory for predetermined condition C5, from potential cut-point kiAlong data flow point cutpoint lookup side To 7 bytes of jumping, the end position the 7th byte obtains current potential cut-point kj, as Shown in Figure 22, according to the rule preset for duplicate removal server 103, for potential cut-point kjDetermine Window Wj1[kj-169,kj], it is judged that window Wj1[kj-169,kjIn], whether at least part of data meet Predetermined condition C1Mode with judge window Wi1[ki-169,kiIn], at least part of data are the fullest Foot predetermined condition C1Mode identical, the most as shown in figure 33, Wj1Represent window Wj1[kj-169, kj], for judging Wj1[kj-169,kjIn], whether at least part of data meet predetermined condition C1, choosing Select 5 bytes, the byte " ■ " of serial number 169,127,85,43 and 1 table respectively in Figure 33 Show 1 byte of selection, between adjacent two bytes selected, differ 42 bytes.By sequence number Be 169,127,85,43 and 1 byte " ■ " regard 40 positions successively as, be expressed as a1'、 a2'、a3'、a4'…a40'。a1'、a2'、a3'、a4'…a40Arbitrary a in 't', work as at'=0 Time, Vat'=-1, works as atDuring '=1, Vat'=1, according to at' and Vat' corresponding relation, generate Va1'、 Va2'、Va3'、Va4'…Va40'.Judge window Wj1[kj-169,kjIn], whether at least part of data Meet predetermined condition C1Mode with judge window Wi1[ki-169,kiIn], at least part of data are No meet predetermined condition C1Mode identical, therefore use identical random number: h1、h2、h3、 h4...h40。Sa'=Va1'*h1+Va2'*h2+Va3'*h3+Va4'*h4+…+Va40'*h40.Because h1、 h2、h3、h4...h40Normal Distribution, therefore, Sa' also Normal Distribution.Work as Sa' it is Positive number, then Wj1[kj-169,kjIn], at least part of data meet predetermined condition C1, work as Sa' it is negative Number or 0, then Wj1[kj-169,kjIn], at least part of data are unsatisfactory for predetermined condition C1, Sa' just it is The probability of number is 1/2.
Judge Wi2[ki-170,ki-1] in, whether at least part of data meet predetermined condition C2Side Formula and judge Wj2[kj-170,kj-1] in, whether at least part of data meet predetermined condition C2Side Formula is identical, therefore, as shown in figure 33,Represent and judge window Wj2[kj-170,kj-1] in extremely Whether small part data meet predetermined condition C2Time select 1 byte, adjacent two select 42 bytes are differed between byte.In fig. 33, respectively by sequence number 170,128,86,44 Represent with 2, between adjacent two bytes selected, differ 42 bytes.By sequence number 170,128, 86, the byte of 44 and 2Regard 40 positions successively as, be expressed as b1'、b2'、b3'、b4'… b40'。b1'、b2'、b3'、b4'…b40Arbitrary b in 't', work as btDuring '=0, Vbt'=-1, works as bt'=1 Time, Vbt'=1, according to bt' and Vbt' corresponding relation, generate Vb1'、Vb2'、Vb3'、Vb4'…Vb40'。 Judge Wi2[ki-170,ki-1] in, whether at least part of data meet predetermined condition C2Mode and Judge Wj2[kj-170,kj-1] in, whether at least part of data meet predetermined condition C2Mode phase With, therefore, use identical random number: h1、h2、h3、h4...h40, Sb'=Vb1'*h1+Vb2' *h2+Vb3'*h3+Vb4'*h4+…+Vb40'*h40.Because h1、h2、h3、h4...h40Just obey State is distributed, therefore, and Sb' also Normal Distribution.Work as Sb' for positive number, then Wj2[kj-170,kj-1] In at least partly data meet predetermined condition C2, work as Sb' for negative or 0, then Wj2[kj-170,kj -1] in, at least part of data are unsatisfactory for predetermined condition C2, Sb' it is 1/2 for the probability of positive number.
In like manner, it is judged that Wi3[ki-171,ki-2] in, whether at least part of data meet predetermined condition C3Mode with judge Wj3[kj-171,kj-2] in, whether at least part of data meet predetermined condition C3Mode identical, in like manner, it is judged that Wj4[kj-172,kj-3] in, whether at least part of data meet Predetermined condition C4, judge Wj5[kj-173,kj-4] in, whether at least part of data meet predetermined condition C5, judge Wj6[kj-174,kj-5] in, whether at least part of data meet predetermined condition C6, judge Wj7[kj-175,kj-6] in, whether at least part of data meet predetermined condition C7, judge Wj8[kj -176,kj-7] in, whether at least part of data meet predetermined condition C8, judge Wj9[kj-177,kj-8] In at least partly data whether meet predetermined condition C9, judge Wj10[kj-178,kj-9] at least Whether part data meet predetermined condition C10With judge Wj11[kj-179,kj-10] at least partly Whether data meet predetermined condition C11, do not repeat them here.
The present embodiment use random function judge window Wiz[ki-Az,ki+BzAt least portion in] Whether divided data meets predetermined condition Cz, still as a example by Figure 21 illustrated embodiment, according to The rule preset on duplicate removal server 103, for potential cut-point kiDetermine window Wi1[ki-169, ki], it is judged that Wi1[ki-169,kiIn], whether at least part of data meet predetermined condition C1, as Shown in Figure 32, Wi1Represent window Wi1[ki-169,ki], for judging Wi1[ki-169,kiIn] at least Whether part data meet predetermined condition C1, select 5 bytes, serial number 169 in Figure 32,127, 85, the byte " ■ " of 43 and 1 represents 1 byte of selection, adjacent two words selected respectively 42 bytes are differed between joint.The byte " ■ " of serial number 169,127,85,43 and 1 is turned Changing 1 decimal number into, scope is 0-(2^40-1), uses uniform random number maker 1 designated value, record 0-(2^40-1) is generated for each decimal number in 0-(2^40-1) In each decimal number and designated value between corresponding relation R, once specify, this ten enters The designated value that number processed is corresponding is the most constant, and this designated value is obeyed and is uniformly distributed, if this designated value is Even number, then Wi1[ki-169,kiIn], at least part of data meet predetermined condition C1If this refers to Definite value is odd number, then Wi1[ki-169,kiIn], at least part of data are unsatisfactory for predetermined condition C1, C1 Represent that the designated value obtained according to the method described above is even number.Because equally distributed random number is even The probability of number is 1/2, therefore, Wi1[ki-169,kiIn], at least part of data meet predetermined condition C1 Probability be 1/2.In the embodiment shown in Figure 21, use same rule, sentence respectively Disconnected Wi2[ki-170,ki-1] in, whether at least part of data meet predetermined condition C2, it is judged that Wi3[ki -171,ki-2] in, whether at least part of data meet predetermined condition C3, judge Wi4[ki-172,ki-3] In at least partly data whether meet predetermined condition C4, judge Wi5[ki-173,ki-4] at least portion in Whether divided data meets predetermined condition C5, do not repeat them here.
Work as Wi5[ki-173,ki-4] in, at least part of data are unsatisfactory for predetermined condition C5, from potential point Cutpoint kiAlong data flow point cutpoint search direction 7 bytes of jump, in the end of the 7th byte Position obtains current potential cut-point kj, as shown in figure 22, according to pre-for duplicate removal server 103 If rule, for potential cut-point kjDetermine window Wj1[kj-169,kj], it is judged that window Wj1[kj -169,kjIn], whether at least part of data meet predetermined condition C1Mode with judge window Wi1 [ki-169,kiIn], whether at least part of data meet predetermined condition C1Mode identical, therefore, Use the corresponding pass between each decimal number with the designated value in identical 0-(2^40-1) It is R, as shown in figure 33, Wj1Represent window, for judging Wj1[kj-169,kjIn] at least partly Whether data meet predetermined condition C1, selecting 5 bytes, in Figure 33, " ■ " represents 1 selected Individual byte, differs 42 bytes between adjacent two bytes " ■ " selected.By serial number 169, 127, the byte " ■ " of 85,43 and 1 is converted into 1 decimal number, searches this decimal scale at R The designated value that number is corresponding, if this designated value is even number, then Wj1[kj-169,kjIn] at least partly Data meet predetermined condition C1If this designated value is odd number, then Wj1[kj-169,kjIn] at least Part data are unsatisfactory for predetermined condition C1, because the probability that equally distributed random number is even number is 1/2, therefore, Wj1[kj-169,kjIn], at least part of data meet predetermined condition C1Probability be 1/2.In like manner, it is judged that Wi2[ki-170,ki-1] in, whether at least part of data meet predetermined condition C2 Mode and judge Wj2[kj-170,kj-1] in, whether at least part of data meet predetermined condition C2 Mode identical, it is judged that Wi3[ki-171,ki-2] in, whether at least part of data meet predetermined condition C3Mode with judge Wj3[kj-171,kj-2] in, whether at least part of data meet predetermined condition C3Mode identical, in like manner, it is judged that Wj4[kj-172,kj-3] in, whether at least part of data meet Predetermined condition C4, judge Wj5[kj-173,kj-4] in, whether at least part of data meet predetermined condition C5, judge Wj6[kj-174,kj-5] in, whether at least part of data meet predetermined condition C6, judge Wj7[kj-175,kj-6] in, whether at least part of data meet predetermined condition C7, judge Wj8[kj -176,kj-7] in, whether at least part of data meet predetermined condition C8, judge Wj9[kj-177,kj-8] In at least partly data whether meet predetermined condition C9, judge Wj10[kj-178,kj-9] at least Whether part data meet predetermined condition C10With judge Wj11[kj-179,kj-10] at least partly Whether data meet predetermined condition C11, do not repeat them here.
Duplicate removal server 103 in the embodiment of the present invention shown in Fig. 1, refers to realize this The device of the technical scheme described by bright embodiment, as shown in figure 18, generally includes central authorities' process Unit, main storage and input/output interface.CPU, main storage and input The intercommunication of output interface, main memory store executable instruction, CPU is held The executable instruction of storage in row main storage, thus perform specific function, make duplicate removal service Device 103 possesses specific function, the lookup data as described by embodiment of the present invention Figure 20 to Figure 33 Flow point cutpoint.Therefore, as shown in figure 19, according to the embodiment of the present invention shown in 20 to Figure 33, Duplicate removal server 103, is preset with rule on duplicate removal server 103, and described rule is: be latent M window W is determined at cut-point kx[k-Ax,k+Bx] and window Wx[k-Ax,k+Bx] right Predetermined condition C answeredx, wherein, x is 1 to M continuous print natural number, M >=2, AxAnd BxFor Integer;
Duplicate removal server 103 includes determining unit 1901 and judging processing unit 1902.Wherein, Determine that unit 1901 is for performing step a):
A) it is current potential cut-point k according to described ruleiDetermine the window W of correspondenceiz[ki-Az, ki+Bz], i and z is integer, and 1≤z≤M;
Judge processing unit 1902, be used for judging described window Wiz[ki-Az,ki+BzIn] at least Whether part data meet predetermined condition Cz
As described window Wiz[ki-Az,ki+BzIn], at least part of data are unsatisfactory for described predetermined bar Part Cz, from described current potential cut-point kiAlong described data flow point cutpoint search direction jump N Individual data flow point cutpoint minimum searches unit U, and N*U is not more than ‖ Bz‖+maxx(‖Ax‖), Obtain new potential cut-point, the most described determine that unit 1901 is that described new potential cut-point is held Row step a);
As described current potential cut-point kiM window in each window Wix[ki-Ax,ki +BxIn], at least part of data meet predetermined condition Cx, the most described current potential cut-point kiFor number According to flow point cutpoint.
Further, described rule also includes: at least two window Wie[ki-Ae,ki+Be] and Wif [ki-Af,ki+Bf], meet condition: | Ae+Be|=| Af+Bf|, Ce=Cf.Further, Described rule also includes: AeAnd AfFor positive integer.Further, described rule also includes: Ae -1=Af, Be+ 1=Bf
Further, it is judged that processing unit 1902 judges window specifically for using random function Wiz[ki-Az,ki+BzIn], whether at least part of data meet predetermined condition Cz.Further, Judge that processing unit 1902 specifically used hash function judges window Wiz[ki-Az,ki+BzIn] extremely Whether small part data meet predetermined condition Cz
Further, it is judged that processing unit 1902 is for as described window Wiz[ki-Az,ki+Bz] In at least partly data be unsatisfactory for described predetermined condition Cz, from described current potential cut-point kiEdge The described data flow point cutpoint search direction N number of data flow point cutpoint minimum of jump searches unit U, obtains Described new potential cut-point, described determine that unit 1901 be that described potential cut-point newly is held Row step a), according to described rule, the window W determined for described new potential cut-pointic[ki -Ac,ki+Bc] left margin and described window Wiz[ki-Az,ki+Bz] right margin overlap or The described window W determined for described new potential cut-pointic[ki-Ac,ki+Bc] left margin position In described window Wiz[ki-Az,ki+BzWithin the scope of];Wherein, for described new potential segmentation The described window W that point determinesic[ki-Ac,ki+Bc] it is according to described rule, for described new diving Sequence the in the sequence that M the window determined at cut-point obtains according to data stream search direction The window of one.
Further, it is judged that processing unit 1902 uses random function to judge described window Wiz[ki -Az,ki+BzIn], whether at least part of data meet described predetermined condition Cz, specifically include:
At described window Wiz[ki-Az,ki+BzF byte is selected, by anti-for described F byte in] Utilizing H time again, obtain F*H byte altogether, the most each byte is formed by 8, is designated as am,1… am,8, represent the 1st to the 8th of m-th byte in described F*H byte, described F*H word The position that joint is corresponding can be expressed as: a 1 , 1 a 1 , 2 ... a 1 , 8 a 2 , 1 a 2 , 2 ... a 2 , 8 . . . . . . . . . . . . a F * H , 1 a F * H , 2 ... a F * H , 8 , Work as am,nWhen=1, Vam,n =1, work as am,nWhen=0, Vam,n=-1, wherein am,nRepresent am,1…am,8In any one, described Position corresponding to F*H byte is according to am,nWith Vam,nTransformational relation obtain matrix Va, described matrix VaIt is expressed as: V a 1 , 1 V a 1 , 2 ... V a 1 , 8 V a 2 , 1 V a 2 , 2 ... V a 2 , 8 . . . . . . . . . . . . V a F * H , 1 V a F * H , 2 ... V a F * H , 8 , Select from the random number of service normal distribution Select F*H*8 random number composition matrix R, described matrix R to be expressed as: h 1 , 1 h 1 , 2 ... h 1 , 8 h 2 , 1 h 2 , 2 ... h 2 , 8 . . . . . . . . . . . . h F * H , 1 h F * H , 2 ... h F * H , 8 , By described matrix VaThe m row of m row and described matrix R Random number be multiplied, then summation obtain a value, be embodied as Sam=Vam,1*hm,1+Vam,2 *hm,2+…+Vam,8*hm,8, in like manner, it is thus achieved that Sa1、Sa2... to SaF*H, add up Sa1、Sa2… To SaF*HIn meet number K of value more than 0, when K is even number, the most described window Wiz[ki-Az, ki+BzIn], at least part of data meet described predetermined condition Cz
According to shown in 20 to Figure 33 the embodiment of the present invention provide based on whois lookup data In the method for flow point cutpoint, for potential cut-point kiDetermine window Wix[ki-Ax, ki+Bx], its In, x is respectively 1 and arrives M continuous print natural number, M >=2, can judge in M window every parallel In one window, whether at least part of data meet predetermined condition Cx, or judge successively in window At least partly whether data meet predetermined condition, it is also possible to window W successivelyi1[ki-A1, ki+B1], In at least partly data meet predetermined condition C1Time, then judge Wi2[ki-A2, ki+B2In] at least Part data meet predetermined condition C2Time, until judging Wim[ki-Am, ki+BmAt least portion in] Divided data meets predetermined condition Cm.In embodiment, the judgement of other windows is identical with this, no longer Repeat.
It addition, according to the embodiment of the present invention shown in 20 to Figure 33, on duplicate removal server 103 It is preset with rule, described rule: determine M window W for potential cut-point kx[k-Ax,k+Bx] With window Wx[k-Ax,k+Bx] corresponding predetermined condition Cx, x is respectively 1 to M continuous print certainly So number, M >=2, in this preset rules, A1、A2、A3…AmCan not be the most equal, B1、 B2、B3…BmCan not be the most equal, C1、C2、C3…CMCan not also be the most identical. In the embodiment shown in Figure 21, at Wi1[ki-169,ki]、Wi2[ki-170,ki-1]、Wi3[ki -171,ki-2]、Wi4[ki-172,ki-3]、Wi5[ki-173,ki-4]、Wi6[ki-174,ki-5]、Wi7 [ki-175,ki-6]、Wi8[ki-176,ki-7]、Wi9[ki-177,ki-8]、Wi10[ki-178,ki-9] and Wi11[ki-179,ki-10] in, each window size is identical, i.e. window size is 169 bytes, simultaneously Judge that the mode that in window, whether at least part of data meet predetermined condition is the most identical, be specifically shown in State and judge Wi1[ki-169,kiIn], whether at least part of data meet predetermined condition C1Description, but In the embodiment shown in Figure 11, Wi1[ki-169,ki]、Wi2[ki-170,ki-1]、Wi3[ki -171,ki-2]、Wi4[ki-172,ki-3]、Wi5[ki-173,ki-4]、Wi6[ki-174,ki-5]、 Wi7[ki-175,ki-6]、Wi8[ki-176,ki-7]、Wi9[ki-177,ki-8]、Wi10[ki-168, ki+ 1] and Wi11[ki-179,ki+ 3] window size can differ, and judges in window at least simultaneously Whether part data meet the mode of predetermined condition can also differ.In all embodiments, According to the rule preset for duplicate removal server 103, it is judged that window Wi1In at least partly whether data Meet predetermined condition C1Mode with judge window Wj1In at least partly data whether meet predetermined Condition C1Mode inevitable the most identical, it is judged that Wi2In at least partly data whether meet predetermined condition C2Mode with judge Wj2In at least partly data whether meet predetermined condition C2Mode inevitable Identical ... to judge window WiMIn at least partly data whether meet predetermined condition CMMode with Judge window WjMIn at least partly data whether meet predetermined condition CMMode inevitable the most identical. Do not repeat them here.
According to the embodiment of the present invention shown in 20 to Figure 33, duplicate removal server 103 is preset with Rule, ka、ki、kj、klAnd kmFor searching cut-point along data flow point cutpoint search direction Time obtain potential cut-point, ka、ki、kj、klAnd kmAll according to this rule.The present invention is real Execute the window W in examplex[k-Ax,k+Bx] represent a particular range, select at this particular range Data are to judge whether these data meet predetermined condition Cx, specifically, can be at this specific model Enclose interior selection part data, it is also possible to select total data pre-to judge whether these data meet Fixed condition Cx.Window concept specifically used in the embodiment of the present invention can refer to window Wx[k-Ax, k+Bx], do not repeat them here.
Window Wx[k-Ax,k+BxIn], (k-Ax) and (k+Bx) represent this window Wx[k- Ax,k+Bx] two borders, wherein (k-Ax) represent window Wx[k-Ax,k+Bx] relatively It is positioned at data flow point cutpoint in potential cut-point k and searches reciprocal border, (k+Bx) table Show window Wx[k-Ax,k+Bx] it is positioned at the lookup of data flow point cutpoint relative to potential cut-point k The border in direction.Specifically, in embodiments of the present invention, in the data shown in Figure 20 to Figure 33 Flow point cutpoint search direction is from left to right, wherein (k-Ax) represent window Wx[k-Ax,k+ Bx] it is positioned at data flow point cutpoint lookup (the i.e. left side, reciprocal border relative to potential cut-point k Boundary), (k+Bx) represent window Wx[k-Ax,k+Bx] it is positioned at number relative to potential cut-point k Border (i.e. right margin) according to flow point cutpoint search direction.If shown in Figure 20 to Figure 33 Data flow point cutpoint search direction is from right to left, wherein (k-Ax) represent window Wx[k-Ax, k+Bx] it is positioned at the reciprocal border of data flow point cutpoint lookup (i.e. relative to potential cut-point k Right margin), (k+Bx) represent window Wx[k-Ax,k+Bx] relative to potential cut-point k position Border (i.e. left margin) in data flow point cutpoint search direction.
Those of ordinary skill in the art are it is to be appreciated that combine embodiment of the present invention Figure 20 to Figure 33 The unit of each example described and algorithm steps, the key feature of the embodiment of the present invention can be with it He combines at technology, presents with increasingly complex form, but still can comprise the crucial special of the present invention Levy.May use standby cut-point in true environment, such as one embodiment is, according to The rule preset for duplicate removal server 103, for potential cut-point kiDetermine 11 window Wx[k-Ax, k+Bx] and window Wx[k-Ax,k+Bx] corresponding predetermined condition Cx, x be 1 to 11 continuous print from So number, as each window W in 11 windowsx[k-Ax,k+BxIn], at least part of data are the fullest Foot predetermined condition Cx, the most potential cut-point kiFor data flow point cutpoint, when the maximum exceeding setting During data block, do not find cut-point yet, at this moment may use standby preset rules, standby Preset rules is similar with the rule preset on duplicate removal server 103, and standby preset rules is: The most potential cut-point kiDetermine 10 window Wx[k-Ax,k+Bx] and window Wx[k-Ax,k +Bx] corresponding predetermined condition Cx, x is 1 to 10 continuous print natural numbers, determines when in 10 windows Each window Wx[k-Ax,k+BxIn], at least part of data are satisfied by predetermined condition Cx, then dive At cut-point kiFor data flow point cutpoint, when exceeding the maximum data block of setting, search not yet During to data flow point cutpoint, from the end position of maximum data block as force-splitting point.
According to the embodiment of the present invention shown in 20 to Figure 33, on duplicate removal server 103 It is preset with rule, described rule determines M window for potential cut-point k, and differs Provisioning request first has a potential cut-point k, and M the window that can be determined by judges Potential cut-point k.
Those of ordinary skill in the art are it is to be appreciated that combine enforcement disclosed herein The unit of each example that example describes and algorithm steps, it is possible to electronic hardware or calculating Being implemented in combination in of machine software and electronic hardware.These functions are actually with hardware or software Mode performs, and depends on application-specific and the design constraint of technical scheme.Specialty Technical staff can to each specifically should be used for using different methods to realize described Function, but this realization is it is not considered that beyond the scope of this invention.
Those skilled in the art is it can be understood that arrive, for the convenience described and letter Clean, the specific works process of the system of foregoing description, device and unit, before being referred to State the corresponding process in embodiment of the method, do not repeat them here.
In the several embodiments provided, it should be understood that disclosed system, method, Can realize by another way.Such as, device embodiment described above is only It is schematic, such as, the division of described unit, it is only a kind of logic function and divides, Actual can have other dividing mode, the most multiple unit or assembly to tie when realizing Close or be desirably integrated into another system, or some features can be ignored, or not performing. Another point, shown or discussed coupling each other or direct-coupling or communication connection Can be the INDIRECT COUPLING by some interfaces, device or unit or communication connection, permissible It is electrical, machinery or other form.
The described unit illustrated as separating component can be or may not be physically Separate, the parts shown as unit can be or may not be physical location, I.e. may be located at a place, or can also be distributed on multiple NE.Permissible Select some or all of unit therein to realize the present embodiment side according to the actual needs The purpose of case.
It addition, each functional unit in each embodiment of the present invention can be integrated in one In processing unit, it is also possible to be that unit is individually physically present, it is also possible to two or two Individual above unit is integrated in a unit.
If described function realizes and as independent product using the form of SFU software functional unit When selling or use, an embodied on computer readable non-volatile memory medium can be stored in In.Based on such understanding, technical scheme is the most in other words to existing skill Part or the part of this technical scheme that art contributes can be with the forms of software product Embodying, this computer software product is stored in a non-volatile memory medium, Including some instructions with so that a computer equipment (can be personal computer, take Business device, or the network equipment etc.) perform the whole of method described in each embodiment of the present invention Or part steps.And aforesaid non-volatile memory medium includes: USB flash disk, portable hard drive, Read only memory (Read-Only Memory, ROM), magnetic disc or CD etc. are various The medium of program code can be stored.
The above, the only detailed description of the invention of the present invention, but the protection model of the present invention Enclosing and be not limited thereto, any those familiar with the art the invention discloses Technical scope in, can readily occur in change or replace, all should contain the guarantor in the present invention Within the scope of protecting.Therefore, protection scope of the present invention answers the described protection with claim Scope is as the criterion.

Claims (48)

1. a method based on whois lookup data flow point cutpoint, it is characterised in that:
Being preset with rule on described server, described rule is: determine for potential cut-point k M some px, some pxCorresponding window Wx[px-Ax,px+Bx] and window Wx[px-Ax, px+Bx] corresponding predetermined condition Cx, wherein, x is 1 to M continuous print natural number, M >=2, Ax And BxFor integer;
Described method includes:
A) it is current potential cut-point k according to described ruleiDetermine a pizAnd described some pizCorresponding Window Wiz[piz-Az,piz+Bz], i and z is integer, and 1≤z≤M;
B) described window W is judgediz[piz-Az,piz+BzIn], whether at least part of data meet Predetermined condition Cz
As described window Wiz[piz-Az,piz+BzIn], at least part of data are unsatisfactory for described predetermined Condition Cz, from described some pizAlong the described data flow point cutpoint search direction N number of data flow point of jump Cutpoint minimum searches unit U, and N*U is not more than ‖ Bz‖+maxx(‖Ax‖+‖(ki-pix) ‖), Obtain new potential cut-point, perform step a);
C) as described current potential cut-point kiM window in each window Wix[pix- Ax,pix+BxIn], at least part of data meet predetermined condition Cx, the most described current potential segmentation Point kiFor data flow point cutpoint.
Method the most according to claim 1, it is characterised in that described rule also includes: At least two point peAnd pf, meet condition Ae=Af, Be=Bf, Ce=Cf
Method the most according to claim 2, it is characterised in that described rule also includes: Described at least two point peAnd pf, relative to described potential cut-point k, at described data flow point Cutpoint searches in the reverse direction.
The most according to the method in claim 2 or 3, it is characterised in that described rule is also wrapped Include: described at least two point peAnd pfBetween distance be 1 U.
5. according to the arbitrary described method of claims 1 to 3, it is characterised in that judge described Window Wiz[piz-Az,piz+BzIn], whether at least part of data meet described predetermined condition Cz, Specifically include:
Random function is used to judge described window Wiz[piz-Az,piz+BzAt least part of data in] Whether meet described predetermined condition Cz
Method the most according to claim 5, it is characterised in that described use random function Judge described window Wiz[piz-Az,piz+BzIn], whether at least part of data meet described predetermined Condition Cz, it is specially and uses hash function to judge described window Wiz[piz-Az,piz+BzIn] extremely Whether small part data meet described predetermined condition Cz
7. according to the arbitrary described method of claims 1 to 3, it is characterised in that when described window Mouth Wiz[piz-Az,piz+BzIn], at least part of data are unsatisfactory for described predetermined condition Cz, from described Point pizSearch along the described data flow point cutpoint search direction N number of data flow point cutpoint minimum of jump Unit U, it is thus achieved that described new potential cut-point, according to described rule, for described new potential The point p that cut-point determinesicCorresponding window Wic[pic-Ac,pic+Bc] left margin and described window Mouth Wiz[piz-Az,piz+Bz] right margin overlap or determine for described new potential cut-point Described some picCorresponding described window Wic[pic-Ac,pic+Bc] left margin be positioned at described window Wiz[piz-Az,piz+BzWithin the scope of];Wherein, the institute determined for described new potential cut-point State a picIt is according to described rule, puts according to number for M determined for described new potential cut-point According to the point of sequence first in the sequence that stream search direction obtains.
Method the most according to claim 5, it is characterised in that use random function to judge Described window Wiz[piz-Az,piz+BzIn], whether at least part of data meet described predetermined condition Cz, specifically include:
At described window Wiz[piz-Az,piz+BzF byte is selected, by described F byte in] Recycling H time, obtain F*H byte altogether, the most each byte is formed by 8, is designated as am,1… am,8, represent the 1st to the 8th of m-th byte in described F*H byte, described F*H word The position that joint is corresponding can be expressed as: a 1 , 1 a 1 , 2 ... a 1 , 8 a 2 , 1 a 2 , 2 ... a 2 , 8 . . . . . . . . . . . . a F * H , 1 a F * H , 2 ... a F * H , 8 , Work as am,nWhen=1, Vam,n =1, work as am,nWhen=0, Vam,n=-1, wherein am,nRepresent am,1…am,8In any one, described Position corresponding to F*H byte is according to am,nWith Vam,nTransformational relation obtain matrix Va, described matrix VaIt is expressed as: V a 1 , 1 V a 1 , 2 ... V a 1 , 8 V a 2 , 1 V a 2 , 2 ... V a 2 , 8 . . . . . . . . . . . . V a F * H , 1 V a F * H , 2 ... V a F * H , 8 , Select from the random number of service normal distribution Select F*H*8 random number composition matrix R, described matrix R to be expressed as: h 1 , 1 h 1 , 2 ... h 1 , 8 h 2 , 1 h 2 , 2 ... h 2 , 8 . . . . . . . . . . . . h F * H , 1 h F * H , 2 ... h F * H , 8 , By described matrix VaThe m row of m row and described matrix R Random number be multiplied, then summation obtain a value, be embodied as Sam=Vam,1*hm,1+Vam,2 *hm,2+…+Vam,8*hm,8, in like manner, it is thus achieved that Sa1、Sa2... to SaF*H, add up Sa1、Sa2… To SaF*HIn meet number K of value more than 0, when K is even number, the most described window Wiz[piz-Az, piz+BzIn], at least part of data meet described predetermined condition Cz
9. a method based on whois lookup data flow point cutpoint, it is characterised in that
Being preset with rule on described server, described rule is: determine for potential cut-point k M window Wx[k-Ax,k+Bx] and window Wx[k-Ax,k+Bx] corresponding predetermined condition Cx, wherein, x is 1 to M continuous print natural number, M >=2, AxAnd BxFor integer;
Described method includes:
A) it is current potential cut-point k according to described ruleiDetermine the window W of correspondenceiz[ki-Az, ki+Bz], i and z is integer, and 1≤z≤M;
B) described window W is judgediz[ki-Az,ki+BzIn], whether at least part of data meet pre- Fixed condition Cz
As described window Wiz[ki-Az,ki+BzIn], at least part of data are unsatisfactory for described pre- Fixed condition Cz, from described current potential cut-point kiLook into along described data flow point cutpoint Looking for the direction N number of data flow point cutpoint minimum of jump to search unit U, N*U is not more than ‖ Bz‖+maxx(‖Ax‖), it is thus achieved that new potential cut-point, step a) is performed;
C) as described current potential cut-point kiM window in each window Wix[ki-Ax, ki+BxIn], at least part of data meet predetermined condition Cx, the most described current potential cut-point kiFor Data flow point cutpoint.
Method the most according to claim 9, it is characterised in that described rule also includes: At least two window Wie[ki-Ae,ki+Be] and Wif[ki-Af,ki+Bf], meet condition: | Ae+Be|=| Af+Bf|, Ce=Cf
11. methods according to claim 10, it is characterised in that described rule also includes: AeAnd AfFor positive integer.
12. according to the method described in claim 10 or 11, it is characterised in that described rule is also Including: Ae-1=Af, Be+ 1=Bf
13. according to the arbitrary described method of claim 9 to 11, it is characterised in that judge institute State window Wiz[ki-Az,ki+BzIn], whether at least part of data meet described predetermined condition Cz, Specifically include:
Random function is used to judge described window Wiz[ki-Az,ki+BzAt least part of data in] Whether meet described predetermined condition Cz
14. methods according to claim 13, it is characterised in that the random letter of described use Number judges described Wiz[ki-Az,ki+BzIn], whether at least part of data meet described predetermined bar Part Cz, it is specially and uses hash function to judge described Wiz[ki-Az,ki+BzAt least partly count in] According to whether meeting described predetermined condition Cz
15. according to the arbitrary described method of claim 9 to 11, it is characterised in that when described Window Wiz[ki-Az,ki+BzIn], at least part of data are unsatisfactory for described predetermined condition Cz, from institute State current potential cut-point kiAlong the described data flow point cutpoint search direction N number of data flow point of jump Cutpoint minimum searches unit U, it is thus achieved that described new potential cut-point, according to described rule, for The window W that described new potential cut-point determinesic[ki-Ac,ki+Bc] left margin and described window Mouth Wiz[ki-Az,ki+Bz] right margin overlap or determine for described new potential cut-point Described window Wic[ki-Ac,ki+Bc] left margin be positioned at described window Wiz[ki-Az,ki+ BzWithin the scope of];Wherein, the described window W determined for described new potential cut-pointic[ki- Ac,ki+Bc] be according to described rule, M the window determined for described new potential cut-point by According to the window of sequence first in the sequence that data stream search direction obtains.
16. methods according to claim 13, it is characterised in that use random function to sentence Disconnected described window Wiz[ki-Az,ki+BzIn], whether at least part of data meet described predetermined bar Part Cz, specifically include:
At described window Wiz[ki-Az,ki+BzF byte is selected, by anti-for described F byte in] Utilizing H time again, obtain F*H byte altogether, the most each byte is formed by 8, is designated as am,1… am,8, represent the 1st to the 8th of m-th byte in described F*H byte, described F*H word The position that joint is corresponding can be expressed as: a 1 , 1 a 1 , 2 ... a 1 , 8 a 2 , 1 a 2 , 2 ... a 2 , 8 . . . . . . . . . . . . a F * H , 1 a F * H , 2 ... a F * H , 8 , Work as am,nWhen=1, Vam,n =1, work as am,nWhen=0, Vam,n=-1, wherein am,nRepresent am,1…am,8In any one, described Position corresponding to F*H byte is according to am,nWith Vam,nTransformational relation obtain matrix Va, described matrix VaIt is expressed as: V a 1 , 1 V a 1 , 2 ... V a 1 , 8 V a 2 , 1 V a 2 , 2 ... V a 2 , 8 . . . . . . . . . . . . V a F * H , 1 V a F * H , 2 ... V a F * H , 8 , Select from the random number of service normal distribution Select F*H*8 random number composition matrix R, described matrix R to be expressed as:
h 1 , 1 h 1 , 2 ... h 1 , 8 h 2 , 1 h 2 , 2 ... h 2 , 8 . . . . . . . . . . . . h F * H , 1 h F * H , 2 ... h F * H , 8 , By described matrix VaThe m row of m row and described matrix R Random number be multiplied, then summation obtain a value, be embodied as Sam=Vam,1*hm,1+Vam,2 *hm,2+…+Vam,8*hm,8, in like manner, it is thus achieved that Sa1、Sa2... to SaF*H, add up Sa1、Sa2… To SaF*HIn meet number K of value more than 0, when K is even number, the most described window Wiz[ki-Az, ki+BzIn], at least part of data meet described predetermined condition Cz
17. 1 kinds for searching the server of data flow point cutpoint, it is characterised in that described clothes Business device includes CPU and main storage, described CPU and described primary storage Device communicates, and is preset with rule on described server, and described rule is: for potential cut-point k Determine M some px, some pxCorresponding window Wx[px-Ax,px+Bx] and window Wx[px-Ax, px+Bx] corresponding predetermined condition Cx, wherein, x is 1 to M continuous print natural number, M >=2, Ax And BxFor integer;
Described main storage is used for storing executable instruction, and described CPU performs described Executable instruction, to perform following steps:
A) it is current potential cut-point k according to described ruleiDetermine a pizAnd described some pizCorresponding Window Wiz[piz-Az,piz+Bz], i and z is integer, and 1≤z≤M;
B) described window W is judgediz[piz-Az,piz+BzIn], whether at least part of data meet Predetermined condition Cz
As described window Wiz[piz-Az,piz+BzIn], at least part of data are unsatisfactory for described predetermined Condition Cz, from described some pizAlong the described data flow point cutpoint search direction N number of data flow point of jump Cutpoint minimum searches unit U, and N*U is not more than ‖ Bz‖+maxx(‖Ax‖+‖(ki- pix) ‖), it is thus achieved that new potential cut-point, perform step a);
C) as described current potential cut-point kiM window in each window Wix[pix- Ax,pix+BxIn], at least part of data meet predetermined condition Cx, the most described current potential segmentation Point kiFor data flow point cutpoint.
18. according to server described in claim 17, it is characterised in that described rule also includes: At least two point peAnd pf, meet condition Ae=Af, Be=Bf, Ce=Cf
19. according to server described in claim 18, it is characterised in that described rule also includes: Described at least two point peAnd pf, relative to described potential cut-point k, at described data flow point Cutpoint searches in the reverse direction.
20. according to the server described in claim 18 or 19, it is characterised in that described rule Also include: described at least two point peAnd pfBetween distance be 1 U.
21. according to the arbitrary described server of claim 17 to 19, it is characterised in that described CPU judges described window W specifically for using random functioniz[piz-Az,piz+ BzIn], whether at least part of data meet described predetermined condition Cz
22. servers according to claim 21, it is characterised in that described central authorities process Unit judges described window W specifically for using hash functioniz[piz-Az,piz+BzIn] at least Whether part data meet described predetermined condition Cz
23. according to the arbitrary described server of claim 17 to 19, it is characterised in that work as institute State window Wiz[piz-Az,piz+BzIn], at least part of data are unsatisfactory for described predetermined condition Cz, from Described some pizMinimum along the described data flow point cutpoint search direction N number of data flow point cutpoint of jump Search unit U, it is thus achieved that described new potential cut-point, according to described rule, for described new The point p that potential cut-point determinesicCorresponding window Wic[pic-Ac,pic+Bc] left margin and institute State window Wiz[piz-Az,piz+Bz] right margin overlap or be that described potential cut-point newly is true Described fixed picCorresponding described window Wic[pic-Ac,pic+Bc] left margin be positioned at described Window Wiz[piz-Az,piz+BzWithin the scope of];Wherein, determine for described new potential cut-point Described some picIt is according to described rule, presses for M determined for described new potential cut-point According to the point of sequence first in the sequence that data stream search direction obtains.
24. servers according to claim 21, it is characterised in that described central authorities process Unit uses random function to judge described window Wiz[piz-Az,piz+BzAt least part of data in] Whether meet described predetermined condition Cz, specifically include:
At described window Wiz[piz-Az,piz+BzF byte is selected, by described F byte in] Recycling H time, obtain F*H byte altogether, the most each byte is formed by 8, is designated as am,1… am,8, represent the 1st to the 8th of m-th byte in described F*H byte, described F*H word The position that joint is corresponding can be expressed as: a 1 , 1 a 1 , 2 ... a 1 , 8 a 2 , 1 a 2 , 2 ... a 2 , 8 . . . . . . . . . . . . a F * H , 1 a F * H , 2 ... a F * H , 8 , Work as am,nWhen=1, Vam,n =1, work as am,nWhen=0, Vam,n=-1, wherein am,nRepresent am,1…am,8In any one, described Position corresponding to F*H byte is according to am,nWith Vam,nTransformational relation obtain matrix Va, described matrix VaIt is expressed as: V a 1 , 1 V a 1 , 2 ... V a 1 , 8 V a 2 , 1 V a 2 , 2 ... V a 2 , 8 . . . . . . . . . . . . V a F * H , 1 V a F * H , 2 ... V a F * H , 8 , Select from the random number of service normal distribution Select F*H*8 random number composition matrix R, described matrix R to be expressed as: h 1 , 1 h 1 , 2 ... h 1 , 8 h 2 , 1 h 2 , 2 ... h 2 , 8 . . . . . . . . . . . . h F * H , 1 h F * H , 2 ... h F * H , 8 , By described matrix VaThe m row of m row and described matrix R Random number be multiplied, then summation obtain a value, be embodied as Sam=Vam,1*hm,1+Vam,2 *hm,2+…+Vam,8*hm,8, in like manner, it is thus achieved that Sa1、Sa2... to SaF*H, add up Sa1、Sa2… To SaF*HIn meet number K of value more than 0, when K is even number, the most described window Wiz[piz-Az, piz+BzIn], at least part of data meet described predetermined condition Cz
25. 1 kinds for searching the server of data flow point cutpoint, it is characterised in that described clothes Business device includes CPU and main storage, described CPU and described primary storage Device communicates, and is preset with rule on described server, and described rule is: for potential cut-point k Determine M window Wx[k-Ax,k+Bx] and window Wx[k-Ax,k+Bx] corresponding making a reservation for Condition Cx, wherein, x is 1 to M continuous print natural number, M >=2, AxAnd BxFor integer;
Described main storage is used for storing executable instruction, and described CPU performs described Executable instruction, to perform following steps:
A) it is current potential cut-point k according to described ruleiDetermine the window W of correspondenceiz[ki-Az,
ki+Bz], i and z is integer, and 1≤z≤M;
B) described window W is judgediz[ki-Az,ki+BzIn], whether at least part of data meet pre- Fixed condition Cz
As described window Wiz[ki-Az,ki+BzIn], at least part of data are unsatisfactory for described predetermined bar Part Cz, from described current potential cut-point kiAlong described data flow point cutpoint search direction jump N Individual data flow point cutpoint minimum searches unit U, and N*U is not more than ‖ Bz‖+maxx(‖Ax‖), Obtain new potential cut-point, perform step a);
C) as described current potential cut-point kiM window in each window Wix[ki-Ax, ki+BxIn], at least part of data meet predetermined condition Cx, the most described current potential cut-point kiFor Data flow point cutpoint.
26. servers according to claim 25, it is characterised in that described rule is also wrapped Include: at least two window Wie[ki-Ae,ki+Be] and Wif[ki-Af,ki+Bf], meet condition: |Ae+Be|=| Af+Bf|, Ce=Cf
27. servers according to claim 26, it is characterised in that for described server Preset rules, described rule also includes: AeAnd AfFor positive integer.
28. according to the server described in claim 26 or 27, it is characterised in that described rule Also include: Ae-1=Af, Be+ 1=Bf
29. according to the arbitrary described server of claim 25 to 27, it is characterised in that described CPU judges described window W specifically for using random functioniz[ki-Az,ki+Bz] In at least partly data whether meet described predetermined condition Cz
30. servers according to claim 29, it is characterised in that described central authorities process Unit judges described window W specifically for using hash functioniz[ki-Az,ki+BzIn] at least Whether part data meet described predetermined condition Cz
31. according to the arbitrary described server of claim 25 to 27, it is characterised in that work as institute State window Wiz[ki-Az,ki+BzIn], at least part of data are unsatisfactory for described predetermined condition Cz, from Described current potential cut-point kiAlong the described data flow point cutpoint search direction N number of data stream of jump Cut-point minimum searches unit U, it is thus achieved that described new potential cut-point, according to described rule, The window W determined for described new potential cut-pointic[ki-Ac,ki+Bc] left margin with described Window Wiz[ki-Az,ki+Bz] right margin overlap or be that described potential cut-point newly is true Fixed described window Wic[ki-Ac,ki+Bc] left margin be positioned at described window Wiz[ki-Az,ki+ BzWithin the scope of];Wherein, the described window W determined for described new potential cut-pointic[ki- Ac,ki+Bc] be according to described rule, M the window determined for described new potential cut-point by According to the window of sequence first in the sequence that data stream search direction obtains.
32. servers according to claim 29, it is characterised in that described central authorities process Unit uses random function to judge described window Wiz[ki-Az,ki+BzAt least part of data in] Whether meet described predetermined condition Cz, specifically include:
At described window Wiz[ki-Az,ki+BzF byte is selected, by anti-for described F byte in] Utilizing H time again, obtain F*H byte altogether, the most each byte is formed by 8, is designated as am,1… am8, represent the 1st to the 8th of m-th byte in described F*H byte, described F*H word The position that joint is corresponding can be expressed as: a 1 , 1 a 1 , 2 ... a 1 , 8 a 2 , 1 a 2 , 2 ... a 2 , 8 . . . . . . . . . . . . a F * H , 1 a F * H , 2 ... a F * H , 8 , Work as am,nWhen=1, Vam,n =1, work as am,nWhen=0, Vam,n=-1, wherein am,nRepresent am,1…am,8In any one, described Position corresponding to F*H byte is according to am,nWith Vam,nTransformational relation obtain matrix Va, described matrix VaIt is expressed as: V a 1 , 1 V a 1 , 2 ... V a 1 , 8 V a 2 , 1 V a 2 , 2 ... V a 2 , 8 . . . . . . . . . . . . V a F * H , 1 V a F * H , 2 ... V a F * H , 8 , Select from the random number of service normal distribution Select F*H*8 random number composition matrix R, described matrix R to be expressed as: h 1 , 1 h 1 , 2 ... h 1 , 8 h 2 , 1 h 2 , 2 ... h 2 , 8 . . . . . . . . . . . . h F * H , 1 h F * H , 2 ... h F * H , 8 , By described matrix VaThe m row of m row and described matrix R Random number be multiplied, then summation obtain a value, be embodied as Sam=Vam,1*hm,1+Vam,2 *hm,2+…+Vam,8*hm,8, in like manner, it is thus achieved that Sa1、Sa2... to SaF*H, add up Sa1、Sa2… To SaF*HIn meet number K of value more than 0, when K is even number, the most described window Wiz[ki-Az, ki+BzIn], at least part of data meet described predetermined condition Cz
33. 1 kinds for searching the server of data flow point cutpoint, it is characterised in that described Being preset with rule on server, described rule is: determine M some p for potential cut-point kx、 Point pxCorresponding window Wx[px-Ax,px+Bx] and window Wx[px-Ax,px+Bx] right Predetermined condition C answeredx, wherein, x is 1 to M continuous print natural number, M >=2, AxAnd BxFor Integer;
Described server comprises determining that unit, is used for performing step a): a) according to described rule For current potential cut-point kiDetermine a pizAnd described some pizCorresponding window Wiz[piz-Az, piz+Bz], i and z is integer, and 1≤z≤M;
Judge processing unit, be used for judging described window Wiz[piz-Az,piz+BzAt least portion in] Whether divided data meets predetermined condition Cz
As described window Wiz[piz-Az,piz+BzIn], at least part of data are unsatisfactory for described predetermined Condition Cz, from described some pizAlong the described data flow point cutpoint search direction N number of data flow point of jump Cutpoint minimum searches unit U, and N*U is not more than ‖ Bz‖+maxx(‖Ax‖+‖(ki- pix) ‖), it is thus achieved that new potential cut-point, the most described determine that unit is described new potential point Cutpoint performs step a);
As described current potential cut-point kiM window in each window Wix[pix-Ax, pix+BxIn], at least part of data meet predetermined condition Cx, the most described current potential cut-point ki For data flow point cutpoint.
34. servers according to claim 33, it is characterised in that described rule is also wrapped Include: at least two point peAnd pf, meet condition Ae=Af, Be=Bf, Ce=Cf
35. servers according to claim 34, it is characterised in that described rule is also wrapped Include: described at least two point peAnd pf, relative to described potential cut-point k, in described data Flow point cutpoint searches in the reverse direction.
36. according to the server described in claim 34 or 35, it is characterised in that described rule Also include: described at least two point peAnd pfBetween distance be 1 U.
37. according to the arbitrary described server of claim 33 to 35, it is characterised in that described Judge that processing unit is specifically for using random function to judge described window Wiz[piz-Az,piz+ BzIn], whether at least part of data meet described predetermined condition Cz
38. according to the server described in claim 37, it is characterised in that described judgement processes Unit judges described window W specifically for using hash functioniz[piz-Az,piz+BzIn] at least Whether part data meet described predetermined condition Cz
39. according to the arbitrary described server of claim 33 to 35, it is characterised in that described Judge that processing unit is for as described window Wiz[piz-Az,piz+BzIn], at least part of data are discontented with Described predetermined condition C of footz, from described some pizAlong described data flow point cutpoint search direction jump N Individual data flow point cutpoint minimum searches unit U, it is thus achieved that described new potential cut-point, described really Cell is that described new potential cut-point performs step a), according to described rule, for described The point p that new potential cut-point determinesicCorresponding window Wic[pic-Ac,pic+Bc] left margin With described window Wiz[piz-Az,piz+Bz] right margin overlap or be described potential segmentation newly The described window W that point determinesic[pic-Ac,pic+Bc] left margin be positioned at described window Wiz[piz- Az,piz+BzWithin the scope of];Wherein, the described window determined for described new potential cut-point Wic[pic-Ac,pic+Bc] it is according to described rule, the M determined for described new potential cut-point The sequence that individual point obtains according to data stream search direction sorts first point.
40. according to the server described in claim 37, it is characterised in that described judgement processes Unit judges described window W specifically for using random functioniz[piz-Az,piz+BzIn] at least Whether part data meet described predetermined condition Cz, specifically include:
At described window Wiz[piz-Az,piz+BzF byte is selected, by described F byte in] Recycling H time, obtain F*H byte altogether, the most each byte is formed by 8, is designated as am,1… am,8, represent the 1st to the 8th of m-th byte in described F*H byte, described F*H word The position that joint is corresponding can be expressed as: a 1 , 1 a 1 , 2 ... a 1 , 8 a 2 , 1 a 2 , 2 ... a 2 , 8 . . . . . . . . . . . . a F * H , 1 a F * H , 2 ... a F * H , 8 , Work as am,nWhen=1, Vam,n =1, work as am,nWhen=0, Vam,n=-1, wherein am,nRepresent am,1…am,8In any one, described Position corresponding to F*H byte is according to am,nWith Vam,nTransformational relation obtain matrix Va, described matrix VaIt is expressed as: V a 1 , 1 V a 1 , 2 ... V a 1 , 8 V a 2 , 1 V a 2 , 2 ... V a 2 , 8 . . . . . . . . . . . . V a F * H , 1 V a F * H , 2 ... V a F * H , 8 , Select from the random number of service normal distribution Select F*H*8 random number composition matrix R, described matrix R to be expressed as: h 1 , 1 h 1 , 2 ... h 1 , 8 h 2 , 1 h 2 , 2 ... h 2 , 8 . . . . . . . . . . . . h F * H , 1 h F * H , 2 ... h F * H , 8 , By described matrix VaThe m row of m row and described matrix R Random number be multiplied, then summation obtain a value, be embodied as Sam=Vam,1*hm,1+Vam,2 *hm,2+…+Vam,8*hm,8, in like manner, it is thus achieved that Sa1、Sa2... to SaF*H, add up Sa1、Sa2… To SaF*HIn meet number K of value more than 0, when K is even number, the most described window Wiz[piz-Az, piz+BzIn], at least part of data meet described predetermined condition Cz
41. 1 kinds for searching the server of data flow point cutpoint, it is characterised in that described Being preset with rule on server, described rule is: determine M window W for potential cut-point kx [k-Ax,k+Bx] and window Wx[k-Ax,k+Bx] corresponding predetermined condition Cx, wherein, x It is 1 to M continuous print natural number, M >=2, AxAnd BxFor integer;
Described server comprises determining that unit, is used for performing step a):
A) it is current potential cut-point k according to described ruleiDetermine the window W of correspondenceiz[ki-Az,
ki+Bz], i and z is integer, and 1≤z≤M;
Judge processing unit, be used for judging described window Wiz[ki-Az,ki+BzIn] at least partly Whether data meet predetermined condition Cz
As described window Wiz[ki-Az,ki+BzIn], at least part of data are unsatisfactory for described predetermined bar Part Cz, from described current potential cut-point kiAlong described data flow point cutpoint search direction jump N Individual data flow point cutpoint minimum searches unit U, and N*U is not more than ‖ Bz‖+maxx(‖Ax‖), Obtain new potential cut-point, the most described determine that unit is that described new potential cut-point performs step A);
As described current potential cut-point kiM window in each window Wix[ki-Ax,ki +BxIn], at least part of data meet predetermined condition Cx, the most described current potential cut-point kiFor Data flow point cutpoint.
42. servers according to claim 41, it is characterised in that described rule is also wrapped Include: at least two window Wie[ki-Ae,ki+Be] and Wif[ki-Af,ki+Bf], meet condition: |Ae+Be|=| Af+Bf|, Ce=Cf
43. servers according to claim 42, it is characterised in that described rule is also wrapped Include: AeAnd AfFor positive integer.
44. according to the server described in claim 42 or 43, it is characterised in that described rule Also include: Ae-1=Af, Be+ 1=Bf
45. according to the arbitrary described server of claim 41 to 43, it is characterised in that described Judge processing unit specifically for
Random function is used to judge described window Wiz[ki-Az,ki+BzAt least part of data in] Whether meet described predetermined condition Cz
46. servers according to claim 45, it is characterised in that described judgement processes Unit specifically used hash function judges described window Wiz[ki-Az,ki+BzIn] at least partly Whether data meet described predetermined condition Cz
47. according to the arbitrary described server of claim 41 to 43, it is characterised in that described Judge that processing unit is for as described window Wiz[ki-Az,ki+BzIn], at least part of data are not Meet described predetermined condition Cz, from described current potential cut-point kiAlong described data flow point cutpoint Search direction N number of data flow point cutpoint minimum of jumping searches unit U, it is thus achieved that described new potential point Cutpoint, described determines that unit is that described new potential cut-point performs step a), according to described Rule, the window W determined for described new potential cut-pointic[ki-Ac,ki+Bc] left margin With described window Wiz[ki-Az,ki+Bz] right margin overlap or be described newly potential point The described window W that cutpoint determinesic[ki-Ac,ki+Bc] left margin be positioned at described window Wiz[ki- Az,ki+BzWithin the scope of];Wherein, the described window determined for described new potential cut-point Wic[ki-Ac,ki+Bc] it is according to described rule, the M determined for described new potential cut-point The sequence that individual window obtains according to data stream search direction sorts first window.
48. servers according to claim 46, it is characterised in that described judgement processes Unit uses random function to judge described window Wiz[ki-Az,ki+BzAt least part of data in] Whether meet described predetermined condition Cz, specifically include:
At described window Wiz[ki-Az,ki+BzF byte is selected, by anti-for described F byte in] Utilizing H time again, obtain F*H byte altogether, the most each byte is formed by 8, is designated as am,1… am,8, represent the 1st to the 8th of m-th byte in described F*H byte, described F*H word The position that joint is corresponding can be expressed as: a 1 , 1 a 1 , 2 ... a 1 , 8 a 2 , 1 a 2 , 2 ... a 2 , 8 . . . . . . . . . . . . a F * H , 1 a F * H , 2 ... a F * H , 8 , Work as am,nWhen=1, Vam,n =1, work as am,nWhen=0, Vam,n=-1, wherein am,nRepresent am,1…am,8In any one, described Position corresponding to F*H byte is according to am,nWith Vam,nTransformational relation obtain matrix Va, described matrix VaIt is expressed as: V a 1 , 1 V a 1 , 2 ... V a 1 , 8 V a 2 , 1 V a 2 , 2 ... V a 2 , 8 . . . . . . . . . . . . V a F * H , 1 V a F * H , 2 ... V a F * H , 8 , Select from the random number of service normal distribution Select F*H*8 random number composition matrix R, described matrix R to be expressed as: h 1 , 1 h 1 , 2 ... h 1 , 8 h 2 , 1 h 2 , 2 ... h 2 , 8 . . . . . . . . . . . . h F * H , 1 h F * H , 2 ... h F * H , 8 , By described matrix VaThe m row of m row and described matrix R Random number be multiplied, then summation obtain a value, be embodied as Sam=Vam,1*hm,1+Vam,2 *hm,2+…+Vam,8*hm,8, in like manner, it is thus achieved that Sa1、Sa2... to SaF*H, add up Sa1、Sa2… To SaF*HIn meet number K of value more than 0, when K is even number, the most described window Wiz[ki-Az, ki+BzIn], at least part of data meet described predetermined condition Cz
CN201480000347.4A 2014-02-14 2014-02-27 A kind of method based on whois lookup data flow point cutpoint and server Active CN104169917B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201480000347.4A CN104169917B (en) 2014-02-14 2014-02-27 A kind of method based on whois lookup data flow point cutpoint and server
CN201610439783.2A CN106095971B (en) 2014-02-14 2014-02-27 A kind of method and server for searching data flow cut-point based on server

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN2014072115 2014-02-14
CNPCT/CN2014/072115 2014-02-14
CN201480000347.4A CN104169917B (en) 2014-02-14 2014-02-27 A kind of method based on whois lookup data flow point cutpoint and server
PCT/CN2014/072648 WO2015120645A1 (en) 2014-02-14 2014-02-27 Server-based method for searching for data flow break point, and server

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN201610439783.2A Division CN106095971B (en) 2014-02-14 2014-02-27 A kind of method and server for searching data flow cut-point based on server

Publications (2)

Publication Number Publication Date
CN104169917A CN104169917A (en) 2014-11-26
CN104169917B true CN104169917B (en) 2016-08-24

Family

ID=51912349

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201480000347.4A Active CN104169917B (en) 2014-02-14 2014-02-27 A kind of method based on whois lookup data flow point cutpoint and server

Country Status (1)

Country Link
CN (1) CN104169917B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102214210A (en) * 2011-05-16 2011-10-12 成都市华为赛门铁克科技有限公司 Method, device and system for processing repeating data
WO2012044366A1 (en) * 2010-09-30 2012-04-05 Commvault Systems, Inc. Content aligned block-based deduplication

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012044366A1 (en) * 2010-09-30 2012-04-05 Commvault Systems, Inc. Content aligned block-based deduplication
CN102214210A (en) * 2011-05-16 2011-10-12 成都市华为赛门铁克科技有限公司 Method, device and system for processing repeating data

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《Improving Duplicate Elimination in Storage Systems》;DEEPAK R. BOBBARJUNG等;《ACM Transactions on Storage》;20061130;第2卷(第4期);第424-448页 *
《基于存储环境感知的重复数据删除算法优化》;周敬利等;《计算机科学》;20110228;第38卷(第2期);第63-67页 *

Also Published As

Publication number Publication date
CN104169917A (en) 2014-11-26

Similar Documents

Publication Publication Date Title
US20200285634A1 (en) System for data sharing platform based on distributed data sharing environment based on block chain, method of searching for data in the system, and method of providing search index in the system
Pittel Asymptotical growth of a class of random trees
CN104462609B (en) RDF data storage and querying method with reference to star-like graph code
US20150358219A1 (en) System and method for gathering information
CN106897409A (en) Data point library storage method and device
Ou et al. Order acceptance and scheduling with consideration of service level
Li et al. ASLM: Adaptive single layer model for learned index
CN107038059A (en) virtual machine deployment method and device
CN108415912A (en) Data processing method based on MapReduce model and equipment
CN104182518A (en) Collaborative filtering recommendation method and device
CN101551814B (en) Method for data management and data search
EP3026585A1 (en) Server-based method for searching for data flow break point, and server
Flores Analysis of internal computer sorting
CN104169917B (en) A kind of method based on whois lookup data flow point cutpoint and server
Street Defining sets for block designs: an update
CN106095971B (en) A kind of method and server for searching data flow cut-point based on server
CN105843859A (en) Data processing method, device and equipment
JP7099316B2 (en) Similarity arithmetic units, methods, and programs
Epstein et al. Robust algorithms for total completion time
Tarjan et al. Balancing applied to maximum network flow problems
CN106202503A (en) Data processing method and device
WO2011016281A2 (en) Information processing device and program for learning bayesian network structure
CN105373561B (en) The method and apparatus for identifying the logging mode in non-relational database
CN114169488A (en) Hybrid meta-heuristic algorithm-based vehicle path acquisition method with capacity constraint
Shibasaki et al. Lagrangian bounds for large‐scale multicommodity network design: a comparison between Volume and Bundle methods

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20220118

Address after: 450046 Floor 9, building 1, Zhengshang Boya Plaza, Longzihu wisdom Island, Zhengdong New Area, Zhengzhou City, Henan Province

Patentee after: xFusion Digital Technologies Co., Ltd.

Address before: 518129 Bantian HUAWEI headquarters office building, Longgang District, Guangdong, Shenzhen

Patentee before: HUAWEI TECHNOLOGIES Co.,Ltd.

TR01 Transfer of patent right