CN108268628A - The method and device of delta compression based on dynamic anchor point - Google Patents
The method and device of delta compression based on dynamic anchor point Download PDFInfo
- Publication number
- CN108268628A CN108268628A CN201810035223.XA CN201810035223A CN108268628A CN 108268628 A CN108268628 A CN 108268628A CN 201810035223 A CN201810035223 A CN 201810035223A CN 108268628 A CN108268628 A CN 108268628A
- Authority
- CN
- China
- Prior art keywords
- anchor point
- target
- data stream
- offset
- paragraph
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/90335—Query processing
- G06F16/90344—Query processing by using string matching techniques
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The invention discloses a kind of method and device of the delta compression based on dynamic anchor point, wherein, this method comprises the following steps:According to hash algorithm scanning target data stream and reference data stream is rolled, the identical target candidate anchor point of cryptographic Hash will be rolled with being labeled as an anchor point pair with reference to candidate anchor point;Using anchor point to reference data stream and target data stream are divided into multiple paragraphs respectively;For the paragraph of no change, then record the section of paragraph and carry out coded treatment;For there is the paragraph of change, then when target data stream and reference data stream flow into matching module progress string matching, according to the anchor point detected to coming automatic aligning target window or reference windows;Coded treatment is carried out to the result of string matching;And outputting encoded data.Technical scheme of the present invention can simplify calculating, improve operation efficiency;A large amount of memory sources on chip can be saved, it can be with hardware realization.
Description
Technical field
The present invention relates to technical field of data processing more particularly to a kind of delta compression based on dynamic anchor point method and
Device.
Background technology
At present, most of compress techniques are related to handling individual traffic, and different, delta compression is then to pass through meter
The increment calculated between target data stream and reference data stream carries out data compression.Increment can be considered as target and reference data stream it
Between difference coding, therefore, target data stream can also be restored by increment and reference data stream.
Delta compression is initially applied in version control system.Pass through the number for storing the increment of different editions to substitute practical
According to system storage demand can be significantly reduced.For example, the Xdelta file system (XDFS) of MacDonald is exactly to utilize increment pressure
What contracting was realized.Another application of delta compression is software distribution, and the software particularly distributed on the internet is particularly typical.It is soft
Part can dramatically reduction network flow by distributing increment or patch.In addition, delta compression can also be used for improving HTTP performances.It should
Technology effectively reduces web access using the similitude between the different pages of appointed website or the different editions of named web page
Delay.The VCDIFF being defined in RFC supports the usage.However, as certain deletions or insertion operation, reference data is often not
It can be matched with target data.And excessive, the target inputted in reference data window if reference data and target data misplace
Data cannot find matched character string, be substantially reduced so as to cause compression ratio.Currently used several delta compressions
Device is not avoided that this problem, including xdetla, vdelta (and its newer VCDIFF) and zdelta.
In view of this, it is necessary to propose that current delta compression technology is further improved.
Invention content
To solve an above-mentioned at least technical problem, the main object of the present invention is to provide a kind of increment based on dynamic anchor point
The method of compression.
To achieve the above object, one aspect of the present invention is:A kind of increment based on dynamic anchor point is provided
The method of compression, including:
Target data stream and reference data stream, the target anchor point identical by cryptographic Hash is rolled are scanned according to hash algorithm is rolled
With being labeled as an anchor point pair with reference to anchor point, wherein, the anchor point is to being expressed as (offset of a relatively upper target anchor point, phase
To upper one offset with reference to anchor point);
Using anchor point to target data stream and reference data stream are divided into multiple paragraphs respectively;
For the paragraph of no change, then record the section of paragraph and carry out coded treatment;
For there is the paragraph of change, then flow into matching module in target data stream and reference data stream and carry out string matching
When, according to the anchor point detected to coming automatic aligning target window or reference windows, wherein, the target window can accommodating portion
The data of target data stream, the reference windows can accommodating portion reference data stream data;
Coded treatment is carried out to the result of string matching;And outputting encoded data.
Wherein, it is described according to rolling hash algorithm scanning target data stream and reference data stream, it is identical by cryptographic Hash is rolled
Target anchor point and the step with reference to anchor point labeled as anchor point pair, specifically include:
Using hash algorithm scanning target data stream is rolled, a string of rolling cryptographic Hash are obtained, is chosen to roll with bitmask and be breathed out
Several data bit of uncommon value are compared with default Hash characteristic value, if equal, are recorded as with reference to candidate anchor point;
Using hash algorithm scanning reference data stream is rolled, a string of rolling cryptographic Hash are obtained, is chosen to roll with bitmask and be breathed out
Several data bit of uncommon value, if equal, are recorded as target candidate anchor point compared with default Hash characteristic value;
Whether comparison reference candidate anchor point is identical with the rolling cryptographic Hash of target candidate anchor point, will be referred to when the two is identical
Candidate anchor point is determined as with reference to anchor point and target candidate anchor point is determined as target anchor point;
Anchor point will be referred to and target anchor point is marked as an anchor point pair.
Wherein, the step of anchor point that the basis detects is to coming automatic aligning target window or reference windows, further includes:
Judge the sequencing of the reference anchor point of reference data stream and the target anchor point of target data stream;
If the time detected with reference to anchor point before target anchor point, i.e., is less than target anchor point with reference to the offset of anchor point
Offset then suspends reference data stream and flows to reference windows, and continues to search for flowing into the target data stream of target window, until looking for
Until same target anchor point, continue to execute and string matching is carried out to target data stream and reference data stream;If with reference to anchor
The time of point detection after target anchor point, i.e., is more than the offset of target anchor point with reference to the offset of anchor point, then suspends target
Data flow direction target window, and continue to search for flowing into the reference data stream of reference windows, it is up to finding the same anchor point that refers to
Only, continue to carry out string matching to target data stream and reference data stream.
Wherein, the target data stream and reference data stream flow into the step of matching module carries out string matching, specifically
Including:
The target data stream for flowing into reference windows and reference data are done by limit matching according to LZ77 algorithms, obtain output knot
Fruit, the output result are matching unit [offset, matching length] and character cell.
Wherein, the outputting encoded data include without the paragraph of change outputting encoded data and have the paragraph of change
Outputting encoded data;
The outputting encoded data for the paragraph do not changed with reference to anchor point and bout length by forming;
It is made of the outputting encoded data of the paragraph of change reference anchor point, matching unit and character cell;Wherein, institute
State the offset being labeled as with reference to anchor point relative to upper one with reference to anchor point;The offset of the matching unit is relative to current
With reference to the offset of anchor point.
To achieve the above object, another technical solution used in the present invention is:A kind of increasing based on dynamic anchor point is provided
The device of compression is measured, including:
Determining module, for according to hash algorithm scanning target data stream and reference data stream is rolled, cryptographic Hash will to be rolled
Identical target anchor point is labeled as an anchor point pair with reference anchor point, wherein, the anchor point is to being expressed as (relatively upper a target anchor
The offset of point, a relatively upper offset with reference to anchor point);
Paragraph division module, for using anchor point to target data stream and reference data stream are divided into multiple paragraphs respectively;
First processing module is handled for the paragraph to no change, including recording the section of paragraph and being compiled
Code processing;
Second processing module for handling the paragraph for having change, is included in target data stream and reference data stream
When flowing into matching module progress string matching, according to the anchor point detected to coming automatic aligning target window or reference windows,
Wherein, the target window can accommodating portion target data stream data, the reference windows can accommodating portion reference data stream
Data;
Coding module, for carrying out coded treatment to the result of string matching;And
Output module, for outputting encoded data.
Wherein, the determining module, is specifically used for:
Using hash algorithm scanning target data stream is rolled, a string of rolling cryptographic Hash are obtained, is chosen to roll with bitmask and be breathed out
Several data bit of uncommon value are compared with default Hash characteristic value, if equal, are recorded as with reference to candidate anchor point;
Using hash algorithm scanning reference data stream is rolled, a string of rolling cryptographic Hash are obtained, is chosen to roll with bitmask and be breathed out
Several data bit of uncommon value, if equal, are recorded as target candidate anchor point compared with default Hash characteristic value;
Whether comparison reference candidate anchor point is identical with the rolling cryptographic Hash of target candidate anchor point, will be referred to when the two is identical
Candidate anchor point is determined as with reference to anchor point and target candidate anchor point is determined as target anchor point;
Anchor point will be referred to and target anchor point is marked as an anchor point pair.
Wherein, the Second processing module, is specifically used for:
Judge the sequencing of the reference anchor point of reference data stream and the target anchor point of target data stream;
If the time detected with reference to anchor point before target anchor point, i.e., is less than target anchor point with reference to the offset of anchor point
Offset then suspends reference data stream and flows to reference windows, and continues to search for flowing into the target data stream of target window, until looking for
Until same target anchor point, continue to execute and string matching is carried out to target data stream and reference data stream;If with reference to anchor
The time of point detection after target anchor point, i.e., is more than the offset of target anchor point with reference to the offset of anchor point, then suspends target
Data flow direction target window, and continue to search for flowing into the reference data stream of reference windows, it is up to finding the same anchor point that refers to
Only, continue to carry out string matching to target data stream and reference data stream.
Wherein, the Second processing module, is additionally operable to:
The target data stream for flowing into reference windows and reference data are done by limit matching according to LZ77 algorithms, obtain output knot
Fruit, the output result are matching unit [offset, matching length] and character cell.
Wherein, the outputting encoded data include without the paragraph of change outputting encoded data and have the paragraph of change
Outputting encoded data;
The outputting encoded data for the paragraph do not changed with reference to anchor point and bout length by forming;
It is made of the outputting encoded data of the paragraph of change reference anchor point, matching unit and character cell;Its
In, the offset being labeled as with reference to anchor point relative to upper one with reference to anchor point;The offset of the matching unit is opposite
In the offset of current reference anchor point.
Technical scheme of the present invention mainly scans target data stream and reference data stream using first according to rolling hash algorithm,
By the target anchor point matched with being labeled as an anchor point pair with reference to anchor point;Using anchor point to target data stream and reference data
Stream is divided into multiple paragraphs;For the paragraph of no change, then record the section of paragraph and carry out coded treatment;For there is change
Paragraph, then when target data stream and reference data stream carry out string matching in the form of transmitting as a stream, according to the anchor detected
Any one automatic aligning target window or reference windows of point centering, then carry out coded treatment to the result of string matching;
Last outputting encoded data, this programme pass through the dynamic anchor point of setting by using reference windows more smaller than other tools
Target data stream and reference data stream are detected, with the performance that this simplifies computational complexity and improves system;In addition, smaller ginseng
Hardware realization can also be made it possible by saving a large amount of memory sources on chip by examining window.
Description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, to embodiment or will show below
There is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Some embodiments of invention, for those of ordinary skill in the art, without creative efforts, can be with
Structure according to these attached drawings obtains other attached drawings.
Fig. 1 shows that text is inserted into the schematic diagram of target data;
Fig. 2 shows existing data compression schematic diagrames;
Fig. 3 shows that present invention introduces dynamic anchor points to realize data compression schematic diagram;
Fig. 4 shows the flow chart of the method for delta compression of the one embodiment of the invention based on dynamic anchor point;
Fig. 5 shows that the present invention searches candidate anchor point schematic diagram using hash algorithm;
Fig. 6 shows the flow chart of the method for delta compression of the specific embodiment based on dynamic anchor point of the invention;
Fig. 7 shows the block diagram of the device of invention delta compression of one embodiment based on dynamic anchor point.
The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present invention, the technical solution in the embodiment of the present invention is carried out clear, complete
Site preparation describes, it is clear that described embodiment is only the part of the embodiment of the present invention, instead of all the embodiments.Base
Embodiment in the present invention, those of ordinary skill in the art obtained without creative efforts it is all its
His embodiment, shall fall within the protection scope of the present invention.
It is to be appreciated that the description of " first ", " second " etc. involved in the present invention be only used for description purpose, and it is not intended that
Indicate or imply its relative importance or the implicit quantity for indicating indicated technical characteristic.Define as a result, " first ",
At least one this feature can be expressed or be implicitly included to the feature of " second ".In addition, the technical side between each embodiment
Case can be combined with each other, but must can be implemented as basis with those of ordinary skill in the art, when the combination of technical solution
Conflicting or can not realize when occur will be understood that the combination of this technical solution is not present, also not the present invention claims guarantor
Within the scope of shield.
Fig. 1 is please referred to, Fig. 1 shows that text is inserted into the schematic diagram of target data;It will be seen from figure 1 that target data stream
Had more y segment datas compared to reference data stream, which is identified and is encoded with delta compression, finally by by its
He preserves compressed content by data referencing to reference data.At present, all delta compression devices all by the target data stream of input with
Reference data stream is compared.
Fig. 2 is please referred to, Fig. 2 shows existing data compression schematic diagrames, and also some compressor reducers are also by the target of input
Data are compared with previous target data (target histories).By source data (also referred to as target data) and reference windows and mesh
Mark window is compared, to find matched character string.It, can when reference windows are sufficiently large, can keep entire reference data stream
Realize optimum compression ratio.But in order to save resource, entire reference data and incoming target data are not compared by this.
On the contrary, as most of compressibilities, only a part of reference data storage is compared in reference windows.Therefore, such as
Data character string in fruit reference data stream not in reference windows, is answered matched character string that will not be found, is caused just
Compression ratio will significantly reduce.By taking Fig. 1 as an example, if the data segment y in figure is more than the size of reference windows, in reference windows
Target data after y can not find matched character string.For this purpose, it is asked invention introduces dynamic anchor point to solve above-mentioned technology
Topic.Identical content part can be marked out by dynamic anchor point between reference and target data stream.
Fig. 3 is please referred to, Fig. 3 shows that present invention introduces dynamic anchor points to realize data compression schematic diagram.Compared to existing skill
Art, this programme is before string matching, and dynamic anchor point is by scanning reference data stream and target data Stream Discovery.In character string
With period, compressor reducer adjusts reference windows pointer according to dynamic anchor point;If reference offset is more than target offset, compressor reducer is with faster
Speed pull reference data, some texts are deleted.If reference offset is less than target offset amount, compressor reducer stops reference window
Mouthful, it is meant that some texts are inserted into target data, and concrete implementation method please refers to following embodiments.
Fig. 4 is please referred to, Fig. 4 shows the flow of the method for delta compression of the one embodiment of the invention based on dynamic anchor point
Figure.In embodiments of the present invention, the method for the delta compression based on dynamic anchor point is somebody's turn to do, is included the following steps:
Step S10, it is identical by cryptographic Hash is rolled according to rolling hash algorithm scanning target data stream and reference data stream
Target anchor point with reference to anchor point labeled as an anchor point pair, wherein, the anchor point to be expressed as (a relatively upper target anchor point it is inclined
Shifting amount, a relatively upper offset with reference to anchor point);
Step S20, using anchor point to reference data stream and target data stream are divided into multiple paragraphs respectively;
Step S30, it for the paragraph of no change, then records the section of paragraph and carries out coded treatment;
Step S40, it for there is the paragraph of change, then flows into matching module in target data stream and reference data stream and carries out word
When according with String matching, according to the anchor point detected to coming automatic aligning target window or reference windows, wherein, the target window can
The data of accommodating portion target data stream, the reference windows can accommodating portion reference data stream data;
Step S41, coded treatment is carried out to the result of string matching;And
Step S50, outputting encoded data.
In the above embodiments, the reference anchor point in target anchor point and reference data stream in target data stream can pass through
Hash algorithm is rolled to determine.Roll the hash function that hash algorithm is the mobile computing cryptographic Hash using input in the window.
Hash function allows quickly to calculate rolling Hash --- and new cryptographic Hash removes window by being deleted in old cryptographic Hash
The new value that the old value of mouth and addition move into window is calculated.This is a kind of mode similar to rolling average function, and operation is fast
Degree can be more faster than other low-pass filters.By target anchor point with being labeled as anchor point pair with reference to anchor point, then by target data
Stream flows into target window and reference data stream is inputted reference windows, and according in detection target data stream or reference data stream
The anchor point automatic aligning target window or reference windows of detection;Coded treatment then is carried out to the result of string matching;Finally
Output data, this programme pass through intelligence alignment reference and target data so that most like data are included with reference to target window,
Better compression ratio is realized with this.
Technical scheme of the present invention is main first according to rolling hash algorithm scanning target data stream and reference data stream, will
The target anchor point mixed with reference to anchor point with being labeled as an anchor point pair;Using anchor point to target data stream and reference data flow point
Into multiple paragraphs;For the paragraph of no change, then record the section of paragraph and carry out coded treatment;For there is the section of change
It falls, then when target data stream and reference data stream carry out string matching in the form of transmitting as a stream, according to the anchor point detected
Any one automatic aligning target window or reference windows of centering then carry out coded treatment to the result of string matching;Most
Outputting encoded data afterwards, this programme by using reference windows more smaller than other tools, and by the dynamic anchor point of setting come
Target data stream and reference data stream are detected, with the performance that this simplifies computational complexity and improves system;In addition, smaller reference
Window can also make it possible hardware realization by saving a large amount of memory sources on chip.
It is described according to hash algorithm scanning target data stream and reference data stream is rolled in a specific embodiment, it will
The identical target anchor point of cryptographic Hash and the step with reference to anchor point labeled as an anchor point pair are rolled, is specifically included:
Using hash algorithm scanning target data stream is rolled, a string of rolling cryptographic Hash are obtained, is chosen to roll with bitmask and be breathed out
Several data bit of uncommon value are compared with default Hash characteristic value, if equal, are recorded as with reference to candidate anchor point;
Using hash algorithm scanning reference data stream is rolled, a string of rolling cryptographic Hash are obtained, is chosen to roll with bitmask and be breathed out
Several data bit of uncommon value, if equal, are recorded as target candidate anchor point compared with default Hash characteristic value;
Whether comparison reference candidate anchor point is identical with the rolling cryptographic Hash of target candidate anchor point, will be referred to when the two is identical
Candidate anchor point is determined as with reference to anchor point and target candidate anchor point is determined as target anchor point;
Anchor point will be referred to and target anchor point is marked as an anchor point pair.
In above-described embodiment, the target candidate anchor point in the candidate anchor point of reference and target data stream of reference data stream
It can be determined by rolling hash algorithm.Roll the Hash letter that Hash is the mobile computing cryptographic Hash using input in the window
Number.Particular flow sheet can refer to Fig. 5, and Fig. 5 shows that the present invention searches candidate anchor point schematic diagram using hash algorithm.With Rabin-
For Karp algorithms, the algorithm is specific as follows usually using very simple rolling hash function:
Hk=(c1ak-1+c2ak-2+c3ak-3+...+cka0) mod M,
Wherein a, M are constants, and c1 ..., ck are input characters.
In order to avoid calculating huge H values, all mathematics is all to take M moulds, delete and addition character only need addition or
First term or tail item are subtracted, all characters are moved to left one then needs to the left side the sum of entire Hk being multiplied by a.Therefore, the meter of Hk+1
Calculation can be reduced to:
Hk+1=((Hk-c1ak-1) * a+ck+1) mod M,
Therefore, inswept entire reference data stream, each hash sliding window that slides can generate a rolling cryptographic Hash.If
This rolls cryptographic Hash and matches with predefined feature string (for example, least significant bit " 0 " of selected quantity), then recently
Offset is registered as with reference to candidate anchor point between the character of immigration and upper reference anchor point.This matched rolling cryptographic Hash
The fingerprint of also referred to as candidate anchor point, candidate anchor point are expressed as follows:(anchoring offset, anchor point fingerprint).Wherein, anchor point fingerprint is rolls
Cryptographic Hash.
From the foregoing, it will be observed that target candidate anchor point can be determined in a like fashion.If fingerprint and the reference of target candidate anchor point
Candidate anchor point is identical, then confirms this target candidate anchor point for target anchor point, and target anchor point and reference anchor point are marked as one
Anchor point pair.In addition, anchor point density is adjustable.For example, we can be by being configured the feature with minimum effective 11 " 0 "
Character string identifies an anchor point pair per 2KB to be averaged.Density is 1KB, and 10 minimum values are " 0 ".Although higher density can carry
High compression ratio, but more resources will be consumed to anchor point processing.
In a specific embodiment, the anchor point that the basis detects is to coming automatic aligning target window or reference windows
The step of, it further includes:
Judge the sequencing of the reference anchor point of reference data stream and the target anchor point of target data stream;
If the time detected with reference to anchor point before target anchor point, i.e., is less than target anchor point with reference to the offset of anchor point
Offset then suspends reference data stream and flows to reference windows, and continues to search for flowing into the target data stream of target window, until looking for
Until same target anchor point, continue to execute and string matching is carried out to target data stream and reference data stream;If with reference to anchor
The time of point detection after target anchor point, i.e., is more than the offset of target anchor point with reference to the offset of anchor point, then suspends target
Data flow direction target window, and continue to search for flowing into the reference data stream of reference windows, it is up to finding the same anchor point that refers to
Only, continue to carry out string matching to target data stream and reference data stream.
By the above embodiments, the screening to the data character string in target data stream can be simplified, work as target data
Stream continues to match not in reference windows, when target data is in reference windows, can match corresponding anchor point pair, so as to
Compression ratio can be improved, improves compression efficiency.
In a specific embodiment, the target data stream and reference data stream flow into matching module and carry out character string
With the step of, specifically include:
The target data stream for flowing into reference windows and reference data are done by limit matching according to LZ77 algorithms, obtain output knot
Fruit, the output result are matching unit [offset, matching length] and character cell.Output result obtains defeated by coding output
Go out coded data, the outputting encoded data include without change paragraph outputting encoded data and have change paragraph output
Coded data;The outputting encoded data for the paragraph do not changed with reference to anchor point and bout length by forming;There is the paragraph of change
It is formed with reference to anchor point, matching unit and character cell;Wherein, it is described to be labeled as referring to anchor point relative to upper one with reference to anchor point
Offset;The offset of the matching unit is the offset relative to current reference anchor point.
Fig. 6 is please referred to, Fig. 6 shows the stream of the method for delta compression of the specific embodiment based on dynamic anchor point of the invention
Cheng Tu.Idiographic flow step includes:When delta compression starts, step S101, using rolling hash algorithm to reference paper (ginseng
Examine data flow) and file destination (target data stream) calculated, obtain with reference to candidate anchor point and determine to join according to anchor point fingerprint
It examines candidate anchor point and target candidate anchor point and target candidate anchor point is determined according to anchor point fingerprint;Step S102, candidate anchor will be referred to
Point and target candidate anchor point carry out anchor point pairing, if specifically, the fingerprint of target candidate anchor point is identical with reference to candidate anchor point,
Candidate anchor point will be then referred to be determined as, with reference to anchor point, target candidate anchor point being determined as target anchor point, and this target anchor point
With this anchor point pair is marked as with reference to anchor point;Step S103, reference data stream and target data stream are corresponded into input ginseng
Examine window and target window;Step S104, string matching (LZ77) is carried out in reference windows and target window;Step
S105, matching when, judge whether reference data stream and target data stream terminate, if then delta compression terminates;If otherwise after
Continuous detection, and perform step S106, judge whether detect anchor point in reference windows and target window, if otherwise jumping to step
Rapid S104;If so then execute step S107, it is with reference to anchor point or target anchor point to judge the anchor point;If anchor point is with reference to anchor point,
Step S110 is performed, suspends reference windows, string matching is carried out to the target data stream for flowing into target window;Step S111,
Judge whether to find it is corresponding with reference to anchor point, if then return to step S103;If otherwise return to step S110;If anchor point is target
Anchor point performs step S108, suspends target window, continues to be detected the reference data stream for flowing into reference windows;Step
S109 judges whether to find corresponding target anchor point, if then return to step S103, if otherwise return to step S108.
Fig. 7 is please referred to, Fig. 7 shows the module box of the device of invention delta compression of one embodiment based on dynamic anchor point
Figure.In the embodiment of the present invention, the device of the delta compression based on dynamic anchor point is somebody's turn to do, including:
Determining module 10, for according to hash algorithm scanning target data stream and reference data stream is rolled, Hash will to be rolled
It is worth identical target anchor point with being labeled as an anchor point pair with reference to anchor point, wherein, the anchor point is to being expressed as (relatively upper a target
The offset of anchor point, a relatively upper offset with reference to anchor point);
Paragraph division module 20, for using anchor point to target data stream and reference data stream are divided into multiple sections respectively
It falls;
First processing module 30 is handled for the paragraph to no change, section and progress including recording paragraph
Coded treatment;
Second processing module 40 for handling the paragraph for having change, is included in target data stream and reference data
When stream flows into matching module progress string matching, according to the anchor point detected to coming automatic aligning target window or reference window
Mouthful, wherein, the target window can accommodating portion target data stream data, the reference windows can accommodating portion reference data
The data of stream;
Coding module 50, for carrying out coded treatment to the result of string matching;And
Output module 60, for outputting encoded data.
In the above embodiments, which determines the target anchor in target data stream by rolling hash algorithm
Reference anchor point in point and reference data stream.Roll the Kazakhstan that hash algorithm is the mobile computing cryptographic Hash using input in the window
Uncommon function.Hash function allows quickly to calculate rolling Hash --- and new cryptographic Hash in old cryptographic Hash by deleting
The new value that the old value of grand window and addition move into window is calculated.This is a kind of mode similar to rolling average function,
Arithmetic speed can be more faster than other low-pass filters.By target anchor point with being labeled as anchor point pair with reference to anchor point, then by mesh
Mark data flow flows into target window and reference data stream is inputted reference windows, and passes through Second processing module 40 and detect target
Anchor point automatic aligning target window or reference windows in data flow or reference data stream;Then by coding module 50 to character
The result of String matching carries out coded treatment;Finally by 60 output data of output module, this programme by intelligence alignment with reference to and
Target data so that include most like data with reference to target window, better compression ratio is realized with this.
In a specific embodiment, the determining module 10 is specifically used for:
Using hash algorithm scanning target data stream is rolled, a string of rolling cryptographic Hash are obtained, is chosen to roll with bitmask and be breathed out
Several data bit of uncommon value are compared with default Hash characteristic value, if equal, are recorded as with reference to candidate anchor point;
Using hash algorithm scanning reference data stream is rolled, a string of rolling cryptographic Hash are obtained, is chosen to roll with bitmask and be breathed out
Several data bit of uncommon value, if equal, are recorded as target candidate anchor point compared with default Hash characteristic value;
Whether comparison reference candidate anchor point is identical with the rolling cryptographic Hash of target candidate anchor point, will be referred to when the two is identical
Candidate anchor point is determined as with reference to anchor point and target candidate anchor point is determined as target anchor point;
And this is marked as an anchor point pair with reference to anchor point and this target anchor point.
In above-described embodiment, the target candidate anchor point in the candidate anchor point of reference and target data stream of reference data stream
It can be determined by rolling hash algorithm.The specific hash algorithm that rolls please refers to the above embodiments, and details are not described herein again.
In a specific embodiment, the Second processing module 40 is additionally operable to:
Judge the sequencing of the reference anchor point of reference data stream and the target anchor point of target data stream;
If the time detected with reference to anchor point before target anchor point, i.e., is less than target anchor point with reference to the offset of anchor point
Offset then suspends reference data stream and flows to reference windows, and continues to search for flowing into the target data stream of target window, until looking for
Until same target anchor point, continue to execute and string matching is carried out to target data stream and reference data stream;If with reference to anchor
The time of point detection after target anchor point, i.e., is more than the offset of target anchor point with reference to the offset of anchor point, then suspends target
Data flow direction target window, and continue to search for flowing into the reference data stream of reference windows, it is up to finding the same anchor point that refers to
Only, continue to carry out string matching to target data stream and reference data stream.
By the above embodiments, the screening to the data character string in target data stream can be simplified, work as target data
Stream continues to match not in reference windows, when target data is in reference windows, can match corresponding anchor point pair, so as to
Compression ratio can be improved, improves compression efficiency.
In a specific embodiment, the Second processing module 40 is additionally operable to that according to LZ77 algorithms reference window will be flowed into
Mouthful target data stream and reference data do limit matching, exported as a result, the output result for matching unit [offset,
Matching length] and character cell.Output result obtains outputting encoded data by coding output.The outputting encoded data includes
The outputting encoded data for the paragraph do not changed and have change paragraph outputting encoded data;The output for the paragraph do not changed
Coded data with reference to anchor point and bout length by forming;There is the outputting encoded data of the paragraph of change by with reference to anchor point, matching
Unit and character cell are formed;Wherein, the offset being labeled as with reference to anchor point relative to upper one with reference to anchor point;It is described
The offset of matching unit is the offset relative to current reference anchor point.
The foregoing is merely the preferred embodiment of the present invention, are not intended to limit the scope of the invention, every at this
The equivalent structure transformation made under the inventive concept of invention using description of the invention and accompanying drawing content or directly/utilization indirectly
It is included in the scope of patent protection of the present invention in other related technical areas.
Claims (10)
- A kind of 1. method of the delta compression based on dynamic anchor point, which is characterized in that the delta compression based on dynamic anchor point Method include:According to hash algorithm scanning target data stream and reference data stream is rolled, the identical target anchor point of cryptographic Hash and ginseng will be rolled Anchor point is examined labeled as an anchor point pair, wherein, the anchor point to be expressed as (offset of a relatively upper target anchor point, relatively on One offset with reference to anchor point);Using anchor point to target data stream and reference data stream are divided into multiple paragraphs respectively;For the paragraph of no change, then record the section of paragraph and carry out coded treatment;For there is the paragraph of change, then when target data stream and reference data stream flow into matching module progress string matching, According to the anchor point detected to coming automatic aligning target window or reference windows, wherein, the target window can receiving portion subhead Mark data flow data, the reference windows can accommodating portion reference data stream data;The result of string matching is carried out Coded treatment;AndOutputting encoded data.
- 2. the method for the delta compression as described in claim 1 based on dynamic anchor point, which is characterized in that described to be breathed out according to rolling Uncommon algorithm scanning target data stream and reference data stream will roll the identical target anchor point of cryptographic Hash with being labeled as one with reference to anchor point The step of a anchor point pair, specifically includes:Using hash algorithm scanning target data stream is rolled, a string of rolling cryptographic Hash are obtained, is chosen with bitmask and rolls cryptographic Hash Several data bit compared with default Hash characteristic value, if equal, be recorded as with reference to candidate anchor point;Using hash algorithm scanning reference data stream is rolled, a string of rolling cryptographic Hash are obtained, is chosen with bitmask and rolls cryptographic Hash Several data bit compared with default Hash characteristic value, if equal, be recorded as target candidate anchor point;Whether comparison reference candidate anchor point is identical with the rolling cryptographic Hash of target candidate anchor point, and candidate will be referred to when the two is identical Anchor point is determined as with reference to anchor point and target candidate anchor point is determined as target anchor point;Anchor point will be referred to and target anchor point is marked as an anchor point pair.
- 3. the method for the delta compression as described in claim 1 based on dynamic anchor point, which is characterized in that the basis detects Anchor point to coming automatic aligning target window or reference windows the step of, further include:Judge the sequencing of the reference anchor point of reference data stream and the target anchor point of target data stream;If with reference to anchor point detect time before target anchor point, i.e., with reference to the offset of anchor point be less than target anchor point offset Amount, then suspend reference data stream and flow to reference windows, and continues to search for flowing into the target data stream of target window, until finding same Until one target anchor point, continue to execute and string matching is carried out to target data stream and reference data stream;If it is examined with reference to anchor point The time of survey after target anchor point, i.e., is more than the offset of target anchor point with reference to the offset of anchor point, then suspends target data Stream flows to target window, and continues to search for flowing into the reference data stream of reference windows, until same reference anchor point is found, after It is continuous that string matching is carried out to target data stream and reference data stream.
- 4. the method for the delta compression as described in claim 1 based on dynamic anchor point, which is characterized in that the target data stream And reference data stream flows into the step of matching module carries out string matching, specifically includes:The target data stream for flowing into reference windows and reference data are done by limit matching according to LZ77 algorithms, exported as a result, The output result is matching unit [offset, matching length] and character cell.
- 5. the method for the delta compression as claimed in claim 4 based on dynamic anchor point, which is characterized in that the exports coding number According to the paragraph including no change outputting encoded data and have change paragraph outputting encoded data;The paragraph do not changed Outputting encoded data by being formed with reference to anchor point and bout length;It is made of the outputting encoded data of the paragraph of change reference anchor point, matching unit and character cell;Wherein, the ginseng Anchor point is examined labeled as relative to upper one offset with reference to anchor point;The offset of the matching unit is relative to current reference The offset of anchor point.
- A kind of 6. device of the delta compression based on dynamic anchor point, which is characterized in that the delta compression based on dynamic anchor point Device include:Determining module is identical by cryptographic Hash is rolled for scanning target data stream and reference data stream according to rolling hash algorithm Target anchor point with reference to anchor point labeled as an anchor point pair, wherein, the anchor point is to being expressed as (a relatively upper target anchor point Offset, a relatively upper offset with reference to anchor point);Paragraph division module, for using anchor point to target data stream and reference data stream are divided into multiple paragraphs respectively;First processing module is handled for the paragraph to no change, including recording the section of paragraph and carrying out at coding Reason;Second processing module for handling the paragraph for having change, is included in target data stream and reference data stream flows into When matching module carries out string matching, according to the anchor point detected to coming automatic aligning target window or reference windows, wherein, The target window can accommodating portion target data stream data, the reference windows can accommodating portion reference data stream number According to;Coding module, for carrying out coded treatment to the result of string matching;AndOutput module, for outputting encoded data.
- 7. the device of the delta compression as claimed in claim 6 based on dynamic anchor point, which is characterized in that the determining module, It is specifically used for:Using hash algorithm scanning target data stream is rolled, a string of rolling cryptographic Hash are obtained, is chosen with bitmask and rolls cryptographic Hash Several data bit compared with default Hash characteristic value, if equal, be recorded as with reference to candidate anchor point;Using hash algorithm scanning reference data stream is rolled, a string of rolling cryptographic Hash are obtained, is chosen with bitmask and rolls cryptographic Hash Several data bit compared with default Hash characteristic value, if equal, be recorded as target candidate anchor point;Whether comparison reference candidate anchor point is identical with the rolling cryptographic Hash of target candidate anchor point, and candidate will be referred to when the two is identical Anchor point is determined as with reference to anchor point and target candidate anchor point is determined as target anchor point;Anchor point will be referred to and target anchor point is marked as an anchor point pair.
- 8. the device of the delta compression as claimed in claim 6 based on dynamic anchor point, which is characterized in that the second processing mould Block is specifically used for:Judge the sequencing of the reference anchor point of reference data stream and the target anchor point of target data stream;If with reference to anchor point detect time before target anchor point, i.e., with reference to the offset of anchor point be less than target anchor point offset Amount, then suspend reference data stream and flow to reference windows, and continues to search for flowing into the target data stream of target window, until finding same Until one target anchor point, continue to execute and string matching is carried out to target data stream and reference data stream;If it is examined with reference to anchor point The time of survey after target anchor point, i.e., is more than the offset of target anchor point with reference to the offset of anchor point, then suspends target data Stream flows to target window, and continues to search for flowing into the reference data stream of reference windows, until same reference anchor point is found, after It is continuous that string matching is carried out to target data stream and reference data stream.
- 9. the device of the delta compression as claimed in claim 6 based on dynamic anchor point, which is characterized in that the second processing mould Block is additionally operable to:The target data stream for flowing into reference windows and reference data are done by limit matching according to LZ77 algorithms, exported as a result, The output result is matching unit [offset, matching length] and character cell.
- 10. the device of the delta compression as claimed in claim 9 based on dynamic anchor point, which is characterized in that the exports coding Data include without change paragraph outputting encoded data and have change paragraph outputting encoded data;The outputting encoded data for the paragraph do not changed with reference to anchor point and bout length by forming;It is made of the outputting encoded data of the paragraph of change reference anchor point, matching unit and character cell;Wherein, the ginseng Anchor point is examined labeled as relative to upper one offset with reference to anchor point;The offset of the matching unit is relative to current reference The offset of anchor point.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810035223.XA CN108268628A (en) | 2018-01-15 | 2018-01-15 | The method and device of delta compression based on dynamic anchor point |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810035223.XA CN108268628A (en) | 2018-01-15 | 2018-01-15 | The method and device of delta compression based on dynamic anchor point |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108268628A true CN108268628A (en) | 2018-07-10 |
Family
ID=62775707
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810035223.XA Pending CN108268628A (en) | 2018-01-15 | 2018-01-15 | The method and device of delta compression based on dynamic anchor point |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108268628A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110287149A (en) * | 2019-05-10 | 2019-09-27 | 同济大学 | A kind of matching coding method using Hash Search |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050044294A1 (en) * | 2003-07-17 | 2005-02-24 | Vo Binh Dao | Method and apparatus for window matching in delta compressors |
CN101847998A (en) * | 2010-04-15 | 2010-09-29 | 同济大学 | High-performance GML flow compression method |
US20120185612A1 (en) * | 2011-01-19 | 2012-07-19 | Exar Corporation | Apparatus and method of delta compression |
CN105515586A (en) * | 2015-12-14 | 2016-04-20 | 华中科技大学 | Rapid delta compression method |
-
2018
- 2018-01-15 CN CN201810035223.XA patent/CN108268628A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050044294A1 (en) * | 2003-07-17 | 2005-02-24 | Vo Binh Dao | Method and apparatus for window matching in delta compressors |
CN101847998A (en) * | 2010-04-15 | 2010-09-29 | 同济大学 | High-performance GML flow compression method |
US20120185612A1 (en) * | 2011-01-19 | 2012-07-19 | Exar Corporation | Apparatus and method of delta compression |
CN105515586A (en) * | 2015-12-14 | 2016-04-20 | 华中科技大学 | Rapid delta compression method |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110287149A (en) * | 2019-05-10 | 2019-09-27 | 同济大学 | A kind of matching coding method using Hash Search |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhang et al. | AE: An asymmetric extremum content defined chunking algorithm for fast and bandwidth-efficient data deduplication | |
US8799239B2 (en) | Method, apparatus and computer program product for performing a query using a decision diagram | |
CN105721340B (en) | A kind of online reading pre-load amount calculation method and device | |
US20170147597A1 (en) | Quality score compression for improving downstream genotyping accuracy | |
CN101044480A (en) | Method, device and system for automatic retrieval of similar objects in a network of devices | |
CN1858734A (en) | Data storaging and searching method | |
US6735600B1 (en) | Editing protocol for flexible search engines | |
CN116915259B (en) | Bin allocation data optimized storage method and system based on internet of things | |
US8117343B2 (en) | Landmark chunking of landmarkless regions | |
CN101459489B (en) | Deep packet detection device and method | |
CN108268628A (en) | The method and device of delta compression based on dynamic anchor point | |
US11755540B2 (en) | Chunking method and apparatus | |
CN116015311A (en) | Lz4 text compression method based on sliding dictionary implementation | |
US7895347B2 (en) | Compact encoding of arbitrary length binary objects | |
CN104123309A (en) | Method and system used for data management | |
US7484068B2 (en) | Storage space management methods and systems | |
CN103607412A (en) | Content center multiple-interest-packet processing method based on tree | |
Kim et al. | Design and implementation of binary file similarity evaluation system | |
CN102722557A (en) | Self-adaption identification method for identical data blocks | |
CN111414339A (en) | File processing method, system, device, equipment and medium | |
CN116821970A (en) | Sampling detection method based on block chain and Internet of things | |
US20100228703A1 (en) | Reducing memory required for prediction by partial matching models | |
US20080114722A1 (en) | Method For Low Distortion Embedding Of Edit Distance To Hamming Distance | |
CN111597379B (en) | Audio searching method and device, computer equipment and computer-readable storage medium | |
WO2001071483A2 (en) | Determinaton of a minimum or maximum value in a set of data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180710 |
|
RJ01 | Rejection of invention patent application after publication |