CN102082575A - Method for removing repeated data based on pre-blocking and sliding window - Google Patents

Method for removing repeated data based on pre-blocking and sliding window Download PDF

Info

Publication number
CN102082575A
CN102082575A CN2010105858665A CN201010585866A CN102082575A CN 102082575 A CN102082575 A CN 102082575A CN 2010105858665 A CN2010105858665 A CN 2010105858665A CN 201010585866 A CN201010585866 A CN 201010585866A CN 102082575 A CN102082575 A CN 102082575A
Authority
CN
China
Prior art keywords
piecemeal
sliding window
ssw
little
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2010105858665A
Other languages
Chinese (zh)
Inventor
秦志光
王亦德
匡平
高嵘
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
JIANGSU GOWOO INFORMATION TECHNOLOGY Co Ltd
Original Assignee
JIANGSU GOWOO INFORMATION TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by JIANGSU GOWOO INFORMATION TECHNOLOGY Co Ltd filed Critical JIANGSU GOWOO INFORMATION TECHNOLOGY Co Ltd
Priority to CN2010105858665A priority Critical patent/CN102082575A/en
Publication of CN102082575A publication Critical patent/CN102082575A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a method for removing repeated data based on a pre-blocking and a sliding window, which comprises the following steps of: pre-blocking a data object DO into small blocks MC which are not mutually overlapped; then by using the small blocks MC as units, detecting the continuous new blocks MC by using the sliding window and fusing the new blocks MC to a large block SC; and simultaneously reserving the small blocks MC in a new and old data joining part. On the basis, the method adopts different blocking strategies so that under the condition of larger expected values of blocking sizes, higher compression ratio can be obtained and element data expenses can be reduced.

Description

Replicated data eliminating method based on pre-piecemeal and sliding window
Technical field
The present invention relates to the application that repeating data is eliminated, be specifically related to a kind of replicated data eliminating method based on pre-piecemeal and sliding window.
Background technology
For the memory space expense that reduces data and the bandwidth occupancy when reducing the remote data transmission, need compress data usually.Traditional data compression method utilizes data object (Data Object, below being abbreviated as DO) information redundancy of self in-line coding compresses, this compression method is not considered the old version of the DO to be compressed that other DO(in DO to be compressed and the system has for example stored in the system) between relation, so compression ratio (Compression Ratio that obtains, size after CR=DO original size/DO compression) more limited usually, about average out to 2:1, and CR is subjected to the influence of data object coded format very big, for example to binary file and audio/video file, the effect of conventional compression method is very limited.
In a lot of applied environments, a lot of identical data are arranged between the different DO in the system, for example in the standby system between the different backup versions of identical file, between the different released versions of same software, between the mail of mass-sending in the mailing system or the like.The repeating data technology for eliminating utilizes the information redundancy between these DO that data are compressed, and can obtain the CR far above the conventional compression method, and is subjected to the influence of data encoding format very little.The repeating data technology for eliminating uses fixed size piecemeal (Fixed Size Chunking, FSC), content-based elongated piecemeal (Content Defined Chunking, CDC), sliding window piecemeal (Sliding Window Chunking, SWC) etc. method is divided into piecemeal (Chunk) continuous, that do not overlap mutually with DO, calculate the unique identifier (ChunkID) of the cryptographic Hash of data among each Chunk, and it is deposited into a Hash table (HT as this Chunk ID) in.DO is that unit stores with Chunk, when writing a new DO, removes to mate HT with the ChunkID of each Chunk of forming this DO IDIn record (be called piecemeal existence inquiry, Chunk Existence Query, CEQ), to the HT that has stored in the system ID(be HT IDIn have the ChunkID match) repeated storage no longer, only store new Chunk and ChunkID thereof.As previously mentioned, owing to may have a lot of common data between the DO, so this method can reduce the physical store amount of data greatly.Because the data that repeat do not need to write again, can also reduce the network data transmission amount greatly when teletransmission DO.
FSC determines the border of Chunk with the absolute offset values of distance D O head, and its advantage is that speed is very fast, and the Chunk size is consistent, is convenient to storage device processes.But its fatal shortcoming is that all Chunk after the operating point can be affected to newly-increased very responsive with deletion action.Original CDC(OriginalCDC) moves by Byte in DO with a window (size is 12 Bytes~48 Bytes usually), (for example window Rabin fingerprint value and the preset value D result that carries out modular arithmetic equals-1 to seek certain recurrent pattern, the desired value of Chunk size is by the decision of D value, and this pattern is called Marker) as the Chunk border.Because the border of Chunk is to be determined by the relative position between the Marker, therefore to newly-increased insensitive with deletion action, the Chunk that only has the operating point place usually is affected, and can obtain the CR far above FSC.The shortcoming of OriginalCDC mainly is that the fluctuation of Chunk size is bigger.BaseCDC has introduced Chunk size lower limit C MinWith Chunk maxsize C Max, reduced the big minor swing of Chunk, but C MaxIntroducing can produce hard piecemeal (Chunk H).Chunk HThe border be based on that absolute offset values divides, therefore have identical shortcoming with FSC, should avoid Chunk as far as possible HGeneration.TTTD(Two Threshold Two Divisor) except setting C Min, C MaxOutside main modulus D, also preset a standby modulus D ', therefore D '<D has bigger chance to find the Marker(that meets D ' boundary condition to be designated as D '-Marker, the Marker that meets the D boundary condition correspondingly is designated as D-Marker), if reach C in current C hunk size MaxThe time also do not run into D-Marker, if D '-Marker is then arranged in this scope, just with it as the Chunk border.TTTD has reduced Chunk when reducing the big minor swing of Chunk HQuantity.DO is in certain position behind the newly-increased or deleted data, can make this position Chunk afterwards change with respect to the side-play amount of DO head, SWC is by window (the Sliding Window that size is K Bytes that slides by Byte, SW) find out the variation of these side-play amounts, and determine the Chunk border on this basis.SWC can obtain higher CR with respect to CDC, and the Chunk that obtains overwhelming majority size all equals K Bytes, and it is very little to fluctuate.All there is the contradiction between common problem a: CR and the metadata expense in above several method, promptly more little Chunk size desired value, can obtain high more CR, but this can increase the total quantity of Chunk again, thereby significantly increase the metadata expense of Chunk index and management.
Generally speaking, two big parts are all arranged based on the replicated data eliminating method of piecemeal, the one, DO is divided into this process of Chunk(that does not overlap mutually is called Chunking), the 2nd, by CEQ, detect whether each Chunk is repetition among the DO.For stateless method of partition FSC and CDC, Chunking and CEQ be incoherent, separate, Chunking result is only depended on DO itself, with the current state of system (be HT IDIn stored Chunk situation) irrelevant, therefore in the Chunking process, do not need to carry out CEQ fully, for same DO, at any time it is carried out Chunking, the Chunking result who obtains is always identical.And for the method for partition SWC of state is arranged, Chunking and CEQ are closely related and merge, in the Chunking process, need to carry out a large amount of CEQ, Chunking result is depended on the acting in conjunction of DO itself and system's current state, therefore same DO may be because the difference of system mode obtains different Chunking results.HT IDUsually very huge, whole graftabls, CEQ just may relate to the magnetic disc i/o operation, so expense is bigger, in network application environment, and HT IDNormally be kept on the long-range meta data server, this has just more aggravated this problem.SWC is in the Chunking process, each position at the SW place, all to carry out CEQ, and the CEQ return value is under the situation of False, in order to determine the side-play amount on Chunk border, SW moves by Byte, though can effectively improve CR like this, but increased the quantity of CEQ widely, so the time overhead of SWC is very big.This shows that the subject matter that the state method of partition is arranged is the contradiction between CR and the CEQ quantity.
Summary of the invention
Goal of the invention: the present invention has proposed a kind of replicated data eliminating method based on pre-piecemeal and sliding window (hereinafter to be referred as CDSWC) in order to solve contradiction between existing existing CR of method and the metadata expense and the contradiction between CR and the CEQ quantity.
Technical scheme: in order to realize above purpose, a kind of replicated data eliminating method of the present invention based on pre-piecemeal and sliding window, the concrete steps of this method are as follows;
(1) data object DO is carried out pre-piecemeal, it is divided into the little piecemeal MC that does not overlap mutually,
(2) be unit with little piecemeal MC again, use the continuous new little piecemeal MC of sliding window method detection and it is fused to big piecemeal SC; Divide in new legacy data interface simultaneously and keep little piecemeal MC.
Adopt elongated method of partition CDC to carry out piecemeal to data object DO in the described step (1) based on the data object content.
Described sliding window method is:
(a) set sliding window SSW; Set sliding window SSW and form by X little piecemeal MC, and from the head of the data object DO described sliding window SSW that begins to slide;
(b) remaining little piecemeal RMC number L and the X of being untreated among the data object DO compared;
If L=X calculates the SHA-1 cryptographic Hash of data among the sliding window SSW, and sliding window SSW is carried out the piecemeal existence inquire about CEQ; If piecemeal existence inquiry CEQ query display result is true, keep the border of the little piecemeal RMC that is untreated, X little piecemeal MC of output preserves new little piecemeal MC;
If L<X, if sliding window SSW is carried out piecemeal existence inquiry CEQ, query display result is true, the little piecemeal RMC that will be untreated is output as a current R MC, each little piecemeal MC is carried out preserving behind the piecemeal existence inquiry CEQ, and X the little piecemeal MC that will form sliding window SSW merges into the big piecemeal SC output of a repetition, judges inquiry again after then sliding window SSW being slided backward the distance of X little piecemeal MC;
If sliding window SSW is carried out piecemeal existence inquiry CEQ, query display result is false, and the distance back that then sliding window SSW is slided backward a little piecemeal MC is being advanced relatively to judge to L and X.
(c) except (b) described situation, all the data in the described sliding window SSW are kept original border under all the other situations, be output as several little piecemeal MC.
Replicated data eliminating method based on pre-piecemeal and sliding window of the present invention has adopted following two criterion: ⅰ when merging little piecemeal MC. often together the continuous data of appearance be divided into big piecemeal SC; ⅱ. the intersection at new data and legacy data adopts little piecemeal MC.In a lot of applied environments, have an important relationship characteristic between the continuous version of data object DO usually: the size of whole relatively data object DO, the overwhelming majority of data object DO is changed, and often concentrates in the less relatively zone.For example, in a lot of file system, the rare variation of most of file, often the file that changes only accounts for the fraction of whole file set, therefore in continuous a plurality of Backup Images of file system, the data that change concentrate in the less zone of Backup Images usually.Therefore much drop in the continuous data data object DO version afterwards outside the data variation zone, long and also usually can repeat.Because these long continuous datas are not in the data variation zone, therefore,, these data can not produce excessive border expense even being divided into big piecemeal SC preservation yet.Here the border expense is meant owing to divide deviation between block boundary and the actual new data border, and new minute block size that causes and the difference between the actual new data size.Generally speaking, the size of new data place piecemeal is more little, and the border expense is just more little.Intersection at new legacy data adopts little piecemeal MC, also can reach the purpose that reduces border expense, increasing compression ratio.
Beneficial effect: the present invention compared with prior art has the following advantages:
The present invention is owing to adopted different partition strategies in the data movement zone with non-variable domain, therefore under bigger piecemeal desired value situation, still can obtain compression ratio preferably, and because sliding window SSW is that unit slides with little piecemeal MC, therefore can reduce CEQ quantity significantly, thereby reduce time overhead.
Description of drawings
Fig. 1 is the schematic diagram that new legacy data intersection adopts little piecemeal among the present invention.
Fig. 2 is an example of the present invention schematic diagram.
Embodiment
Below in conjunction with specific embodiment, further illustrate the present invention, should understand these embodiment only is used to the present invention is described and is not used in and limit the scope of the invention, after having read the present invention, those skilled in the art all fall within the application's claims institute restricted portion to the modification of the various equivalent form of values of the present invention.
The main process of the CDSWC method that the present invention proposes is as follows.
ⅰ. use elongated method of partition CDC(such as TTTD based on the data object content) data object DO is carried out pre-piecemeal (Pre-Chunking), data object DO is divided into the little piecemeal MC that does not overlap mutually, the border of writing down each little piecemeal MC.
ⅱ. initialization flag, set sliding window SSW, form by X little piecemeal MC, and begin the SSW that slides from the head of DO.
ⅲ. whether remaining untreated little piecemeal MC number L is less than X, if then forward step ⅷ among the judgment data object DO.
ⅳ. judge the little piecemeal RMC(Residue Mini Chunk that is untreated that sliding window SSW slips over, whether amount R RMC) has equaled X, if then calculate the SHA-1 cryptographic Hash of data among the SSW, and sliding window SSW is carried out the piecemeal existence inquire about CEQ (SSW): if CEQ (SSW)=True forwards step ⅴ to; If CEQ (SSW)=False then forwards step ⅵ to.If R less than X, then forwards step ⅶ to.
ⅴ. the little piecemeal RMC that will be untreated keeps original border, be output as X MC, each little piecemeal MC is carried out preserving (having only new little piecemeal MC just to preserve) behind the piecemeal existence inquiry CEQ, and X the little piecemeal MC that will form sliding window SSW merges into the big piecemeal SC output of a repetition, IsPreDupSC sign (whether the last Chunk that is used to identify RMC is the SC of repetition) is set to True, then sliding window SSW is slided backward the distance of X little piecemeal MC, forward step ⅲ to.
ⅵ. check the IsPreDupSC sign: if IsPreDupSC=True, the little piecemeal RMC that will be untreated is output as X little piecemeal MC, and each little piecemeal MC is carried out preserving behind the piecemeal existence inquiry CEQ; If IsPreDupSC=False merges into a new big piecemeal SC output and a preservation with each little piecemeal RMC.The IsPreDupSC sign is set to False then, and with the distance that sliding window SSW slides backward a little piecemeal MC, forwards step ⅲ to.
ⅶ. sliding window SSW is carried out piecemeal existence inquiry CEQ (SSW): if CEQ (SSW)=True, IsPreDupSC is set is masked as True, RMC is output as R little piecemeal MC(as R〉0 the time), each little piecemeal MC is carried out preserving behind the piecemeal existence inquiry CEQ, and X the little piecemeal MC that will form sliding window SSW merges into the big piecemeal SC output of a repetition, forwards step ⅲ to after then sliding window SSW being slided backward the distance of X little piecemeal MC; If piecemeal existence inquiry CEQ(SSW)=and False, forward step ⅲ to after then sliding window SSW being slided backward the distance of a little piecemeal MC.
If ⅷ. L〉0, then to the individual little piecemeal MC(Last Mini Chunk of the L at data object DO end, LMC) carrying out the piecemeal existence inquires about CEQ (LMC) and checks the IsPreDupSC sign: only when CEQ (LMC)=False and IsPreDupSC=True, LMC is output as L little piecemeal MC, each little piecemeal MC is carried out piecemeal existence inquiry CEQ (MC) back preserve; Otherwise LMC is merged into a big piecemeal SC output,, then preserve if this big piecemeal SC is new.Finish computing then.
Fig. 2 has provided the example of a CDSWC method, and the border that obtains MC behind the Pre-Chunking as shown in phantom in FIG..Set X=3, begin the SSW that slides from the head of DO, when SSW is positioned at the A position, CEQ (SSWA)=True merges into SC1 output (SC1 is the SC of repetition) with MCa, MCb and MCc, and IsPreDupSC=True is set, then SSW is slided backward the distance of 3 MC, arrive the B position.CEQ (SSWA)=False slides backward the distance of 1 MC with SSW, and in C position and D position, CEQ (SSW) is False.When SSW slides into the E position, R=3, CEQ (SSWE)=False, IsPreDupSC=True, be that current RMC(is made up of MCd, MCe and MCf) in data to do as a whole be that CEQ (SSWB)=False before the new SC(has guaranteed this point), and its last Chunk is the SC of repetition, illustrates that this RMC is the boundary part of new legacy data.Therefore it is output as MC2, MC3 and MC4, these 3 MC is carried out CEQ(MC) back preservation (these 3 MC of hypothesis are new among the figure), IsPreDupSC=False is set then, and SSW is slided backward carrying out apart from continuing of a MC.When SSW slides into the H position, R=3, CEQ (SSWH)=False, IsPreDupSC=False, therefore current RMC(is made up of MCg, MCh and MCi) merge into SC5 and export (SC5 is new SC), IsPreDupSC=False is set, and SSW is slided backward the distance continuation execution of a MC.In like manner, MCj, MCk and MCl to be merged into SC6(SC6 be new SC).When SSW slides into position M, R=2, CEQ (SSWM)=True, therefore current RMC(is made up of MCm and MCn) be output as MC7 and MC8, and MCo, MCp and MCq merged into SC9 output (SC9 is the SC of repetition), IsPreDupSC=True is set then, and the distance that SSW is slided backward 3 MC arrives the N position.In the N position, L=2<3, CEQ(LMC)=and True, therefore MCr and MCs are merged into SC10 output (SC10 is the SC of repetition), computing finishes.

Claims (3)

1. replicated data eliminating method based on pre-piecemeal and sliding window, it is characterized in that: the concrete steps of this method are as follows;
(1) data object DO is carried out pre-piecemeal, it is divided into the little piecemeal MC that does not overlap mutually,
(2) be unit with little piecemeal MC again, use the continuous new little piecemeal MC of sliding window method detection and it is fused to big piecemeal SC; Divide in new legacy data interface simultaneously and keep little piecemeal MC.
2. the replicated data eliminating method based on pre-piecemeal and sliding window according to claim 1 is characterized in that: adopt the elongated method of partition CDC based on the data object content to carry out piecemeal to data object DO in the described step (1).
3. the replicated data eliminating method based on pre-piecemeal and sliding window according to claim 1 is characterized in that: described sliding window method is:
(a) set sliding window SSW; Set sliding window SSW and form by X little piecemeal MC, and from the head of the data object DO described sliding window SSW that begins to slide,
(b) remaining little piecemeal RMC number L and the X of being untreated among the data object DO compared;
If L=X calculates the SHA-1 cryptographic Hash of data among the sliding window SSW, and sliding window SSW is carried out the piecemeal existence inquire about CEQ; If piecemeal existence inquiry CEQ query display result is true, keep the border of the little piecemeal RMC that is untreated, X little piecemeal MC of output preserves new little piecemeal MC;
If L<X, if sliding window SSW is carried out piecemeal existence inquiry CEQ, query display result is true, the little piecemeal RMC that will be untreated is output as a current R MC, each little piecemeal MC is carried out preserving behind the piecemeal existence inquiry CEQ, and X the little piecemeal MC that will form sliding window SSW merges into the big piecemeal SC output of a repetition, judges inquiry again after then sliding window SSW being slided backward the distance of X little piecemeal MC;
If sliding window SSW is carried out piecemeal existence inquiry CEQ, query display result is false, and the distance back that then sliding window SSW is slided backward a little piecemeal MC is being advanced relatively to judge to L and X;
(c) except (b) described situation, all the data in the described sliding window SSW are kept original border under all the other situations, be output as several little piecemeal MC.
CN2010105858665A 2010-12-14 2010-12-14 Method for removing repeated data based on pre-blocking and sliding window Pending CN102082575A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2010105858665A CN102082575A (en) 2010-12-14 2010-12-14 Method for removing repeated data based on pre-blocking and sliding window

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2010105858665A CN102082575A (en) 2010-12-14 2010-12-14 Method for removing repeated data based on pre-blocking and sliding window

Publications (1)

Publication Number Publication Date
CN102082575A true CN102082575A (en) 2011-06-01

Family

ID=44088340

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2010105858665A Pending CN102082575A (en) 2010-12-14 2010-12-14 Method for removing repeated data based on pre-blocking and sliding window

Country Status (1)

Country Link
CN (1) CN102082575A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102253820A (en) * 2011-06-16 2011-11-23 华中科技大学 Stream type repetitive data detection method
CN104813310A (en) * 2012-09-05 2015-07-29 印度理工学院卡哈拉格普尔分校 Multi-level inline data deduplication
WO2015131492A1 (en) * 2014-03-05 2015-09-11 中兴通讯股份有限公司 File chunking method, system and file processing system
CN105446964A (en) * 2014-05-30 2016-03-30 国际商业机器公司 File repeat data delete method and device
CN106095971A (en) * 2014-02-14 2016-11-09 华为技术有限公司 A kind of method based on whois lookup data flow point cutpoint and server
CN107911862A (en) * 2017-11-06 2018-04-13 烟台慧彦网络科技有限公司 The time division multiple acess supersonic sounding cluster communication method of algorithm is collected based on pre- piecemeal
CN108092938A (en) * 2016-11-23 2018-05-29 中移(杭州)信息技术有限公司 Authentication method based on fingerprint, first server and terminal based on finger print identifying
CN109582640A (en) * 2018-11-15 2019-04-05 深圳市酷开网络科技有限公司 A kind of data deduplication storage method, device and storage medium based on sliding window
US10264045B2 (en) 2014-02-14 2019-04-16 Huawei Technologies Co., Ltd. Method and server for searching for data stream dividing point based on server

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6037883A (en) * 1998-05-07 2000-03-14 Microsoft Corporation Efficient memory usage for two-pass compression
CN101706825A (en) * 2009-12-10 2010-05-12 华中科技大学 Replicated data deleting method based on file content types
CN101827002A (en) * 2010-05-27 2010-09-08 文益民 Concept drift detection method of data flow classification

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6037883A (en) * 1998-05-07 2000-03-14 Microsoft Corporation Efficient memory usage for two-pass compression
CN101706825A (en) * 2009-12-10 2010-05-12 华中科技大学 Replicated data deleting method based on file content types
CN101827002A (en) * 2010-05-27 2010-09-08 文益民 Concept drift detection method of data flow classification

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102253820B (en) * 2011-06-16 2013-03-20 华中科技大学 Stream type repetitive data detection method
CN102253820A (en) * 2011-06-16 2011-11-23 华中科技大学 Stream type repetitive data detection method
CN104813310A (en) * 2012-09-05 2015-07-29 印度理工学院卡哈拉格普尔分校 Multi-level inline data deduplication
US10264045B2 (en) 2014-02-14 2019-04-16 Huawei Technologies Co., Ltd. Method and server for searching for data stream dividing point based on server
CN106095971A (en) * 2014-02-14 2016-11-09 华为技术有限公司 A kind of method based on whois lookup data flow point cutpoint and server
US10542062B2 (en) 2014-02-14 2020-01-21 Huawei Technologies Co., Ltd. Method and server for searching for data stream dividing point based on server
CN106095971B (en) * 2014-02-14 2019-08-13 华为技术有限公司 A kind of method and server for searching data flow cut-point based on server
WO2015131492A1 (en) * 2014-03-05 2015-09-11 中兴通讯股份有限公司 File chunking method, system and file processing system
CN105446964A (en) * 2014-05-30 2016-03-30 国际商业机器公司 File repeat data delete method and device
CN105446964B (en) * 2014-05-30 2019-04-26 国际商业机器公司 The method and device of data de-duplication for file
US10769112B2 (en) 2014-05-30 2020-09-08 International Business Machines Corporation Deduplication of file
CN108092938A (en) * 2016-11-23 2018-05-29 中移(杭州)信息技术有限公司 Authentication method based on fingerprint, first server and terminal based on finger print identifying
CN108092938B (en) * 2016-11-23 2021-12-07 中移(杭州)信息技术有限公司 Fingerprint-based authentication method, fingerprint-based first server and terminal
CN107911862A (en) * 2017-11-06 2018-04-13 烟台慧彦网络科技有限公司 The time division multiple acess supersonic sounding cluster communication method of algorithm is collected based on pre- piecemeal
CN109582640A (en) * 2018-11-15 2019-04-05 深圳市酷开网络科技有限公司 A kind of data deduplication storage method, device and storage medium based on sliding window
CN109582640B (en) * 2018-11-15 2020-12-01 深圳市酷开网络科技有限公司 Sliding window-based data deduplication storage method and device and storage medium

Similar Documents

Publication Publication Date Title
CN102082575A (en) Method for removing repeated data based on pre-blocking and sliding window
CN102323958A (en) Data de-duplication method
CN101989929B (en) Disaster recovery data backup method and system
CN106201771B (en) Data-storage system and data read-write method
US20110282845A1 (en) Efficient backup data retrieval
CN108415671B (en) Method and system for deleting repeated data facing green cloud computing
CN103324699A (en) Rapid data de-duplication method adapted to big data application
CN1622087A (en) Managing file system versions
CN101707633B (en) Message-oriented middleware persistent message storing method based on file system
EP3316150A1 (en) Method and apparatus for file compaction in key-value storage system
CN102339321A (en) Network file system with version control and method using same
CN103514258A (en) Centralized recording, preprocessing and replaying method based on offline cache file operation
CN113672170A (en) Redundant data marking and removing method
CN103034566A (en) Method and device for restoring virtual machine
CN106980680B (en) Data storage method and storage device
CN102999433A (en) Redundant data deletion method and system of virtual disks
CN102053880A (en) Rar file carving recovery method based on contents
CN112463077A (en) Data block processing method, device, equipment and storage medium
CN102722450B (en) Storage method for redundancy deletion block device based on location-sensitive hash
CN107368545B (en) A kind of De-weight method and device based on Merkle Tree deformation algorithm
CN107506466B (en) Small file storage method and system
CN105488108A (en) High-speed parallel storage method and apparatus for multiple video files
US11397706B2 (en) System and method for reducing read amplification of archival storage using proactive consolidation
CN116634167B (en) Satellite imaging data storage and extraction method
CN105515586A (en) Rapid delta compression method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20110601