CN105389387A - Compression based deduplication performance and deduplication rate improving method and system - Google Patents

Compression based deduplication performance and deduplication rate improving method and system Download PDF

Info

Publication number
CN105389387A
CN105389387A CN201510918539.XA CN201510918539A CN105389387A CN 105389387 A CN105389387 A CN 105389387A CN 201510918539 A CN201510918539 A CN 201510918539A CN 105389387 A CN105389387 A CN 105389387A
Authority
CN
China
Prior art keywords
data
duplication
compression
length
performance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510918539.XA
Other languages
Chinese (zh)
Other versions
CN105389387B (en
Inventor
吴植民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Eisoo Information Technology Co Ltd
Original Assignee
Shanghai Eisoo Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Eisoo Information Technology Co Ltd filed Critical Shanghai Eisoo Information Technology Co Ltd
Priority to CN201510918539.XA priority Critical patent/CN105389387B/en
Publication of CN105389387A publication Critical patent/CN105389387A/en
Application granted granted Critical
Publication of CN105389387B publication Critical patent/CN105389387B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1744Redundancy elimination performed by the file system using compression, e.g. sparse files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1748De-duplication implemented within the file system, e.g. based on file segments

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a compression based deduplication performance and deduplication rate improving method. The method comprises the steps of: S1, acquiring to-be-transmitted data and data length after deduplication; S2, compressing the data after deduplication; S3, replacing the data and the data length after deduplication with the data and the data length after compression; S4, comparing the data length after deduplication with the data length after compression; and S5, adding a difference value after comparison into a numerical value for calculating a deduplication ratio. The compression based deduplication performance and deduplication rate improving method and system disclosed by the present invention can improve the deduplication ratio and performance of deduplication, thereby better reducing the occupied data storage space, network occupied network bandwidth and data protection window time.

Description

The method and system that a kind of data de-duplication performance based on compression and rate of heavily deleting promote
Technical field
The present invention relates to data de-duplication field, particularly relate to a kind of based on compression data de-duplication performance and heavily delete rate promote method and system.
Background technology
Along with the development of computing machine, the valid data existed in computing machine get more and more.The accumulation of mass data brings great challenge to data protection, and in order to solve this difficult problem, a lot of manufacturer proposes the solution of data de-duplication.
Data de-duplication removes by the fingerprint of comparison available data and data with existing the data repeated, thus minimizing storage space takies, reduces the widow time etc. that the network bandwidth took, shortened data protection.Although the technology that each manufacturer realizes data de-duplication is all very similar, then there is very large difference in the effect reached.Wherein heavily deleting ratio, performance is then detect two key points of data de-duplication quality.If heavily deleted than low, can cause that storage space takies, the network bandwidth takies, the minimizing DeGrain of data protection widow time, thus not reach the effect that data de-duplication expects reaching.If data de-duplication degraded performance, data protection widow time will be caused elongated, thus cannot protected data timely, also new challenge is brought to the data protection of big data quantity.
Summary of the invention
The shortcoming of prior art in view of the above, the object of the present invention is to provide the method and system that a kind of data de-duplication performance based on compression and rate of heavily deleting promote, heavily delete than lowly then causing that storage space takies, the network bandwidth takies, the minimizing DeGrain of data protection widow time for solving in prior art, thus do not reach the effect that data de-duplication expects reaching; Data de-duplication degraded performance then will cause data protection widow time elongated, thus cannot the problem of protected data timely.
For achieving the above object and other relevant objects, the invention provides a kind of method that data de-duplication performance based on compression and rate of heavily deleting promote, comprise step: S1, obtain the data after the data de-duplication that is about to send and length; S2, the data after described data de-duplication to be compressed; S3, by compression after data and length replace the data after described data de-duplication and length thereof; S4, the data length after the data length after described data de-duplication and described compression to be contrasted; S5, by contrast after difference add calculating heavily delete than numerical value in.
In one embodiment of the invention, described step S2 also comprises step:
S21, obtain length after maximum compression according to the data length after described data de-duplication;
S22, be used for the data after store compressed according to length allocation memory headroom after described maximum compression;
S23, according to the data length after described data de-duplication the data after described data de-duplication to be compressed and data length after the data obtained after described compression and compression;
S24, by the data copy after described compression to described memory headroom.
In one embodiment of the invention, before described step S1, also comprise step: the repeating data in the data being about to store is carried out to deletion and sent after deletion and store.
In one embodiment of the invention, after described step S5, also comprise step: repeated execution of steps S1 to S5, until need the repeating data in the data stored all to delete and complete to store rear stopping.
Present invention also offers a kind of based on compression data de-duplication performance and heavily delete rate promote system, comprising: data acquisition module, for obtain be about to send data de-duplication after data and length; Data compressing module, for compressing the data after described data de-duplication; Data replacement module, for replacing the data after described data de-duplication and length thereof by the data after compression and length thereof; Data Comparison module, for contrasting the data length after the data length after described data de-duplication and described compression; And by contrast after difference add calculating heavily delete than numerical value in.
In one embodiment of the invention, described data compressing module comprises: length acquiring unit, for obtaining length after maximum compression according to the data length after described data de-duplication; Memory Allocation unit, for being used for the data after store compressed according to length allocation memory headroom after described maximum compression; Data compression unit, for compressing and the data obtained after described compression and the data length after compressing the data after described data de-duplication according to the data length after described data de-duplication; Data copy unit, for by the data copy after described compression to described memory headroom.
In one embodiment of the invention, the system that the described data de-duplication performance based on compression and rate of heavily deleting promote also comprises: data processing module, for carrying out deletion to the repeating data in the data being about to store and send after deletion and store.
As mentioned above; the method and system that data de-duplication performance based on compression of the present invention and rate of heavily deleting promote; there is following beneficial effect: what can promote data de-duplication heavily deletes when performance, thus better reduction data space takies, the network bandwidth takies, data protection widow time.
Accompanying drawing explanation
Fig. 1 is shown as the process blocks schematic diagram in the data de-duplication performance that the present invention is based on compression and an embodiment of heavily deleting the method that rate promotes.
Fig. 2 is shown as the systematic square frame schematic diagram in the data de-duplication performance that the present invention is based on compression and an embodiment of heavily deleting the system that rate promotes.
Embodiment
Below by way of specific instantiation, embodiments of the present invention are described, those skilled in the art the content disclosed by this instructions can understand other advantages of the present invention and effect easily.The present invention can also be implemented or be applied by embodiments different in addition, and the every details in this instructions also can based on different viewpoints and application, carries out various modification or change not deviating under spirit of the present invention.It should be noted that, when not conflicting, the feature in following examples and embodiment can combine mutually.
It should be noted that, the diagram provided in following examples only illustrates basic conception of the present invention in a schematic way, then only the assembly relevant with the present invention is shown in graphic but not component count, shape and size when implementing according to reality is drawn, it is actual when implementing, and the kenel of each assembly, quantity and ratio can be a kind of change arbitrarily, and its assembly layout kenel also may be more complicated.
Refer to Fig. 1 to 2.As shown in Figure 1, Fig. 1 is shown as the process blocks schematic diagram in the data de-duplication performance that the present invention is based on compression and an embodiment of heavily deleting the method that rate promotes.The invention provides a kind of based on compression data de-duplication performance and heavily delete rate promote method, comprise step:
S1, acquisition are about to the data after the data de-duplication sent and length;
S2, the data after described data de-duplication to be compressed; Further, described step S2 also comprises step: S21, obtain length after maximum compression according to the data length after described data de-duplication; S22, be used for the data after store compressed according to length allocation memory headroom after described maximum compression; S23, according to the data length after described data de-duplication the data after described data de-duplication to be compressed and data length after the data obtained after described compression and compression; S24, by the data copy after described compression to described memory headroom.
S3, by compression after data and length replace the data after described data de-duplication and length thereof;
S4, the data length after the data length after described data de-duplication and described compression to be contrasted;
S5, by contrast after difference add calculating heavily delete than numerical value in.
Further, also step is comprised before described step S1: the repeating data in the data being about to store is carried out to deletion and sent after deletion and store.
Further, after described step S5, also comprise step: repeated execution of steps S1 to S5, until need the repeating data in the data stored all to delete and complete to store rear stopping.
Be described for a concrete embodiment below.In the present embodiment, the data deduplication system of employing is AnyBackup6.0 data deduplication system, and operating system uses RedHatEnterpriseLinux5; By the file that the data of needs protection are called after test, setting its data volume is 100GB.The step of method promoted based on data de-duplication performance and the rate of heavily deleting of compression comprises:
1, protected needing the data test of protection by AnyBackup6.0 data deduplication system.
2, obtaining step 1 is by the data after the data de-duplication sent and length thereof.
3, the data after the data de-duplication obtained in step 2 are compressed.The step of compression is as follows:
3.1, according to length after the data length acquisition maximum compression after the data de-duplication obtained in step 2.
3.2, the data after store compressed are used for according to length allocation memory headroom after the maximum compression obtained in step 3.1.
3.3, according to the data length after the data de-duplication obtained in step 2, the data after the data de-duplication obtained in step 2 are compressed and the data obtained after compression and the data length after compressing.
3.4, by memory headroom that the data copy after the compression obtained in step 3.3 is distributed in step 3.1.
4, the data after the data de-duplication sent and length thereof is about in the data after the compression after step 3.4 being performed and length replacement step 2 thereof.
5, the data length after the data de-duplication after being about to the compression sent after the data length after the data de-duplication being about in step 2 send and step 4 being performed contrasts.
6, the difference after step 5 being contrasted add calculating heavily delete than numerical value in.
7, step 1 is repeated until need the data test protected to protect completely in step 6.
This shows; the method that data de-duplication performance based on compression of the present invention and rate of heavily deleting promote; what can promote data de-duplication heavily deletes when performance, thus better reduction data space takies, the network bandwidth takies, data protection widow time.
As shown in Figure 2, Fig. 2 is shown as the systematic square frame schematic diagram in the data de-duplication performance that the present invention is based on compression and an embodiment of heavily deleting the system that rate promotes.
The present invention goes back a kind of data de-duplication performance based on compression of drawings and heavily deletes the system of rate lifting, comprising: data acquisition module, for obtaining the data after the data de-duplication being about to transmission and length; Data compressing module, for compressing the data after described data de-duplication.Data replacement module, for replacing the data after described data de-duplication and length thereof by the data after compression and length thereof; Data Comparison module, for contrasting the data length after the data length after described data de-duplication and described compression; And by contrast after difference add calculating heavily delete than numerical value in.In the preferred embodiment of the present invention, described data compressing module comprises: length acquiring unit, for obtaining length after maximum compression according to the data length after described data de-duplication; Memory Allocation unit, for being used for the data after store compressed according to length allocation memory headroom after described maximum compression; Data compression unit, for compressing and the data obtained after described compression and the data length after compressing the data after described data de-duplication according to the data length after described data de-duplication; Data copy unit, for by the data copy after described compression to described memory headroom.
In addition the system that the described data de-duplication performance based on compression and rate of heavily deleting promote also comprises: data processing module, for carrying out deletion to the repeating data in the data being about to store and send after deletion and store.
In sum; the method and system that data de-duplication performance based on compression of the present invention and rate of heavily deleting promote; what can promote data de-duplication heavily deletes when performance, thus better reduction data space takies, the network bandwidth takies, data protection widow time.So the present invention effectively overcomes various shortcoming of the prior art and tool high industrial utilization.
Above-described embodiment is illustrative principle of the present invention and effect thereof only, but not for limiting the present invention.Any person skilled in the art scholar all without prejudice under spirit of the present invention and category, can modify above-described embodiment or changes.Therefore, such as have in art usually know the knowledgeable do not depart from complete under disclosed spirit and technological thought all equivalence modify or change, must be contained by claim of the present invention.

Claims (7)

1., based on the method that data de-duplication performance and the rate of heavily deleting of compression promote, it is characterized in that, comprise step:
S1, acquisition are about to the data after the data de-duplication sent and length;
S2, the data after described data de-duplication to be compressed;
S3, by compression after data and length replace the data after described data de-duplication and length thereof;
S4, the data length after the data length after described data de-duplication and described compression to be contrasted;
S5, by contrast after difference add calculating heavily delete than numerical value in.
2. according to claim 1 based on compression data de-duplication performance and heavily delete rate promote method, it is characterized in that, described step S2 also comprises step:
S21, obtain length after maximum compression according to the data length after described data de-duplication;
S22, be used for the data after store compressed according to length allocation memory headroom after described maximum compression;
S23, according to the data length after described data de-duplication the data after described data de-duplication to be compressed and data length after the data obtained after described compression and compression;
S24, by the data copy after described compression to described memory headroom.
3. according to claim 1 based on compression data de-duplication performance and heavily delete rate promote method, it is characterized in that, before described step S1, also comprise step: the repeating data in the data being about to store is carried out to deletion and sent after deletion and store.
4. according to claim 1 based on compression data de-duplication performance and heavily delete rate promote method, it is characterized in that, also step is comprised: repeated execution of steps S1 to S5, until need the repeating data in the data stored all to delete and complete to store rear stopping after described step S5.
5., based on the system that data de-duplication performance and the rate of heavily deleting of compression promote, it is characterized in that, comprising:
Data acquisition module, for obtaining the data after the data de-duplication being about to transmission and length;
Data compressing module, for compressing the data after described data de-duplication;
Data replacement module, for replacing the data after described data de-duplication and length thereof by the data after compression and length thereof;
Data Comparison module, for contrasting the data length after the data length after described data de-duplication and described compression; And by contrast after difference add calculating heavily delete than numerical value in.
6. according to claim 5 based on compression data de-duplication performance and heavily delete rate promote system, it is characterized in that, described data compressing module comprises:
Length acquiring unit, for obtaining length after maximum compression according to the data length after described data de-duplication;
Memory Allocation unit, for being used for the data after store compressed according to length allocation memory headroom after described maximum compression;
Data compression unit, for compressing and the data obtained after described compression and the data length after compressing the data after described data de-duplication according to the data length after described data de-duplication;
Data copy unit, for by the data copy after described compression to described memory headroom.
7. according to claim 5 based on compression data de-duplication performance and heavily delete rate promote system, it is characterized in that, described based on compression data de-duplication performance and heavily delete rate promote system also comprise:
Data processing module, for carrying out deletion to the repeating data in the data being about to store and send after deletion and store.
CN201510918539.XA 2015-12-11 2015-12-11 A kind of data de-duplication performance based on compression and the method and system for deleting rate promotion again Active CN105389387B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510918539.XA CN105389387B (en) 2015-12-11 2015-12-11 A kind of data de-duplication performance based on compression and the method and system for deleting rate promotion again

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510918539.XA CN105389387B (en) 2015-12-11 2015-12-11 A kind of data de-duplication performance based on compression and the method and system for deleting rate promotion again

Publications (2)

Publication Number Publication Date
CN105389387A true CN105389387A (en) 2016-03-09
CN105389387B CN105389387B (en) 2018-12-14

Family

ID=55421677

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510918539.XA Active CN105389387B (en) 2015-12-11 2015-12-11 A kind of data de-duplication performance based on compression and the method and system for deleting rate promotion again

Country Status (1)

Country Link
CN (1) CN105389387B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106648469A (en) * 2016-12-29 2017-05-10 华为技术有限公司 Method and device for processing cache data and storage controller
CN109116146A (en) * 2018-07-27 2019-01-01 南京瑞贻电子科技有限公司 A kind of analysis instrument for deleting priceless Value Data with automation
CN109408036A (en) * 2018-09-07 2019-03-01 安徽恒科信息技术有限公司 A kind of agile development platform
WO2023279833A1 (en) * 2021-07-08 2023-01-12 华为技术有限公司 Data processing method and apparatus

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7937371B2 (en) * 2008-03-14 2011-05-03 International Business Machines Corporation Ordering compression and deduplication of data
CN102156703A (en) * 2011-01-24 2011-08-17 南开大学 Low-power consumption high-performance repeating data deleting system
CN102831222A (en) * 2012-08-24 2012-12-19 华中科技大学 Differential compression method based on data de-duplication
CN103020317A (en) * 2013-01-10 2013-04-03 曙光信息产业(北京)有限公司 Device and method for data compression based on data deduplication
CN103152430A (en) * 2013-03-21 2013-06-12 河海大学 Cloud storage method for reducing data-occupied space
CN103177111A (en) * 2013-03-29 2013-06-26 西安理工大学 System and method for deleting repeating data
CN105027122A (en) * 2013-01-02 2015-11-04 甲骨文国际公司 Compression and deduplication layered driver
CN105022788A (en) * 2015-06-19 2015-11-04 江苏新通达电子科技股份有限公司 Lossless compression algorithm for bin file in PNG picture format and full liquid crystal instrument display system

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7937371B2 (en) * 2008-03-14 2011-05-03 International Business Machines Corporation Ordering compression and deduplication of data
CN102156703A (en) * 2011-01-24 2011-08-17 南开大学 Low-power consumption high-performance repeating data deleting system
CN102831222A (en) * 2012-08-24 2012-12-19 华中科技大学 Differential compression method based on data de-duplication
CN105027122A (en) * 2013-01-02 2015-11-04 甲骨文国际公司 Compression and deduplication layered driver
CN103020317A (en) * 2013-01-10 2013-04-03 曙光信息产业(北京)有限公司 Device and method for data compression based on data deduplication
CN103152430A (en) * 2013-03-21 2013-06-12 河海大学 Cloud storage method for reducing data-occupied space
CN103177111A (en) * 2013-03-29 2013-06-26 西安理工大学 System and method for deleting repeating data
CN105022788A (en) * 2015-06-19 2015-11-04 江苏新通达电子科技股份有限公司 Lossless compression algorithm for bin file in PNG picture format and full liquid crystal instrument display system

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106648469A (en) * 2016-12-29 2017-05-10 华为技术有限公司 Method and device for processing cache data and storage controller
WO2018121455A1 (en) * 2016-12-29 2018-07-05 华为技术有限公司 Cached-data processing method and device, and storage controller
CN106648469B (en) * 2016-12-29 2020-01-17 华为技术有限公司 Cache data processing method and device and storage controller
CN109116146A (en) * 2018-07-27 2019-01-01 南京瑞贻电子科技有限公司 A kind of analysis instrument for deleting priceless Value Data with automation
CN109408036A (en) * 2018-09-07 2019-03-01 安徽恒科信息技术有限公司 A kind of agile development platform
WO2023279833A1 (en) * 2021-07-08 2023-01-12 华为技术有限公司 Data processing method and apparatus

Also Published As

Publication number Publication date
CN105389387B (en) 2018-12-14

Similar Documents

Publication Publication Date Title
CN105389387A (en) Compression based deduplication performance and deduplication rate improving method and system
US10996858B2 (en) Method and device for migrating data
RU2626334C2 (en) Method and device for processing data object
CN107832406B (en) Method, device, equipment and storage medium for removing duplicate entries of mass log data
CN104966265B (en) Graphics processing method and apparatus
CN104239518A (en) Repeated data deleting method and device
CN112748863B (en) Method, electronic device and computer program product for processing data
CN110990516A (en) Map data processing method and device and server
CN105096367B (en) Optimize the method and device of Canvas rendering performances
US11231852B2 (en) Efficient sharing of non-volatile memory
US11210821B2 (en) Graphics processing systems
CN104753539A (en) Data compression method and device
CN114880742B (en) Webgl engine-oriented Revit model light-weight method
CN105373452A (en) Data backup method
US9639566B2 (en) Method, apparatus and computer program product for improved storage of key-value pairs
CN104376584A (en) Data compression method, computer system and device
CN116894272B (en) Cloud computing system data processing method based on high-speed encryption technology
US8983916B2 (en) Configurable data generator
CN106682047B (en) A kind of data lead-in method and relevant apparatus
CN110704404A (en) Data quality checking method, device and system
CN104809140A (en) Method and system for counting trading data
CN114253479B (en) CAN bus intrusion detection method and system
CN105335530A (en) Method for improving large data block duplicated data deletion performance
CN105335095A (en) Flash file system processing method and apparatus
CN110990640B (en) Data determination method, device, equipment and computer readable storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant