CN102999605A - Method and device for optimizing data placement to reduce data fragments - Google Patents
Method and device for optimizing data placement to reduce data fragments Download PDFInfo
- Publication number
- CN102999605A CN102999605A CN2012104746888A CN201210474688A CN102999605A CN 102999605 A CN102999605 A CN 102999605A CN 2012104746888 A CN2012104746888 A CN 2012104746888A CN 201210474688 A CN201210474688 A CN 201210474688A CN 102999605 A CN102999605 A CN 102999605A
- Authority
- CN
- China
- Prior art keywords
- data
- backed
- repeating
- segment
- locality
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to a method and a device for optimizing data placement to reduce data fragments. The method comprises the following steps of: carrying out data partitioning on each file to be backed up, and determining a data block fingerprint of each data block to be backed up; organizing a plurality of continuous data blocks to be backed up into a data segment to be backed up; searching whether the data block same as that backed up by the backed-up data segment in the system as to each data block to be backed up in the data segment to be backed up, if not, judging the data block to be a non-repeated data block, entering a data reading and writing step, if so, judging the data block to be a repeated data block, and entering the next step; calculating the data redundancy locality of the data segment to be backed up and the backed-up data segment, and quantifying the data redundancy locality, if the value of the data redundancy locality is smaller than a preset threshold, entering the data reading and writing step, or else, entering the next step; and deleting the repeated data block shared by the data segment to be backed up and the backed-up data segment from the data segment to be backed up. According to the method disclosed by the invention, non-sequenced placement of the data and the data fragment are reduced; deterioration of the data fragment is slowed down under the premise of sacrificing a little of data compression ratio; and the reading and writing performance of the system is improved.
Description
Technical field
The invention belongs to the computer information storage technology field, be specifically related to a kind of method and apparatus of placing to reduce the data fragment by optimization data.
Background technology
Data de-duplication is a kind of senior data lossless compress technique, is mainly used in saving storage space required in the information storage and backup system.The ultimate principle that it is realized is that each file is cut into a plurality of continuous data blocks successively, and Single document is deleted with the repeating data piece that occurs between interior or a plurality of files, reduces data space with this.Existing most information storage and backup system all adopts this technology to carry out the optimization of storage space, saves data storage cost and handling cost.
In having used the information storage and backup system of data de-duplication technology (referred to as data deduplication system), mainly exist two class data blocks.One class is the new data block that need to write disk, the another kind of repeating data piece that needs elimination.For new data block, they will sequentially write disk successively; And the repeating data piece of eliminating for needs, they will can not be repeated storage.Therefore, any one file to be backed up, the new data block that it comprises and repeating data piece can not store together, and the deposit position of repeating data piece is to be determined by the backup file of writing in the past these data blocks.This between a plurality of files the mechanism of elimination of duplicate data piece broken in the standby system in the past all data block sequential storage rule together with a backup file, cause the data block of a backup file can leave a plurality of different positions in, produce a plurality of data fragments.
The data de-duplication method of existing information storage and backup system is mainly paid close attention to the throughput that how to promote data compression rate and data de-duplication, do not consider because the deletion of repeating data piece can cause the non-order placement of data block and attract a lot of data fragments, and these data fragments can have a strong impact on the readwrite performance of data, cause the hydraulic performance decline of information storage and backup system.
Summary of the invention
Technical matters to be solved by this invention is exactly to reduce non-order placement and the data fragment of data, the deterioration of alleviation data fragment under the prerequisite of sacrificing few data compression rate, the readwrite performance of elevator system.
Solve the problems of the technologies described above, the invention provides a kind of method of placing to reduce the data fragment by optimization data, it may further comprise the steps:
Step 1 is carried out deblocking to each file to be backed up, and each data block to be backed up is asked for the data block fingerprint;
Step 2 is made into data segment to be backed up with a plurality of continuous data chunk to be backed up;
Step 4 is calculated data segment to be backed up and the data redundancy locality of Backup Data section, with data redundancy locality quantification, if the value of this data redundancy locality enters step 6, otherwise enters step 5 less than predetermined threshold value;
Step 5, the repeating data piece that deletion data segment to be backed up and Backup Data section are shared from data segment to be backed up;
Step 6, data block successively order write disk.
The invention provides a kind of device of placing to reduce the data fragment by optimization data, it comprises:
Deblocking and fingerprint computing unit carry out deblocking for the file to be backed up that each is passed to storage server, and obtaining the average data block size is quantitative data block to be backed up, and each data block to be backed up is asked for the data block fingerprint;
The data segment organization unit is used for a plurality of continuous data chunk to be backed up are made into data segment to be backed up;
Repeating data piece query unit is used for searching the data segment that had backed up and whether has the data block identical with data segment to be backed up, if do not have, then be non-repeating data piece, change date read-write cell over to, if having, be the repeating data piece then, change repeating data piece screening unit over to;
Repeating data piece screening unit, be used for calculating the Backup Data section at these repeating data piece places and the data redundancy locality between the data segment to be backed up, with data redundancy locality quantification, if the value of this data redundancy locality is less than predetermined threshold value, change date read-write cell over to, otherwise change the data erase unit over to;
The data erase unit is used for the repeating data piece that deletion is confirmed by repeating data piece screening unit;
Date read-write cell, repeating data piece and other non-repeating data pieces of being used for needs are kept write disk together.
The repeating data piece search with delete procedure in, the present invention keeps the repeating data piece less than predetermined redundant locality threshold value, and they and non-repeating data piece are sequentially stored together, so the present invention can reduce the data fragment that generates.
Compare with existing data de-duplication method, the present invention has advantages of as follows:
1, by reserve part repeating data piece, these data blocks and non-repeating data piece are sequentially stored together, can reduce the data amount of debris that produces;
2, flock together by the data block that will more belong to same file, reduce the data amount of debris, can greatly strengthen the redundant locality of data;
3, the raising of data redundancy locality not only can improve throughput and the data write performance of data de-duplication, also can improve data and read performance;
4, by representing quantitatively the data redundancy locality, and the redundant locality of data arranged the repeating data amount that threshold value keeps with control, can under the prerequisite of sacrificing less data compression rate, reduce a large amount of data fragments, obtain preferably reading and writing data performance.If the data redundancy locality threshold value that arranges is larger, the repeating data amount of reservation is more, and the compressibility of sacrifice is just larger; Otherwise if threshold value is less, the repeating data amount of reservation is less, and the compressibility of sacrifice is also less.
In sum, the present invention sacrifices less data compression rate by keeping small part repeating data piece, the data block that more belongs to same file sequentially can be stored together, greatly reduce the data amount of debris that generates, strengthen the data redundancy locality, improve the readwrite performance of data.
Description of drawings
Description of drawings of the present invention is as follows:
Fig. 1 is the process flow diagram of placing to reduce the method for data fragment by optimization data of the present invention;
Fig. 2 is the structural representation of placing to reduce the device of data fragment by optimization data of the present invention.
Embodiment
The invention will be further described below in conjunction with drawings and Examples:
The main body that the present invention relates to is backup server and storage server, and backup server provides the data that need backup, and storage server is then stored the data that will back up.Searching and delete in storage server of repeating data carried out.
Fig. 1 is the process flow diagram of placing to reduce the method for data fragment by optimization data of the present invention; This flow process starts from S101.
In step S102, each file to be backed up is carried out deblocking, carry out deblocking as adopting the elongated algorithm of data block, obtaining the average data block size is quantitative data block to be backed up, is the data block of 8KB such as data volume; And each data block to be backed up asked for the data block fingerprint, and the algorithm of data block fingerprint can adopt the SHA-1 hash algorithm to calculate the cryptographic hash of each data block, and the cryptographic hash that obtains is called as the data block fingerprint.The data block fingerprint can be used for each data block of unique representative, and any two data blocks with identical fingerprints are considered to identical data block.
In step S103, a plurality of continuous data chunk to be backed up are made into data segment to be backed up, for example each data segment has 256 data blocks.
In step S104, search whether there be the data block identical with data segment to be backed up in the data segment that had backed up, these identical data blocks are the repeating data piece, if having the repeating data piece, enter step S105; If there is not the repeating data piece, then enter step S107.
In step S105, for each repeating data piece, add up the Backup Data section at its place and the repeating data amount that data segment to be backed up is shared, and with the size of this repeating data amount divided by data segment to be backed up, the value that calculates like this is the data redundancy locality of quantificational expression, if data redundancy locality quantized value enters step S107 less than predetermined threshold value; Otherwise, if data redundancy locality quantized value enters step S106 greater than predetermined threshold value.Wherein threshold value is predetermined data redundancy locality threshold value, can control the repeating data amount of reservation by this threshold value.If threshold value is larger, the repeating data that then keeps is more, and the compressibility of sacrifice is larger, and the data redundancy locality of keeping is also stronger; Otherwise if threshold value is less, the repeating data of reservation is less, and the compressibility of sacrifice is less, the data redundancy locality of keeping also a little less than.This threshold value is used for doing a balance between the compressibility of sacrificing and the data redundancy locality kept.
In step S106, these repeating data pieces of deletion from data segment to be backed up, flow process finishes.
In step S107, preserve successively these data blocks, flow process finishes.
Fig. 2 is the structural representation of placing to reduce the device of data fragment by optimization data of the present invention.1 expression deblocking and fingerprint computing unit, 2 expression data segment organization unit, 3 expression repeating data piece query unit, 4 expression repeating data piece screening unit, 5 expression data erase unit, 6 expression date read-write cell.
Deblocking and fingerprint computing unit 1 carry out deblocking for the file to be backed up that each is passed to storage server, and obtaining the average data block size is quantitative data block to be backed up, and each data block to be backed up is asked for the data block fingerprint;
Data segment organization unit 2 is used for a plurality of continuous data chunk to be backed up are made into data segment to be backed up;
Repeating data piece query unit 3 is used for searching the data segment that had backed up and whether has the data block identical with data segment to be backed up, if do not have, then be non-repeating data piece, change date read-write cell 6 over to, if having, be the repeating data piece then, change repeating data piece screening unit 4 over to;
Repeating data piece screening unit 4, be used for calculating the Backup Data section at these repeating data piece places and the data redundancy locality between the data segment to be backed up, with data redundancy locality quantification, if the value of this data redundancy locality is less than predetermined threshold value, change date read-write cell 6 over to, otherwise change data erase unit 5 over to;
Data erase unit 5; Be used for the repeating data piece that deletion is confirmed by repeating data piece screening unit;
Date read-write cell 6, repeating data piece and other non-repeating data pieces of being used for needs are kept write disk together.
Advantage of the present invention is, reduced non-order placement and the data fragment of data, alleviates the deterioration of data fragment under the prerequisite of sacrificing few data compression rate, promoted the readwrite performance of system.
Claims (4)
1. method of placing to reduce the data fragment by optimization data is characterized in that: may further comprise the steps:
Step 1 is carried out deblocking to each file to be backed up, and each data block to be backed up is asked for the data block fingerprint;
Step 2 is made into data segment to be backed up with a plurality of continuous data chunk to be backed up;
Step 3, for each data block to be backed up in the data segment to be backed up, whether have Backup Data section backed up identical data block, if do not have, then be non-repeating data piece if searching in system, enter step 6, if having, is the repeating data piece then, enters step 4;
Step 4 is calculated data segment to be backed up and the data redundancy locality of Backup Data section, with data redundancy locality quantification, if the value of this data redundancy locality enters step 6, otherwise enters step 5 less than predetermined threshold value;
Step 5, the repeating data piece that deletion data segment to be backed up and Backup Data section are shared from data segment to be backed up;
Step 6, data block successively order write disk.
2. the method for placing to reduce the data fragment by optimization data according to claim 1, it is characterized in that: the quantification of the data redundancy locality in step 4 is, the repeating data amount that the Backup Data section at statistics repeating data piece place and data segment to be backed up are shared, and with the size of this repeating data amount divided by data segment to be backed up.
3. the method for placing to reduce the data fragment by optimization data according to claim 1 is characterized in that: the threshold value in step 4 is predetermined data redundancy locality threshold value, and this threshold value control writes the repeating data amount of disk.
4. device of placing to reduce the data fragment by optimization data is characterized in that comprising:
Deblocking and fingerprint computing unit (1) carry out deblocking for the file to be backed up that each is passed to storage server, and obtaining the average data block size is quantitative data block to be backed up, and each data block to be backed up is asked for the data block fingerprint;
Data segment organization unit (2) is used for a plurality of continuous data chunk to be backed up are made into data segment to be backed up;
Repeating data piece query unit (3), be used for searching the data segment that had backed up and whether have the data block identical with data segment to be backed up, if do not have, it then is non-repeating data piece, change date read-write cell (6) over to, if have, be the repeating data piece then, change repeating data piece screening unit (4) over to;
Repeating data piece screening unit (4), be used for calculating the Backup Data section at these repeating data piece places and the data redundancy locality between the data segment to be backed up, with data redundancy locality quantification, if the value of this data redundancy locality is less than predetermined threshold value, change date read-write cell (6) over to, otherwise change data erase unit (5) over to;
Data erase unit (5) is used for the repeating data piece that deletion is confirmed by repeating data piece screening unit;
Date read-write cell (6), repeating data piece and other non-repeating data pieces of being used for needs are kept write disk together.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2012104746888A CN102999605A (en) | 2012-11-21 | 2012-11-21 | Method and device for optimizing data placement to reduce data fragments |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2012104746888A CN102999605A (en) | 2012-11-21 | 2012-11-21 | Method and device for optimizing data placement to reduce data fragments |
Publications (1)
Publication Number | Publication Date |
---|---|
CN102999605A true CN102999605A (en) | 2013-03-27 |
Family
ID=47928173
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2012104746888A Pending CN102999605A (en) | 2012-11-21 | 2012-11-21 | Method and device for optimizing data placement to reduce data fragments |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102999605A (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103473150A (en) * | 2013-08-28 | 2013-12-25 | 华中科技大学 | Fragment rewriting method for data repetition removing system |
CN103609091A (en) * | 2013-06-24 | 2014-02-26 | 华为技术有限公司 | Method and device for data transmission |
CN103885859A (en) * | 2014-03-12 | 2014-06-25 | 华中科技大学 | Fragment removing method and system based on global statistics |
CN104216890A (en) * | 2013-05-30 | 2014-12-17 | 北京赛科世纪数码科技有限公司 | Method and system for compressing ELF file |
CN105824720A (en) * | 2016-03-10 | 2016-08-03 | 中国人民解放军国防科学技术大学 | Continuous data reading oriented data placement method of deduplication and erasure correcting combined system |
CN105897921A (en) * | 2016-05-27 | 2016-08-24 | 重庆大学 | Data block routing method combining fingerprint sampling and reducing data fragments |
CN105930534A (en) * | 2016-06-20 | 2016-09-07 | 重庆大学 | Method for reducing data fragments on basis of cloud storage service prices |
CN106066818A (en) * | 2016-05-25 | 2016-11-02 | 重庆大学 | A kind of data layout's method improving data de-duplication standby system restorability |
CN106294002A (en) * | 2016-07-26 | 2017-01-04 | 广州杰赛科技股份有限公司 | A kind of cloud backup method and device |
CN107623788A (en) * | 2017-09-22 | 2018-01-23 | 努比亚技术有限公司 | Using the raising method, apparatus and computer-readable recording medium of toggle speed |
CN110442555A (en) * | 2019-07-26 | 2019-11-12 | 华中科技大学 | A kind of method and system of the reduction fragment of selectivity reserved space |
CN111124259A (en) * | 2018-10-31 | 2020-05-08 | 深信服科技股份有限公司 | Data compression method and system based on full flash memory array |
CN112463058A (en) * | 2020-11-27 | 2021-03-09 | 杭州海康威视系统技术有限公司 | Fragmented data sorting method and device and storage node |
CN113632059A (en) * | 2020-03-06 | 2021-11-09 | 华为技术有限公司 | Apparatus and method for eliminating defragmentation in deduplication |
WO2023279833A1 (en) * | 2021-07-08 | 2023-01-12 | 华为技术有限公司 | Data processing method and apparatus |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101582076A (en) * | 2009-06-24 | 2009-11-18 | 浪潮电子信息产业股份有限公司 | Data de-duplication method based on data base |
CN101706825A (en) * | 2009-12-10 | 2010-05-12 | 华中科技大学 | Replicated data deleting method based on file content types |
CN102033924A (en) * | 2010-12-08 | 2011-04-27 | 浪潮(北京)电子信息产业有限公司 | Data storage method and system |
CN102222085A (en) * | 2011-05-17 | 2011-10-19 | 华中科技大学 | Data de-duplication method based on combination of similarity and locality |
CN102385554A (en) * | 2011-10-28 | 2012-03-21 | 华中科技大学 | Method for optimizing duplicated data deletion system |
-
2012
- 2012-11-21 CN CN2012104746888A patent/CN102999605A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101582076A (en) * | 2009-06-24 | 2009-11-18 | 浪潮电子信息产业股份有限公司 | Data de-duplication method based on data base |
CN101706825A (en) * | 2009-12-10 | 2010-05-12 | 华中科技大学 | Replicated data deleting method based on file content types |
CN102033924A (en) * | 2010-12-08 | 2011-04-27 | 浪潮(北京)电子信息产业有限公司 | Data storage method and system |
CN102222085A (en) * | 2011-05-17 | 2011-10-19 | 华中科技大学 | Data de-duplication method based on combination of similarity and locality |
CN102385554A (en) * | 2011-10-28 | 2012-03-21 | 华中科技大学 | Method for optimizing duplicated data deletion system |
Non-Patent Citations (1)
Title |
---|
谭玉娟: "数据备份系统中数据去重技术研究", 《中国博士学位论文全文数据库电子期刊》 * |
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104216890A (en) * | 2013-05-30 | 2014-12-17 | 北京赛科世纪数码科技有限公司 | Method and system for compressing ELF file |
CN103609091A (en) * | 2013-06-24 | 2014-02-26 | 华为技术有限公司 | Method and device for data transmission |
CN103609091B (en) * | 2013-06-24 | 2017-01-11 | 华为技术有限公司 | Method and device for data transmission |
CN103473150A (en) * | 2013-08-28 | 2013-12-25 | 华中科技大学 | Fragment rewriting method for data repetition removing system |
CN103885859B (en) * | 2014-03-12 | 2017-09-26 | 华中科技大学 | It is a kind of to go fragment method and system based on global statistics |
CN103885859A (en) * | 2014-03-12 | 2014-06-25 | 华中科技大学 | Fragment removing method and system based on global statistics |
CN105824720A (en) * | 2016-03-10 | 2016-08-03 | 中国人民解放军国防科学技术大学 | Continuous data reading oriented data placement method of deduplication and erasure correcting combined system |
CN105824720B (en) * | 2016-03-10 | 2018-11-20 | 中国人民解放军国防科学技术大学 | What a kind of data-oriented was continuously read delete again entangles the data placement method for deleting hybrid system |
CN106066818B (en) * | 2016-05-25 | 2019-05-17 | 重庆大学 | A kind of data layout method improving data de-duplication standby system restorability |
CN106066818A (en) * | 2016-05-25 | 2016-11-02 | 重庆大学 | A kind of data layout's method improving data de-duplication standby system restorability |
CN105897921A (en) * | 2016-05-27 | 2016-08-24 | 重庆大学 | Data block routing method combining fingerprint sampling and reducing data fragments |
CN105897921B (en) * | 2016-05-27 | 2019-02-26 | 重庆大学 | A kind of data block method for routing of the sampling of combination fingerprint and reduction fragmentation of data |
CN105930534A (en) * | 2016-06-20 | 2016-09-07 | 重庆大学 | Method for reducing data fragments on basis of cloud storage service prices |
CN106294002A (en) * | 2016-07-26 | 2017-01-04 | 广州杰赛科技股份有限公司 | A kind of cloud backup method and device |
CN107623788A (en) * | 2017-09-22 | 2018-01-23 | 努比亚技术有限公司 | Using the raising method, apparatus and computer-readable recording medium of toggle speed |
CN107623788B (en) * | 2017-09-22 | 2020-10-27 | 海南飞特同创科技有限公司 | Method and device for improving application starting speed and computer readable storage medium |
CN111124259A (en) * | 2018-10-31 | 2020-05-08 | 深信服科技股份有限公司 | Data compression method and system based on full flash memory array |
CN110442555A (en) * | 2019-07-26 | 2019-11-12 | 华中科技大学 | A kind of method and system of the reduction fragment of selectivity reserved space |
CN110442555B (en) * | 2019-07-26 | 2021-08-31 | 华中科技大学 | Method and system for reducing fragments of selective reserved space |
CN113632059A (en) * | 2020-03-06 | 2021-11-09 | 华为技术有限公司 | Apparatus and method for eliminating defragmentation in deduplication |
CN112463058A (en) * | 2020-11-27 | 2021-03-09 | 杭州海康威视系统技术有限公司 | Fragmented data sorting method and device and storage node |
WO2023279833A1 (en) * | 2021-07-08 | 2023-01-12 | 华为技术有限公司 | Data processing method and apparatus |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102999605A (en) | Method and device for optimizing data placement to reduce data fragments | |
US10318181B2 (en) | System, method, and computer program product for increasing spare space in memory to extend a lifetime of the memory | |
US10809928B2 (en) | Efficient data deduplication leveraging sequential chunks or auxiliary databases | |
CN106662981B (en) | Storage device, program, and information processing method | |
US9880746B1 (en) | Method to increase random I/O performance with low memory overheads | |
US8639669B1 (en) | Method and apparatus for determining optimal chunk sizes of a deduplicated storage system | |
US10466932B2 (en) | Cache data placement for compression in data storage systems | |
US8712963B1 (en) | Method and apparatus for content-aware resizing of data chunks for replication | |
US10061693B2 (en) | Method of generating secondary index and apparatus for storing secondary index | |
CN103019887B (en) | Data back up method and device | |
KR20170054299A (en) | Reference block aggregating into a reference set for deduplication in memory management | |
Zou et al. | The dilemma between deduplication and locality: Can both be achieved? | |
CN101916171A (en) | Concurrent hierarchy type replicated data eliminating method and system | |
US20120136842A1 (en) | Partitioning method of data blocks | |
US9471245B1 (en) | Method and apparatus for transferring modified data efficiently | |
CN112559452B (en) | Data deduplication processing method, device, equipment and storage medium | |
WO2018171296A1 (en) | File merging method and controller | |
US9189408B1 (en) | System and method of offline annotation of future accesses for improving performance of backup storage system | |
US10503608B2 (en) | Efficient management of reference blocks used in data deduplication | |
CN111124258B (en) | Data storage method, device and equipment of full flash memory array and readable storage medium | |
CN104050057B (en) | Historical sensed data duplicate removal fragment eliminating method and system | |
US10013346B2 (en) | Method of decreasing write amplification of NAND flash using a journal approach | |
US10282127B2 (en) | Managing data in a storage system | |
Zhang et al. | Improving the performance of deduplication-based backup systems via container utilization based hot fingerprint entry distilling | |
KR101473837B1 (en) | An Invalid Data Recycling Method for Improving I/O Performance in SSD-based Storage System |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20130327 |