CN103955530A - Data reconstruction and optimization method of on-line repeating data deletion system - Google Patents

Data reconstruction and optimization method of on-line repeating data deletion system Download PDF

Info

Publication number
CN103955530A
CN103955530A CN201410198679.XA CN201410198679A CN103955530A CN 103955530 A CN103955530 A CN 103955530A CN 201410198679 A CN201410198679 A CN 201410198679A CN 103955530 A CN103955530 A CN 103955530A
Authority
CN
China
Prior art keywords
data
file
duplicate removal
data block
removal bag
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410198679.XA
Other languages
Chinese (zh)
Other versions
CN103955530B (en
Inventor
邓玉辉
岑大慰
黄战
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Hong Kong And Macao Qingchuang Technology Guangzhou Co ltd
Guangzhou Jinan University Science Park Management Co ltd
Original Assignee
Jinan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinan University filed Critical Jinan University
Priority to CN201410198679.XA priority Critical patent/CN103955530B/en
Publication of CN103955530A publication Critical patent/CN103955530A/en
Application granted granted Critical
Publication of CN103955530B publication Critical patent/CN103955530B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1748De-duplication implemented within the file system, e.g. based on file segments
    • G06F16/1752De-duplication implemented within the file system, e.g. based on file segments based on file chunks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a data reconstruction and optimization method of an on-line repeating data deletion system. On the one hand, more data is stored and managed by limited space resources via redundancy detection and deletion of repeating data for data storage; on the other hand, a system model prefetches commonly-used file data blocks to the front end of a data slot in a duplication removing packet by scheduling and rearranging distribution of the data blocks after duplication removing, and aggregates the randomly and discretely distributed data blocks and corresponding fingerprints in the duplication removing packet for storage to shorten seek time on a disk in a file data recovery process, so that data reconstruction performance of the on-line repeating data deletion system is improved, response time of the system is shortened, and the data recovery efficiency is improved.

Description

A kind of data reconstruction optimization method of online data deduplication system
Technical field
The present invention relates to a kind of data reconstruction optimization method of online data deduplication system, the technology that has related in particular visiting frequency based on file and carry out addressing, recovery and the reconstruct of data block in technology that in duplicate removal bag, data block is reset and duplicate removal bag.
Background technology
Along with the development of network and various plateform systems, modern society becomes the ocean of data.Link up document data of order data, study research and the office of the mutual browsing information producing in various identity informations, website of people's life, various ecommerce etc. every day, each computer user is the producer of data, is also the consumer of data.Huge data source need to be faced and process to information handling system every day.In face of mass data, how effectively store and management, in mining data, Useful Information becomes the focus of modernization intellectual technology.Effective storage of data is exactly to utilize same space resources to store more data volume after all.The operation wherein relating to can be a lot, delete but the method that acts on data itself is exactly data compression and redundant data.Duplicate removal and compress technique for data itself are the most direct, are also to use at present the widest research field.
Data de-duplication technology has application and research basis for many years in industry member and academia.From the development of this technology, what model framework was constant is all the comparison of carrying out data, eliminates the data slot repeating, and sets up metadata and safeguards, wherein duplicate removal rate and time efficiency are the emphasis of this technical concerns.The generation of data after from original document to duplicate removal, then be reduced to original document to data, the emphasis difference of concern, data de-duplication technology, beyond itself stores utilization aspect, obtains expansion in various degree.
Make a general survey of data compression and data duplicate removal, no matter which kind of processing means, what need to carry out that the excavation of processing, the information of data be unable to do without is all that the file data after stores processor is recovered.In addition, storage system is just for the preservation of large data, and client needs request access, or system server need to carry out data verification and relatively time, all the file data of system will be recovered from storage medium.So, file reverts to another gordian technique point for data processing.The effectively File Instauration Technique request of responding system rapidly, improves system-computed and the ability of processing large data.
Summary of the invention
The object of the invention is to realize a kind of data reconstruction optimization method of online data deduplication system, that processes passes through data de-duplication packet afterwards to liking, the distribution of data after duplicate removal in duplicate removal bag directly affects the response time of system responses client, by optimizing storage organization, system is the request of access of feedback user more in real time.
Object of the present invention realizes by following technical scheme:
A data reconstruction optimization method for online data deduplication system, comprises the steps:
(1), after online data deduplication system carries out data duplicate removal to original document, generate duplicate removal bag, the request of access of machining system response user to the data based on file-level, recover to realize user's memory access by file, online data deduplication system can within one period that presets length, add up duplicate removal bag in the access times of each file, visiting frequency is classified as to active file collection higher than the file of certain value, visiting frequency is classified as non-common file set lower than the file of this critical value, then execution step (2) operation;
(2), suspend the data access request of data deduplication system, carry out resetting based on the data block of file-level, the document entity in the active file set pair duplicate removal bag that active file filtrator obtains according to step (1) is shunted processing; Processing procedure is: according to putting in order of original document in duplicate removal bag, read one by one the document entity in duplicate removal bag, filename and the file type of the metadata information section of comparison document entity record respective file, if being present in the active file of step (1) generation, concentrates this filename execution step (3) operation;
(3), the unique data block number district of file reading entity, according to data block mapping ruler, the deposit position of the unique data piece that finds each reference numeral in duplicate removal bag, corresponding unique data piece is written in the file that will recover, and last the unique data piece in document entity is also written in the file that will recover, if after step (2) all completes, execution step (4), otherwise continue to return execution step (2);
(4), conventional concentrated file being re-started to data block cutting and fingerprint calculates, and generate new logical data module unit and file and describe metamessage, newly-generated data message is written in new duplicate removal bag to then execution step (5) operation;
(5), unique data the piece corresponding non-common file set in old duplicate removal bag is carried out recover based on the data of file-level, non-common file centralized documentation is appended in new duplicate removal bag, be put into the rear end of data slot in new duplicate removal bag, the old duplicate removal bag of deletion after completing;
(6) it is looking ahead and concentrating of data block based on active file is comprised and file metadata that the data, in newly-generated duplicate removal bag distribute, and data deduplication system recovers the request of response user to data access.
Preferably, in step (2), carrying out based on the prerequisite step of file data rearrangement piece is to find all data blocks that Single document is comprised, corresponding data block is done to unified scheduling, before the corresponding data block of locating file, need the file in duplicate removal bag to recover, it is the process of a read block and writing in files that file recovers, and by reading file metadata information and the data block information that in duplicate removal bag, each document entity comprises, recovers initial file data; Reset based on the data block of file-level, not only unique data piece is concentrated to the front end that is prefetched to the data slot in duplicate removal bag, and the relevant descriptor such as data block fingerprint and logic data block is also prefetched to the front end of corresponding data fragment in the lump.
Preferably, in step (2), described active file filtrator is used for realizing file data blocks distribution management, the order that enters data deduplication system by changing file, the data block realizing based on active file collection is reset, first file filter device scans the file in duplicate removal bag by the order of system file, when the file scanning is during at active file collection, just directly carry out the corresponding data block of file, fingerprint, the retrieval of logical data and document entity, retrieving comprises addressing and the recovery of data block, and the writing of data field in new duplicate removal bag, All Files is all after been scanned, remaining not in the concentrated file of active file is just arranged in duplicate removal bag by original order after the data slot of active file collection.
Preferably, in step (3), the storage format of data block in duplicate removal bag is a copy, multiple indexes, the addressing unit of data block is byte, and in duplicate removal bag, the physical message of unique data piece is recorded in corresponding logic data block, and the size of each logic data block is identical, the numbering of unique data piece, since 0, increases progressively successively.
Preferably, data block addressing comprises two mapping process, first, find corresponding logic data block according to the numbering of data block in document entity, because the size of each logical block is identical, the calculating process of addressing is: the numbering of data block is multiplied by the size of logical block, then just draws the physical address of counterlogic data block; Then, addressing is for the second time according to physical displacement and the block size of the unique data piece recording in the logic data block of reading, and finds corresponding data block, and addressing and the physical mappings of data block is actually the conversion of " index-unique data piece ".
Preferably, after file filter device screens recovery to original document data in duplicate removal bag based on active file collection, need to be again by the data block of file including and corresponding metadata store in duplicate removal bag, concrete steps are to carry out file cutting, fingerprint to generate, set up service data, after system cutting file, it is the hash value of first computational data piece to the processing of data block, then carry out hash comparison, finally exactly the data after duplicate removal are stored, the memory management module of system to the processing procedure of new unique data piece be one can concurrent execution scheduling.
Preferably, data recovery is to recover for the unified of all unique data pieces, logic data block, data block fingerprint and the file metadata that comprise in Single document.
Preferably, the data block processing procedure that file after processing through data de-duplication technology is comprised is divided into the thread of four parallel processings: the storage of unique data piece, logic data block storage, data block fingerprint storage and file metadata storage, the programming mechanism that thread uses is openMP.
Preferably, file in active file filters scan duplicate removal bag is the time sequencing that enters data deduplication system by original document, whether the filename that compares one by one document entity in duplicate removal bag is present in active file collection, and the file shunting that visiting frequency is different is processed.
Preferably, original document in the duplicate removal bag of change data deduplication system enters the feature of the discrete distribution of time sequencing of system by file, again the data content in duplicate removal bag is comprised to unique data piece, logic data block, data block fingerprint and file metadata, by the visiting frequency of file, are dispatched to the front end of corresponding data fragment in duplicate removal bag in base unit Unified Set taking Single document.
Compared with prior art, tool has the following advantages and beneficial effect in the present invention:
(1) the present invention is based on the data rearrangement of active file, taking file as processing unit, the data message corresponding with data block to all data blocks that comprise in Single document carries out United Dispatching and distribution, and this is consistent with the request of access contents and mode of user level.
(2) the present invention shunts the data of active file and non-common file, file test is concentrated to the data slot front end being prefetched in duplicate removal bag, the time overhead that saving system is found document entity.
(3) file recovers termination mechanism, the present invention is based on the process of in the duplicate removal bag after active file is reset, file being recovered and add termination judgement, after All Files in file set all recovers from packet, system no longer scans the alternative document entity in duplicate removal bag.This can save the unnecessary document retrieval time.
Brief description of the drawings
Fig. 1 is system model structural representation of the present invention;
Fig. 2 is the workflow schematic diagram that the present invention is based on file data rearrangement piece;
Fig. 3 is data block mapping and addressing schematic diagram in duplicate removal bag of the present invention;
Fig. 4 is data stream storage organization schematic diagram of the present invention.
Embodiment
Below in conjunction with embodiment and accompanying drawing, the present invention is described in further detail, but embodiments of the present invention are not limited to this.
Embodiment
As shown in Figure 1, the data reconstruction optimization method of a kind of online data deduplication system of the present invention, the model of place of application is online data deduplication system, comprises server end and client two parts:
The function that client mainly realizes is that file is carried out to stripping and slicing, the hash value of computational data piece, and storage hash value, and as the fingerprint of this data block.By comparing the fingerprint of each data block, judge the piece whether this data block repeats, system is only stored unique data block, and records the ID of each data block.Each file entity of can creating a file, document entity is for preserving the metadata of original, comprise the size of filename, data block quantity, data block ID size, last data block and the numbering of one group of unique data piece, and last data block of file is (because this data block size is conventionally little than normal data piece, recurrence probability is very little, so storage separately).Unique data piece, data block fingerprint, all document entities can be kept at a duplicate removal bag, and in duplicate removal bag, data send to server end with the form of file.
Data in server parses duplicate removal bag, and preserve unique data piece, data block fingerprint table, logical data and document entity, the operation interval based on file data rearrangement piece is exactly the read and write of these the four classes data on server.Resetting based on file is the sequencing by reorganizing data in duplicate removal bag, with the more excellent document retrieval of acquisition system and release time efficiency.
In order more clearly to illustrate specific embodiment of the invention model, remake labor below in conjunction with data block mapping in the workflow schematic diagram (Fig. 2) based on file data rearrangement piece, duplicate removal bag with addressing schematic diagram (Fig. 3) and data stream storage organization schematic diagram (Fig. 4).
As shown in Figure 2, system is reset and is divided into two stages file.First stage be file recover, processing to as if duplicate removal bag.Recover based on the data of file, first, read the document entity in duplicate removal bag, the numbering that document entity has comprised the unique data piece that corresponding document is corresponding; Then, find corresponding logic data block according to data block numbering, read displacement and the size information of logic data block, find the unique data piece in duplicate removal bag; Finally, the data block based on document entity puts in order, and unique data piece is written in corresponding file.Second stage is that file is reset, and file is reset the module that has three orders to carry out.(1) file filter device, (2) data block cutting, (3) data block processing, the function of each several part around processing unit be all file, the base unit of data processing is data block.
As shown in Figure 3, file filter device is retrieved data concentrated active file taking file as base unit, and the retrieval of file in duplicate removal bag carried out corresponding data block addressing and operation according to document entity.The storage format of data block in duplicate removal bag is a copy, multiple indexes.So in data deduplication system, need to set up the logical description information of data block, share the index of unique data piece to facilitate between different files and set up.The addressing unit of data block is byte, and in duplicate removal bag, the physical message of unique data piece is recorded in corresponding logic data block.The size of each logic data block is identical, and the numbering of unique data piece, since 0, increases progressively successively.Data block addressing comprises two mapping process, first, find corresponding logic data block according to the numbering of data block in document entity, because the size of each logical block is identical, the calculating process of addressing is: the numbering of data block is multiplied by the size of logical block, then just draws the physical address of counterlogic data block.Then, addressing is for the second time according to physical displacement and the block size of the unique data piece recording in the logic data block of reading, and finds corresponding data block.Addressing and the physical mappings of data block is actually the conversion of " index-unique data piece ".
As shown in Figure 4, after file filter device recovers based on active file collection screening original document data in duplicate removal bag, need to be again by the data block of file including and corresponding metadata store in duplicate removal bag.Concrete steps are to carry out file cutting, fingerprint to generate, set up service data.After system cutting file, be the hash value of first computational data piece to the processing of data block, then carry out hash comparison, finally exactly the data after duplicate removal are stored.The memory management module of system to the processing procedure of new unique data piece be one can concurrent execution scheduling.In order to improve the treatment effeciency of data block, the model that the present invention proposes is divided into storing process with Open MP multithreading the thread of four concurrent execution: hash value is inserted hash table, the processing of unique data piece, logic data block processing and metadata processing.Because the diverse location data writing of each thread in duplicate removal bag, so concurrent storage administration not only can improve the output efficiency of system, and has safeguarded the independence of data to a certain extent.
Above-described embodiment is preferably embodiment of the present invention; but embodiments of the present invention are not restricted to the described embodiments; other any do not deviate from change, the modification done under Spirit Essence of the present invention and principle, substitutes, combination, simplify; all should be equivalent substitute mode, within being included in protection scope of the present invention.

Claims (10)

1. a data reconstruction optimization method for online data deduplication system, is characterized in that, comprises the steps:
(1), after online data deduplication system carries out data duplicate removal to original document, generate duplicate removal bag, the request of access of machining system response user to the data based on file-level, recover to realize user's memory access by file, online data deduplication system can within one period that presets length, add up duplicate removal bag in the access times of each file, visiting frequency is classified as to active file collection higher than the file of certain value, visiting frequency is classified as non-common file set lower than the file of this critical value, then execution step (2) operation;
(2), suspend the data access request of data deduplication system, carry out resetting based on the data block of file-level, the document entity in the active file set pair duplicate removal bag that active file filtrator obtains according to step (1) is shunted processing; Processing procedure is: according to putting in order of original document in duplicate removal bag, read one by one the document entity in duplicate removal bag, filename and the file type of the metadata information section of comparison document entity record respective file, if being present in the active file of step (1) generation, concentrates this filename execution step (3) operation;
(3), the unique data block number district of file reading entity, according to data block mapping ruler, the deposit position of the unique data piece that finds each reference numeral in duplicate removal bag, corresponding unique data piece is written in the file that will recover, and last the unique data piece in document entity is also written in the file that will recover, if after step (2) all completes, execution step (4), otherwise continue to return execution step (2);
(4), conventional concentrated file being re-started to data block cutting and fingerprint calculates, and generate new logical data module unit and file and describe metamessage, newly-generated data message is written in new duplicate removal bag to then execution step (5) operation;
(5), unique data the piece corresponding non-common file set in old duplicate removal bag is carried out recover based on the data of file-level, non-common file centralized documentation is appended in new duplicate removal bag, be put into the rear end of data slot in new duplicate removal bag, the old duplicate removal bag of deletion after completing;
(6) it is looking ahead and concentrating of data block based on active file is comprised and file metadata that the data, in newly-generated duplicate removal bag distribute, and data deduplication system recovers the request of response user to data access.
2. the data reconstruction optimization method of online data deduplication system according to claim 1, it is characterized in that, in step (2), carrying out based on the prerequisite step of file data rearrangement piece is to find all data blocks that Single document is comprised, corresponding data block is done to unified scheduling, before the corresponding data block of locating file, need the file in duplicate removal bag to recover, it is the process of a read block and writing in files that file recovers, by reading file metadata information and the data block information that in duplicate removal bag, each document entity comprises, recover initial file data, data block based on file-level is reset, and not only unique data piece is concentrated to the front end that is prefetched to the data slot in duplicate removal bag, and the data block fingerprint descriptor relevant with logic data block is also prefetched to the front end of corresponding data fragment in the lump.
3. the data reconstruction optimization method of online data deduplication system according to claim 1, it is characterized in that, in step (2), described active file filtrator is used for realizing file data blocks distribution management, the order that enters data deduplication system by changing file, the data block realizing based on active file collection is reset, first file filter device scans the file in duplicate removal bag by the order of system file, when the file scanning is during at active file collection, just directly carry out the corresponding data block of file, fingerprint, the retrieval of logical data and document entity, retrieving comprises addressing and the recovery of data block, and the writing of data field in new duplicate removal bag, All Files is all after been scanned, remaining not in the concentrated file of active file is just arranged in duplicate removal bag by original order after the data slot of active file collection.
4. the data reconstruction optimization method of online data deduplication system according to claim 1, it is characterized in that, in step (3), the storage format of data block in duplicate removal bag is a copy, multiple indexes, and the addressing unit of data block is byte, in duplicate removal bag, the physical message of unique data piece is recorded in corresponding logic data block, the size of each logic data block is identical, and the numbering of unique data piece, since 0, increases progressively successively.
5. the data reconstruction optimization method of online data deduplication system according to claim 4, it is characterized in that, data block addressing comprises two mapping process, first, find corresponding logic data block according to the numbering of data block in document entity, because the size of each logical block is identical, the calculating process of addressing is: the numbering of data block is multiplied by the size of logical block, then just draws the physical address of counterlogic data block; Then, addressing is for the second time according to physical displacement and the block size of the unique data piece recording in the logic data block of reading, and finds corresponding data block, and addressing and the physical mappings of data block is actually the conversion of " index-unique data piece ".
6. the data reconstruction optimization method of online data deduplication system according to claim 1, it is characterized in that, after file filter device screens recovery to original document data in duplicate removal bag based on active file collection, need to be again by the data block of file including and corresponding metadata store in duplicate removal bag, concrete steps are to carry out file cutting, fingerprint generates, set up service data, after system cutting file, it is the hash value of first computational data piece to the processing of data block, then carry out hash comparison, finally exactly the data after duplicate removal are stored, the memory management module of system to the processing procedure of new unique data piece be one can concurrent execution scheduling.
7. the data reconstruction optimization method of online data deduplication system according to claim 1, it is characterized in that, it is to recover for the unified of all unique data pieces, logic data block, data block fingerprint and the file metadata that comprise in Single document that file recovers.
8. the data reconstruction optimization method of online data deduplication system according to claim 1, it is characterized in that: the data block processing procedure that the file after processing through data de-duplication technology is comprised is divided into the thread of four parallel processings: the storage of unique data piece, logic data block storage, data block fingerprint storage and file metadata storage, the programming mechanism that thread uses is openMP.
9. the data reconstruction optimization method of online data deduplication system according to claim 3, it is characterized in that, file in active file filters scan duplicate removal bag is the time sequencing that enters data deduplication system by original document, whether the filename that compares one by one document entity in duplicate removal bag is present in active file collection, and the file shunting that visiting frequency is different is processed.
10. the method for the data access optimization of online data deduplication system according to claim 1, it is characterized in that, original document in the duplicate removal bag of change data deduplication system enters the feature of the discrete distribution of time sequencing of system by file, again the data content in duplicate removal bag is comprised to unique data piece, logic data block, data block fingerprint and file metadata, by the visiting frequency of file, are dispatched to the front end of corresponding data fragment in duplicate removal bag in base unit Unified Set taking Single document.
CN201410198679.XA 2014-05-12 2014-05-12 Data reconstruction and optimization method of on-line repeating data deletion system Active CN103955530B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410198679.XA CN103955530B (en) 2014-05-12 2014-05-12 Data reconstruction and optimization method of on-line repeating data deletion system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410198679.XA CN103955530B (en) 2014-05-12 2014-05-12 Data reconstruction and optimization method of on-line repeating data deletion system

Publications (2)

Publication Number Publication Date
CN103955530A true CN103955530A (en) 2014-07-30
CN103955530B CN103955530B (en) 2017-02-22

Family

ID=51332805

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410198679.XA Active CN103955530B (en) 2014-05-12 2014-05-12 Data reconstruction and optimization method of on-line repeating data deletion system

Country Status (1)

Country Link
CN (1) CN103955530B (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105630689A (en) * 2014-10-30 2016-06-01 曙光信息产业股份有限公司 Reconstruction method of expedited data in distributed storage system
CN105930101A (en) * 2016-05-04 2016-09-07 中国人民解放军国防科学技术大学 Weak fingerprint repeated data deletion mechanism based on flash memory solid-state disk
CN106569745A (en) * 2016-10-25 2017-04-19 暨南大学 Memory optimization system for data deduplication under memory overload
CN106844480A (en) * 2016-12-23 2017-06-13 航天星图科技(北京)有限公司 One kind cleaning compares storage method
CN108762679A (en) * 2018-05-30 2018-11-06 郑州云海信息技术有限公司 A kind of online DDP is the same as the offline DDP methods being combined and its relevant apparatus
CN108874315A (en) * 2018-06-01 2018-11-23 暨南大学 A kind of online data deduplicated file system data access performance optimization method
CN109196457A (en) * 2016-04-11 2019-01-11 慧与发展有限责任合伙企业 It sends de-redundancy data and repairs agency
CN109558066A (en) * 2017-09-26 2019-04-02 华为技术有限公司 Restore the method and apparatus of metadata in storage system
CN110083309A (en) * 2019-04-11 2019-08-02 重庆大学 Shared data block processing method, system and readable storage medium storing program for executing
CN110457163A (en) * 2019-07-05 2019-11-15 苏州元核云技术有限公司 A kind of data reconstruction method, device and the storage medium of distributed block storage
CN111338581A (en) * 2020-03-27 2020-06-26 尹兵 Data storage method and device based on cloud computing, cloud server and system
CN113434751A (en) * 2021-07-14 2021-09-24 国际关系学院 Network hotspot artificial intelligence early warning system and method
WO2022193447A1 (en) * 2021-03-17 2022-09-22 网宿科技股份有限公司 Data packet deduplication and transmission method, electronic device, and storage medium
WO2023000915A1 (en) * 2021-07-21 2023-01-26 Huawei Technologies Co., Ltd. Method and apparatus for replicating a target file between devices

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101216791A (en) * 2008-01-04 2008-07-09 华中科技大学 File backup method based on fingerprint
US20100205389A1 (en) * 2009-02-06 2010-08-12 International Business Machines Corporation Backup of deduplicated data
CN101968795A (en) * 2010-09-03 2011-02-09 清华大学 Cache method for file system with changeable data block length
US20130086009A1 (en) * 2011-09-29 2013-04-04 International Business Machines Corporation Method and system for data deduplication
CN103473278A (en) * 2013-08-28 2013-12-25 苏州天永备网络科技有限公司 Repeating data processing technology
CN103617260A (en) * 2013-11-29 2014-03-05 华为技术有限公司 Index generation method and device for repeated data deletion

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101216791A (en) * 2008-01-04 2008-07-09 华中科技大学 File backup method based on fingerprint
US20100205389A1 (en) * 2009-02-06 2010-08-12 International Business Machines Corporation Backup of deduplicated data
CN101968795A (en) * 2010-09-03 2011-02-09 清华大学 Cache method for file system with changeable data block length
US20130086009A1 (en) * 2011-09-29 2013-04-04 International Business Machines Corporation Method and system for data deduplication
CN103473278A (en) * 2013-08-28 2013-12-25 苏州天永备网络科技有限公司 Repeating data processing technology
CN103617260A (en) * 2013-11-29 2014-03-05 华为技术有限公司 Index generation method and device for repeated data deletion

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105630689A (en) * 2014-10-30 2016-06-01 曙光信息产业股份有限公司 Reconstruction method of expedited data in distributed storage system
CN105630689B (en) * 2014-10-30 2018-11-27 曙光信息产业股份有限公司 Accelerate the method for data reconstruction in a kind of distributed memory system
CN109196457A (en) * 2016-04-11 2019-01-11 慧与发展有限责任合伙企业 It sends de-redundancy data and repairs agency
CN105930101A (en) * 2016-05-04 2016-09-07 中国人民解放军国防科学技术大学 Weak fingerprint repeated data deletion mechanism based on flash memory solid-state disk
CN106569745A (en) * 2016-10-25 2017-04-19 暨南大学 Memory optimization system for data deduplication under memory overload
CN106569745B (en) * 2016-10-25 2019-07-19 暨南大学 Memory optimizing system towards data de-duplication under a kind of memory overload
CN106844480A (en) * 2016-12-23 2017-06-13 航天星图科技(北京)有限公司 One kind cleaning compares storage method
CN106844480B (en) * 2016-12-23 2019-03-15 中科星图股份有限公司 A kind of cleaning comparison storage method
CN109558066A (en) * 2017-09-26 2019-04-02 华为技术有限公司 Restore the method and apparatus of metadata in storage system
CN108762679B (en) * 2018-05-30 2021-06-29 郑州云海信息技术有限公司 Method for combining online DDP (distributed data processing) and offline DDP (distributed data processing) and related device thereof
CN108762679A (en) * 2018-05-30 2018-11-06 郑州云海信息技术有限公司 A kind of online DDP is the same as the offline DDP methods being combined and its relevant apparatus
CN108874315A (en) * 2018-06-01 2018-11-23 暨南大学 A kind of online data deduplicated file system data access performance optimization method
CN110083309A (en) * 2019-04-11 2019-08-02 重庆大学 Shared data block processing method, system and readable storage medium storing program for executing
CN110457163A (en) * 2019-07-05 2019-11-15 苏州元核云技术有限公司 A kind of data reconstruction method, device and the storage medium of distributed block storage
CN110457163B (en) * 2019-07-05 2022-05-03 苏州元核云技术有限公司 Data recovery method and device for distributed block storage and storage medium
CN111338581A (en) * 2020-03-27 2020-06-26 尹兵 Data storage method and device based on cloud computing, cloud server and system
WO2022193447A1 (en) * 2021-03-17 2022-09-22 网宿科技股份有限公司 Data packet deduplication and transmission method, electronic device, and storage medium
CN113434751A (en) * 2021-07-14 2021-09-24 国际关系学院 Network hotspot artificial intelligence early warning system and method
CN113434751B (en) * 2021-07-14 2023-06-02 国际关系学院 Network hotspot artificial intelligent early warning system and method
WO2023000915A1 (en) * 2021-07-21 2023-01-26 Huawei Technologies Co., Ltd. Method and apparatus for replicating a target file between devices

Also Published As

Publication number Publication date
CN103955530B (en) 2017-02-22

Similar Documents

Publication Publication Date Title
CN103955530B (en) Data reconstruction and optimization method of on-line repeating data deletion system
CN106662981B (en) Storage device, program, and information processing method
US8631052B1 (en) Efficient content meta-data collection and trace generation from deduplicated storage
US11093466B2 (en) Incremental out-of-place updates for index structures
Shilane et al. Wan-optimized replication of backup datasets using stream-informed delta compression
CN101676855B (en) Scalable secondary storage systems and methods
CN103116661B (en) A kind of data processing method of database
Ng et al. Revdedup: A reverse deduplication storage system optimized for reads to latest backups
CN103635900B (en) Time-based data partitioning
US8667032B1 (en) Efficient content meta-data collection and trace generation from deduplicated storage
Tarasov et al. Generating realistic datasets for deduplication analysis
CN103562914B (en) The type that economizes on resources extends file system
Xia et al. Similarity and locality based indexing for high performance data deduplication
CN101777017B (en) Rapid recovery method of continuous data protection system
Liu et al. ADMAD: Application-driven metadata aware de-duplication archival storage system
Xia et al. DARE: A deduplication-aware resemblance detection and elimination scheme for data reduction with low overheads
CN106649676B (en) HDFS (Hadoop distributed File System) -based duplicate removal method and device for stored files
CN105069048A (en) Small file storage method, query method and device
US11422721B2 (en) Data storage scheme switching in a distributed data storage system
CN106874399B (en) Networking backup system and backup method
Zhang et al. Improving restore performance for in-line backup system combining deduplication and delta compression
CN105493080B (en) The method and apparatus of data de-duplication based on context-aware
CN104050057B (en) Historical sensed data duplicate removal fragment eliminating method and system
Tan et al. Improving restore performance in deduplication-based backup systems via a fine-grained defragmentation approach
Kumar et al. Bucket based data deduplication technique for big data storage system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20201211

Address after: 510632 No. 601, Whampoa Avenue, Tianhe District, Guangdong, Guangzhou

Patentee after: Guangzhou Jinan University Science Park Management Co.,Ltd.

Address before: 510632 No. 601, Whampoa Avenue, Guangzhou, Guangdong

Patentee before: Jinan University

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20210125

Address after: 241, 2nd floor, No.35, Huajing Road, Huajing new town, 105 Zhongshan Avenue, Tianhe District, Guangzhou, Guangdong 510000

Patentee after: Guangdong, Hong Kong and Macao QingChuang Technology (Guangzhou) Co.,Ltd.

Patentee after: Guangzhou Jinan University Science Park Management Co.,Ltd.

Address before: 510632 No. 601, Whampoa Avenue, Tianhe District, Guangdong, Guangzhou

Patentee before: Guangzhou Jinan University Science Park Management Co.,Ltd.