CN103955530B - Data reconstruction and optimization method of on-line repeating data deletion system - Google Patents

Data reconstruction and optimization method of on-line repeating data deletion system Download PDF

Info

Publication number
CN103955530B
CN103955530B CN201410198679.XA CN201410198679A CN103955530B CN 103955530 B CN103955530 B CN 103955530B CN 201410198679 A CN201410198679 A CN 201410198679A CN 103955530 B CN103955530 B CN 103955530B
Authority
CN
China
Prior art keywords
data
file
data block
duplicate removal
removal bag
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410198679.XA
Other languages
Chinese (zh)
Other versions
CN103955530A (en
Inventor
邓玉辉
岑大慰
黄战
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Hong Kong And Macao Qingchuang Technology Guangzhou Co ltd
Guangzhou Jinan University Science Park Management Co ltd
Original Assignee
Jinan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinan University filed Critical Jinan University
Priority to CN201410198679.XA priority Critical patent/CN103955530B/en
Publication of CN103955530A publication Critical patent/CN103955530A/en
Application granted granted Critical
Publication of CN103955530B publication Critical patent/CN103955530B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1748De-duplication implemented within the file system, e.g. based on file segments
    • G06F16/1752De-duplication implemented within the file system, e.g. based on file segments based on file chunks

Abstract

The invention provides a data reconstruction and optimization method of an on-line repeating data deletion system. On the one hand, more data is stored and managed by limited space resources via redundancy detection and deletion of repeating data for data storage; on the other hand, a system model prefetches commonly-used file data blocks to the front end of a data slot in a duplication removing packet by scheduling and rearranging distribution of the data blocks after duplication removing, and aggregates the randomly and discretely distributed data blocks and corresponding fingerprints in the duplication removing packet for storage to shorten seek time on a disk in a file data recovery process, so that data reconstruction performance of the on-line repeating data deletion system is improved, response time of the system is shortened, and the data recovery efficiency is improved.

Description

A kind of data reconstruction optimization method of online data deduplication system
Technical field
The present invention relates to a kind of data reconstruction optimization method of online data deduplication system, more specifically to Visiting frequency based on file carry out the addressing of data block in technology that in duplicate removal bag, data block is reset and duplicate removal bag, recover with The technology of reconstruct.
Background technology
With the continuous development of network and various plateform system, modern society becomes the ocean of data.Daily communication people Live various identity informations, website interaction produce browse information, the order data of various ecommerce, Learning Studies and Document data of office etc., each computer user is the producer of data, is also the consumer of data.Information processing system Daily needs are in the face of the data source huge with process.In face of mass data, how effectively to store and to manage, in mining data Useful information becomes the focus of modernization intellectual technology.Effective storage of data is exactly using same space after all Resource stores more data volumes.The operation being directed to can be a lot, but the method acting on data itself is exactly data pressure Contracting and redundant data are deleted.Duplicate removal for data itself and compress technique are the most direct, are also current with research the widest Field.
Data de-duplication technology has application for many years and Research foundation in industrial quarters and academia.Sending out from this technology From the point of view of exhibition, the constant comparison being all by data of model framework, eliminate the data slot repeating, set up metadata and safeguard, its Middle duplicate removal rate and time efficiency are the emphasis of this technical concerns.The generation of data after from original document to duplicate removal, then to data also Originally it was original document, the emphasis of concern is different, data de-duplication technology, beyond itself storage is with aspect, obtains difference The expansion of degree.
Make a general survey of data compression data duplicate removal, no matter which kind of processing means, need to carry out process, the excavation of information of data Too busy to get away is all that file data after processing storage is recovered.In addition, storage system is only intended to big data Preserve, client needs request to access, or when system server needs to carry out data verification and compare, will be by the literary composition of system Number of packages evidence recovers from storage medium.So, file access pattern becomes another key technology point of data processing.Have The File Instauration Technique of effect can quickly respond the request of system, the ability improving system-computed and processing big data.
Content of the invention
The purpose of the present invention is to realize a kind of data reconstruction optimization method of online data deduplication system, process right As if the packet after data de-duplication, distribution in duplicate removal bag for the data after duplicate removal directly affect system response The response time of client, by optimizing storage organization, system can feedback user more in real time access request.
The purpose of the present invention is realized by following technical scheme:
A kind of data reconstruction optimization method of online data deduplication system, comprises the steps:
(1), after, online data deduplication system carries out data deduplication to original document, duplicate removal bag, duplicate removal system are generated The system response access request to the data based on file-level for the user, is accessed by the storage that file access pattern realizes user, online Data deduplication system can count the access times of each file in duplicate removal bag within the time of one section of default measured length, will visit Ask that frequency is higher than that the file of certain value classifies as active file collection, the file that visiting frequency is less than this critical value is classified as non-conventional literary composition Part collection, then execution step (2) operation;
(2), suspend the data access request of data deduplication system, carry out the data block based on file-level and reset, Document entity in the active file set pair duplicate removal bag that active file filter obtains according to step (1) carries out shunting process;Place Reason process is:According to the putting in order of original document in duplicate removal bag, read the document entity in duplicate removal bag one by one, comparison document is real The filename of metadata information section of body record respective file and file type, if file name is present in step (1) and generates Active file concentrate, then execution step (3) operation;
(3), read the unique data block number area of document entity, according to data block mapping ruler, find each corresponding volume Number deposit position in duplicate removal bag for the unique data block, corresponding unique data block is written in the file that will recover, And last the unique data block in document entity is also written in file to be recovered, if step (2) is all complete After one-tenth, then execution step (4), otherwise continue to return execution step (2);
(4), the file of conventional concentration is re-started data block cutting and fingerprint calculates, and generate new logic data block Unit and file describe metamessage, and newly-generated data message is written in new duplicate removal bag, then execution step (5) behaviour Make;
(5), the non-conventional file set corresponding unique data block in old duplicate removal bag is carried out the number based on file-level According to recovery, file in non-conventional file set is appended in new duplicate removal bag, is put into the rear end of data slot in new duplicate removal bag, After the completion of delete old duplicate removal bag;
(6) data distribution in, newly-generated duplicate removal bag is based on the data block that active file is comprised and file unit The prefetching and concentrating of data, data deduplication system recovers the response request to data access for the user.
Preferably, in step (2), carrying out based on the prerequisite steps of file data rearrangement block is to find to be wrapped single file The all data blocks containing, corresponding data block is made unified scheduling, needs to duplicate removal before the corresponding data block of locating file File in bag is recovered, and file access pattern is a read block and the process of write file, by reading in duplicate removal bag The file metadata information data block message that each document entity comprises, recovers initial file data;Based on file-level Data block reset, not only unique data block is concentrated the front end being prefetched to data slot in duplicate removal bag, and data block refers to The related description information such as line and logic data block is also prefetched to the front end of corresponding data fragment in the lump.
Preferably, in step (2), described active file filter is used for realizing file data blocks distribution management, by changing Become the order that file enters data deduplication system, realize the data block based on active file collection and reset, file filter device is first First the file in duplicate removal bag is scanned by the order of system file, when the file scanning is in active file collection, just straight Tap into data block corresponding to style of writing part, the retrieval of fingerprint, logical data and document entity, retrieving includes seeking of data block The write of data field in location and recovery, and new duplicate removal bag, after All Files is all scanned, remaining not in active file Concentrate file just by original be arranged sequentially the data slot of active file collection in duplicate removal bag after.
Preferably, in step (3), storage format in duplicate removal bag for the data block is a copy, multiple indexes, data block Addressing unit be byte, in duplicate removal bag, in corresponding logic data block, each patrols the physical message record of unique data block The size of volume data block is identical, the numbering of unique data block from the beginning of 0, incremented by successively.
Preferably, data block addressing includes two mapping process, first, is found according to the numbering of data block in document entity Corresponding logic data block, because the size of each logical block is identical, the calculating process of addressing is:The numbering of data block is multiplied by The size of logical block, then just draws the physical address of counterlogic data block;Then, second addressing is according to patrolling of reading The physical displacement of unique data block of record and block size in volume data block, find corresponding data block, the addressing of data block and Physical mappings are actually the conversion of " index unique data block ".
Preferably, after file filter device is based on the screening recovery of active file collection to original document data in duplicate removal bag, need Again the data block that file is comprised and corresponding metadata to store in duplicate removal bag, concrete steps be by file cutting, Fingerprint generates, sets up and safeguard data, and after system cutting file, the process to data block is the hash value first calculating data block, connects And carry out hash and compare, be exactly finally that the data after duplicate removal is stored, the memory management module of system is to new unique number Processing procedure according to block is a scheduling that can concurrently execute.
Preferably, data recovery is for all unique data blocks comprising in single file, logic data block, data block Fingerprint and the unified recovery of file metadata.
Preferably, the data block processing procedure that the file after processing through data de-duplication technology is comprised is divided into The thread of four parallel processings:The storage of unique data block, logic data block storage, the storage of data block fingerprint and file metadata are deposited Storage, thread with programming mechanism be openMP.
Preferably, the file in active file filters scan duplicate removal bag is to enter data de-duplication system by original document The time sequencing of system, the filename comparing document entity in duplicate removal bag one by one whether there is in active file collection, to visiting frequency Different file shuntings is processed.
Preferably, change data deduplication system duplicate removal bag in original document press file entrance system time suitable Data content in duplicate removal bag is included unique data block, logic data block, data block fingerprint by the feature of the discrete distribution of sequence again Press the visiting frequency of file with file metadata, with single file for base unit Unified Set in be dispatched to respective counts in duplicate removal bag Front end according to fragment.
The present invention compared with prior art, has the advantage that and beneficial effect:
(1) data rearrangement based on active file for the present invention, with file for processing unit, to comprised in single file The corresponding data message of all data block data blocks carries out United Dispatching and distribution, in this access request with user level Hold consistent with mode.
(2) present invention shunts to the data of active file and non-active file, file test is concentrated pre- Get the data slot front end in duplicate removal bag, the time overhead that the system of saving is found to document entity.
(3) file access pattern termination mechanism, the present invention is based on the mistake to file access pattern in the duplicate removal bag after active file rearrangement Journey adds termination to judge, that is, after All Files all recovers from packet in file set, system no longer scans duplicate removal Alternative document entity in bag.This can save unnecessary file retrieval time.
Brief description
Fig. 1 is present system model structure schematic diagram;
Fig. 2 is the workflow diagrams based on file data rearrangement block for the present invention;
Fig. 3 is data block mapping and addressing scheme in duplicate removal bag of the present invention;
Fig. 4 is data flow storage organization schematic diagram of the present invention.
Specific embodiment
With reference to embodiment and accompanying drawing, the present invention is described in further detail, but embodiments of the present invention do not limit In this.
Embodiment
As shown in figure 1, a kind of data reconstruction optimization method of online data deduplication system of the present invention, the scene of application Model is online data deduplication system, including server end and client two parts:
The function that client is mainly realized is that file is carried out with stripping and slicing, calculates the hash value of data block, stores hash value, and Fingerprint as this data block.By comparing the fingerprint of each data block, judged the block whether this data block repeats, system is only Store unique data block, and record the ID of each data block.Each file can set up a document entity, and document entity is used for Preserve original metadata, including filename, data number of blocks, data block ID size, the size of last data block and The numbering of one group of unique data block, and last data block of file is (because this data block size is generally than normal number Little according to block, recurrence probability is very little, so independent store).Unique data block, data block fingerprint, all of document entity can be protected There is a duplicate removal bag, in duplicate removal bag, data is sent to server end in the form of a file.
Server parses the data in duplicate removal bag, and preserves unique data block, data block fingerprint table, logical data and file Entity is exactly the reading of this four classes data on server based on the operation interval of file data rearrangement block and writes.Based on file weight Row is the sequencing by reorganizing data in duplicate removal bag, to obtain the more excellent document retrieval of system and recovery time effect Rate.
It is embodied as model in order to more clearly illustrate the present invention, below in conjunction with the work based on file data rearrangement block Data block mapping and addressing scheme (Fig. 3) data stream storage organization schematic diagram (Fig. 4) in flow diagram (Fig. 2), duplicate removal bag Remake labor.
As shown in Fig. 2 system enters rearrangement to file is divided into two stages.First stage is file access pattern, process To as if duplicate removal bag.Based on the data recovery of file, first, read the document entity in duplicate removal bag, document entity contains phase Answer the numbering of file corresponding unique data block;Then, corresponding logic data block is found according to data block numbering, read logic The displacement of data block and size information, find the unique data block in duplicate removal bag;Finally, the data block based on document entity arranges Sequentially, unique data block is written in corresponding file.Second stage is that file is reset, and file is reset has three orders to hold The module of row.(1) file filter device, (2) data block cutting, (3) data block is processed, the function of each several part around process unit It is all file, the base unit of data processing is data block.
As shown in figure 3, the data that active file is concentrated is entered line retrieval, file with file for base unit by file filter device Retrieval in duplicate removal bag is to carry out corresponding data block addressing and operation according to document entity.Data block is in duplicate removal bag Storage format is a copy, multiple indexes.So in data deduplication system, needing to set up the logical description of data block Information, is set up with the index facilitating shared unique data block between different files.The addressing unit of data block is byte, duplicate removal bag The physical message record of middle unique data block is in corresponding logic data block.The size of each logic data block is identical, uniquely The numbering of data block from the beginning of 0, incremented by successively.Data block addressing includes two mapping process, first, according to number in document entity Find corresponding logic data block according to the numbering of block, because the size of each logical block is identical, the calculating process of addressing is:Number It is multiplied by the size of logical block according to the numbering of block, then just draw the physical address of counterlogic data block.Then, address for second It is according to the physical displacement of the unique data block of record and block size in the logic data block reading, find corresponding data block. The addressing of data block and physical mappings are actually the conversion of " index unique data block ".
As shown in figure 4, after file filter device is based on the screening recovery of active file collection to original document data in duplicate removal bag, The data block again comprising file and corresponding metadata is needed to store in duplicate removal bag.Concrete steps are by file and cut Divide, fingerprint generates, data is safeguarded in foundation.After system cutting file, the process to data block is the hash value first calculating data block, Then carry out hash to compare, be exactly finally that the data after duplicate removal is stored.The memory management module of system is to new unique The processing procedure of data block is a scheduling that can concurrently execute.In order to improve the treatment effeciency of data block, proposed by the present invention Storing process is divided into four threads concurrently executing with Open MP multithreading by model:Hash value insertion hash table, unique Data block is processed, logic data block is processed and metadata is processed.Because diverse location write number in duplicate removal bag for each thread According to so concurrent storage management not only can improve the delivery efficiency of system, and maintaining the independence of data to a certain extent Property.
Above-described embodiment is the present invention preferably embodiment, but embodiments of the present invention are not subject to above-described embodiment Limit, other any Spirit Essences without departing from the present invention and the change made under principle, modification, replacement, combine, simplify, All should be equivalent substitute mode, be included within protection scope of the present invention.

Claims (10)

1. a kind of data reconstruction optimization method of online data deduplication system is it is characterised in that comprise the steps:
(1) after, online data deduplication system carries out data deduplication to original document, generate duplicate removal bag, machining system rings Answer the access request to the data based on file-level for the user, accessed by the storage that file access pattern realizes user, online repetition Data deletion system can count the access times of each file in duplicate removal bag within the time of one section of default measured length, will access frequency Degree classifies as active file collection higher than the file of the critical value setting, and the file that visiting frequency is less than this critical value is classified as non-commonly using File set, then execution step (2) operation;
(2), suspend the data access request of data deduplication system, carry out the data block based on file-level and reset, commonly use Document entity in the active file set pair duplicate removal bag that file filter device obtains according to step (1) carries out shunting process;Processed Cheng Shi:According to the putting in order of original document in duplicate removal bag, read the document entity in duplicate removal bag one by one, comparison document entity is remembered The record filename of metadata information section of respective file and file type, if file name be present in that step (1) generates normal With in file set, then execution step (3) operates;
(3), read the unique data block number area of document entity, according to data block mapping ruler, find each reference numeral Deposit position in duplicate removal bag for the unique data block, corresponding unique data block is written in the file that will recover, and Last unique data block in document entity is also written in file to be recovered, if step (2) is fully completed it Afterwards, then execution step (4), otherwise continue to return execution step (2);
(4), the file of conventional concentration is re-started data block cutting and fingerprint calculates, and generate new logical data module unit Describe metamessage with file, newly-generated data message is written in new duplicate removal bag, then execution step (5) operation;
(5) data, carrying out the non-conventional file set corresponding unique data block in old duplicate removal bag based on file-level is extensive Multiple, file in non-conventional file set is appended in new duplicate removal bag, is put into the rear end of data slot in new duplicate removal bag, completes Delete old duplicate removal bag afterwards;
(6) data distribution in, newly-generated duplicate removal bag is based on the data block that active file is comprised and file metadata Prefetch and concentrate, data deduplication system recovers the response request to data access for the user.
2. the data reconstruction optimization method of online data deduplication system according to claim 1 is it is characterised in that walk Suddenly in (2), carrying out based on the prerequisite steps of file data rearrangement block is to find all data blocks being comprised single file, will Corresponding data block makees unified scheduling, needs the file in duplicate removal bag is carried out extensive before the corresponding data block of locating file Multiple, file access pattern is a read block and the process of write file, is comprised by reading each document entity in duplicate removal bag File metadata information data block message, recover initial file data;Data block based on file-level is reset, not only Unique data block is concentrated the front end of the data slot being prefetched in duplicate removal bag, and data block fingerprint is related with logic data block Description information be also prefetched to the front end of corresponding data fragment in the lump.
3. the data reconstruction optimization method of online data deduplication system according to claim 1 is it is characterised in that walk Suddenly in (2), described active file filter is used for realizing file data blocks distribution management, enters repeated data by changing file The order of deletion system, realizes the data block based on active file collection and resets, file filter device is first by the file in duplicate removal bag It is scanned by the order of system file, when the file scanning is in active file collection, with regard to directly carrying out corresponding to file The retrieval of data block, fingerprint, logical data and document entity, retrieving includes addressing and the recovery of data block, and newly goes The write of data field in again wrapping, after All Files is all scanned, the remaining file do not concentrated in common file is just pressed former Have after being arranged sequentially the data slot of active file collection in new duplicate removal bag.
4. the data reconstruction optimization method of online data deduplication system according to claim 1 is it is characterised in that walk Suddenly, in (3), storage format in duplicate removal bag for the data block is a copy, multiple indexes, and the addressing unit of data block is byte, In duplicate removal bag, the physical message record of unique data block is in corresponding logic data block, the size phase of each logic data block With, the numbering of unique data block from the beginning of 0, incremented by successively.
5. the data reconstruction optimization method of online data deduplication system according to claim 4 is it is characterised in that count Include two mapping process according to block addressing, first, corresponding logic data block found according to the numbering of data block in document entity, Because the size of each logical block is identical, the calculating process of addressing is:The numbering of data block is multiplied by the size of logical block, then Just draw the physical address of counterlogic data block;Then, second addressing is according to record in the logic data block reading The physical displacement of unique data block and block size, find corresponding data block, and the addressing of data block and physical mappings are actually The conversion of " index unique data block ".
6. the data reconstruction optimization method of online data deduplication system according to claim 1 is it is characterised in that literary composition Part filter is based on to original document data in duplicate removal bag after the screening of active file collection recovers, and needs again to comprise file Data block and corresponding metadata store in duplicate removal bag, and concrete steps are by file cutting, fingerprint generates, number is safeguarded in foundation According to, after system cutting file, the process to data block is the hash value first calculating data block, then carries out hash and compares, finally Exactly the data after duplicate removal is stored, the memory management module of system is one to the processing procedure of new unique data block The scheduling that can concurrently execute.
7. the data reconstruction optimization method of online data deduplication system according to claim 1 is it is characterised in that literary composition It is for all unique data blocks, logic data block, data block fingerprint and the file metadata comprising in single file that part recovers Unified recovery.
8. online data deduplication system according to claim 1 data reconstruction optimization method it is characterised in that:Will The data block processing procedure that file after data de-duplication technology is processed is comprised is divided into the line of four parallel processings Journey:The storage of unique data block, logic data block storage, the storage of data block fingerprint and file metadata storage, the volume that thread uses Journey mechanism is openMP.
9. the data reconstruction optimization method of online data deduplication system according to claim 3 is it is characterised in that often It is the time sequencing entering data deduplication system by original document with the file that file filter device scans in duplicate removal bag, one by one Relatively in duplicate removal bag, the filename of document entity whether there is at active file collection, the file shunting different to visiting frequency Reason.
10. online data deduplication system according to claim 1 data reconstruction optimization method it is characterised in that Change the original document in the duplicate removal bag of data deduplication system and press the spy that file enters the discrete distribution of time sequencing of system Levy, again by the data content in duplicate removal bag include unique data block, logic data block, data block fingerprint and file metadata by The visiting frequency of file, with single file for base unit Unified Set in be dispatched to the front end of corresponding data fragment in duplicate removal bag.
CN201410198679.XA 2014-05-12 2014-05-12 Data reconstruction and optimization method of on-line repeating data deletion system Active CN103955530B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410198679.XA CN103955530B (en) 2014-05-12 2014-05-12 Data reconstruction and optimization method of on-line repeating data deletion system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410198679.XA CN103955530B (en) 2014-05-12 2014-05-12 Data reconstruction and optimization method of on-line repeating data deletion system

Publications (2)

Publication Number Publication Date
CN103955530A CN103955530A (en) 2014-07-30
CN103955530B true CN103955530B (en) 2017-02-22

Family

ID=51332805

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410198679.XA Active CN103955530B (en) 2014-05-12 2014-05-12 Data reconstruction and optimization method of on-line repeating data deletion system

Country Status (1)

Country Link
CN (1) CN103955530B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105630689B (en) * 2014-10-30 2018-11-27 曙光信息产业股份有限公司 Accelerate the method for data reconstruction in a kind of distributed memory system
US20200159431A1 (en) * 2016-04-11 2020-05-21 Hewlett Packard Enterprise Development Lp Sending deduplicated data and rehydrating agent
CN105930101A (en) * 2016-05-04 2016-09-07 中国人民解放军国防科学技术大学 Weak fingerprint repeated data deletion mechanism based on flash memory solid-state disk
CN106569745B (en) * 2016-10-25 2019-07-19 暨南大学 Memory optimizing system towards data de-duplication under a kind of memory overload
CN106844480B (en) * 2016-12-23 2019-03-15 中科星图股份有限公司 A kind of cleaning comparison storage method
CN109558066B (en) * 2017-09-26 2020-10-27 华为技术有限公司 Method and device for recovering metadata in storage system
CN108762679B (en) * 2018-05-30 2021-06-29 郑州云海信息技术有限公司 Method for combining online DDP (distributed data processing) and offline DDP (distributed data processing) and related device thereof
CN108874315A (en) * 2018-06-01 2018-11-23 暨南大学 A kind of online data deduplicated file system data access performance optimization method
CN110083309B (en) * 2019-04-11 2020-05-26 重庆大学 Shared data block processing method, system and readable storage medium
CN110457163B (en) * 2019-07-05 2022-05-03 苏州元核云技术有限公司 Data recovery method and device for distributed block storage and storage medium
CN112148216A (en) * 2020-03-27 2020-12-29 尹兵 Data processing method and system based on cloud server and data interaction

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101216791A (en) * 2008-01-04 2008-07-09 华中科技大学 File backup method based on fingerprint
CN101968795A (en) * 2010-09-03 2011-02-09 清华大学 Cache method for file system with changeable data block length
CN103473278A (en) * 2013-08-28 2013-12-25 苏州天永备网络科技有限公司 Repeating data processing technology
CN103617260A (en) * 2013-11-29 2014-03-05 华为技术有限公司 Index generation method and device for repeated data deletion

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8108638B2 (en) * 2009-02-06 2012-01-31 International Business Machines Corporation Backup of deduplicated data
CN103034659B (en) * 2011-09-29 2015-08-19 国际商业机器公司 A kind of method and system of data de-duplication

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101216791A (en) * 2008-01-04 2008-07-09 华中科技大学 File backup method based on fingerprint
CN101968795A (en) * 2010-09-03 2011-02-09 清华大学 Cache method for file system with changeable data block length
CN103473278A (en) * 2013-08-28 2013-12-25 苏州天永备网络科技有限公司 Repeating data processing technology
CN103617260A (en) * 2013-11-29 2014-03-05 华为技术有限公司 Index generation method and device for repeated data deletion

Also Published As

Publication number Publication date
CN103955530A (en) 2014-07-30

Similar Documents

Publication Publication Date Title
CN103955530B (en) Data reconstruction and optimization method of on-line repeating data deletion system
CN101676855B (en) Scalable secondary storage systems and methods
CN103116661B (en) A kind of data processing method of database
CN103635900B (en) Time-based data partitioning
US11093466B2 (en) Incremental out-of-place updates for index structures
Jin et al. Scarab: scaling reachability computation on large graphs
JP5735654B2 (en) Deduplication method for stored data, deduplication apparatus for stored data, and deduplication program
CN103562914B (en) The type that economizes on resources extends file system
CN103714123B (en) Enterprise's cloud memory partitioning object data de-duplication and restructuring version control method
CN106233259A (en) The many storage data from generation to generation of retrieval in decentralized storage networks
US8667032B1 (en) Efficient content meta-data collection and trace generation from deduplicated storage
CN101777017B (en) Rapid recovery method of continuous data protection system
US8799291B2 (en) Forensic index method and apparatus by distributed processing
CN102292720A (en) Method and apparatus for managing data objects of a data storage system
CN101963982A (en) Method for managing metadata of redundancy deletion and storage system based on location sensitive Hash
CN103377278A (en) Table boundary detection in data blocks for compression
JP2003330787A (en) Distributed file system using scatter-gather
US10438092B2 (en) Systems and methods for converting massive point cloud datasets to a hierarchical storage format
Strzelczak et al. Concurrent Deletion in a Distributed {Content-Addressable} Storage System with Global Deduplication
CN106909623B (en) A kind of data set and date storage method for supporting efficient mass data to analyze and retrieve
CN104050057B (en) Historical sensed data duplicate removal fragment eliminating method and system
CN105493080B (en) The method and apparatus of data de-duplication based on context-aware
Zhang et al. Improving restore performance for in-line backup system combining deduplication and delta compression
Kumar et al. Bucket based data deduplication technique for big data storage system
EP2856359B1 (en) Systems and methods for storing data and eliminating redundancy

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20201211

Address after: 510632 No. 601, Whampoa Avenue, Tianhe District, Guangdong, Guangzhou

Patentee after: Guangzhou Jinan University Science Park Management Co.,Ltd.

Address before: 510632 No. 601, Whampoa Avenue, Guangzhou, Guangdong

Patentee before: Jinan University

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20210125

Address after: 241, 2nd floor, No.35, Huajing Road, Huajing new town, 105 Zhongshan Avenue, Tianhe District, Guangzhou, Guangdong 510000

Patentee after: Guangdong, Hong Kong and Macao QingChuang Technology (Guangzhou) Co.,Ltd.

Patentee after: Guangzhou Jinan University Science Park Management Co.,Ltd.

Address before: 510632 No. 601, Whampoa Avenue, Tianhe District, Guangdong, Guangzhou

Patentee before: Guangzhou Jinan University Science Park Management Co.,Ltd.