CN103279502B - A kind of framework and method with the data de-duplication file system be combined with parallel file system - Google Patents

A kind of framework and method with the data de-duplication file system be combined with parallel file system Download PDF

Info

Publication number
CN103279502B
CN103279502B CN201310168444.1A CN201310168444A CN103279502B CN 103279502 B CN103279502 B CN 103279502B CN 201310168444 A CN201310168444 A CN 201310168444A CN 103279502 B CN103279502 B CN 103279502B
Authority
CN
China
Prior art keywords
data
file system
data block
file
duplication
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310168444.1A
Other languages
Chinese (zh)
Other versions
CN103279502A (en
Inventor
周晓阳
周游
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING SCISTOR TECHNOLOGY Co Ltd
Original Assignee
BEIJING SCISTOR TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING SCISTOR TECHNOLOGY Co Ltd filed Critical BEIJING SCISTOR TECHNOLOGY Co Ltd
Priority to CN201310168444.1A priority Critical patent/CN103279502B/en
Publication of CN103279502A publication Critical patent/CN103279502A/en
Application granted granted Critical
Publication of CN103279502B publication Critical patent/CN103279502B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Storage Device Security (AREA)

Abstract

The present invention is a kind of framework and method with the data de-duplication file system be combined with parallel file system, described framework is deployed with data deduplication system file access interface at client device, data de-duplication gateway deploy has data de-duplication processing engine and data mover system, parallel file system provides access interface to client device and data de-duplication gateway, and data de-duplication processing engine and data mover system realize data de-duplication to file data, reduction and migration process.The Data Migration reaching transition condition in parallel file system stores by the inventive method in data de-duplication file system, carries out data de-duplication to the data of migration, realizes the operation of reading data of having moved from client device.The present invention adds data de-duplication file system pellucidly in existing parallel file system, reduces the impact on operation system, saves storage space, reduces data center's handling cost and energy consumption cost.

Description

A kind of framework and method with the data de-duplication file system be combined with parallel file system
Technical field
The invention belongs to technical field of data storage, relate to a kind of transparent scheme be combined with data de-duplication file system, specifically a kind of framework and method with the data de-duplication file system be combined with parallel file system.
Background technology
But existing most of parallel file system is as cluster parallel file system Lustre, blue whale cluster file system BWFS etc., do not have built-inly to realize data de-duplication function.And in these centralized stores systems, there is a large amount of redundant data information, redundant data amount even can reach tens times of even hundreds of times in some cases, and As time goes on, redundant data amount can be increasing.Such as: in data backup and filing system, heap file data movement is less, even there is multiple copy, stores, create a large amount of redundant datas through filing repeatedly; In the office automation system, restoring files, version revision are commonplace, and a file may be made a copy for multiple people, and a file may have multiple version, and this wherein has a large amount of repeating datas; In addition, mail mass-sending, forwarding also can cause a large amount of information redundancies.The sharp increase of data volume substantially increases handling cost and the energy consumption cost of data center.Therefore how to reduce the demand to data space, reduction data carrying cost becomes a difficult problem urgently to be resolved hurrily.
Data de-duplication technology (be otherwise known as the superfluous technology that disappears) effectively can identify and eliminate the repeating data in data, improves the utilization factor of storage resources, therefore becomes a study hotspot gradually.
But maybe should be used for supporting that this data de-duplication function has larger difficulty and risk by amendment existed system, therefore how data de-duplication technology being attached in existing parallel file system pellucidly becomes a problem demanding prompt solution simultaneously.
Summary of the invention
The present invention is directed to problem data de-duplication technology be attached to how pellucidly in existing parallel file system, provide a kind of framework and the method with the data de-duplication file system be combined with parallel file system.
A kind of framework with the data de-duplication file system be combined with parallel file system provided by the invention, comprises client device, parallel file system cluster, data de-duplication gateway cluster and memory device.Run operation system on client device, generate data stream.Parallel file system clustered deploy(ment) parallel file system, parallel file system externally provides parallel file system access interface.Parallel file system cluster comprises more than one parallel file system equipment, and parallel file system equipment is divided into meta data server and data server.Data de-duplication gateway cluster comprises more than one data de-duplication gateway, data de-duplication gateway deployment data de-duplication file system, data de-duplication function is externally provided, specifically, data de-duplication gateway deploy has data de-duplication processing engine and data mover system; Data de-duplication processing engine carries out data de-duplication process and reduction treatment to the data that parallel file system stores; The Data Migration reaching transition condition in parallel file system stores by data mover system in data de-duplication file system.Memory device be used for storing data information, and with parallel file system equipment, data de-duplication gateway interconnect.Client device and data de-duplication gateway, by parallel file system access interface, carry out read-write deletion action to the data in parallel file system.
Data de-duplication processing engine to the method that data process is: first, the data of file reading, and to deblocking, calculate the fingerprint of each data block, then, the fingerprint of each data block is inquired about, if inquire in data block concordance list, then this data block exists, no longer store, otherwise this data block is new data block, store this data block in data block warehouse, and in data block concordance list, generate corresponding tuple.Described data block concordance list looks into retry for data block, and tuple format is < data block fingerprint, data block place file, the side-play amount of data block in file, data block length, data block reference count >.For storing unduplicated data block in described data block warehouse, be arranged in memory device.
Data mover system is by parallel file system access interface, file in periodic scanning parallel file system, the file of transition condition will be reached, move in data de-duplication file system, and for original is set up and move associating of rear file in parallel file system, the file moved is by being stored in data de-duplication file system after the process of data de-duplication processing engine, the tuple that this file is corresponding is generated in data block mapping table, the form of each tuple is < file unique identification, ChunkFP1, ChunkFP2, ChunkFP i... >, wherein, ChunkFP irepresent the fingerprint of i-th data block.
Client device is by data deduplication system file access interface, the file moved from parallel file system equipment is accessed in data de-duplication file system, specifically: in parallel file system, according to original and the associating of file after migration, be redirected to the file after migration in data deduplication system file, the fingerprint of the data block finding this file to comprise from data block mapping table, according to data block fingerprint, the memory address of respective data blocks is found from data block concordance list, data are read from data block warehouse, the data read return to client device 1 by data de-duplication file system access interface.
Based on the above-mentioned framework with the data de-duplication file system be combined with parallel file system, the data de-duplication method be combined with parallel file system provided by the invention, mainly comprises following three aspects:
First aspect: data mover system periodic scanning parallel file system, obtain the listed files not yet moved, to each file in list, judge whether this file meets transition condition, if meet, then this file is moved in data de-duplication file system, and for original is set up and move associating of rear file in parallel file system.
Second aspect: data delete processing engine processes the file that will move in parallel file system, the data of file reading, and to deblocking, calculate the fingerprint of each data block, the fingerprint of each data block is inquired about in data block concordance list, if inquire, then no longer store this data block, otherwise this data block is new data block, store this data block, and in data block concordance list, generate corresponding tuple.Data block concordance list looks into retry for data block, and tuple format is < data block fingerprint, data block place file, the side-play amount of data block in file, data block length, data block reference count >.
The third aspect: read the file of having moved in data de-duplication file system parallel file system equipment from client device, specifically: in parallel file system, according to original f and migration after file f ' associate, be redirected to the file f after migration in data deduplication system file ', file f is found from data block mapping table ' fingerprint of data block that comprises, according to data block fingerprint, the physical storage address of respective data blocks is found from data block concordance list, from data block warehouse, obtain database and copy to request buffer, by request buffer file f ' data return to client device by data de-duplication file system access interface.
Framework and the method with the data de-duplication file system be combined with parallel file system provided by the invention, transparently in existing parallel file system and operation system can add the support of data de-duplication function, reduce the impact on operation system; Using the secondary storage system of data de-duplication file system as parallel file system, save storage space, reduce data center's handling cost and energy consumption cost, improve storage efficiency, therefore, technical scheme of the present invention has very strong practicality and range of application, has application prospect very widely.
Accompanying drawing explanation
Fig. 1 is the physical module figure that the present invention has the framework of the data de-duplication file system be combined with parallel file system;
Fig. 2 is the logic module figure that the present invention has the framework of the data de-duplication file system be combined with parallel file system;
Fig. 3 is the operational flowchart that parallel file system file moves to data de-duplication file system;
Fig. 4 is the process flow diagram carrying out data de-duplication operations in repeating data delete file system;
Fig. 5 is the operational flowchart reading the file moved to data de-duplication file system from client device.
Embodiment
Below in conjunction with drawings and Examples, technical solution of the present invention is described in further detail.
As depicted in figs. 1 and 2, a kind of physical connection figure and logic composition diagram with the structure of the data de-duplication file system be combined with parallel file system of the present invention is given.The data de-duplication file system be combined with parallel file system that example of the present invention provides, linux operating system realizes, and solves the problem how by parallel file system and the transparent combination of data de-duplication file system.
As shown in Figure 1, the framework of the data de-duplication file system be combined with parallel file system that what the present invention provided have, physical module comprises: client device 1, parallel file system cluster 2, data de-duplication gateway cluster 3 and memory device 4.Client device 1 runs operation system, generates data stream; Parallel file system cluster 2 comprises some parallel file system equipment, generally comprises some meta data servers 21 and some data servers 22; Data de-duplication gateway cluster 3 comprises some data de-duplication gateways 31, and data de-duplication gateway 31, for disposing data de-duplication file system, externally provides data de-duplication function; Memory device 4 is for storing data information, and interconnected with parallel file system cluster 2, data de-duplication gateway cluster 3.
As shown in Figure 2, also comprise some logic modules in framework of the present invention: be deployed in the data de-duplication processing engine 6 on data de-duplication gateway 31 and data mover system 7, be deployed in the data de-duplication file system access interface 5 on client device 1.Parallel file system cluster 2 deploy parallel file system, and parallel file system externally provides parallel file system access interface, and inside has metadata store and data to store.Client device 1 and data de-duplication gateway 31, by parallel file system access interface, carry out the operations such as read-write deletion to the data in parallel file system.Client device 1 accesses the data moved to from parallel file system in data de-duplication file system by data de-duplication file system access interface 5.
Data de-duplication gateway 31 deploy has data de-duplication processing engine 6 and data mover system 7.In data de-duplication processing engine 6 pairs of parallel file system clusters 2, the data of parallel file system device storage carry out data de-duplication process and reduction treatment.The Data Migration reaching transition condition in data mover system 7 pairs of parallel file systems stores in data de-duplication file system.
The method that data de-duplication gateway 31 is processed by the data in data de-duplication processing engine 6 pairs of parallel file system clusters 2 is: read the data in parallel file system, to deblocking, calculate the fingerprint of each data block, to the fingerprint of each data block, inquire about in data block concordance list, if inquire, then determine that this data block exists, no longer store, realize the elimination of repeating data, if do not inquire, then illustrate that this data block is new data block, new data is stored in data block warehouse, and generate corresponding tuple at database index table.Described data block concordance list creates when data de-duplication file system first time uses, retry is looked into for data block, tuple is generally < data block fingerprint, data block place file, the side-play amount of data block in file, data block length, data block reference count >, whenever having new data block fingerprint, store the tuple of new data block fingerprint.Described data block warehouse creates when repeating delete file system first time use, for storing the data after data de-duplication processing engine 6 processes, i.e. unduplicated data block.General each file is made up of multiple data block, and in actual use, general each file size is within tens MB to 2GB.The side-play amount of data block in file
Data mover system 7 is by parallel file system access interface, periodic scanning parallel file system equipment file, the file migration of transition condition will be reached in data de-duplication file system in parallel file system, transition condition is specified according to service conditions by user, generally according to the setting such as nearest access time, file size, file extension of file.When certain file generation migration operation, for original is set up and move associating of rear file in parallel file system, the file moved is by being stored in data de-duplication file system after the process of data de-duplication processing engine, and in data block mapping table, record tuple corresponding to this file, the corresponding multiple data block of each file, the form of each tuple is < file unique identification (inode), ChunkFP 1, ChunkFP 2, ChunkFP 3, ChunkFP i... >, ChunkFP 1, ChunkFP 2, ChunkFP 3represent the fingerprint of the 1st data block, the fingerprint of the 2nd data block, the fingerprint of the 3rd data block ..., ChunkFP irepresent the fingerprint of i-th data block.
In example of the present invention, operation system is from client device 1 file reading, first from parallel file system cluster 2, this file is obtained, if file is moved, according to original and the associating of file after migration, automatically be redirected in data de-duplication file system, the fingerprint of the data block finding respective file to comprise from data block mapping table, according to data block fingerprint, the memory address of respective data blocks is found from data block concordance list, then from data block warehouse, data are read according to memory address, the data read return to client device 1 by data de-duplication file system access interface 5.According to the side-play amount in file of recorded data block place file, data block in data block concordance list tuple and the memory address of data block length acquisition data block in the embodiment of the present invention.
Adopt framework of the present invention in whole file reading process, the operation such as redirect operation, finger print information acquisition is transparent to the operation system of client device.During data filing, migration operation occurs on data de-duplication gateway 31, data mover system 7 will meet the Data Migration of transition condition in data de-duplication file system, and automatically set up the related information of parallel file system to data de-duplication file system, this process is also completely transparent to the operation system of client device.Therefore, use framework of the present invention, the transparent combination of parallel file system and data de-duplication file system can be realized.
System shown in composition graphs 1 and Fig. 2, a kind of data de-duplication method be combined with parallel file system provided by the invention, mainly comprises following three aspects:
First aspect: Data Migration, by the data reaching transition condition in parallel file system equipment, moves in data de-duplication file system; Data mover system 7 periodic scanning parallel file system, obtain the listed files not yet moved, to each file in list, judge whether this file meets transition condition, if meet, then this file is moved in data de-duplication file system, and for original is set up and move associating of rear file in parallel file system;
Second aspect: data de-duplication operations is carried out to the data of migration; The file that will move in data delete processing engine 6 pairs of parallel file systems processes, the data of file reading, and to deblocking, calculate the fingerprint of each data block, the fingerprint of each data block is inquired about in data block concordance list, if inquire, then no longer store this data block, otherwise this data block is new data block, store this data block, and in data block concordance list, generate corresponding tuple.
The third aspect: realize reading data data de-duplication file system of having moved from client device; In parallel file system, according to original f and migration after file f ' associate, be redirected to the file f after migration in data deduplication system file ', file f is found from data block mapping table ' fingerprint of data block that comprises, according to data block fingerprint, the physical storage address of respective data blocks is found from data block concordance list, from data block warehouse, obtain database and copy to request buffer, by request buffer file f ' data return to client device by data de-duplication file system access interface.
As shown in Figure 3, give the operating process of Data Migration, this flow process comprises Data Migration, and the foundation of file association between parallel file system and data de-duplication file system operates, and concrete steps are as follows:
Step 101: data mover system 7 scans parallel file system, obtains the listed files not yet moved; Each file in listed files is judged and processed.
Step 102: judge whether untreated file in addition, if had, perform step 103, otherwise this migration is finished, terminates;
Step 103: obtain a file f from list;
Step 104: judge whether file f meets transition condition, this condition is generally arranged by user, generally comprise file extension, file path filtercondition, file last access time, file size etc., such as transition condition is: choose expansion dat by name, the file last access time is that before 10 days, file size is greater than 1GB; If meet transition condition, perform step 105, otherwise perform step 102;
Step 105: file f moved in data de-duplication file system, forms file f '; File f needs, after data de-duplication processing engine 6 processes, to form the file f after migration '; After migration, the data of original f are deleted in parallel file system;
Step 106: set up file f to file f in parallel file system ' association, with the reading f enabling operation system transparent; File f after the original f that the present invention sets up and migration ' associate, to refer in parallel file system as original f sets up Symbolic Links, the destination address of link be set to the path of migrated file f '.
As shown in Figure 4, the data will moved in data delete processing engine 6 pairs of parallel file systems carry out the flow process of data de-duplication operations, and concrete steps are as follows:
Step 201: establish current file f in parallel file system will being moved in data de-duplication file system, generate the file f after corresponding migration ', data delete processing engine 6 is according to the path of file f, file reading f from parallel file system, piecemeal is carried out to the data of file f, then reads a data block successively;
Step 202: the fingerprint calculating this data block;
Step 203: judge whether there is this fingerprint in data block concordance list, if existed, illustrates that this data block is repeating data block, perform step 206, otherwise this data block is new data block, performs step 204;
Step 204: this data block is stored in data block warehouse, and in data block information table, generate tuple corresponding to this data block; Data block Warehouse Establishing is in memory device 4;
Step 205: generate the tuple that this data block is corresponding in data block concordance list, then goes to step 207 execution; The tuple format of data block concordance list is < data block fingerprint, data block place file, the side-play amount of data block in file, data block length, data block reference count >;
Step 206: the data block reference count upgrading this data block tuple corresponding in data block concordance list;
Step 207: in the tuple that data block mapping table file f ' is corresponding, record the fingerprint of this data block;
Step 208: judge that all data blocks of file f all read, the data that if so, then deleted file f is corresponding, terminate this data de-duplication operations, otherwise, read next data block, then go to step 202 execution.
When operation system is from client device 1 file reading f, first from parallel file system cluster 2, this file f is read, when file f has been migrated in data de-duplication file system, if the file after migration is f ', then according to file f and file f ' associate, be redirected to the file f in data de-duplication file system ', read the operating process of data de-duplication file system file of having moved from client device 1, as shown in Figure 5, concrete steps are as follows:
Step 301: according to file f and file f ' associate, be redirected to the file f in data de-duplication file system ';
The fingerprint of the data block that step 302: find file f from data block mapping table ' corresponding tuple, record file f in this tuple ' comprises;
Step 303: read a data block fingerprint successively;
Step 304: search this data block fingerprint in data block concordance list, according to the data block place file recorded in found tuple, the data block side-play amount in file and the physical storage address of data block length acquisition data block;
Step 305: according to physical storage address, reads corresponding data block, by block copy in request buffer in data block warehouse;
Step 306: judge file f ' in all data blocks all read, if do not have, perform step 303, otherwise, perform step 307;
Step 307: by request buffering file X, data return to client device 1 by data de-duplication file system access interface 5.
It should be noted that and understand, when not departing from the spirit and scope of the present invention required by accompanying claim, various amendment and improvement can be made to the present invention of foregoing detailed description.Therefore, the scope of claimed technical scheme is not by the restriction of given any specific exemplary teachings.

Claims (2)

1. there is a framework for the data de-duplication file system be combined with parallel file system, comprise client device, parallel file system cluster and memory device, run operation system on client device, generate data stream, parallel file system clustered deploy(ment) parallel file system, parallel file system externally provides parallel file system access interface, memory device is used for storing data information, it is characterized in that, the described data de-duplication file system be combined with parallel file system, also comprise data de-duplication gateway cluster, data de-duplication gateway cluster comprises more than one data de-duplication gateway, data de-duplication gateway deploy has data de-duplication processing engine and data mover system, data de-duplication processing engine carries out data de-duplication process and reduction treatment to the data that parallel file system stores, the Data Migration reaching transition condition in parallel file system stores by data mover system in data de-duplication file system, client device and data de-duplication gateway, by parallel file system access interface, carry out read-write deletion action to the data that parallel file system stores,
Data de-duplication processing engine to the method that data process is: first, read data and to deblocking, calculate the fingerprint of each data block, then, the fingerprint of each data block is inquired about in data block concordance list, if inquire, then this data block exists, and no longer stores, otherwise, this data block is new data block, stores this data block in data block warehouse, and in data block concordance list, generate corresponding tuple; Described data block concordance list looks into retry for data block, and tuple format is < data block fingerprint, data block place file, the side-play amount of data block in file, data block length, data block reference count >; Described data block warehouse is arranged on a storage device;
Data mover system is by parallel file system access interface, file in periodic scanning parallel file system, the file of transition condition will be reached, move in data de-duplication file system, and for original is set up and move associating of rear file in parallel file system, file after migration is by storing after the process of data de-duplication processing engine, the tuple that the file after this migration is corresponding is generated in data block mapping table, the form of each tuple is < file unique identification, ChunkFP 1, ChunkFP 2..., ChunkFP i... >, wherein, ChunkFP irepresent the fingerprint of i-th data block, client device is by data de-duplication file system access interface, the file moved from parallel file system equipment is accessed in data de-duplication file system, specifically: in parallel file system, according to original and the associating of file after migration, be redirected to the file after migration in data de-duplication file system, the fingerprint of the data block finding the file after this migration to comprise from data block mapping table, according to data block fingerprint, the memory address of respective data blocks is found from data block concordance list, data are read from data block warehouse, the data read return to client device by data de-duplication file system access interface.
2. based on the data de-duplication method that a kind of of framework described in claim 1 is combined with parallel file system, it is characterized in that, comprise following three kinds of process:
First aspect: data mover system periodic scanning parallel file system, obtain the listed files not yet moved, to each file in list, judge whether this file meets transition condition, if meet, then this file is moved in data de-duplication file system, and for original is set up and move associating of rear file in parallel file system;
Second aspect: data de-duplication processing engine processes the file that will move in parallel file system, the data of file reading, and to deblocking, calculate the fingerprint of each data block, the fingerprint of each data block is inquired about in data block concordance list, if inquire, then no longer store this data block, otherwise this data block is new data block, store this data block, and in data block concordance list, generate corresponding tuple; Data block concordance list looks into retry for data block, and tuple format is < data block fingerprint, data block place file, the side-play amount of data block in file, data block length, data block reference count >;
The third aspect: read the file of having moved in data de-duplication file system parallel file system equipment from client device, specifically: in parallel file system, according to original f and migration after file f ' associate, be redirected to the file f after migration in data de-duplication file system ', file f is found from data block mapping table ' fingerprint of data block that comprises, according to data block fingerprint, the physical storage address of respective data blocks is found from data block concordance list, from data block warehouse, obtain database and copy to request buffer, by request buffer file f ' data return to client device by data de-duplication file system access interface.
CN201310168444.1A 2013-05-06 2013-05-06 A kind of framework and method with the data de-duplication file system be combined with parallel file system Active CN103279502B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310168444.1A CN103279502B (en) 2013-05-06 2013-05-06 A kind of framework and method with the data de-duplication file system be combined with parallel file system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310168444.1A CN103279502B (en) 2013-05-06 2013-05-06 A kind of framework and method with the data de-duplication file system be combined with parallel file system

Publications (2)

Publication Number Publication Date
CN103279502A CN103279502A (en) 2013-09-04
CN103279502B true CN103279502B (en) 2016-01-20

Family

ID=49062022

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310168444.1A Active CN103279502B (en) 2013-05-06 2013-05-06 A kind of framework and method with the data de-duplication file system be combined with parallel file system

Country Status (1)

Country Link
CN (1) CN103279502B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103617177A (en) * 2013-11-05 2014-03-05 浪潮(北京)电子信息产业有限公司 Stackable repeating data deletion file system
CN104298614B (en) * 2014-09-30 2017-08-11 华为技术有限公司 Data block storage method and storage device in storage device
CN105653582A (en) * 2015-12-21 2016-06-08 联想(北京)有限公司 File management method of electronic equipment and electronic equipment
CN106383670B (en) * 2016-09-21 2020-02-14 华为技术有限公司 Data processing method and storage device
CN108234542A (en) * 2016-12-14 2018-06-29 中国航空工业集团公司西安航空计算技术研究所 A kind of airborne file network implementation method
CN108268216B (en) * 2018-01-05 2019-11-12 新华三技术有限公司 Data processing method, device and server
CN110109617B (en) * 2019-04-22 2020-05-12 电子科技大学 Efficient metadata management method in encrypted repeated data deleting system
CN110688360A (en) * 2019-09-17 2020-01-14 济南浪潮数据技术有限公司 Distributed file system storage management method, device, equipment and storage medium
CN113590535B (en) * 2021-09-30 2021-12-17 中国人民解放军国防科技大学 Efficient data migration method and device for deduplication storage system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101916171A (en) * 2010-07-16 2010-12-15 中国科学院计算技术研究所 Concurrent hierarchy type replicated data eliminating method and system
CN103051671A (en) * 2012-11-22 2013-04-17 浪潮电子信息产业股份有限公司 Repeating data deletion method for cluster file system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103034659B (en) * 2011-09-29 2015-08-19 国际商业机器公司 A kind of method and system of data de-duplication

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101916171A (en) * 2010-07-16 2010-12-15 中国科学院计算技术研究所 Concurrent hierarchy type replicated data eliminating method and system
CN103051671A (en) * 2012-11-22 2013-04-17 浪潮电子信息产业股份有限公司 Repeating data deletion method for cluster file system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Extreme binning: scalable, parallel deduplication for chunk-based file backup;Deepavali Bhagwat et al;《IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems》;20091231;第1-9页 *
一种基于流水线的重复数据删除系统读性能优化方法;李超等;《计算机研究与发展》;20130131;第50卷(第1期);第90-100页 *

Also Published As

Publication number Publication date
CN103279502A (en) 2013-09-04

Similar Documents

Publication Publication Date Title
CN103279502B (en) A kind of framework and method with the data de-duplication file system be combined with parallel file system
CN103020315B (en) A kind of mass small documents storage means based on master-salve distributed file system
CN101354726B (en) Method for managing memory metadata of cluster file system
CN102035881B (en) Data caching method of cloud storage system
CN102591946B (en) It is divided using index and coordinates to carry out data deduplication
CN103473239B (en) A kind of data of non relational database update method and device
CN103229173B (en) Metadata management method and system
CN103106249B (en) A kind of parallel data processing system based on Cassandra
CN100565512C (en) Eliminate the system and method for redundant file in the document storage system
CN101866358B (en) Multidimensional interval querying method and system thereof
CN102629247B (en) Method, device and system for data processing
CN105069111B (en) Block level data duplicate removal method based on similitude in cloud storage
CN103034684A (en) Optimizing method for storing virtual machine mirror images based on CAS (content addressable storage)
CN102169507A (en) Distributed real-time search engine
US20100281077A1 (en) Batching requests for accessing differential data stores
CN104932841A (en) Saving type duplicated data deleting method in cloud storage system
US11249899B2 (en) Filesystem management for cloud object storage
CN103139300A (en) Virtual machine image management optimization method based on data de-duplication
CN101986649B (en) Shared data center used in telecommunication industry billing system
CN104317966A (en) Dynamic indexing method applied to quick combined querying of big electric power data
CN104820717A (en) Massive small file storage and management method and system
CN104408111A (en) Method and device for deleting duplicate data
CN104239377A (en) Platform-crossing data retrieval method and device
CN101866359A (en) Small file storage and visit method in avicade file system
KR20130049111A (en) Forensic index method and apparatus by distributed processing

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant