CN109213738A - A kind of cloud storage file-level data de-duplication searching system and method - Google Patents

A kind of cloud storage file-level data de-duplication searching system and method Download PDF

Info

Publication number
CN109213738A
CN109213738A CN201811384763.5A CN201811384763A CN109213738A CN 109213738 A CN109213738 A CN 109213738A CN 201811384763 A CN201811384763 A CN 201811384763A CN 109213738 A CN109213738 A CN 109213738A
Authority
CN
China
Prior art keywords
file
information
client
name server
characteristic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811384763.5A
Other languages
Chinese (zh)
Other versions
CN109213738B (en
Inventor
董志勇
邱琳
赵航
刘梦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beacon Fire Technology Group Co Ltd
Wuhan Ligong Guangke Co Ltd
Original Assignee
Beacon Fire Technology Group Co Ltd
Wuhan Ligong Guangke Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beacon Fire Technology Group Co Ltd, Wuhan Ligong Guangke Co Ltd filed Critical Beacon Fire Technology Group Co Ltd
Priority to CN201811384763.5A priority Critical patent/CN109213738B/en
Publication of CN109213738A publication Critical patent/CN109213738A/en
Application granted granted Critical
Publication of CN109213738B publication Critical patent/CN109213738B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a kind of cloud storage file-level data de-duplication searching system and methods, this method passes through the characteristic information of fingerprint server storage file, when client proposes file storage application, coarse filtration is carried out first, it is searched in fingerprint server, if not finding the file record for having same characteristic features, this document is regarded as new file;If found, then carefully filtered, the file set being found is considered as comparison file, successively chooses the random point and characteristic interval for comparing file, carries out precise alignment, it is whether existing with confirmation request file, if it is, the metadata that demand file is arranged in name server is directed toward the metadata of the comparison file, if there is no, then file is stored, and records file feature information into fingerprint server.The present invention can largely reduce the typing of duplicate file by the filtering of thick, thin two steps, have the characteristics that execution efficiency is high, data de-duplication rate is high, be suitable for big data and cloud storage environment.

Description

A kind of cloud storage file-level data de-duplication searching system and method
Technical field
The present invention relates to the deletion of repeated data in computer storage, cloud storage and searching fields more particularly to a kind of cloud Storage file grade data de-duplication searching system and method.
Background technique
The high speed development of internet produces mass data, and the transimission and storage scene for resulting in mass data increasingly increases More, in this background, data storage technology is developed rapidly, and data de-duplication and compression are can to save largely The technology of data storage.Data de-duplication is to carry out duplicate removal, and leave in corresponding storage location by identifying duplicate contents Pointer minimizes data volume.Only a small number of main arrays provide additional function of the data de-duplication as product at present; Duplicate data waste valuable cloud resource, and generate overhead, it was reported that real only less than 5% disk array Support online data de-duplication and compression, the space by data deduplication saving is very considerable.Carry out deleting for repeated data Except it is necessary to which file is compared, since storage system has a large amount of file, shadow inevitably is generated to comparison efficiency It rings, a kind of the method elimination data redundancy and reduction memory capacity of file-level data de-duplication proposed by the present invention effectively solve Certainly the problem of file comparison efficiency.
Summary of the invention
The technical problem to be solved in the present invention is that in the prior art, for repeated data in cloud space, waste Valuable cloud resource leads to the problem of overhead and to solve the comparison efficiency of duplicate file, it is literary to provide a kind of cloud storage Part grade data de-duplication searching system and method.
The technical solution adopted by the present invention to solve the technical problems is:
The present invention provides a kind of cloud storage file-level data de-duplication searching system, which includes: client, Yun Cun Storage platform, fingerprint server and name server, cloud storage platform are made of multiple back end;Wherein:
Multiple back end are connected by name server with fingerprint server;Fingerprint server node for storing data The characteristic information of middle file;Client is for sending the request searched file and filtered;In the mistake for carrying out file filter Cheng Zhong carries out coarse filtration to file by the characteristic information of file;After the completion of coarse filtration, if also needing to carry out further file Confirmation, generates thin filtration duty by name server, and back end completion is transferred to filter again.
Further, characteristic information of the invention indicates local fingerprint, size, metadata pointer and the characteristic area of file Between.
Further, the data in fingerprint server of the invention carry out fingerprint extraction by the way of MD5, eliminate redundancy Data block, further data de-duplication is then done on name server, wherein the key-value pair information of fingerprint extraction are as follows: Key is file local fingerprint, and value is size, metadata pointer and the characteristic interval of file.
Further, the local fingerprint information of file of the invention are as follows: Hash operation, obtained text are carried out to file head and the tail Part signing messages;If file size is not enough to carry out head and the tail Hash operation, using entire file as signing messages.
Further, the characteristic interval of file of the invention are as follows: file and similar documents to be uploaded is accurately being compared Clock synchronization, generated difference section;Similar documents indicate partly or entirely there is identical fingerprints and file with file to be uploaded The file of size.
Further, name server of the invention determines random area according to file size and the quantity of characteristic interval Between number;According to file storage condition, random interval position is determined.
Further, back end of the invention receives the comparison request of name server transmitting, receives comparison data, presses It is compared according to section is compared, and is notified to comparison result.
The present invention provides a kind of cloud storage file-level data de-duplication search method, method includes the following steps:
The head and the tail progress Hash operation of S1, client selecting file, obtain file label using MD5 finger print information extracting mode Name, the local fingerprint information as file;
Since the MD5 fingerprint extraction arithmetic speed based on Hash is fast, CPU usage is low, and the data in fingerprint server are adopted Fingerprint extraction is carried out with the mode of MD5, eliminates the data block of redundancy, further repeated data is then done on name server It deletes.Wherein the key-value pair information of fingerprint extraction is that key is file local fingerprint, and value is that size, the metadata of file refer to Needle and characteristic interval.
S2, the document size information to be uploaded and file signature are sent to fingerprint server, by fingerprint server All Files corresponding to the finger print information, and statistics file information are directly taken out, obtained statistical information is returned into client End;
S2, the document size information to be uploaded and file signature are sent to fingerprint server, and carry out storage text The coarse filtration of part directly takes out All Files corresponding to the finger print information, and statistics file information as fingerprint server, will To statistical information return to client;
S3, client receive the file information of fingerprint server return, if quantity of documents is 0, then it represents that by wait deposit After storing up file coarse filtration, the characteristic information of this document is not matched in finger print information storehouse, the file to be uploaded is completely new text Part, client sends storage request to name server, while carrying the local fingerprint information of this document, by name server It determines the storage location of file, and the characteristic information of file is registered to fingerprint server;
If S4, quantity of documents are not 0, then it represents that after file to be stored coarse filtration, finger print information storehouse is matched to this article The characteristic information of part, client carry out the cyclic check stage, and client can successively send file and compare request, can carry in request File metadata pointer and characteristic interval further carefully filter file to be stored;
S5, name server obtain the verification request that client is sent, and according to file metadata pointer or index, find text Part metadata, and according to the quantity and distribution that randomized test section is arranged the case where the storage condition of file, characteristic interval, at random The quantity and the sum of the quantity of characteristic interval for examining section should be directly proportional with file size, ratio according to circumstances sets itself, special Sign section is not overlapped with random interval, and the area size of random interval is fixed value, according to circumstances sets itself, name server The random interval calculated is sent to client, begins preparing file precise alignment;
S6, client send the data of characteristic interval and random interval in name server, and name server will Data and inspection section are issued in back end, complete precise alignment by back end, and wait back end that will examine As a result it returns;
S7, back end, which obtain, examines block information and inspection data, is accurately compared the information in inspection section It is right, if compared successfully, Success Flag is returned, if comparing failure, returns to failure flags, and first comparison is failed Block information returns to name server;
S8, name server count comparing and increase file metadata information newly as a result, if compared completely successfully, will It is directed toward and compares successfully that file completely, and returns to the information that file has found and stored to client;
If S9, comparing failure, name server caching compares the block information of failure, and starts to client request File compares next time;
S10, client send new comparison solicited message, continue above-mentioned comparison step, if all comparisons have been tied Beam, and name server does not return to comparison successful information, is completed then client sends to compare, application documents storage;
S11, name server receive file and complete and start the storage location distribution of new file after applying for storage, and inform Client is ready for, and client sends file, and name server then starts storage file;
After S12, question paper storage are completed, by this document in caching when comparing, the generated section for comparing failure As the relative characteristic section of file, if there is there are intersections between partial section in relative characteristic section, at this point, only retaining it In a part, guarantee characteristic interval between be separated from each other, if characteristic interval is excessive, carry out selective selection, it is ensured that The quantity of characteristic interval is no more than the range of setting;
S13, characteristic interval and file local fingerprint, file size, file metadata pointer are registered to fingerprint server In, and client file is notified to be transmitted.
The beneficial effect comprise that: cloud storage file-level data de-duplication searching system of the invention and side Method passes through the filtering of thick, thin two steps, it is ensured that can largely reduce the typing of duplicate file, which, which has, executes Feature high-efficient, data de-duplication rate is high can provide rapidly the repetition situation of file, and execution efficiency is high, duplicate removal effect Obviously, more suitable for being used under mass data storage and cloud storage environment.
Detailed description of the invention
Present invention will be further explained below with reference to the attached drawings and examples, in attached drawing:
Fig. 1 is the system block diagram of the embodiment of the present invention;
Fig. 2 is the method flow diagram of the embodiment of the present invention.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that described herein, specific examples are only used to explain the present invention, not For limiting the present invention.
As shown in Figure 1, the cloud storage file-level data de-duplication searching system of the embodiment of the present invention, the system include: Client, cloud storage platform, fingerprint server and name server, cloud storage platform are made of multiple back end;Wherein:
Multiple back end are connected by name server with fingerprint server;Fingerprint server node for storing data The characteristic information of middle file;Client is for sending the request searched file and filtered;In the mistake for carrying out file filter Cheng Zhong carries out coarse filtration to file by the characteristic information of file;After the completion of coarse filtration, if also needing to carry out further file Confirmation, generates thin filtration duty by name server, and back end completion is transferred to filter again.
By introducing fingerprint server come the characteristic information of storage file, these information include the local fingerprint of file, text Part size, relative characteristic section, metadata pointer etc..
The present invention need client can fingerprint server communicate, client when presenting a paper upload request first The local fingerprint information and document size information of calculation document, and transfer to fingerprint server to search these information.Fingerprint The effect of server is compared for making the file of coarseness, to realize coarse filtration, fingerprint server returns to comparison result To client, client, which sends further compare to name server according to the result of return, requests either file storage to be asked It asks.Name server obtains the metadata pointer and its characteristic interval of the duplicate file that may be present of client transmitting, into The thin filtering of row, is inquiry this document storage condition first, then comprehensively considers file size, memory partitioning situation, characteristic interval The factors such as quantity are selected at random compares section, and is returned to client.Client is according to the block information extracting part of passback Point the file information is simultaneously transmitted, and name server is issued after receiving in data to data node, is compared by back end.Number Returned according to node compare whether successful information and for the first time unsuccessful block information is accused to name server by name server Know whether client file repeats, and plan in next step, to complete thin filtering.
The specific implementation procedure of the technology of the present invention method:
The head and the tail that step 1. client selects file carry out Hash operation and obtain the hash signature of head and the tail, and are merged, Wherein head and tail parts size is identical, and specific size can be set by situation, if file is too small, directly acquires entire text The hash signature of part, the client-cache hash signature.
The document size information to be uploaded and file signature are sent to fingerprint server by step 2., by fingerprinting service Device directly takes out All Files corresponding to the fingerprint, then carries out the comparison of file size, counts fingerprint and file size all The quantity and file metadata of identical file index or the information such as pointer, characteristic interval return to client.
Step 3. client receives the information of fingerprint server return, first determines whether quantity of documents is 0, if it is 0, then prove that this document is a completely new file, client sends storage request to name server, while carrying this article The local fingerprint information of part is determined the storage location of file by name server, and the characteristic information of file is registered to fingerprint Server.
If what step 4. client received is not 0 there may be the quantity of duplicate file, client is followed Ring checking stage, client can successively send file and compare request, can carry file metadata pointer and characteristic area in request Between.
Step 5. name server obtains the verification request that client is sent, and according to file metadata pointer or index, looks for To file metadata, and the case where according to the storage condition of file, characteristic interval etc. the quantity in setting randomized test section and point Cloth, the sum of the quantity in randomized test section and quantity of characteristic interval should be directly proportional with file size, ratio can according to circumstances from Row setting, characteristic interval are not overlapped as far as possible with random interval, and the area size of random interval is fixed value, can according to circumstances certainly The random interval calculated is sent to client, begins preparing file precise alignment by row setting, name server.
Step 6. client sends the data of characteristic interval and random interval in name server, name server By data and section is examined to be issued in back end, precise alignment is completed by back end, and waits back end that will examine Test result return.
Step 7. back end, which obtains, examines block information and inspection data, carries out to the information examined in section accurate It compares, if compared successfully, returns to Success Flag, if comparing failure, return to failure flags, and first comparison is failed Block information return to name server.
Step 8. name server counts comparing and increases file metadata information newly as a result, if compared completely successfully, It is directed toward and compares successfully that file completely, and return to the information that file has found and stored to client.
If step 9. compares failure, name server caching compares the block information of failure, and to client request Start file next time to compare.
Step 10. client sends new comparison solicited message, continues above-mentioned comparison step, if all comparisons are Terminate, and name server does not return to comparison successful information, is completed then client sends to compare, application documents are deposited Storage.
Step 11. name server receives file and completes and apply the storage location distribution for starting new file after storing, and Inform that client is ready for, client sends file, and name server then starts storage file.
After the storage of step 12. question paper is completed, by this document in caching when comparing, the generated area for comparing failure Between relative characteristic section as file, may have between partial section that there are intersections in relative characteristic section, at this point, only retaining A part therein guarantees to be separated from each other between characteristic interval, if characteristic interval is excessive, carries out selective selection, really The quantity for protecting characteristic interval is no more than certain range.
Characteristic interval and file local fingerprint, file size, file metadata pointer etc. are registered to fingerprint clothes by step 13. It is engaged in device, and client file is notified to be transmitted.
It should be understood that for those of ordinary skills, it can be modified or changed according to the above description, And all these modifications and variations should all belong to the protection domain of appended claims of the present invention.

Claims (8)

1. a kind of cloud storage file-level data de-duplication searching system, which is characterized in that the system includes: client, Yun Cun Storage platform, fingerprint server and name server, cloud storage platform are made of multiple back end;Wherein:
Multiple back end are connected by name server with fingerprint server;Node is Chinese for storing data for fingerprint server The characteristic information of part;Client is for sending the request searched file and filtered;During carrying out file filter, Coarse filtration is carried out to file by the characteristic information of file;After the completion of coarse filtration, if also needing to carry out further file confirmation, Thin filtration duty is generated by name server, and back end completion is transferred to filter again.
2. cloud storage file-level data de-duplication searching system according to claim 1, which is characterized in that characteristic information Indicate local fingerprint, size, metadata pointer and the characteristic interval of file.
3. cloud storage file-level data de-duplication searching system according to claim 1, which is characterized in that fingerprinting service Data in device carry out fingerprint extraction by the way of MD5, eliminate the data block of redundancy, then do on name server into one The data de-duplication of step, the wherein key-value pair information of fingerprint extraction are as follows: key is file local fingerprint, and value is the big of file Small, metadata pointer and characteristic interval.
4. cloud storage file-level data de-duplication searching system according to claim 2, which is characterized in that the office of file Portion's finger print information are as follows: Hash operation is carried out to file head and the tail, obtains file signature information;If file size is not enough to carry out head and the tail Hash operation, then using entire file as signing messages.
5. cloud storage file-level data de-duplication searching system according to claim 2, which is characterized in that the spy of file Levy section are as follows: file and similar documents to be uploaded is when carrying out precise alignment, generated difference section;Similar documents indicate Partly or entirely there is with file to be uploaded the file of identical fingerprints information and file size.
6. cloud storage file-level data de-duplication searching system according to claim 2, which is characterized in that name service Device determines the number of random interval according to file size and the quantity of characteristic interval;According to file storage condition, determine random Section position.
7. cloud storage file-level data de-duplication searching system according to claim 1, which is characterized in that back end The comparison request for receiving name server transmitting, receives comparison data, is compared according to section is compared, and is notified to compare knot Fruit.
8. a kind of data de-duplication using cloud storage file-level data de-duplication searching system described in claim 1 is examined Suo Fangfa, which is characterized in that method includes the following steps:
The head and the tail progress Hash operation of S1, client selecting file, obtain file signature using MD5 finger print information extracting mode, Local fingerprint information as file;
S2, the document size information to be uploaded and file signature are sent to fingerprint server, and carry out storage file Coarse filtration directly takes out All Files corresponding to the finger print information, and statistics file information as fingerprint server, by what is obtained Statistical information returns to client;
S3, client receive the file information of fingerprint server return, if quantity of documents is 0, then it represents that pass through text to be stored After part coarse filtration, the characteristic information of this document is not matched in finger print information storehouse, the file to be uploaded is completely new file, visitor Family end sends storage request to name server, while carrying the local fingerprint information of this document, is determined by name server The storage location of file, and the characteristic information of file is registered to fingerprint server;
If S4, quantity of documents are not 0, then it represents that after file to be stored coarse filtration, finger print information storehouse is matched to this document Characteristic information, client carry out the cyclic check stage, and client can successively send file and compare request, can carry file in request Metadata pointer and characteristic interval further carefully filter file to be stored;
S5, name server obtain the verification request that client is sent, and according to file metadata pointer or index, find file member Data, and according to the quantity and distribution that randomized test section is arranged the case where the storage condition of file, characteristic interval, randomized test The sum of the quantity in section and quantity of characteristic interval should be directly proportional with file size, ratio according to circumstances sets itself, characteristic area Between be not overlapped with random interval, the area size of random interval is fixed value, according to circumstances sets itself, and name server will be counted Good random interval is sent to client, begins preparing file precise alignment;
S6, client send the data of characteristic interval and random interval in name server, and name server is by data And section is examined to be issued in the back end of cloud storage platform, precise alignment is completed by back end, and wait data section Point returns to inspection result;
S7, back end, which obtain, examines block information and inspection data, carries out precise alignment to the information examined in section, such as Fruit compares successfully, then returns to Success Flag, if comparing failure, returns to failure flags, and the section that first comparison is failed Information returns to name server;
S8, name server count comparing as a result, if compared completely successfully, increase file metadata information newly, are referred to Successfully that file is compared to complete, and returns to the information that file has found and stored to client;
If S9, comparing failure, name server caching compares the block information of failure, and starts to client request next Secondary file compares;
S10, client send new comparison solicited message, continue above-mentioned comparison step, if all comparisons have terminated, and And name server does not return to comparison successful information, is completed then client sends to compare, application documents storage;
S11, name server receive file and complete and start the storage location distribution of new file after applying for storage, and inform client End is ready for, and client sends file, and name server then starts storage file;
After S12, question paper storage are completed, by this document in caching when comparing, the generated section conduct for comparing failure The relative characteristic section of file, if there is there are intersections between partial section in relative characteristic section, at this point, only retaining therein A part guarantees to be separated from each other between characteristic interval, if characteristic interval is excessive, carries out selective selection, it is ensured that feature The quantity in section is no more than the range of setting;
S13, characteristic interval and file local fingerprint, file size, file metadata pointer are registered in fingerprint server, and Notice client file is transmitted.
CN201811384763.5A 2018-11-20 2018-11-20 Cloud storage file-level repeated data deletion retrieval system and method Active CN109213738B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811384763.5A CN109213738B (en) 2018-11-20 2018-11-20 Cloud storage file-level repeated data deletion retrieval system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811384763.5A CN109213738B (en) 2018-11-20 2018-11-20 Cloud storage file-level repeated data deletion retrieval system and method

Publications (2)

Publication Number Publication Date
CN109213738A true CN109213738A (en) 2019-01-15
CN109213738B CN109213738B (en) 2022-01-25

Family

ID=64993843

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811384763.5A Active CN109213738B (en) 2018-11-20 2018-11-20 Cloud storage file-level repeated data deletion retrieval system and method

Country Status (1)

Country Link
CN (1) CN109213738B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110096483A (en) * 2019-05-08 2019-08-06 北京奇艺世纪科技有限公司 A kind of duplicate file detection method, terminal and server
CN110636141A (en) * 2019-10-17 2019-12-31 中国人民解放军陆军工程大学 Multi-cloud storage system based on cloud and mist cooperation and management method thereof
CN111177082A (en) * 2019-12-03 2020-05-19 世强先进(深圳)科技股份有限公司 PDF file duplicate removal storage method and system
CN111294613A (en) * 2020-02-20 2020-06-16 北京奇艺世纪科技有限公司 Video processing method, client and server
CN112347060A (en) * 2020-10-19 2021-02-09 北京天融信网络安全技术有限公司 Data storage method, device and equipment of desktop cloud system and readable storage medium
CN112631514A (en) * 2020-12-17 2021-04-09 龙存科技(北京)股份有限公司 File duplicate removal method and system applied to cloud disk system
WO2021164171A1 (en) * 2020-02-17 2021-08-26 平安科技(深圳)有限公司 Method and apparatus for processing data in knowledge base, and computer device and storage medium
CN113362046A (en) * 2021-08-10 2021-09-07 北京开科唯识技术股份有限公司 Control method and device for preventing salary generation errors

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101477523A (en) * 2008-11-24 2009-07-08 北京邮电大学 Index structure and retrieval method for ultra-large fingerprint base
CN102156727A (en) * 2011-04-01 2011-08-17 华中科技大学 Method for deleting repeated data by using double-fingerprint hash check
US20120191669A1 (en) * 2011-01-25 2012-07-26 Sepaton, Inc. Detection and Deduplication of Backup Sets Exhibiting Poor Locality
US20120323859A1 (en) * 2011-06-14 2012-12-20 Netapp, Inc. Hierarchical identification and mapping of duplicate data in a storage system
CN103034659A (en) * 2011-09-29 2013-04-10 国际商业机器公司 Repeated data deleting method and system
CN103177111A (en) * 2013-03-29 2013-06-26 西安理工大学 System and method for deleting repeating data
CN104077422A (en) * 2014-07-22 2014-10-01 百度在线网络技术(北京)有限公司 Repeated APK removing method and device in APK downloading
CN104932841A (en) * 2015-06-17 2015-09-23 南京邮电大学 Saving type duplicated data deleting method in cloud storage system
CN105955675A (en) * 2016-06-22 2016-09-21 南京邮电大学 Repeated data deletion system and method for de-centralization cloud environment
CN107924353A (en) * 2015-10-14 2018-04-17 株式会社日立制作所 The control method of storage system and storage system

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101477523A (en) * 2008-11-24 2009-07-08 北京邮电大学 Index structure and retrieval method for ultra-large fingerprint base
US20120191669A1 (en) * 2011-01-25 2012-07-26 Sepaton, Inc. Detection and Deduplication of Backup Sets Exhibiting Poor Locality
CN102156727A (en) * 2011-04-01 2011-08-17 华中科技大学 Method for deleting repeated data by using double-fingerprint hash check
US20120323859A1 (en) * 2011-06-14 2012-12-20 Netapp, Inc. Hierarchical identification and mapping of duplicate data in a storage system
CN103034659A (en) * 2011-09-29 2013-04-10 国际商业机器公司 Repeated data deleting method and system
CN103177111A (en) * 2013-03-29 2013-06-26 西安理工大学 System and method for deleting repeating data
CN104077422A (en) * 2014-07-22 2014-10-01 百度在线网络技术(北京)有限公司 Repeated APK removing method and device in APK downloading
CN104932841A (en) * 2015-06-17 2015-09-23 南京邮电大学 Saving type duplicated data deleting method in cloud storage system
CN107924353A (en) * 2015-10-14 2018-04-17 株式会社日立制作所 The control method of storage system and storage system
CN105955675A (en) * 2016-06-22 2016-09-21 南京邮电大学 Repeated data deletion system and method for de-centralization cloud environment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SUBASHINI BALACHANDRAN: "Sequence of Hashes Compression in Data De-duplication", 《 DATA COMPRESSION CONFERENCE (DCC 2008)》 *
贾志凯等: "一种并行层次化的重复数据删除技术", 《计算机研究与发展》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110096483A (en) * 2019-05-08 2019-08-06 北京奇艺世纪科技有限公司 A kind of duplicate file detection method, terminal and server
CN110096483B (en) * 2019-05-08 2021-04-30 北京奇艺世纪科技有限公司 Duplicate file detection method, terminal and server
CN110636141A (en) * 2019-10-17 2019-12-31 中国人民解放军陆军工程大学 Multi-cloud storage system based on cloud and mist cooperation and management method thereof
CN111177082A (en) * 2019-12-03 2020-05-19 世强先进(深圳)科技股份有限公司 PDF file duplicate removal storage method and system
WO2021164171A1 (en) * 2020-02-17 2021-08-26 平安科技(深圳)有限公司 Method and apparatus for processing data in knowledge base, and computer device and storage medium
CN111294613A (en) * 2020-02-20 2020-06-16 北京奇艺世纪科技有限公司 Video processing method, client and server
CN112347060A (en) * 2020-10-19 2021-02-09 北京天融信网络安全技术有限公司 Data storage method, device and equipment of desktop cloud system and readable storage medium
CN112347060B (en) * 2020-10-19 2023-09-26 北京天融信网络安全技术有限公司 Data storage method, device and equipment of desktop cloud system and readable storage medium
CN112631514A (en) * 2020-12-17 2021-04-09 龙存科技(北京)股份有限公司 File duplicate removal method and system applied to cloud disk system
CN113362046A (en) * 2021-08-10 2021-09-07 北京开科唯识技术股份有限公司 Control method and device for preventing salary generation errors

Also Published As

Publication number Publication date
CN109213738B (en) 2022-01-25

Similar Documents

Publication Publication Date Title
CN109213738A (en) A kind of cloud storage file-level data de-duplication searching system and method
EP3251031B1 (en) Techniques for compact data storage of network traffic and efficient search thereof
CN109656999B (en) Method, device, storage medium and apparatus for synchronizing large data volume data
US8949561B2 (en) Systems, methods, and computer program products providing change logging in a deduplication process
CN107194006A (en) A kind of video features structural management method
CN106598785A (en) File system backup and restoration method and device
CN110188103A (en) Data account checking method, device, equipment and storage medium
US20150066877A1 (en) Segment combining for deduplication
WO2021237467A1 (en) File uploading method, file downloading method and file management apparatus
CN110019873B (en) Face data processing method, device and equipment
CN109669795A (en) Crash info processing method and processing device
CN104965835B (en) A kind of file read/write method and device of distributed file system
CN109271545A (en) A kind of characteristic key method and device, storage medium and computer equipment
CN111522791B (en) Distributed file repeated data deleting system and method
CN105072608B (en) A kind of method and device of administrative authentication token
CN108241639B (en) A kind of data duplicate removal method
CN107181773A (en) Data storage and data managing method, the equipment of distributed memory system
US20140222771A1 (en) Management device and management method
Du et al. Deduplicated disk image evidence acquisition and forensically-sound reconstruction
CN109189813B (en) Data sharing method and device
CN107169065B (en) Method and device for removing specific content
WO2021163856A1 (en) Content pushing method and apparatus, and server and storage medium
CN112988684A (en) Method and system for extracting and de-duplicating electronic official document data based on Hash algorithm
CN106126375B (en) A kind of each version restoration methods of YAFFS2 file based on Hash
CN109688176A (en) A kind of file synchronisation method and terminal, the network equipment, storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant