CN101354708B - Remote file rapid synchronization method - Google Patents

Remote file rapid synchronization method Download PDF

Info

Publication number
CN101354708B
CN101354708B CN2008100456790A CN200810045679A CN101354708B CN 101354708 B CN101354708 B CN 101354708B CN 2008100456790 A CN2008100456790 A CN 2008100456790A CN 200810045679 A CN200810045679 A CN 200810045679A CN 101354708 B CN101354708 B CN 101354708B
Authority
CN
China
Prior art keywords
file
difference
content
filename
rename
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2008100456790A
Other languages
Chinese (zh)
Other versions
CN101354708A (en
Inventor
胡晓勤
刘晓洁
李涛
赵奎
卢正添
杨频
梁刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu century summit Technology Co., Ltd.
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN2008100456790A priority Critical patent/CN101354708B/en
Publication of CN101354708A publication Critical patent/CN101354708A/en
Application granted granted Critical
Publication of CN101354708B publication Critical patent/CN101354708B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention provides a method for rapidly synchronizing a remote file, belonging to the technical field of a computer network. The method stores new and old versions of a set of files at the client through the snapshot technique, avoids the data request on a server in differential calculation, and copies and moves the file at the client; and the server repeats the operation and does not transmit the file content. The method has the characteristics that the differential calculation is rapid, the network communication volume is small, and the differential replay is rapid, accelerates the file synchronization between the client and the server, and has wide potential application in the aspects such as the file backup and file recovery, and the Web cache.

Description

A kind of remote file rapid synchronization method
One, technical field
The present invention proposes a kind of remote file rapid synchronization method, belongs to technical field of the computer network.
Two, background technology
Telefile is widely used at aspects such as file backup and recovery, Web buffer memorys synchronously.Telefile stationary problem key is: the file status of realizing server and client as how minimum Internet traffic is synchronous, particularly between the file of preserving on the file of preserving on the server and the client difference than hour, synchronous speed should be very fast.Operations such as the difference on the client and server between the file can be by document creation, duplicate, deletion, rename, content modification cause, existing telefile method for synchronous, rsync for example is not even if file changes, when difference is calculated, also to produce network traffics; And when handling by file copy or moving the difference of introducing, the file transfer of duplicating or move operation produces is arrived server, cause huge Internet traffic.
The described file synchronisation method of patent publication No. CN1549130 simply uses copy and overlapping operation to finish the synchronous of file, can't avoid the Network Transmission of mass data.
The present invention proposes a kind of remote file rapid synchronization method, has following characteristics:
1) difference between client calculating new and old edition file, spanned file state difference collection, the difference computing velocity is fast, and difference is calculated and is not produced network traffics;
2) for the file copy or the move operation of client, server this operation of resetting, transfer files content not, the difference playback speed is fast.
Three, summary of the invention
The present invention proposes a kind of remote file rapid synchronization method.Its basic thought is: use the new and old edition of snapping technique at one group of file of client storage, difference between client calculation document new and old edition, spanned file state difference collection, the file difference that operations such as unified description document content change and rename are introduced, wherein rename comprises file copy, the complete trails filename that causes such as moves and change, file status difference collection arrives server by Network Transmission, according to the file difference collection, the file operation of server playback client realizes that client is synchronous fast to the file of server.
Before elaborating the present invention, do following term definition earlier:
(1) file f i, refer to that the synchronous file (directory name is expressed as file) of needs comprises absolute path and filename.
(2) file set F refers to the set that the synchronous file of a plurality of needs is formed: F={f 1, f 2..., f i..., f n, at t 1Constantly be designated as F 1, at t 2Constantly be designated as F 2, t wherein 1<t 2
(3) file difference collection Δ F=(F 1-F 2) ∪ (F 2-F 1), F wherein 1, F 2Be t 1And t 2File set constantly.When file set does not have difference, when promptly needing synchronous file and catalogue not to increase and decrease, Δ F=Φ then.
(4) file status s i t<f i p, f i c, a file comprises absolute path and filename f at t state constantly i p, file content f i c
(5) file status collection S t, refer to that one group of file at t state constantly, can be expressed as S t = { s 1 t , s 2 t , . . . , s i t , . . . , s n t } .
(6) file status discrepancy delta s i<Δ f i p, Δ f i c, refer to file set F 1And F 2In a file at t 1And t 2State variation constantly is expressed as Δ s i 2,1 = s i t 2 - s i t 1 . File t 2State can be by t constantly 1State and file status difference calculate constantly, are expressed as s i t 2 = Δ s i 2,1 + s i t 1 .
(7) file status difference collection Δ S is by file set F 1And F 2In one or one group of file status discrepancy delta s iThe set Δ S={ Δ s that forms 1, Δ s 2..., Δ s i..., Δ s n.
(8) the identical file of search file content F R = O c ( f i c , f k c , F ′ ) , Comparison document f iWith each f kWhether ∈ F ' file content is identical, if identical, export file name f k pThe set F that forms R, otherwise F R=Φ.
(9) judge whether to have same file name f P = O p ( f i p , F ′ ′ ) , Judge file f iFilename f i pWhether belong to F ".If f i p ∈ F ′ ′ , Then output f p = f i p , Otherwise output is empty.
Four, description of drawings
Fig. 1 is a fundamental diagram of the present invention.
Fig. 2 is the step of the identical file of search file content.
Fig. 3 is the step of spanned file state difference collection.
Fig. 4 is the step of storage of file status difference collection and transmission.
Fig. 5 is the step that file status difference collection is reset.
Five, embodiment
Describe concrete grammar of the present invention in detail below in conjunction with accompanying drawing.
Fig. 1 is a fundamental diagram of the present invention.
Fig. 1 is a fundamental diagram of the present invention, among the figure
Figure G200810045679001D00031
The file at expression arrow two ends has identical filename, but file content there are differences, corresponding to the file content retouching operation of client.The file of " → " expression arrow the right and left has identical file content, but filename is inconsistent, corresponding to the file rename operation of client.The file on " ← " expression arrow the right identical file of filename perhaps in the arrow left side does not exist with it is corresponding to the file creation operation of client.
Figure G200810045679001D00032
The file on the expression arrow left side identical file of filename perhaps in arrow the right does not exist with it is corresponding to the file delete operation of client.T among the figure 1The state consistency of file on the file of client snapshot description constantly and the server is corresponding to file set F 1, with t 2Snapshot is represented the current file status of client constantly, corresponding to file set F 2, t wherein 1<t 2Calculate these two the file set F that snapshot write down constantly 2And F 1State difference collection Δ S, server is finished file synchronization according to Δ S.
Particularly, the concrete steps of a kind of remote file rapid synchronization method of the present invention's proposition may further comprise the steps:
1) step of the file that the search file content is identical;
2) step of spanned file state difference collection;
3) step of storage of file status difference collection and transmission;
4) step of file status difference collection playback.
Fig. 2 is the step of the identical file of search file content.
Fig. 2 has provided the identical file of search file content F R = O c ( f i c , f k c , F ′ ) Concrete computing method.All Files among the file set F ' is carried out following operation, judge f iWhether and the file f among the F ' kThe content unanimity, promptly f i c = f k c , If the content unanimity is then with f kFilename f k pAdd set F RIn, f wherein i cCan be expressed as<f i l, f i t, f i h, f i lBe file size; f i tBe the last modification time of file, f i hBe the file hash value.Concrete steps are as follows:
1) step of comparison document length: compare f i lWhether equal f k lIf, not etc., f then i cAnd f k cInconsistent, change 4);
2) step of comparison document hashed value: compare f i hWith f k hWhether identical, if difference, then f i cAnd f k cInconsistent, change 4);
3) filename that will have a same file content deposits the step of result set in: through 2 steps in front, judge f i cAnd f k cUnanimity is with f k pAdd F R
4) judge the step that traversal finishes: check F RWhether middle All Files travels through is finished, if traversal is finished, then withdraw from,
Otherwise change 1)
Fig. 3 is the step of spanned file state difference collection.
Fig. 3 has provided the concrete grammar of spanned file state difference collection.Specific t 1, t 2File set F constantly 1, F 2, to All Files f i∈ F 2, the state difference Δ s of calculating this document i, all Δ s form Δ S, and concrete steps are as follows:
1) step of the file that the search file content is identical: executable operations F R = O c ( f i c , f k c , F 1 ) , The result is designated as F cIf, F cBe not empty set, then change 2), otherwise change 3);
2) detect the step that the file rename is operated: executable operations f p = O p ( f i p , F c ) , Judge f i pWhether belong to F c, if f i p &Element; F c , File status no change then, if f i p &NotElement; F c , Then file produces by duplicating, moving, and generates &Delta; s i = &Delta; f i p = < f k p , f i p > , f k pThe expression source filename, f i pExpression file destination name is with Δ s iAdd among the Δ S;
3) step of the new file of detection: executable operations f p = O p ( f i p , F 1 ) , Judge f i pWhether belong to F 1If, f p = f i p , Then the supporting paper name does not become, f iFile content changes, and changes 4), otherwise f iBe new file, Δ s i=Δ f i, with Δ s iAdd among the Δ S;
4) step of calculation document content difference: by classical rsync algorithm computation &Delta; f i c = rsync ( f i c , f j c ) , , F wherein i cBe f iNew file content, and f j cBe f iThe ancient deed content, promptly &Delta; s i = &Delta; f i c , With Δ s iAdd among the Δ S.
Fig. 4 is the step of storage of file status difference collection and transmission.
Fig. 4 has provided the concrete grammar of storage of file difference collection and transmission.With file FS r, FS c, FS n, FS dDeposit difference dissimilar among the Δ S respectively, to depositing four class files of difference among the Δ S, preface: FS successively r→ FS c→ FS n→ FS d, respectively with File Open, and with file content by Network Transmission to remote server.Concrete steps are as follows:
1) step that stores classifiedly of file status difference collection: the file status difference that writes down among the Δ S is existed respectively in the different files of 4 classes, wherein FS rPreserve the identical but difference of route diversion of file content, promptly &Delta; s i = &Delta; f i p ; FS cPreserve filename and difference content that file content difference takes place, promptly &Delta;s i = &Delta; f i c ; FS nPreserve the filename and the file content of newly-increased file, i.e. Δ s i=f iFS dExpression deleted file, i.e. Δ F '=F 1-F 2, every record r dFilename for deleted file;
2) open the step of the file of depositing file status difference: preface FS successively r→ FS c→ FS n→ FS d, open various types of difference state storage files respectively;
3) content that opens file the step of transmission file status difference: with 2) sends to remote server by network;
4) check that file status difference sends the step that finishes: checking whether all the difference storage file has sent finishes, and finishes if the result sends, and then withdraws from, otherwise switches to next difference storage file, changes 2).
Fig. 5 is the step that file status difference collection is reset.
Fig. 5 has provided the concrete grammar that file status difference collection is reset.Server is after receiving the file status difference collection that client is come by Network Transmission, and playback file status difference collection is promptly carried out S 2=Δ S+S 1, finish the synchronous of file, playback order is: FS r→ FS c→ FS n→ FS dConcrete steps are as follows:
1) step of playback client file rename operation: according to FS rThe file rename service data of middle record, at first spanned file rename playback script is carried out script then, and concrete steps are as follows:
1. the step of all records of preprocessed file rename: establish FS rThe set that all records are formed in the file is R, and every is recorded as r<r s, r t, wherein r s = f k p Be source filename, r t = f i p For the file destination name, according to r<r s, r t, generate source file destination name r sSet F s, file destination name r tSet F t, calculate F tIn each r tPathdepth, and according to the pathdepth descending sort;
2. the step of spanned file rename playback script: state 3 diques, represent playback FS respectively rIn the operation that need carry out of file difference, resolve FS then rIn every record, and structure file movement or replicate run be in corresponding formation, the operation of preserving in final 3 formations is the playback script of being asked, concrete steps are as follows:
(a) step of statement dique: statement dique Q CP, Q MVAnd Q RD, Q wherein CPPreserve replicate run, Q MVPreserve the operation of rename operation and filename exchange first half; Q RDPreserve the operation of execute file name exchange latter half;
(b) detect the step of filename swap operation: for according to r tEach record r<r of descending sort s, r tIn r s, at set F tIf middle search is r s∈ F tThe swap file name then taking place, utilize random number generation function RD, system clock T to generate unique filename, produces rename operation script MV (r s, RD (T)) and be inserted into Q MVQueue heads produces script MV (RD (T), r t) be inserted into Q RDQueue heads, otherwise change (c);
(c) detect the step that rename is operated: in set Δ F, search r s, if r s∈ Δ F then produces rename operation script MV (r s, r t) be inserted into Q MVQueue heads, otherwise change (d);
(d) step of processing replicate run: generate replicate run script CP (r s, r t) be inserted into Q CPQueue heads.
3. the step of execute file rename playback script: from tail to a head traversal formation Q CP, carry out each replicate run; Travel through formation Q from tail to head MV, carry out each rename operation; Travel through formation Q from tail to head RD, carry out the latter half that each rename is operated.
2) step of playback file content difference: to FS cIn all differences record, utilize the playback algorithm of rsync, updating file;
3) playback client is created the step of new file operation: to FS nIn all generate the record of new file, according to the record in new filename and file content, produce new file;
4) step of playback client delete file operation: scanning document FS dAll records, calculate the pathdepth of each bar record, and, according to ordering successively, delete the file that every record is indicated according to the pathdepth descending sort.

Claims (1)

1. a remote file rapid synchronization method is characterized in that may further comprise the steps: the step of the file that the search file content is identical;
The step of spanned file state difference collection; The step of storage of file status difference collection and transmission; The step that file status difference collection is reset; Wherein:
(1) step of the file that the search file content is identical may further comprise the steps:
(1.1) step of comparison document length, whether the file size of a certain file equates in comparison document and the file set, if do not wait, changes (1.4), otherwise changes (1.2);
(1.2) step of comparison document hashed value, relatively the hashed value of two files in (1.1) if do not wait, is changeed (1.4), otherwise changes (1.3);
(1.3) filename that will have a same file content deposits the step of result set in;
(1.4) judge the step that traversal finishes, whether All Files travels through and finishes in the inspection file set, if traversal is finished, then withdraws from, otherwise changes (1.1);
(2) step of spanned file state difference collection may further comprise the steps:
(2.1) to belonging to t 2All Files and t in the moment file set 1Constantly file set, wherein t 1<t 2, the step of the file that execution search file content is identical, Search Results deposits result set in, if result set is not empty, then changes (2.2), otherwise changes (2.3);
(2.2) detect the step that the file rename is operated, judge that filename is whether in the result set that (2.1) produce, if, file status no change then, otherwise file produces by duplicating, moving, represent file status difference by source filename and target part name, and add file status difference to file status difference and concentrate;
(2.3) step of the new file of detection is judged file f iWhether belong to t 1Moment file set, if, change (2.4), otherwise file f iBe new file, file status difference is this document f i, and add file status difference to file status difference and concentrate;
(2.4) step of calculation document content difference, by classical rsync algorithm computation file content difference, file status difference equals file content difference, adds file status difference to file status difference and concentrates;
(3) step of storage of file status difference collection and transmission may further comprise the steps:
(3.1) step that stores classifiedly of file status difference collection;
(3.2) open the step of the file of depositing file status difference;
(3.3) step of transmission file status difference sends to remote server with the content that opens file in (3.2) by network;
(3.4) check that file status difference sends the step that finishes, check whether all the difference storage file has sent to finish, finish, then withdraw from, otherwise switch to next difference storage file, change (3.2) if send;
(4) step of file status difference collection playback may further comprise the steps:
(4.1) step of playback client file rename operation comprises:
1. the step of all records of preprocessed file rename is calculated in the set of file destination name the pathdepth of file destination name in all records, and according to the pathdepth descending sort;
2. the step of spanned file rename playback script comprises:
(a) step of statement dique, statement dique Q CP, Q MVAnd Q RD, Q wherein CPPreserve replicate run, Q MVPreserve the operation of rename operation and filename exchange first half, Q RDPreserve the operation of execute file name exchange latter half;
(b) step of detection filename swap operation;
(c) detect the step of rename operation, concentrate search source file destination name,, then produce rename operation script and be inserted into the rename operation and filename exchange first half operation Q if this source file destination name belongs to the file difference collection at file difference MVQueue heads, otherwise change (d);
(d) step of processing replicate run generates the replicate run script and is inserted into replicate run Q CPQueue heads;
3. the step of execute file rename playback script travels through replicate run formation Q from tail to head CP, carry out each replicate run; Travel through rename operation and filename exchange first half operation queue Q from tail to head MV, carry out each rename operation; Exchange latter half operation queue Q from tail to head traversal filename RD, carry out the latter half that each rename is operated;
(4.2) step of playback file content difference to the filename of generation file content difference and all differences record in the difference content file, is utilized the playback algorithm of rsync, updating file;
(4.3) playback client is created the step of new file operation, and the filename of newly-increased file and all in the file content file are generated the record of new file, and new filename and file content according in the record produce new file;
(4.4) step of playback client delete file operation, all records in the scanning deleted file calculate the pathdepth of every record, and according to the pathdepth descending sort, according to ordering successively, delete the file of every record indication.
CN2008100456790A 2008-07-29 2008-07-29 Remote file rapid synchronization method Expired - Fee Related CN101354708B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2008100456790A CN101354708B (en) 2008-07-29 2008-07-29 Remote file rapid synchronization method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2008100456790A CN101354708B (en) 2008-07-29 2008-07-29 Remote file rapid synchronization method

Publications (2)

Publication Number Publication Date
CN101354708A CN101354708A (en) 2009-01-28
CN101354708B true CN101354708B (en) 2010-08-18

Family

ID=40307519

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2008100456790A Expired - Fee Related CN101354708B (en) 2008-07-29 2008-07-29 Remote file rapid synchronization method

Country Status (1)

Country Link
CN (1) CN101354708B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101534480B (en) * 2009-04-14 2010-12-08 南京大学 Cross network file synchronizing method based on mobile software agent
CN102130929A (en) * 2010-01-20 2011-07-20 金蝶软件(中国)有限公司 Method and device for updating client testing server and testing server
CN102314512A (en) * 2011-09-16 2012-01-11 北京理工大学 Synchronous storage system and method of data file based on Bluetooth technology
KR101310253B1 (en) * 2011-10-28 2013-09-24 (주)네오위즈게임즈 Hash data creation method and hash data comparison system and method
KR101994163B1 (en) * 2012-08-24 2019-09-30 삼성전자 주식회사 Device and method for synchronizing compression contents file and system using the same
CN104008104A (en) * 2013-02-22 2014-08-27 纬创资通股份有限公司 Method for processing document change and electronic device
CN104052767A (en) * 2013-03-13 2014-09-17 宏碁股份有限公司 File tracking method and network communication device with applicability of the same
CN103514258B (en) * 2013-08-09 2017-02-01 北京龙存科技有限责任公司 Centralized recording, preprocessing and replaying method based on offline cache file operation
CN103645905B (en) * 2013-12-20 2017-08-08 北京中电普华信息技术有限公司 A kind of incremental data acquisition method and device
CN105740096A (en) * 2016-01-21 2016-07-06 浪潮(北京)电子信息产业有限公司 Method and apparatus for data remote disaster tolerance backup of cluster file system
CN106331182B (en) * 2016-11-11 2019-05-07 山东大学 A kind of file synchronisation method based on desktop virtualization
CN113094443A (en) * 2021-05-21 2021-07-09 珠海金山网络游戏科技有限公司 Data synchronization method and device

Also Published As

Publication number Publication date
CN101354708A (en) 2009-01-28

Similar Documents

Publication Publication Date Title
CN101354708B (en) Remote file rapid synchronization method
JP7090744B2 (en) Distributed database cluster system and data synchronization method
CN103959256B (en) Data duplication based on fingerprint is deleted
Li et al. Scaling distributed machine learning with the parameter server
US9244983B2 (en) Platform for continuous graph update and computation
CN104765840B (en) A kind of method and apparatus of big data distributed storage
US9705730B1 (en) Cloud storage using Merkle trees
CN108255647B (en) High-speed data backup method under samba server cluster
Shapiro et al. Convergent and commutative replicated data types
US20100318759A1 (en) Distributed rdc chunk store
US8140599B1 (en) Garbage collection for merged collections
CN102624871A (en) Remote file synchronizing method realized based on distributed enterprise service bus
KR20120027132A (en) Differential file and system restores from peers and the cloud
CN106105161A (en) To cloud data storage device Backup Data while maintaining storage efficiency
CN105183400B (en) It is a kind of based on content addressed object storage method and system
KR20080005501A (en) Maintenance of link level consistency between database and file system
CN106844102B (en) Data recovery method and device
CN105138691B (en) Analyze the method and system of subscriber traffic
KR101254179B1 (en) Method for effective data recovery in distributed file system
CN110287251A (en) The high fault tolerant data real-time synchronization method of distribution of MongoDB to HBase
Masadeh et al. Available techniques in hadoop small file issue
Kim et al. A distributed and cooperative NameNode cluster for a highly-available Hadoop distributed file system
Qi et al. Blockchain based consensus checking in cloud storage
CN102609338A (en) Reverse file increment filing method
CN103838757A (en) System and method for processing long relation chain data of user

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
ASS Succession or assignment of patent right

Owner name: SICHUAN GREAT TECHNOLOGY CO., LTD.

Free format text: FORMER OWNER: SICHUAN UNIVERSITY

Effective date: 20101129

C41 Transfer of patent application or patent right or utility model
COR Change of bibliographic data

Free format text: CORRECT: ADDRESS; FROM: 610065 NO.24, SOUTH SECTION 1, 1ST RING ROAD, CHENGDU CITY, SICHUAN PROVINCE TO: 610041 2/F, COMPREHENSIVE BUILDING, PRIMARY SCHOOL STREET FRONTAGE, KEHUA MIDDLE ROAD, CHENGDU CITY, NO.2, CHANGSHOU ROAD, WUHOU DISTRICT, CHENGDU CITY

TR01 Transfer of patent right

Effective date of registration: 20101129

Address after: 610041, Wuhou District, Changshou Road, Chengdu No. 2 Chengdu KELONG Road Primary School Street building two floor

Patentee after: Sichuan Gerite Technology Co., Ltd.

Address before: 610065 Sichuan, Chengdu, South Ring Road, No. 1, No. 24

Patentee before: Sichuan University

ASS Succession or assignment of patent right

Owner name: CHENGDU GLOBAL CAPSHEAF TECHNOLOGY CO., LTD.

Free format text: FORMER OWNER: SICHUAN GREAT TECHNOLOGY CO., LTD.

Effective date: 20130402

C41 Transfer of patent application or patent right or utility model
TR01 Transfer of patent right

Effective date of registration: 20130402

Address after: 610000 C-411 Asia Pacific Plaza, KELONG North Road, Sichuan, Chengdu

Patentee after: Chengdu century summit Technology Co., Ltd.

Address before: 610000, Sichuan, Wuhou District, Changshou Road, Chengdu No. 2 Chengdu KELONG Road Primary School Street building two floor

Patentee before: Sichuan Gerite Technology Co., Ltd.

DD01 Delivery of document by public notice

Addressee: Wang Zhengtao

Document name: Notification of Passing Examination on Formalities

CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20100818

Termination date: 20160729