CN102065098A - Method and system for synchronizing data among network nodes - Google Patents

Method and system for synchronizing data among network nodes Download PDF

Info

Publication number
CN102065098A
CN102065098A CN2010106221550A CN201010622155A CN102065098A CN 102065098 A CN102065098 A CN 102065098A CN 2010106221550 A CN2010106221550 A CN 2010106221550A CN 201010622155 A CN201010622155 A CN 201010622155A CN 102065098 A CN102065098 A CN 102065098A
Authority
CN
China
Prior art keywords
data
data block
server
module
client
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2010106221550A
Other languages
Chinese (zh)
Inventor
洪珂
刘爱贵
刘成彦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wangsu Science and Technology Co Ltd
Original Assignee
Wangsu Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wangsu Science and Technology Co Ltd filed Critical Wangsu Science and Technology Co Ltd
Priority to CN2010106221550A priority Critical patent/CN102065098A/en
Publication of CN102065098A publication Critical patent/CN102065098A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a system for synchronizing data among network nodes, for improving data synchronizing performance, safety and self-adaptability. The technical scheme of the invention is as follows: the method comprises the steps: a client side sends a request of synchronizing files to a server; the client side segments a first file into data blocks with different sizes by adopting a variable data block segmenting method and calculates weak check and strong check for each data block; the server segments a second file by adopting a method same with the client side and calculates weak check and strong check by adopting a manner same with the client side; the server sends a weak check value and a strong check value of the data blocks of the second file to the client side; the client side searches the data blocks in the second file, which have already been in the first file, by adopting a Hash Algorithm, deletes repeated data in the lost data block set and requests unrepeated lost data blocks for the server; the server sends the requested data blocks to the client side; and after receiving the required data blocks, the client side reconstructs a copy of the second file.

Description

The method and system of data sync between the network node
Technical field
The present invention relates to data synchronization technology, relate in particular to the method and system of data sync between the network node.
Background technology
Rsync is under the class Unix environment one a telefile synchronization means efficiently, and it has reduced data traffic and improved file transfer efficient by optimizing the data sync flow process.
Suppose to have now two computer A lpha and Beta, computer A lpha can visit the A file, and computer Beta can visit the B file, and file A and B have high similitude, and computer A lpha and Beta are interconnected by slow network.If file B is synchronized to file A, the Rsync algorithm flow as shown in Figure 1.
1. Beta is cut into the data block of continuous nonoverlapping regular length S with file B, and last data block may be less than the S byte; Beta calculates two check values for each data block: one 32 weak rolling verification and one 128 MD4 verification;
2. Beta is to Alpha demand file A, and data block check value sequence is sent to Alpha;
3. all sizes of Alpha search file A are the data block (side-play amount can be any, and not necessarily leaveing no choice but is the multiple of S) of S, search all and have the data block of identical weak verification and strong verification with file B.Weak verification is finished fast by rolling verification Rolling checksum;
4. Alpha is to the instruction of Beta transmission reconstruct A file, and every instruction is a file B data block index (coupling) or file A data block (coupling).
5. the Beta instruction of sending according to Alpha reconstructs the copy of file A.
There is following deficiency in traditional Rsync algorithm:
(1) identical block is searched and is consumed a large amount of computational resources, comprises that weak check value calculates and Hash (Hash) is searched;
(2) client and server amount of calculation are not reciprocity, can cause server load overweight;
(3) the fixed-length data piecemeal data are inserted and deletion responsive, treatment effeciency is low, can not be adjusted accordingly and optimizes according to content change;
(4) the different pieces of information set of blocks is not done to disappear heavily and handle, performance is very low under the low situation of document similarity.
Summary of the invention
The objective of the invention is to address the above problem, the method for data sync between a kind of network node is provided, improved network data synchronization performance, Information Security and adaptivity.
Another object of the present invention is to provide data synchronization system between a kind of network node.
Technical scheme of the present invention is: the present invention has disclosed the method for data sync between a kind of network node, comprises the steps:
Step 1: user end to server sends the request with synchronous documents;
Step 2: the data block that the elongated data block cutting method of customer end adopted becomes to differ in size with first file division, for each data block is calculated weak verification and strong verification;
Step 3: after server receives synchronization request, the data block that adopts the elongated data block cutting method identical that second file division is become to differ in size with client, and the employing verification account form identical with client be that each data block is calculated weak verification and strong verification, and step 3 and step 2 are carried out simultaneously;
Step 4: server sends the weak check value and the strong check value of the data block of second file to client;
Step 5: receive the weak check value and strong check value of data block in client after, search the data block in second file that has had in first file by hash algorithm, and the data block set of disappearance carried out data de-duplication, then to the unduplicated missing data piece of server requests;
Step 6: server sends the data block of being asked to client;
Step 7: after client receives required data block, utilize data block that receives and the data block that has had, reconstruct the copy of second file.
Embodiment according to the method for data sync between the network node of the present invention in step 4, enables data compression and comes the transmission data block check value, and corresponding in step 5, needing decompresses earlier do subsequent treatment again.
According to an embodiment of the method for data sync between the network node of the present invention, in step 6, the mode that server is enabled data compression is transmitted the data block of being asked, and is corresponding in step 7, needs the operation that decompress to packed data earlier.
According to an embodiment of the method for data sync between the network node of the present invention, client and server is the weak verification of rolling to the weak verification of data block, and client and server is the MD5 verification to the strong verification of data block.
According to an embodiment of the method for data sync between the network node of the present invention, the elongated data block cutting method in step 2 and the step 3 comprises following step:
Calculate the buffering area free space size, read corresponding data from file and fill buffering area;
Judge whether to be last data block, if then flow process end after handling last data block, whether enough if not the remaining space of then judging buffering area, if the space inadequately then return the step that reading of data is filled buffering area, if the space enough then calculation window data block fingerprint;
Judge whether window data piece fingerprint satisfies condition, if satisfy condition then handle the normal data piece, if do not satisfy condition then handle improper data block;
Carry out the data processing of buffering area and window, return the step of judging that the buffering area remaining space is whether enough then.
The present invention has also disclosed data synchronization system between a kind of network node, comprise client terminal device and server unit, wherein client terminal device comprises the synchronous documents request sending module, client data piece cutting module, client data block check module, data block check value receiver module, the missing data piece is searched module, data block receiver module and file restructure module, server unit comprises synchronous documents request receiver module, server data piece cutting module, server data block check module, data block check value sending module, the data block sending module, wherein:
The synchronous documents request sending module sends the synchronous documents request by user end to server;
Client data piece cutting module, the data block that the elongated data block cutting method of customer end adopted becomes to differ in size with first file division;
Client data block check module couples client data piece cutting module, for each divided data piece calculates weak verification and strong verification;
Synchronous documents request receiver module is set up communication with the synchronous documents request sending module and is connected, by the synchronous documents request of server reception from client;
Server data piece cutting module couples synchronous documents request receiver module, the data block that adopts the data block cutting method identical with client data piece cutting module that second file division is become to differ in size after server receives synchronization request;
Server data block check module couples server data piece cutting module, adopts the weak verification and the strong verification of the data block method of calibration calculation server divided data piece identical with client data block check module;
Data block check value sending module couples server data block check module, and the weak check value of server divided data piece and strong check value are sent to client;
Data block check value receiver module receives the weak check value and the strong check value that send from server;
The missing data piece is searched module, couple data block check value receiver module, search the data block in second file that has had in first file by hash algorithm, data de-duplication is carried out in the data block set of disappearance, to the unduplicated missing data piece of server requests;
Data block sending module, server send the data block of being asked to client;
The file restructure module receives the data block from the data block sending module, reconstructs the copy of second file.
Embodiment according to data synchronization system between the network node of the present invention, be provided with first compression unit in the data block check value sending module, enable data compression and come the transmission data block check value, be provided with first decompression unit in the data block check value receiver module, log-on data decompresses decompressed data block check value.
Embodiment according to data synchronization system between the network node of the present invention, be provided with second compression unit in the data block sending module, enable data compression and transmit the data block of being asked, be provided with second decompression unit in the file restructure module, enable data decompression and come the decompressed data piece.
According to an embodiment of data synchronization system between the network node of the present invention, client and server is the weak verification of rolling to the weak verification of data block, and client and server is the MD5 verification to the strong verification of data block.
The present invention contrasts prior art following technique effect: inventive point main in the technical scheme of the present invention is: elongated data block cutting, the weak verification of rolling and the strong verification of MD5, data de-duplication and data compression, the load of transfer service device.
Fixed-length data piecemeal in the conventional art data are inserted and deletion responsive, treatment effeciency is low, can not be adjusted accordingly and optimizes according to content change.This method adopts a kind of content-based deblocking method, and the partition strategy that its application data fingerprint becomes length to differ in size file division is different with the fixed length block algorithm, and the data block size is transformable.In the algorithm implementation, the sliding window that uses one 64 byte fixed size is to file data calculated data fingerprint.If fingerprint FP satisfies condition (FP mod M=R), then the border of the window's position as piece.Ill phenomenon may appear in this method, and promptly the fingerprint condition can not satisfy, and block boundary be can not determine, causes data block excessive, can set bound [Min, Max] to the size of data block and solve this problem.This method changes insensitive to file content, insertion or deleted data only can have influence on adjacent data block, and the remainder data piece is unaffected, possesses very high adaptivity.Elongated deblocking technology also makes the client and server amount of calculation reciprocity substantially, has avoided weak check value calculating and Hash a large amount of in the Rsync data block search procedure to search, and has effectively improved performance.
The weak verification of rolling of the present invention is identical with Rsync, and the MD5 verification is then adopted in strong verification, and the two all is the Hash function.Feature of Hash function is: if two hash value differences, then data block scarcely together; If but two hash values are identical, can not conclude that then two data blocks are identical.The Hash function can produce data collision, but this probability is very small.Therefore be similar to usually and think that if hash value is identical, then data block is identical.Usually, the identical data block retrieval adopts the higher Hash function of intensity to come calculation check, and as SHA-256, SHA-512, SHA-1024, this class methods amount of calculation will increase many.
This method adopts the mode of strong and weak combination to reduce the probability that data collision takes place.For each data block, calculate a weak verification earlier, if identical with target data block, the then real-time stronger MD5 verification of calculating is also made comparisons.The probability of strong and weak data collision takes place in two data blocks simultaneously, and single relatively verification will reduce greatly.Weak verification amount of calculation is very little, and this method has greatly reduced the probability that data bump with lower cost, has effectively improved Information Security.
May there be repeating data in the data of transmitting mutually between client and the server, comprise data block verification metadata and data block data, and existing method does not disappear heavily to data.This method utilizes the data de-duplication technology of field of storage that data synchronous flow journey is further optimized, and reduces the network data communication amount.To being about to the data of transmission, adopt elongated data block cutting method to cut apart, use strong and weak verification to carry out the identical data block retrieval, only transmit the unique data copy, identical data is then represented with its call number.If the network bandwidth is limited, can select to enable data compression, further reduce the network data communication amount.These two kinds of methods can effectively improve the performance of data sync.
Data server can be accepted a large amount of concurrent client data synchronization request, if server logic complexity and amount of calculation are big, then can cause server overload and cause relevant issues.Just there is this problem in Rsync, and server has been born complicated calculating.This method is calculated complex logic and is transferred to client, to reduce server load.Specific practice is: server A lpha sends to customer end B eta with the data block verification sequence information of file A, is responsible for carrying out identical data block retrieval and data de-duplication by Beta, and then to the essential data of Alpha request.Like this, originally the complicated intensive calculations task distribution of being born by server was carried out to numerous clients, and this load is very little to client, but has greatly reduced server load.
Description of drawings
Fig. 1 shows the schematic diagram of traditional file synchronisation method.
The schematic diagram that shows the embodiment of the method for data sync between the network node of the present invention that Fig. 2 is exemplary.
The flow chart that shows elongated data block cutting method of the present invention that Fig. 3 is exemplary.
The structural representation that shows the embodiment of data synchronization system between the network node of the present invention that Fig. 4 is exemplary.
Embodiment
The invention will be further described below in conjunction with drawings and Examples.
The embodiment of the method for data sync between the network node
Fig. 2 shows the embodiment of the method for data sync between the network node of the present invention.See also Fig. 2, the method for present embodiment comprises the steps.
1. customer end B eta is to the request of server A lpha transmission with synchronous documents.
2. customer end B eta adopts elongated data block cutting method that file B is divided into the data block that differs in size, for each data block is calculated weak verification and strong verification.
Elongated data block cutting method is an innovation part of the present invention, and with respect to the fixed length block cutting method, it is a kind of general designation of algorithm, does not represent certain specific algorithm.Present embodiment for example utilizes data fingerprint to carry out cutting, but data fingerprint calculates and each algorithm of window data size is distinguished to some extent, and the data window size is 64 bytes in the scheme of present embodiment, and the fingerprint computational algorithm is adler32_hash.
Elongated data block cutting method in the present embodiment as shown in Figure 3.
Step S101: calculate buffering area (BUF) free space size, read the respective numbers data from file and fill BUF.
Step S102: judge whether to arrive last data block, condition is that current BUF data length is less than minimum data block length (BLOCK_MIN_SZ preestablishes).If satisfy condition then go to step S103, otherwise go to step S109.
Step S103: judge whether the BUF remaining space is enough big, and condition is that remaining space length is not less than window size (BLOCK_WIN_SZ preestablishes).If satisfy condition then go to step S104, otherwise go to step S101.
Step S104: calculate current window data fingerprint FP (Finger Print, fingerprint), the fingerprint is here calculated by weak verification or rolling verification.
Step S105: judge whether FP satisfies condition, i.e. FP%C==R (C and R are predefined constant).If satisfy condition, then go to step S106, otherwise go to step S107.
Step S106: the copy window data obtain a normal data piece to the mobile data piece if the mobile data block length is not less than the minimum data block length, and call function is handled.Adjust BUF and the displacement of mobile data piece.
Step S107: duplicate the fixed step size data to the mobile data piece, (BLOCK_MAX_SZ preestablishes) then obtains an improper data block if the mobile data block length is not less than the maximum data block length, and call function is handled.Adjust BUF and the displacement of mobile data piece.
Step S108: adjust the skew of BUF and mobile data piece, the unnecessary data window of avoiding causing less than BLOCK_MIN_SZ moves with check value and calculates.
Step S109: obtain last improper data block, call function is handled.
This content-based deblocking method, the partition strategy that its application data fingerprint becomes length to differ in size file division, different with the fixed length block algorithm, the data block size is transformable.In the algorithm implementation, the sliding window that uses one 64 byte fixed size is to file data calculated data fingerprint.If fingerprint FP satisfies condition (FP mod M=R), then the border of the window's position as piece.Ill phenomenon may appear in this method, and promptly the fingerprint condition can not satisfy, and block boundary be can not determine, causes data block excessive, can set bound [Min, Max] to the size of data block and solve this problem.This method changes insensitive to file content, insertion or deleted data only can have influence on adjacent data block, and the remainder data piece is unaffected, possesses very high adaptivity.Elongated deblocking technology also makes the client and server amount of calculation reciprocity substantially, has avoided weak check value calculating and Hash a large amount of in the Rsync data block search procedure to search, and has effectively improved performance.
3. after server A lpha receives synchronization request, adopt the elongated data block cutting method identical file A to be divided into the data block that differs in size with customer end B eta, and the employing verification account form identical with customer end B eta be that each data block is calculated weak verification and strong verification, and 2. 3. step carry out simultaneously with step.
The weak verification of the rolling of present embodiment is identical with Rsync, and the MD5 verification is then adopted in strong verification, and the two all is the Hash function.Feature of Hash function is: if two hash value differences, then data block scarcely together; If but two hash values are identical, can not conclude that then two data blocks are identical.The Hash function can produce data collision, but this probability is very small.Therefore be similar to usually and think that if hash value is identical, then data block is identical.Usually, the identical data block retrieval adopts the higher Hash function of intensity to come calculation check, and as SHA-256, SHA-512, SHA-1024, this class methods amount of calculation will increase many.
This method adopts the mode of strong and weak combination to reduce the probability that data collision takes place.For each data block, calculate a weak verification earlier, if identical with target data block, the then real-time stronger MD5 verification of calculating is also made comparisons.The probability of strong and weak data collision takes place in two data blocks simultaneously, and single relatively verification will reduce greatly.Weak verification amount of calculation is very little, and this method has greatly reduced the probability that data bump with lower cost, has effectively improved Information Security.
4. server A lpha sends the weak check value and the strong check value of the data block of file A to customer end B eta.
In the present embodiment, be to adopt a data block check value sequence (five-tuple is represented) to transmit, i.e. (bid, offset, length, checksum, MD5), wherein bid is block id, be the data block numbering, Offset is data block skew hereof, and Length is a data block length, checksum is the weak check value of data block and adopts the adler32_hash algorithm computation that MD5 is the strong check value of data block and adopts the MD5 algorithm computation.
5. after customer end B eta receives the weak check value and strong check value of data block, by the data block among the file A that has had among the hash algorithm locating file B, and the data block set of disappearance carried out data de-duplication, ask unduplicated missing data piece to server A lpha then.
May there be repeating data in the data of transmitting mutually between client and the server, comprise data block verification metadata and data block data, and existing method does not disappear heavily to data.This method utilizes the data de-duplication technology of field of storage that data synchronous flow journey is further optimized, and reduces the network data communication amount.To being about to the data of transmission, adopt elongated data block cutting method to cut apart, use strong and weak verification to carry out the identical data block retrieval, only transmit the unique data copy, identical data is then represented with its call number.If the network bandwidth is limited, can select to enable data compression, further reduce the network data communication amount.These two kinds of methods can effectively improve the performance of data sync.
6. server A lpha sends the data block of being asked to customer end B eta.
7. after customer end B eta receives required data block, utilize data block that receives and the data block that has had, reconstruct the copy of file A.
Data server can be accepted a large amount of concurrent client data synchronization request, if server logic complexity and amount of calculation are big, then can cause server overload and cause relevant issues.Just there is this problem in Rsync, and server has been born complicated calculating.This method is calculated complex logic and is transferred to client, to reduce server load.Specific practice is: server A lpha sends to customer end B eta with the data block verification sequence information of file A, is responsible for carrying out identical data block retrieval and data de-duplication by Beta, and then to the essential data of Alpha request.Like this, originally the complicated intensive calculations task distribution of being born by server was carried out to numerous clients, and this load is very little to client, but has greatly reduced server load.
The embodiment of data synchronization system between the network node
Fig. 4 shows the principle of the embodiment of data synchronization system between the network node of the present invention.See also Fig. 4, the system of present embodiment comprises client terminal device 10 and server unit 12, sets up communication between the two mutually and connects.
Client terminal device 10 comprises that synchronous documents request sending module 100, client data piece cutting module 101, client data block check module 102, data block check value receiver module 103, missing data piece search module 104, data block receiver module 105 and file restructure module 106.
Server unit 12 comprises synchronous documents request receiver module 120, server data piece cutting module 121, server data block check module 122, data block check value sending module 123, data block sending module 124.
The operation logic of whole system is as follows.
Synchronous documents request sending module 100 sends the synchronous documents request by client terminal device 10 to server unit 12.
Client data piece cutting module 101 is that client terminal device 10 adopts elongated data block cutting method file B to be divided into the data block that differs in size.The performing step of elongated data block cutting method wherein as shown in Figure 3.
Step S101: calculate buffering area (BUF) free space size, read the respective numbers data from file and fill BUF.
Step S102: judge whether to arrive last data block, condition is that current BUF data length is less than minimum data block length (BLOCK_MIN_SZ preestablishes).If satisfy condition then go to step S103, otherwise go to step S109.
Step S103: judge whether the BUF remaining space is enough big, and condition is that remaining space length is not less than window size (BLOCK_WIN_SZ preestablishes).If satisfy condition then go to step S104, otherwise go to step S101.
Step S104: calculate current window data fingerprint FP, the fingerprint is here calculated by weak verification or rolling verification.
Step S105: judge whether FP satisfies condition, i.e. FP%C==R (C and R are predefined constant).If satisfy condition, then go to step S106, otherwise go to step S107.
Step S106: the copy window data obtain a normal data piece to the mobile data piece if the mobile data block length is not less than the minimum data block length, and call function is handled.Adjust BUF and the displacement of mobile data piece.
Step S107: duplicate the fixed step size data to the mobile data piece, (BLOCK_MAX_SZ preestablishes) then obtains an improper data block if the mobile data block length is not less than the maximum data block length, and call function is handled.Adjust BUF and the displacement of mobile data piece.
Step S108: adjust the skew of BUF and mobile data piece, the unnecessary data window of avoiding causing less than BLOCK_MIN_SZ moves with check value and calculates.
Step S109: obtain last improper data block, call function is handled.
Client data block check module 102 is to calculate weak verification and strong verification for each divided data piece.Wherein weak verification is the weak verification of rolling, and strong verification is the MD5 verification.
Synchronous documents request receiver module 120 is the synchronous documents requests that received from client terminal device 10 by server unit 12.
Server data piece cutting module 121 adopts the data block cutting method identical with client data piece cutting module 101 file A to be divided into the data block that differs in size after server unit 12 receives synchronization request.
Server data block check module 122 adopts the weak verification and the verification by force of the data block method of calibration calculation services apparatus 12 divided data pieces identical with client data block check module 102.Wherein weak verification is the weak verification of rolling, and strong verification is the MD5 verification.
Data block check value sending module 123 sends to client terminal device 10 with the weak check value and the strong check value of server unit 12 divided data pieces.
Preferable, in data block check value sending module 123, first compression unit is set, enable data compression technique and come the transmission data block check value.
Data block check value receiver module 103 receives weak check value and the strong check value that sends from server unit 12.
Preferable, in data block check value receiver module 103, first decompression unit is set, enable the data decompression technology and come decompressed data block check value.
The missing data piece is searched module 104 by the data block among the file A that has had among the hash algorithm locating file B, and data de-duplication is carried out in the data block set of disappearance, to the unduplicated missing data piece of server unit 12 requests.
Data block sending module 124 is to allow server unit 12 send the data block of being asked to client terminal device 10.Second compression unit is set in data block sending module 124, enables data compression and transmit the data block of being asked.
106 receptions of file restructure module reconstruct the copy of file A from the data block of data block sending module 124.Second decompression unit is set in file restructure module 106, enables the data block that data decompression decompresses and receives.
The foregoing description provides to those of ordinary skills and realizes or use of the present invention; those of ordinary skills can be under the situation that does not break away from invention thought of the present invention; the foregoing description is made various modifications or variation; thereby protection scope of the present invention do not limit by the foregoing description, and should be the maximum magnitude that meets the inventive features that claims mention.

Claims (9)

1. the method for data sync between the network node comprises the steps:
Step 1: user end to server sends the request with synchronous documents;
Step 2: the data block that the elongated data block cutting method of customer end adopted becomes to differ in size with first file division, for each data block is calculated weak verification and strong verification;
Step 3: after server receives synchronization request, the data block that adopts the elongated data block cutting method identical that second file division is become to differ in size with client, and the employing verification account form identical with client be that each data block is calculated weak verification and strong verification, and step 3 and step 2 are carried out simultaneously;
Step 4: server sends the weak check value and the strong check value of the data block of second file to client;
Step 5: receive the weak check value and strong check value of data block in client after, search the data block in second file that has had in first file by hash algorithm, and the data block set of disappearance carried out data de-duplication, then to the unduplicated missing data piece of server requests;
Step 6: server sends the data block of being asked to client;
Step 7: after client receives required data block, utilize data block that receives and the data block that has had, reconstruct the copy of second file.
2. the method for data sync is characterized in that between the network node according to claim 1, in step 4, enables data compression and comes the transmission data block check value, and corresponding in step 5, needing decompresses earlier do subsequent treatment again.
3. the method for data sync between the network node according to claim 1, it is characterized in that in step 6, the mode that server is enabled data compression is transmitted the data block of being asked, corresponding in step 7, need the operation that decompress to packed data earlier.
4. the method for data sync is characterized in that between the network node according to claim 1, and client and server is the weak verification of rolling to the weak verification of data block, and client and server is the MD5 verification to the strong verification of data block.
5. the method for data sync is characterized in that between the network node according to claim 1, and the elongated data block cutting method in step 2 and the step 3 comprises following step:
Calculate the buffering area free space size, read corresponding data from file and fill buffering area;
Judge whether to be last data block, if then flow process end after handling last data block, whether enough if not the remaining space of then judging buffering area, if the space inadequately then return the step that reading of data is filled buffering area, if the space enough then calculation window data block fingerprint;
Judge whether window data piece fingerprint satisfies condition, if satisfy condition then handle the normal data piece, if do not satisfy condition then handle improper data block;
Carry out the data processing of buffering area and window, return the step of judging that the buffering area remaining space is whether enough then.
6. data synchronization system between the network node, comprise client terminal device and server unit, wherein client terminal device comprises that synchronous documents request sending module, client data piece cutting module, client data block check module, data block check value receiver module, missing data piece search module, data block receiver module and file restructure module, server unit comprises synchronous documents request receiver module, server data piece cutting module, server data block check module, data block check value sending module, data block sending module, wherein:
The synchronous documents request sending module sends the synchronous documents request by user end to server;
Client data piece cutting module, the data block that the elongated data block cutting method of customer end adopted becomes to differ in size with first file division;
Client data block check module couples client data piece cutting module, for each divided data piece calculates weak verification and strong verification;
Synchronous documents request receiver module is set up communication with the synchronous documents request sending module and is connected, by the synchronous documents request of server reception from client;
Server data piece cutting module couples synchronous documents request receiver module, the data block that adopts the data block cutting method identical with client data piece cutting module that second file division is become to differ in size after server receives synchronization request;
Server data block check module couples server data piece cutting module, adopts the weak verification and the strong verification of the data block method of calibration calculation server divided data piece identical with client data block check module;
Data block check value sending module couples server data block check module, and the weak check value of server divided data piece and strong check value are sent to client;
Data block check value receiver module receives the weak check value and the strong check value that send from server;
The missing data piece is searched module, couple data block check value receiver module, search the data block in second file that has had in first file by hash algorithm, data de-duplication is carried out in the data block set of disappearance, to the unduplicated missing data piece of server requests;
Data block sending module, server send the data block of being asked to client;
The file restructure module receives the data block from the data block sending module, reconstructs the copy of second file.
7. data synchronization system between the network node according to claim 6, it is characterized in that, be provided with first compression unit in the data block check value sending module, enable data compression and come the transmission data block check value, be provided with first decompression unit in the data block check value receiver module, log-on data decompresses decompressed data block check value.
8. data synchronization system between the network node according to claim 6, it is characterized in that, be provided with second compression unit in the data block sending module, enable data compression and transmit the data block of being asked, be provided with second decompression unit in the file restructure module, enable data decompression and come the decompressed data piece.
9. data synchronization system between the network node according to claim 6 is characterized in that, client and server is the weak verification of rolling to the weak verification of data block, and client and server is the MD5 verification to the strong verification of data block.
CN2010106221550A 2010-12-31 2010-12-31 Method and system for synchronizing data among network nodes Pending CN102065098A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2010106221550A CN102065098A (en) 2010-12-31 2010-12-31 Method and system for synchronizing data among network nodes

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2010106221550A CN102065098A (en) 2010-12-31 2010-12-31 Method and system for synchronizing data among network nodes

Publications (1)

Publication Number Publication Date
CN102065098A true CN102065098A (en) 2011-05-18

Family

ID=44000199

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2010106221550A Pending CN102065098A (en) 2010-12-31 2010-12-31 Method and system for synchronizing data among network nodes

Country Status (1)

Country Link
CN (1) CN102065098A (en)

Cited By (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103051445A (en) * 2011-10-12 2013-04-17 英飞凌科技股份有限公司 Apparatus and method for producing a bit sequence
CN103118019A (en) * 2013-01-24 2013-05-22 南开大学 File network copy method based on duplicate removal
CN103177082A (en) * 2013-02-21 2013-06-26 用友软件股份有限公司 Main server, slave server, index synchronizing system and index synchronizing method
CN103428242A (en) * 2012-05-18 2013-12-04 阿里巴巴集团控股有限公司 Method, device and system for increment synchronization
CN103455420A (en) * 2013-08-16 2013-12-18 华为技术有限公司 Test data construction method and equipment
CN103795783A (en) * 2014-01-14 2014-05-14 上海上讯信息技术股份有限公司 Data synchronization method and system
CN103873522A (en) * 2012-12-14 2014-06-18 联想(北京)有限公司 Electronic equipment, and file partitioning method applied to same
CN104063377A (en) * 2013-03-18 2014-09-24 联想(北京)有限公司 Information processing method and electronic equipment using same
CN104079539A (en) * 2013-03-28 2014-10-01 阿里巴巴集团控股有限公司 Data privacy storage method and client
CN104348884A (en) * 2013-08-08 2015-02-11 中国科学院计算机网络信息中心 Cloud storage automatic synchronization method
CN104539578A (en) * 2014-12-01 2015-04-22 中国科学院计算机网络信息中心 Active synchronizing method and system for RPKI (Resource Public Key Infrastructure) data
CN104639606A (en) * 2014-12-29 2015-05-20 曙光信息产业(北京)有限公司 Optimization method for differentiated contrast of blocks
CN105162855A (en) * 2015-08-18 2015-12-16 浪潮(北京)电子信息产业有限公司 Incremental data synchronization method and device
CN105302486A (en) * 2015-10-20 2016-02-03 山东乾云启创信息科技股份有限公司 Virtual offline desktop block device storage synchronization method
CN105763644A (en) * 2016-04-21 2016-07-13 广州杰赛科技股份有限公司 Cloud disk file synchronization updating method and apparatus thereof
CN105872017A (en) * 2016-03-18 2016-08-17 清华大学 Method and apparatus for carrying out file differential encoding synchronization at web page side
CN105912268A (en) * 2016-04-12 2016-08-31 韶关学院 Distributed data deduplocation method and apparatus based on self-matching characteristics
CN106850842A (en) * 2012-06-28 2017-06-13 北京奇虎科技有限公司 A kind of download of file, method for uploading and device
CN107092444A (en) * 2017-05-03 2017-08-25 郑州云海信息技术有限公司 A kind of system and implementation method of the consistency verification of data based on rsync
CN107196998A (en) * 2017-04-28 2017-09-22 华中科技大学 Mobile Web request processing method, equipment and system based on data deduplication
CN107480267A (en) * 2017-08-17 2017-12-15 无锡清华信息科学与技术国家实验室物联网技术中心 A kind of method that file difference synchronizing speed is improved using locality
CN107682016A (en) * 2017-09-26 2018-02-09 深信服科技股份有限公司 A kind of data compression method, data decompression method and related system
US9900259B2 (en) 2013-01-30 2018-02-20 Huawei Technologies Co., Ltd. Data transmission method and related apparatus to compress data to be transmitted on a network
CN107958027A (en) * 2017-11-16 2018-04-24 南京邮电大学 A kind of Sensor Network data capture method ensured with QoS
CN108199827A (en) * 2018-01-09 2018-06-22 武汉斗鱼网络科技有限公司 Client code integrity checking method, storage medium, electronic equipment and system
CN108762798A (en) * 2017-04-25 2018-11-06 腾讯科技(深圳)有限公司 A kind of method and device of incremental update file
CN109582653A (en) * 2018-11-14 2019-04-05 网易(杭州)网络有限公司 Compression, decompression method and the equipment of file
WO2019071801A1 (en) * 2017-10-10 2019-04-18 语联网(武汉)信息技术有限公司 Data synchronization method
CN109947776A (en) * 2019-03-15 2019-06-28 海南新软软件有限公司 A kind of data compression, decompressing method and device
CN111125258A (en) * 2019-12-26 2020-05-08 哈尔滨工业大学(深圳) Data synchronization method, client, server and system
CN111475269A (en) * 2020-04-02 2020-07-31 北京代码乾坤科技有限公司 Physical settlement sanction method and device
CN111581031A (en) * 2020-05-13 2020-08-25 上海英方软件股份有限公司 Data synchronization method and device based on RDC (remote data center) indefinite-length partitioning strategy
CN111651421A (en) * 2020-06-05 2020-09-11 南方电网科学研究院有限责任公司 Improved Rsync method, device and information synchronization system
CN113094437A (en) * 2021-04-14 2021-07-09 深圳前海移联科技有限公司 Block chain state data synchronization method and system based on Rsync

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101539950A (en) * 2009-05-08 2009-09-23 成都市华为赛门铁克科技有限公司 Data storage method and device
CN101595459A (en) * 2006-12-01 2009-12-02 美国日本电气实验室公司 The method and system that is used for quick and efficient data management and/or processing
CN101788976A (en) * 2010-02-10 2010-07-28 北京播思软件技术有限公司 File splitting method based on contents

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101595459A (en) * 2006-12-01 2009-12-02 美国日本电气实验室公司 The method and system that is used for quick and efficient data management and/or processing
CN101539950A (en) * 2009-05-08 2009-09-23 成都市华为赛门铁克科技有限公司 Data storage method and device
CN101788976A (en) * 2010-02-10 2010-07-28 北京播思软件技术有限公司 File splitting method based on contents

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
汤晓迪等: "远程文件差异同步系统的设计与实现", 《计算机工程与设计》, no. 20, 31 October 2010 (2010-10-31) *

Cited By (49)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103051445A (en) * 2011-10-12 2013-04-17 英飞凌科技股份有限公司 Apparatus and method for producing a bit sequence
CN103428242A (en) * 2012-05-18 2013-12-04 阿里巴巴集团控股有限公司 Method, device and system for increment synchronization
CN103428242B (en) * 2012-05-18 2016-12-14 阿里巴巴集团控股有限公司 A kind of method of increment synchronization, Apparatus and system
CN106850842A (en) * 2012-06-28 2017-06-13 北京奇虎科技有限公司 A kind of download of file, method for uploading and device
CN103873522A (en) * 2012-12-14 2014-06-18 联想(北京)有限公司 Electronic equipment, and file partitioning method applied to same
CN103118019A (en) * 2013-01-24 2013-05-22 南开大学 File network copy method based on duplicate removal
US9900259B2 (en) 2013-01-30 2018-02-20 Huawei Technologies Co., Ltd. Data transmission method and related apparatus to compress data to be transmitted on a network
CN103177082B (en) * 2013-02-21 2016-07-06 用友网络科技股份有限公司 Master server, from server, index synchro system and index synchronous method
CN103177082A (en) * 2013-02-21 2013-06-26 用友软件股份有限公司 Main server, slave server, index synchronizing system and index synchronizing method
CN104063377B (en) * 2013-03-18 2017-06-27 联想(北京)有限公司 Information processing method and use its electronic equipment
CN104063377A (en) * 2013-03-18 2014-09-24 联想(北京)有限公司 Information processing method and electronic equipment using same
CN104079539A (en) * 2013-03-28 2014-10-01 阿里巴巴集团控股有限公司 Data privacy storage method and client
CN104079539B (en) * 2013-03-28 2017-09-08 阿里巴巴集团控股有限公司 A kind of data confidentiality storage method and client
CN104348884A (en) * 2013-08-08 2015-02-11 中国科学院计算机网络信息中心 Cloud storage automatic synchronization method
CN104348884B (en) * 2013-08-08 2018-05-01 中国科学院计算机网络信息中心 A kind of cloud storage automatic synchronous method
CN103455420B (en) * 2013-08-16 2016-06-15 华为技术有限公司 A kind of building method testing data and equipment
CN103455420A (en) * 2013-08-16 2013-12-18 华为技术有限公司 Test data construction method and equipment
CN103795783A (en) * 2014-01-14 2014-05-14 上海上讯信息技术股份有限公司 Data synchronization method and system
CN104539578A (en) * 2014-12-01 2015-04-22 中国科学院计算机网络信息中心 Active synchronizing method and system for RPKI (Resource Public Key Infrastructure) data
CN104539578B (en) * 2014-12-01 2018-03-16 中国科学院计算机网络信息中心 A kind of active synchronization method and system of RPKI data
CN104639606A (en) * 2014-12-29 2015-05-20 曙光信息产业(北京)有限公司 Optimization method for differentiated contrast of blocks
CN104639606B (en) * 2014-12-29 2018-03-16 曙光信息产业(北京)有限公司 A kind of optimization method of differentiation contrast piecemeal
CN105162855A (en) * 2015-08-18 2015-12-16 浪潮(北京)电子信息产业有限公司 Incremental data synchronization method and device
CN105302486A (en) * 2015-10-20 2016-02-03 山东乾云启创信息科技股份有限公司 Virtual offline desktop block device storage synchronization method
CN105872017A (en) * 2016-03-18 2016-08-17 清华大学 Method and apparatus for carrying out file differential encoding synchronization at web page side
CN105912268A (en) * 2016-04-12 2016-08-31 韶关学院 Distributed data deduplocation method and apparatus based on self-matching characteristics
CN105912268B (en) * 2016-04-12 2020-08-28 韶关学院 Distributed repeated data deleting method and device based on self-matching characteristics
CN105763644A (en) * 2016-04-21 2016-07-13 广州杰赛科技股份有限公司 Cloud disk file synchronization updating method and apparatus thereof
CN108762798B (en) * 2017-04-25 2021-11-26 腾讯科技(深圳)有限公司 Method and device for updating file in increment mode
CN108762798A (en) * 2017-04-25 2018-11-06 腾讯科技(深圳)有限公司 A kind of method and device of incremental update file
CN107196998A (en) * 2017-04-28 2017-09-22 华中科技大学 Mobile Web request processing method, equipment and system based on data deduplication
CN107196998B (en) * 2017-04-28 2020-07-10 华中科技大学 Mobile Web request processing method, equipment and system based on data deduplication
CN107092444A (en) * 2017-05-03 2017-08-25 郑州云海信息技术有限公司 A kind of system and implementation method of the consistency verification of data based on rsync
CN107480267A (en) * 2017-08-17 2017-12-15 无锡清华信息科学与技术国家实验室物联网技术中心 A kind of method that file difference synchronizing speed is improved using locality
CN107682016A (en) * 2017-09-26 2018-02-09 深信服科技股份有限公司 A kind of data compression method, data decompression method and related system
CN107682016B (en) * 2017-09-26 2021-09-17 深信服科技股份有限公司 Data compression method, data decompression method and related system
WO2019071801A1 (en) * 2017-10-10 2019-04-18 语联网(武汉)信息技术有限公司 Data synchronization method
CN107958027A (en) * 2017-11-16 2018-04-24 南京邮电大学 A kind of Sensor Network data capture method ensured with QoS
CN108199827A (en) * 2018-01-09 2018-06-22 武汉斗鱼网络科技有限公司 Client code integrity checking method, storage medium, electronic equipment and system
CN109582653B (en) * 2018-11-14 2020-12-08 网易(杭州)网络有限公司 Method and device for compressing and decompressing files
CN109582653A (en) * 2018-11-14 2019-04-05 网易(杭州)网络有限公司 Compression, decompression method and the equipment of file
CN109947776A (en) * 2019-03-15 2019-06-28 海南新软软件有限公司 A kind of data compression, decompressing method and device
CN111125258A (en) * 2019-12-26 2020-05-08 哈尔滨工业大学(深圳) Data synchronization method, client, server and system
CN111125258B (en) * 2019-12-26 2023-03-28 哈尔滨工业大学(深圳) Data synchronization method, client, server and system
CN111475269A (en) * 2020-04-02 2020-07-31 北京代码乾坤科技有限公司 Physical settlement sanction method and device
CN111581031A (en) * 2020-05-13 2020-08-25 上海英方软件股份有限公司 Data synchronization method and device based on RDC (remote data center) indefinite-length partitioning strategy
CN111651421A (en) * 2020-06-05 2020-09-11 南方电网科学研究院有限责任公司 Improved Rsync method, device and information synchronization system
CN113094437A (en) * 2021-04-14 2021-07-09 深圳前海移联科技有限公司 Block chain state data synchronization method and system based on Rsync
CN113094437B (en) * 2021-04-14 2023-10-03 深圳前海移联科技有限公司 Method and system for synchronizing state data of blockchain based on Rsync

Similar Documents

Publication Publication Date Title
CN102065098A (en) Method and system for synchronizing data among network nodes
US9514209B2 (en) Data processing method and data processing device
US7636767B2 (en) Method and apparatus for reducing network traffic over low bandwidth links
US8972358B2 (en) File storage apparatus, file storage method, and program
US20120327956A1 (en) Flow compression across multiple packet flows
CN103116615B (en) A kind of data index method and server based on version vector
RU2005104018A (en) DATA COMPRESSION
US9002806B1 (en) Compression of data transmitted over a network
US8924591B2 (en) Method and device for data segmentation in data compression
CN105812351A (en) Method and system for sharing session
WO2014128707A1 (en) Increased data transfer rate method and system for regular internet user
CN101968796B (en) Method for segmenting bidirectionally and concurrently executed file level variable-length data
CN106156037B (en) Data processing method, apparatus and system
WO2013078797A1 (en) Network file transmission method and system
WO2013097812A1 (en) Method and system for downloading font file
CN104954497A (en) Data transmission method and system for cloud storage system
CN105554081A (en) File difference transmission method and device
CN103841144A (en) Cloud storage system and method, user terminal and cloud storage server
CN105074688B (en) Use the data deduplication based on stream of peer node figure
US9954931B2 (en) Apparatus and method for transmitting file using a different transmission scheme according to whether the file is a first transmission file
CN108900621A (en) A kind of otherness cloud synchronous method calculating mode based on mist
US7779299B2 (en) Efficiently re-starting and recovering synchronization operations between a client and server
CN105610979B (en) Network resource transmission system and method based on virtualization technology
CN102932277B (en) Data cache method and system
CN114385747A (en) Mobile internet rapid data synchronization method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20110518