WO2012159532A1

WO2012159532A1 - Data processing method and device

Info

Publication number: WO2012159532A1
Application number: PCT/CN2012/075411
Authority: WO
Inventors: 任欣; 何非
Original assignee: 成都市华为赛门铁克科技有限公司
Priority date: 2011-05-25
Filing date: 2012-05-12
Publication date: 2012-11-29
Also published as: CN102202098A

Abstract

Disclosed is a data processing method, comprising: calculating fingerprint data of each data block of a file to be backed up and fingerprint data of the data block after an offset of an address of each data block of the file to be backed up is progressively increased; sending the fingerprint data of each data block of the file to be backed up and the fingerprint data of the data block after the offset of the address of each data block of the file to be backed up is progressively increased to a server, for comparison with fingerprint data of each data block of a file stored by the server and fingerprint data of the data block after an offset of each data bock of the stored file is progressively increased; and sending changed data to the server according to a comparison result sent by the server. The data de-duplication rate can be improved.

Description

Data processing method and device The application is submitted to the Chinese Patent Office on May 25, 2011, and the application number is

The priority of the Chinese patent application entitled "Data Processing Method and Apparatus" is hereby incorporated by reference. Technical field

The present invention relates to the field of storage, and in particular, to a data processing method and apparatus.

Background technique

As the amount of data in an enterprise continues to increase, a large amount of duplicate data poses a serious challenge to storage. Data de-duplication (De-Dupe), as an important technology to reduce data storage costs by effectively reducing data, has become the focus of attention.

In the implementation of the data deduplication technology, the system calculates and checks the fingerprint data of the data block (or file), which is data for uniquely identifying a certain data block of a file or file, and determines whether the data block is already The stored metadata is duplicated. If it is repeated, it only needs to keep a pointer to the metadata. If the fingerprint data shows that the data block is brand new, the data block is retained and used as metadata for later use.

In the existing deduplication technology, most of the files to be backed up are cut by a fixed length data block cutting method, and if the client performs the first backup, if it is modified in the head or the middle of the file, for example, inserting and deleting. Update, etc. At this time, if the conventional fixed-length data block cutting method is used, even if the amount of data modified for the original backup file is small, the existing data blocks in the original file will move in order, so after the change The number of duplicated blocks that were previously backed up in the file will be reduced, which will reduce the efficiency of deduplication, resulting in more data blocks being transmitted to the server. This will increase the consumption of network bandwidth. Aspects will increase server-side data storage. Summary of the invention

The embodiment of the invention provides a data processing method and device, which can effectively reduce server-side data storage and further improve the deduplication rate under the premise of ensuring the unique storage of the server-side backup file.

The data processing method provided by the embodiment of the present invention includes:

Calculating fingerprint data of each data block of the file to be backed up, and fingerprint data of the data block after the offset of the address of each data block of the file to be backed up is increased;

Transmitting the fingerprint data of each data block of the file to be backed up and the fingerprint data of the data block with the offset of the address of each data block of the file to be backed up to the server for use with the file saved by the server The fingerprint data of each data block and the fingerprint data of the data block with the offset of the address of each data block of the saved file are compared;

According to the comparison result sent by the server, the data block in which the fingerprint data is inconsistent and the pointer of the data block in which the fingerprint data is inconsistent are sent to the server. The data processing method provided by the embodiment of the present invention includes:

Receiving fingerprint data of each data block of the file to be backed up sent by the client, and fingerprint data of the data block after the offset of the address of each data block of the to-be-backed file is incremented;

Fingerprint data of each data block of the file to be backed up, and fingerprint data of the data block after incrementing the offset of the address of each data block of the file to be backed up, and fingerprint data of each data block of the saved file, And comparing the fingerprint data of the data block with the offset of the address of each data block of the saved file;

The comparison result is sent to the client, and a data block in which the fingerprint data sent by the client is inconsistent and a pointer of the data block in which the fingerprint data is inconsistent are received. The client provided by the embodiment of the present invention includes:

a first calculation module, configured to calculate fingerprint data of each data block of the file to be backed up, and fingerprint data of the data block with an offset of an address of each data block of the file to be backed up; Fingerprint data of each data block of the file to be backed up, and The fingerprint data of the data block with the offset of the address of each data block of the file to be backed up is sent to the server, and the fingerprint data of each data block of the file saved with the server, and each data block of the saved file Aligning the fingerprint data of the data block after the offset of the address is incremented;

a first receiving module, configured to receive a comparison result sent by the server;

And a second sending module, configured to send, according to the comparison result sent by the server, a data block in which the fingerprint data is inconsistent and a pointer of the data block in which the fingerprint data is inconsistent to the server. The server provided by the embodiment of the present invention includes:

a third receiving module, configured to receive fingerprint data of each data block of the file to be backed up sent by the client, and fingerprint data of the data block with an offset of the address of each data block of the file to be backed up;

a first comparison module, configured to: fingerprint data of each data block of the file to be backed up, and fingerprint data of the data block with the offset of the address of each data block of the file to be backed up and the saved file The fingerprint data of each data block and the fingerprint data of the data block with the offset of the address of each data block of the saved file are compared;

And a fourth sending module, configured to send the comparison result to the client, and receive a data block in which the fingerprint data sent by the client is inconsistent and a pointer of the data block in which the fingerprint data is inconsistent.

As can be seen from the above technical solutions, the embodiments of the present invention have the following advantages:

In the embodiment of the present invention, when the backup file to be backed up by the client needs to be backed up again, the fingerprint data of the data block of the file to be backed up and the fingerprint data of the data block of the file to be backed up are incremented. Comparing with the fingerprint data of the data block of the file saved by the server side, and the fingerprint data of the data block whose offset of the address of the data block of the file is increasing, and then transmitting the data corresponding to the data block whose fingerprint data has changed to server. Therefore, under the premise of ensuring the unique storage of the server-side backup file, the server-side data storage can be effectively reduced, and the deduplication ratio can be further improved. DRAWINGS

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly described. It is obvious that the drawings in the following description are only some embodiments of the present invention. Other drawings may also be obtained from those of ordinary skill in the art in view of the drawings.

1 is a flow chart of a data processing method according to an embodiment of the present invention;

2 is a flow chart of another data processing method according to an embodiment of the present invention;

3 is a flowchart of still another data processing method in an embodiment of the present invention;

4 is a flowchart of still another data processing method in an embodiment of the present invention;

5 is a data interaction diagram between a client and a server in an embodiment of the present invention;

6 is a schematic structural diagram of a client according to an embodiment of the present invention;

FIG. 7 is a schematic structural diagram of another client according to an embodiment of the present invention; FIG.

8 is a schematic structural diagram of a server according to an embodiment of the present invention;

9 is a schematic structural diagram of another server according to an embodiment of the present invention;

Figure 10 (a) is a schematic diagram of data block division in the embodiment of the present invention;

Figure 10 (b) is a schematic diagram of one fingerprint data of a data block in the embodiment of the present invention; Figure 10 (c) is a schematic diagram of another fingerprint data of the data block in the embodiment of the present invention; Figure 10 (d) is an implementation of the present invention A further fingerprint data diagram of the data block in the example. detailed description

The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is a partial embodiment of the invention, and not all of the embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention. Referring to FIG. 1 , it is a data processing method according to an embodiment of the present invention. In the embodiment of the present invention, the client has completed the first backup of the file, and after a period of time, the original file changes, that is, the changed The file is backed up to the server, and the server updates the metadata;

Step S401: Calculate fingerprint data of each data block of the file to be backed up, and fingerprint data of the data block with the offset of the address of each data block of the file to be backed up;

Specifically, the fingerprint data refers to a similar algorithm of SHA-1, MD-5, etc., which generates a 32-bit or 128-bit hash value and a data corresponding to each data block of the file to be backed up, thereby forming a correspondence relationship. The generated fingerprint data is a unique identifier of the file data to be backed up. In addition to using the above enumerated algorithms, in this embodiment and other subsequent embodiments, other algorithms may be used to generate fingerprint data according to specific needs. As long as the algorithm of the client and the server is consistent or corresponding, and the data generated by the algorithm can uniquely identify the data block to be backed up;

In the embodiment of the present invention, the file to be backed up is divided into data blocks according to a given length, and the data block is a physical record of data, where the data block can be understood as a group or a group of records continuously arranged in order, which is the main a unit of data transmitted between the memory and the input, output device or external memory;

Here, the offset of the address of each data block of the file is incremented by the fingerprint data of the data block, and the fingerprint data of the data block.

Step S402: Send the fingerprint data of each data block of the file to be backed up and the fingerprint data of the data block with the offset of the address of each data block of the file to be backed up to the server, and use for each file saved by the server. The fingerprint data of the data block and the fingerprint data of the data block with the offset of the address of each data block of the saved file are compared;

Specifically, the client sends the calculated fingerprint data of each data block and the fingerprint data of the data block with the offset of the address of each data block of the file to the server for service. The server sequentially compares the fingerprint data of each data block of the file saved by the server with the fingerprint data of the data block of the address of each data block of the saved file; the file saved by the server here The fingerprint data of each data block refers to the fingerprint data of the data block divided by the given length when the metadata sent by the client to the server is first backed up;

Here, the fingerprint data of the data block after the offset of the address of each data block of the saved file is increased by 1 by the offset of the address of the data block divided by the given length of the file saved by the server. After that, the fingerprint data of the obtained data block.

Step S403: Send the data block and the pointer with inconsistent fingerprint data to the server according to the comparison result sent by the server.

Specifically, for the data block with the same comparison result, the server only needs to instruct the client to send the pointer of the data block to the server. For the data block with different comparison results, the server needs to instruct the client to send the data block.

It should be noted that, in the foregoing embodiment, a file may be backed up, or a file set composed of multiple files may be backed up. The specific backup method is similar, and details are not described herein.

The beneficial effects of the embodiment of the present invention are: when the backup file of the client needs to be backed up again, the fingerprint data of the data block of the file to be backed up and the offset of the address of the data block of the file to be backed up are incremented. The fingerprint data of the data block is compared with the fingerprint data of the data block of the file saved by the server, and the fingerprint data of the data block whose offset of the address of the data block of the file is increased, and then the data block of the fingerprint data is changed. The corresponding data is sent to the server. Therefore, under the premise of ensuring the unique storage of the server-side backup file, the server-side data storage can be effectively reduced, and the deduplication ratio can be further improved. Referring to FIG. 2, it is a data processing method disclosed in an embodiment of the present invention.

In the embodiment of the present invention, the client has completed the first backup of the file, and after a period of time, the original file changes, that is, the changed file needs to be backed up to the server, and the server is completed. The update of the metadata is different from that of the first embodiment. The first embodiment is described from the perspective of the client, and the embodiment is described from the perspective of the server;

Step S501: receiving fingerprint data of each data block of the file to be backed up sent by the client, and fingerprint data of the data block after the offset of the address of each data block of the file to be backed up is incremented;

Specifically, the fingerprint data refers to a similar correspondence between the data blocks of the file to be backed up to be backed up by a SHA-1, MD-5, and the like, and a 32-bit or 128-bit hash value is formed in a one-to-one correspondence with the data. The fingerprint data thus generated is a unique identifier of the file data to be backed up;

Step S502: the fingerprint data of each data block of the file to be backed up, and the fingerprint data of the data block after the offset of the address of each data block of the file to be backed up and the fingerprint data of each data block of the saved file, and The fingerprint data of the data block with the offset of the address of each data block of the saved file is compared;

Here, the fingerprint data of each data block of the file saved by the server refers to the fingerprint data of the data block divided by the given length when the metadata sent by the client to the server is first backed up;

Here, the offset of each data block of the saved file is incremented by the fingerprint data of the data block, and then the fingerprint data of the obtained data block;

It should be noted that the server can use the rsync rolling verification algorithm to compare the fingerprint data. The fingerprint data of the data block sent by the client, and the fingerprint data of the offset of the address plus the fingerprint data of each data block of the file saved by the server, and the offset of the address of each data block of the saved file. The fingerprint data of the incremented data block is compared. At the same time, it should be pointed out that the purpose of this step is to compare the fingerprint data. For convenience of explanation, the rsync algorithm is used as an example, but obviously, in addition to the rsync algorithm, the field The technician can select other algorithms according to the actual situation;

For example: Take a data block with a fixed length of 2 as an example; for the first time, compare the fingerprint data of the first data block sent by the client with the fingerprint data of the first data block of the file saved by the server, if not If they are the same, compare them with the fingerprint data of the second data block of the file saved by the server. If they are not the same, compare them in sequence, until the fingerprint data of the last data block of the file saved by the server is still different. , and compares it with the fingerprint data of the address of the first data block of the file saved by the server plus one, and if not, the second data of the file saved with the server. The offset of the block address plus 1 data block fingerprint data is compared. If the matching fingerprint data is still not found, the client may be notified to send the data block; the second time, the first one sent by the client The offset of the address of the data block plus the fingerprint data of the data block of 1 according to the first step, in turn with the data block of the file saved by the server, and the offset plus one The data blocks are compared.

It should be noted that, in the embodiment of the present invention, in addition to dividing the data of the file into data blocks according to the length of 2, the data block may be divided according to the length of 3, the fixed length of 4, etc.; When the data block is divided, when the fingerprint data of the data block sent by the client is compared with the fingerprint data of each data block of the server, not only the offset of the address of the data block but also the fingerprint data of the data block of 1 is compared. It is also necessary to sequentially compare the offset of the address of the data block with the fingerprint data of the data block of 2;

If the data block is divided by a length of 4, when the fingerprint data of the data block sent by the client is compared with the fingerprint data of each data block of the server side, not only the offset of the address of the data block is sequentially added plus one. The fingerprint data of the data block also needs to compare the addresses of the data blocks in turn. The fingerprint data of the data block with the offset plus 2, and the offset of the address of the data block plus the fingerprint data of the data block of 3;

By analogy, if the data block is divided by the length N, N is a natural number greater than 2, and the fingerprint data of the data block after the offset of the address of each data block of the file to be backed up is the file to be backed up. The offset of the address of the data block is incremented by 1 and sequentially incremented to the fingerprint data of the data block after N-1 is added.

Similarly, the data block with the offset of the address of each data block of the client is also incremented from the offset amount by one to the offset plus N -1 data block.

Step S503: Send the comparison result to the client, and receive the data block and the pointer that the fingerprint data sent by the client is inconsistent.

In step S502, the server obtains the changed data block by comparing, so that the comparison result can be sent to the client, and the client is instructed to send the data block and the pointer with inconsistent fingerprint data to the server.

The beneficial effects of the embodiment of the present invention are: when the backup file of the client needs to be backed up again, the fingerprint data of the data block of the file to be backed up and the offset of the address of the data block of the file to be backed up are incremented. The fingerprint data of the data block is compared with the fingerprint data of the data block of the file saved by the server, and the fingerprint data of the data block whose offset of the address of the data block of the file is increased, and then the data block of the fingerprint data is changed. The corresponding data is sent to the server. Therefore, under the premise of ensuring the unique storage of the server-side backup files, the server-side data storage can be effectively reduced, and the deduplication ratio can be further improved. Referring to FIG. 3, it is a data processing method according to another embodiment of the present invention. In the embodiment of the present invention, the client has completed the first backup of the file, and the first backup of the file is to completely back up the client file to the server, the server. Save the file as metadata.

After a period of time, the file may change, that is, the changed file needs to be backed up to the server to complete the server's update of the metadata. The following will specifically change the file. Description of the backup method after the:

Step S101: Calculate fingerprint data of the file to be backed up;

Specifically, the fingerprint data refers to a similar algorithm of SHA-1, MD-5, etc., to generate a 32-bit or 128-bit hash value and a data corresponding to the data to be backed up, and the fingerprint data generated thereby is The unique identifier of the file data to be backed up.

Step S102: Send the fingerprint data of the file to be backed up to the server, and compare the fingerprint data of the to-be-backed file with the fingerprint data of the file saved by the server;

Specifically, the client sends the fingerprint data of the file to be backed up obtained in step S101 to the server, and the server obtains the fingerprint data of the file through the calculation when the file is first backed up, and after receiving the fingerprint data sent by the client, If the fingerprint data is the same, the file does not change. If the fingerprint data is different, the file has changed.

The technical effect of step S102 and step S103 is that after a period of time t, if it is not determined whether the original file has changed, the fingerprint data of the current file can be compared with the fingerprint data of the original file to determine whether the original file has changed. If there is no change, the client sends the pointer of the file to the server, and does not need to perform the subsequent operations. However, in the embodiment of the present invention, the case where the original file changes is mainly discussed.

Step S103: The comparison result sent by the receiving server, when the comparison result is different, calculating fingerprint data of each data block of the file to be backed up, and fingerprint data of the data block with increasing offset of the address of each data block ;

Specifically, in the embodiment of the present invention, the file to be backed up is divided into data blocks according to a given length, and the data block is a physical record of data, where the data block can be understood as a group or a group of records continuously arranged in order. , is a data unit for transmission between the main memory and the input, output device or external memory;

The method for calculating the fingerprint data of each data block is consistent with the method for calculating the fingerprint data of the file to be backed up in step S101, and is not described here; In addition, when the comparison result is the same, it indicates that the file to be backed up has not changed compared with the metadata stored in the server, and the pointer of the backup file is sent to the server.

Step S104: Send the fingerprint data of each data block and the fingerprint data of the data block with the offset of the address of each data block of the file to the server, and use the fingerprint data of each data block of the file saved by the server, and The fingerprint data of the data block with the offset of the address of each data block of the saved file is compared;

Specifically, the client sends the calculated fingerprint data of each data block and the fingerprint data of the data block with the offset of the address of each data block of the file to the server, and the server sequentially saves it and the server. The fingerprint data of each data block of the file, and the fingerprint data of the data block after the offset of the address of each data block of the saved file is compared; here, the fingerprint data of each data block of the file saved by the server is Refers to the fingerprint data of the data block divided by the given length when the metadata sent by the client to the server is first backed up;

Here, the offset of the address of each data block of the saved file is incremented by the amount of shift of the data block, and the fingerprint data of the obtained data block is obtained.

Step S105: Send, according to the comparison result sent by the server, the data block and the pointer with inconsistent fingerprint data to the server;

Specifically, for the data block with the same comparison result, the server only needs to instruct the client to send the pointer of the data block to the server. For the data with different comparison results, the server needs to instruct the client to send the data.

It should be noted that, in the foregoing embodiment, a file may be backed up, or multiple files in a file set may be backed up. The specific backup method is similar, and details are not described herein.

The beneficial effects of the embodiment of the present invention are: when the backup file of the client needs to be backed up again, the fingerprint data of the data block of the file to be backed up and the offset of the address of the data block of the file to be backed up are incremented. The fingerprint data of the data block and the number of files saved on the server side The data block of the data block in which the offset of the address of the data block of the file is incremented is compared according to the fingerprint data of the block, and then the data corresponding to the data block whose fingerprint data has changed is transmitted to the server. Therefore, under the premise of ensuring the unique storage of the server-side backup file, the server-side data storage can be effectively reduced, and the deduplication ratio can be further improved. Referring to FIG. 4, it is a data processing method according to another embodiment of the present invention. In the embodiment of the present invention, the client has completed the first backup of the file, and the first backup of the file is to completely back up the client file to the server, the server. Save the file as metadata.

After a period of time, the file may change, that is, the changed file needs to be backed up to the server to complete the server's update of the metadata. The following describes the backup method after the file may be changed. The difference from the first embodiment is that the first embodiment is described from the perspective of the client, and the embodiment is described from the perspective of the server;

Step S201: Receive fingerprint data of the file to be backed up sent by the client;

Specifically, the fingerprint data refers to a similar algorithm of SHA-1, MD-5, etc., to generate a 32-bit or 128-bit hash value to be backed up by the data to be backed up, and the resulting fingerprint data is The unique identifier of the file data to be backed up.

Step S202: Comparing the fingerprint data of the file to be backed up with the fingerprint data of the saved file, and sending the comparison result to the client;

Specifically, the server obtains the fingerprint data of the file by performing calculation on the first backup of the file, and compares the fingerprint data sent by the client with the previously stored fingerprint data. If the fingerprint data is the same, the file does not occur. Change, if the fingerprint data is different, the file has changed.

The technical effect of step S202 and step S203 is that after a period of time t, if it is not determined whether the original file has changed, the fingerprint data of the current file can be compared with the fingerprint data of the original file to determine whether the original file has changed. If there is no change, the client sends the pointer of the file to the server, and does not need to perform the following operations, but in the implementation of the present invention The case focuses on the changes in the original documents.

Step S203: calculating fingerprint data of each data block of the saved file, and fingerprint data of the data block after the offset of the address of each data block of the saved file is incremented;

The method for calculating the fingerprint data of each data block is the same as the method for calculating the fingerprint data of the file to be backed up in step S201, and details are not described herein again;

Here, the offset of the address of each data block of the saved file is incremented by 1 and the fingerprint data of the obtained data block is calculated in the same manner as before, and will not be described here.

Step S204: Receive fingerprint data of each data block of the file to be backed up sent by the client, and fingerprint data of the data block with the offset of the address of each data block of the file, and compare the data with the saved file. The fingerprint data of the block and the fingerprint data of the data block in which the offset of the address of each data block of the saved file is incremented are compared.

Step S205: Send the comparison result to the client, and receive the data block and the pointer that the fingerprint data sent by the client is inconsistent;

Specifically, for the data block with the same comparison result, the server only needs to instruct the client to send the pointer of the data block to the server. For the data with different comparison results, the server needs to instruct the client to send the data block.

It should be noted that, in the foregoing embodiment, a file may be backed up, or multiple files included in the same file set may be backed up. The specific backup method is similar and will not be described again.

The beneficial effect of the embodiment of the present invention is that when the client to be backed up, the file needs to be changed. In the secondary backup, the fingerprint data of the data block of the file to be backed up, and the offset of the data block of the file to be backed up, and the fingerprint data of the data block of the file saved by the server, And the fingerprint data of the data block with the offset of the address of the data block of the file is compared, and then the data corresponding to the data block whose fingerprint data has changed is sent to the server. Therefore, under the premise of ensuring the unique storage of the server-side backup file, the server-side data storage can be effectively reduced, and the deduplication ratio can be further improved. Referring to FIG. 5, FIG. 5 is a specific example for explaining a data processing method according to an embodiment of the present invention. It should be noted that, for an embodiment in which data is first backed up, the following steps S301 and S302 are not necessary;

Step S301: The client backs up the file data to the server for the first time;

Step S302: The server saves the first backup file data and the fingerprint data of the file. Specifically, the server saves the backup file data sent by the client as metadata, and saves the fingerprint data of the file obtained by the calculation, for the fingerprint. The calculation method of the data has been explained in the previous embodiment, and will not be described again here.

S303: After a period of time t, the client calculates and sends fingerprint data of the file;

Specifically, after a period of time t, the original file may change, and the client needs to send the changed data to the server to implement data synchronization;

However, at this time, the client does not know whether the original file has changed. Therefore, it is necessary to verify whether the original file is changed or not. The specific method of verification is to calculate the fingerprint data of the current file and send it to the server.

Step S304: The server compares the fingerprint data sent by the client with the saved fingerprint data.

Specifically, the server receives the fingerprint data of the current file sent by the client, and compares the fingerprint data of the original file saved by the client. If the two fingerprint data are the same, the file does not change. If the two fingerprint data are different, The description file has changed, due to the present invention. The technical problem to be solved by the example is the processing after the file changes, so here we will focus on the situation after the file changes.

Step S305: Send a comparison result;

Specifically, the server sends the comparison result to the client by comparing the discovery file in step S304, and then performs the following steps. If the server does not change the file through the comparison in step S304, The files saved in the server need to be updated.

Step S306: Calculate and send fingerprint data of each data block, and fingerprint data of the data block with the offset of the address of each data block of the file is incremented;

The following is an example of inserting a bit of data in the header of the original file:

The original document is: 1234ABC;

After inserting a bit of data 0 in the header of the original file: 01234ABC;

In the embodiment of the present invention, the data block is divided by the fixed length 2, that is, the above 01234ABC can be divided into four data blocks as shown in FIG. 10(a);

In step S306, the client obtains the fingerprint data of each data block by calculation, and the fingerprint data of the data block after the offset of the address of each data block of the file is incremented, specifically, the data block (0, Fingerprint data of 1), (2, 3), (4, A), (B, C), and fingerprint data of (1, 2), (3, 4), (A, B), (C), And sending the fingerprint data to the server; Step S307: The server calculates the fingerprint data of each data block of the saved file and the fingerprint data of each data block after the address offset of each data block of the file is added, and sequentially connects with the client The fingerprint data of each data block sent by the terminal is compared;

Specifically, taking the file in step S306 as an example, the metadata saved by the server is also: 1234ABC; the fingerprint data of each data block and the fingerprint data of each data block after the address offset of each data block of the file is increased by one. As shown in Figure 10 (b): After receiving the fingerprint data of each data block of the current file sent by the client, the server sequentially stores the fingerprint data FP A of the first data block (0, 1) with the server. Fingerprint data FPA of data block (1, 2), fingerprint data FFP of data block (3, 4), data Fingerprint data FPC of block (A, B), fingerprint data FPD of data block (C), and fingerprint data FPE of data block (2, 3) with offset plus 1 and fingerprint data of data block (4, A) FPF, the fingerprint data FPG of the data block (B, C) is compared;

In the prior art, the server only calculates and stores the fingerprint data FPA, FPB, FPC, FPD of each data block of the metadata, and when compared with the fingerprint data of each data block of the current file, it can be found that the current The fingerprint data of each data block of the file is different from FPA, FPB, FPC, and FPD, so all data blocks of the current file are backed up to the server, which reduces the deduplication rate and increases the amount of data on the server side. Network bandwidth consumption;

In the implementation of the present invention, the server will use the fingerprint data FP A of the data block (0, 1), and FP A,

After FPB, FPC, FPD, FP E, FPF, and FPG are compared, the difference is found, and then the fingerprint data of the data block (1, 2) is incremented by 1 and the data block (1, 2) is found. The fingerprint data is the same as FPA, which means that one bit of data is added to the header of the original file, and the server sends the comparison result to the client, requesting the client to send data 0 to the server.

Next, the fingerprint data of the remaining data blocks are sequentially compared, and the data blocks can be found.

The fingerprint data of (3, 4) is the same as that of FPB. The fingerprint data of data block (A, B) is the same as that of FPC, and the fingerprint data of data block (C) is the same as FP D. It can be concluded that the current file is in the original file. The header has been incremented by one bit of data 0.

Step S308 sends a comparison result;

Step S309: Send a data block and a fingerprint with inconsistent fingerprint data.

Specifically, the client sends the data 0 inserted in the header of the original file to the server.

It can be seen that the embodiment of the present invention only sends the changed one-bit data and its pointer to the server, which improves the deduplication ratio compared with the prior art, and reduces the data storage and network bandwidth consumption of the server.

The data processing method of the embodiment of the present invention is described below by taking the data in the middle of the original file as an example. For example: Modify the original file 1234ABC to 15D23C;

The data block that the client divides the current file 15D23C according to the length 2 is (1, 5), (D, 2), (3, C) as shown in FIG. 10(c): It can be known from the above embodiment that the server saves the element. The data is: 1234ABC; the fingerprint data of each data block and the offset of each data block of the file plus one fingerprint data of each data block are as shown in FIG. 10(d): The server sends the data block sent by the client ( 1, 5) The fingerprint data FPA is compared with the calculated fingerprint data FPA, FPB, FPC, FPD, FPE, FPF, FPG, and no matching fingerprint data is found;

The server then compares the fingerprint data of the data block (5, D) sent by the client with the fingerprint data saved by the client, and finds that there is still no matching fingerprint data, indicating that the data block (1, 5) is a changed data. Piece;

Then, the server compares the fingerprint data of the data block (D, 2) sent by the client with the fingerprint data saved by the client, and finds that there is no fingerprint data matching the same;

The server then compares the fingerprint data of the data block (2, 3) sent by the client with the fingerprint data saved by the client, and finds that it matches the FP E, indicating that the current data block adds one bit of data to the original data block 23. D, therefore, can instruct the client to send the data D and its pointer to the server;

The server then compares the fingerprint data of the remaining data block C of the client with the fingerprint data stored by the client, and finds that it matches the FPD, so that the data block C does not change, and only the client is required to pointer the data block C. Send it to the server;

It should be noted that the client divides the file according to the given length, which is a logical division. It is not a true division of the file into several data blocks. The purpose is to facilitate comparison with the file data saved on the server side. Find the data that has changed, so the partitioning of the data block is not fixed. Take the above example as an example. When the offset of the data block (D, 2) is found, When the fingerprint data of the data block (2, 3) of 1 can find the matching fingerprint data on the server side, the data block (1, 2) can be logically used as a data block, and the data D of the previous bit is And the following data C as separate data blocks.

After the server completes the comparison, the data block (1, 5), the data block (D), and the pointers of the two data blocks, which the client will change, may be sent to the server by sending a comparison result.

It can be seen that the embodiment of the present invention can improve the data erasure rate and reduce the data storage capacity of the server and the consumption of the network bandwidth when the head or the middle of the file changes compared with the prior art. Referring to FIG. 6, a client disclosed in an embodiment of the present invention;

The first calculation module 601 is configured to calculate fingerprint data of each data block of the file to be backed up, and fingerprint data of the data block with the offset of the address of each data block of the file to be backed up, which is summarized in the embodiment of the present invention. Data is data used to uniquely identify a certain data block of a file or file;

Here, the offset of the address of each data block of the file is incremented by the fingerprint data of the data block, Fingerprint data of the data block.

The first sending module 602 is configured to send the fingerprint data of each data block of the file to be backed up and the fingerprint data of the data block with the offset of the address of each data block of the file to be backed up to the server, and the server The fingerprint data of each data block of the saved file, and the fingerprint data of the data block with the offset of the address of each data block of the saved file are compared; specifically, each data block that the client will calculate Fingerprint data, and the offset of the address of each data block of the file is incremented, and the fingerprint data of the data block is sent to the server, and the fingerprint data of each data block of the file that is sequentially saved by the server and the server, and the saved data. The fingerprint data of the data block after the offset of the address of each data block of the file is incremented; the fingerprint data of each data block of the file saved by the server herein refers to the metadata sent by the client to the server when the first backup is performed. Fingerprint data of a data block divided according to a given length;

Here, the fingerprint data of the data block after the offset of the address of each data block of the saved file is increased by 1 by the offset of the address of the data block divided by the given length of the file saved by the server. After that, the fingerprint data of the obtained data block. The first receiving module 603: is configured to receive a comparison result sent by the server.

The second sending module 604 is configured to send, according to the comparison result sent by the server, the data block and the pointer with inconsistent fingerprint data to the server.

The beneficial effects of the embodiment of the present invention are: when the client to be backed up file needs to be backed up again, the fingerprint data of the data block of the file to be backed up, and the data of the file to be backed up The fingerprint data of the data block after the offset of the block is incremented is compared with the fingerprint data of the data block of the file saved by the server, and the fingerprint data of the data block with the offset of the address of the data block of the file. Then, the data corresponding to the data block whose fingerprint data has changed is sent to the server. Therefore, under the premise of ensuring the unique storage of the server-side backup file, the server-side data storage can be effectively reduced, and the deduplication ratio can be further improved.

The foregoing embodiment may further include: a second calculating module 605, a third sending module 606, and a second receiving module 607;

a second calculating module 605, configured to calculate fingerprint data of the file to be backed up;

The third sending module 606 is configured to send the fingerprint data of the file to be backed up to the server, and compare the fingerprint data of the file to be backed up with the fingerprint data of the file saved by the server; specifically, the client obtains the calculation The fingerprint data of the file to be backed up is sent to the server. The server obtains the fingerprint data of the file through the calculation when the file is first backed up, and compares with the fingerprint data saved before receiving the fingerprint data sent by the client, if the fingerprint If the data is the same, the file has not changed. If the fingerprint data is different, the file has changed.

The second receiving module 607 is configured to receive a comparison result of the server.

The difference between this embodiment and the previous embodiment is that after a period of time t, if it is not determined whether the original file changes, the fingerprint data of the current file can be compared with the fingerprint data of the original file to determine the original. Whether the file has changed, if there is no change, the client sends the pointer of the file to the server, and does not need to perform the following operations. Referring to FIG. 6, it is a server disclosed in an embodiment of the present invention;

The third receiving module 701 is configured to: receive fingerprint data of each data block of the file to be backed up sent by the client, and fingerprint data of the data block with the offset of the address of each data block of the file to be backed up;

Specifically, the fingerprint data refers to a similar algorithm to be backed up by SHA-1, MD-5, and the like. Each of the data blocks of the backup file generates a 32-bit or 128-bit hash value and forms a one-to-one correspondence with the data, and the fingerprint data thus generated is a unique identifier of the file data to be backed up;

Here, the offset of the address of each data block of the file is incremented by the fingerprint data of the data block, and the fingerprint data of the data block. The first comparison module 702 is configured to: the fingerprint data of each data block of the file to be backed up, and the fingerprint data of the data block after the offset of the address of each data block of the file to be backed up and the data of the saved file The fingerprint data of the block and the fingerprint data of the data block with the offset of the address of each data block of the saved file are compared;

It should be noted that the server can use the rsync rolling check algorithm to compare the fingerprint data, and sequentially add the fingerprint data of the data block sent by the client, and the fingerprint data of the address offset by 1 to the data of the file saved by the server. The fingerprint data of the block and the fingerprint data of the data block with the offset of the address of each data block of the saved file are compared, and it is pointed out that the purpose of this step is to compare the fingerprint data for convenience. The description is based on the rsync algorithm, but it is obvious that in addition to the rsync algorithm, those skilled in the art may select other algorithms according to actual conditions;

For example: Take a data block with a fixed length of 2 as an example; the first time, the first data sent by the client The fingerprint data of the block is compared with the fingerprint data of the first data block of the file saved by the server. If not, compare it with the fingerprint data of the second data block of the file saved by the server, if not the same , then compare, until the fingerprint data of the last data block of the file saved by the server, if still not the same, then add 1 fingerprint data to the offset of the address of the first data block of the file saved by the server. For comparison, if it is not the same, compare it with the fingerprint data of the address of the second data block of the file saved by the server, and if the fingerprint data is still not found, Then, the client can be notified to send the data block; secondly, the fingerprint data of the data block of the first data block sent by the client is increased by 1 according to the first step, and sequentially saved with the server. The data block of the file is compared with the data block with the offset plus one.

It should be noted that, in the embodiment of the present invention, in addition to dividing the data of the file according to the length of 2, the data block may be divided according to the length of 3, the fixed length of 4, and the like;

If the data block is divided by a length of 3, when the fingerprint data of the data block sent by the client is compared with the fingerprint data of each data block of the server side, not only the offset of the address of the data block is sequentially added plus one. The fingerprint data of the data block also needs to compare the offset of the address of the data block and the fingerprint data of the data block of 2 in sequence;

If the data block is divided by a length of 4, when the fingerprint data of the data block sent by the client is compared with the fingerprint data of each data block of the server side, not only the offset of the address of the data block is sequentially added plus one. The fingerprint data of the data block also needs to compare the offset of the address of the data block and the fingerprint data of the data block of 2, and the offset of the address of the data block plus the fingerprint data of the data block of 3;

Similarly, the data block with the offset of the address of each data block of the client is also biased. The shift is incremented by one to the data block with the offset plus N - 1.

The fourth sending module 703 is configured to send the comparison result to the client, and receive a data block in which the fingerprint data sent by the client is inconsistent, and the pointer server compares the first comparison module 702 to obtain the changed data block, This can send the comparison result to the client, instructing the client to send the data block and pointer with inconsistent fingerprint data to the server. The beneficial effects of the embodiment of the present invention are: when the backup file of the client needs to be backed up again, the fingerprint data of the data block of the file to be backed up and the offset of the address of the data block of the file to be backed up are incremented. The fingerprint data of the data block is compared with the fingerprint data of the data block of the file saved by the server, and the fingerprint data of the data block whose offset of the address of the data block of the file is increased, and then the data block of the fingerprint data is changed. The corresponding data is sent to the server. Therefore, under the premise of ensuring the unique storage of the server-side backup files, the server-side data storage can be effectively reduced, and the deduplication ratio can be further improved.

In the above embodiment, the fourth receiving module 705 and the second comparing module 706 are further included; the fourth receiving module 705 is configured to receive fingerprint data of the file to be backed up sent by the client; The fingerprint data of the file to be backed up is compared with the fingerprint data of the saved file, and the comparison result is sent to the client, and the client determines whether the file to be backed up changes.

The difference between this embodiment and the previous embodiment is that after a period of time t, if it is not determined whether the original file changes, the fingerprint data of the current file can be compared with the fingerprint data of the original file to determine the original. Whether the file has changed, if there is no change, the client sends the pointer of the file to the server, and does not need to perform the following operations.

Further, in the above embodiment, the third calculating module 704 may be further included;

The third calculation module is configured to calculate and save the fingerprint data of each data block of the file, and the fingerprint data of the data block with the offset of the address of each data block of the saved file. The above description is only the preferred embodiment of the present invention, and is not intended to limit the scope of the invention, and the equivalent structure or equivalent flow transformation made by the specification and the drawings of the present invention may be directly or indirectly applied to other related The technical field is equally included in the scope of patent protection of the present invention.

A person skilled in the art can understand that all or part of the steps of implementing the above embodiments can be completed by a program to instruct related hardware, and the above program can be stored in a computer readable storage medium, the above mentioned storage medium. It can be a read-only memory, a disk or a disc, and the like.

The method and device for determining the energy consumption provided by the present invention are described in detail above. For those skilled in the art, according to the idea of the embodiment of the present invention, there are changes in the specific implementation manner and application scope. Therefore, the content of the specification should not be construed as limiting the invention.

Claims

Rights request

A data processing method, comprising:

According to the comparison result sent by the server, the data block in which the fingerprint data is inconsistent and the pointer of the data block in which the fingerprint data is inconsistent are sent to the server.

2. The method according to claim 1, further comprising:

Calculating the data of the document to be backed up;

Sending the fingerprint data of the file to be backed up to the server for comparison with the fingerprint data of the file saved by the server;

Receiving the comparison result sent by the server, when the file to be backed up changes, performing steps to calculate fingerprint data of each data block of the file to be backed up.

The method according to claim 1, wherein when the length of each data block of the file to be backed up is 2 bytes, the offset of the address of each data block of the file to be backed up is incremented Fingerprint data of the subsequent data block, including fingerprint data of the data block after the offset of the address of each data block of the file to be backed up is increased;

When the length of each data block of the file to be backed up is N bytes, N is a natural number greater than 2, and the fingerprint data of the data block after the offset of the address of each data block of the file to be backed up is increased, The offset of the address of each data block including the file to be backed up is incremented by 1 and sequentially incremented to the fingerprint data of the data block after adding N-1.

A data processing method, comprising: Receiving fingerprint data of each data block of the file to be backed up sent by the client, and fingerprint data of the data block with an offset of the address of each data block of the file to be backed up;

The comparison result is sent to the client, and a data block in which the fingerprint data sent by the client is inconsistent and a pointer of the data block in which the fingerprint data is inconsistent are received.

5. The method according to claim 4, further comprising:

Receiving fingerprint data of the file to be backed up sent by the client;

Comparing the fingerprint data of the file to be backed up with the fingerprint data of the saved file, and sending the comparison result to the client, when the client determines that the file to be changed changes, performing step receiving Fingerprint data of each data block of the file to be backed up sent by the client.

The method according to claim 4, further comprising:

The fingerprint data of each data block of the saved file is calculated and saved, and the fingerprint data of the data block in which the offset of the address of each data block of the saved file is incremented.

The method according to claim 4, wherein when the length of each data block of the file to be backed up is 2 bytes, the offset of the address of each data block of the file to be backed up is incremented Fingerprint data of the subsequent data block, including fingerprint data of the data block after the offset of the address of each data block of the file to be backed up is increased;

8. A client, comprising:

a first calculation module, configured to calculate fingerprint data of each data block of the file to be backed up, and The fingerprint data of the data block in which the offset of the address of each data block of the backup file is incremented; the first sending module, the fingerprint data of each data block of the file to be backed up, and the file to be backed up The fingerprint data of the data block after the offset of the address of each data block is incremented is sent to the server, and the fingerprint data of each data block of the file saved with the server, and the offset of the address of each data block of the saved file. The fingerprint data of the data block after the increment is compared;

And a second sending module, configured to send, according to the comparison result sent by the server, a data block in which the fingerprint data is inconsistent and a pointer of the data block in which the fingerprint data is inconsistent to the server.

9. The client according to claim 8, further comprising:

a second calculating module, configured to calculate fingerprint data of the file to be backed up;

a third sending module, configured to send fingerprint data of the file to be backed up to a server, and compare the fingerprint data of the file saved by the server;

And a second receiving module, configured to receive a comparison result sent by the server, and start the first computing module when the file to be backed up changes.

10. A server, comprising:

And a fourth sending module, configured to send the comparison result to the client, and receive a data block inconsistent with the fingerprint data sent by the client and a pointer of the data block in which the fingerprint data is inconsistent.

The server according to claim 10, further comprising: a fourth receiving module, configured to receive fingerprint data of the file to be backed up sent by the client, and a second comparison module, configured to compare the fingerprint data of the file to be backed up with the fingerprint data of the saved file, and compare The result is sent to the client, and is used to start the third receiving module when the client determines that the file to be backed up changes.

The server according to claim 10, further comprising:

And a third calculating module, configured to calculate and save fingerprint data of each data block of the saved file, and fingerprint data of the data block with an offset of an address of each data block of the saved file.