WO2012159532A1 - Data processing method and device - Google Patents

Data processing method and device Download PDF

Info

Publication number
WO2012159532A1
WO2012159532A1 PCT/CN2012/075411 CN2012075411W WO2012159532A1 WO 2012159532 A1 WO2012159532 A1 WO 2012159532A1 CN 2012075411 W CN2012075411 W CN 2012075411W WO 2012159532 A1 WO2012159532 A1 WO 2012159532A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
file
data block
backed
fingerprint
Prior art date
Application number
PCT/CN2012/075411
Other languages
French (fr)
Chinese (zh)
Inventor
任欣
何非
Original Assignee
成都市华为赛门铁克科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 成都市华为赛门铁克科技有限公司 filed Critical 成都市华为赛门铁克科技有限公司
Publication of WO2012159532A1 publication Critical patent/WO2012159532A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • G06F11/1453Management of the data involved in backup or backup restore using de-duplication of the data

Definitions

  • the present invention relates to the field of storage, and in particular, to a data processing method and apparatus.
  • Data de-duplication As an important technology to reduce data storage costs by effectively reducing data, has become the focus of attention.
  • the system calculates and checks the fingerprint data of the data block (or file), which is data for uniquely identifying a certain data block of a file or file, and determines whether the data block is already
  • the stored metadata is duplicated. If it is repeated, it only needs to keep a pointer to the metadata. If the fingerprint data shows that the data block is brand new, the data block is retained and used as metadata for later use.
  • the embodiment of the invention provides a data processing method and device, which can effectively reduce server-side data storage and further improve the deduplication rate under the premise of ensuring the unique storage of the server-side backup file.
  • the data processing method provided by the embodiment of the present invention includes:
  • the comparison result is sent to the client, and a data block in which the fingerprint data sent by the client is inconsistent and a pointer of the data block in which the fingerprint data is inconsistent are received.
  • the client provided by the embodiment of the present invention includes:
  • a first calculation module configured to calculate fingerprint data of each data block of the file to be backed up, and fingerprint data of the data block with an offset of an address of each data block of the file to be backed up; Fingerprint data of each data block of the file to be backed up, and The fingerprint data of the data block with the offset of the address of each data block of the file to be backed up is sent to the server, and the fingerprint data of each data block of the file saved with the server, and each data block of the saved file Aligning the fingerprint data of the data block after the offset of the address is incremented;
  • a first receiving module configured to receive a comparison result sent by the server
  • a second sending module configured to send, according to the comparison result sent by the server, a data block in which the fingerprint data is inconsistent and a pointer of the data block in which the fingerprint data is inconsistent to the server.
  • the server provided by the embodiment of the present invention includes:
  • a third receiving module configured to receive fingerprint data of each data block of the file to be backed up sent by the client, and fingerprint data of the data block with an offset of the address of each data block of the file to be backed up;
  • a first comparison module configured to: fingerprint data of each data block of the file to be backed up, and fingerprint data of the data block with the offset of the address of each data block of the file to be backed up and the saved file The fingerprint data of each data block and the fingerprint data of the data block with the offset of the address of each data block of the saved file are compared;
  • a fourth sending module configured to send the comparison result to the client, and receive a data block in which the fingerprint data sent by the client is inconsistent and a pointer of the data block in which the fingerprint data is inconsistent.
  • the embodiments of the present invention have the following advantages:
  • the fingerprint data of the data block of the file to be backed up and the fingerprint data of the data block of the file to be backed up are incremented. Comparing with the fingerprint data of the data block of the file saved by the server side, and the fingerprint data of the data block whose offset of the address of the data block of the file is increasing, and then transmitting the data corresponding to the data block whose fingerprint data has changed to server. Therefore, under the premise of ensuring the unique storage of the server-side backup file, the server-side data storage can be effectively reduced, and the deduplication ratio can be further improved.
  • FIG. 1 is a flow chart of a data processing method according to an embodiment of the present invention.
  • FIG. 2 is a flow chart of another data processing method according to an embodiment of the present invention.
  • FIG. 3 is a flowchart of still another data processing method in an embodiment of the present invention.
  • FIG. 5 is a data interaction diagram between a client and a server in an embodiment of the present invention.
  • FIG. 6 is a schematic structural diagram of a client according to an embodiment of the present invention.
  • FIG. 7 is a schematic structural diagram of another client according to an embodiment of the present invention.
  • FIG. 8 is a schematic structural diagram of a server according to an embodiment of the present invention.
  • FIG. 9 is a schematic structural diagram of another server according to an embodiment of the present invention.
  • Figure 10 (a) is a schematic diagram of data block division in the embodiment of the present invention.
  • Figure 10 (b) is a schematic diagram of one fingerprint data of a data block in the embodiment of the present invention
  • Figure 10 (c) is a schematic diagram of another fingerprint data of the data block in the embodiment of the present invention
  • Figure 10 (d) is an implementation of the present invention
  • FIG. 1 it is a data processing method according to an embodiment of the present invention.
  • the client has completed the first backup of the file, and after a period of time, the original file changes, that is, the changed
  • the file is backed up to the server, and the server updates the metadata;
  • Step S401 Calculate fingerprint data of each data block of the file to be backed up, and fingerprint data of the data block with the offset of the address of each data block of the file to be backed up;
  • the fingerprint data refers to a similar algorithm of SHA-1, MD-5, etc., which generates a 32-bit or 128-bit hash value and a data corresponding to each data block of the file to be backed up, thereby forming a correspondence relationship.
  • the generated fingerprint data is a unique identifier of the file data to be backed up.
  • other algorithms may be used to generate fingerprint data according to specific needs. As long as the algorithm of the client and the server is consistent or corresponding, and the data generated by the algorithm can uniquely identify the data block to be backed up;
  • the file to be backed up is divided into data blocks according to a given length, and the data block is a physical record of data, where the data block can be understood as a group or a group of records continuously arranged in order, which is the main a unit of data transmitted between the memory and the input, output device or external memory;
  • the offset of the address of each data block of the file is incremented by the fingerprint data of the data block, and the fingerprint data of the data block.
  • Step S402 Send the fingerprint data of each data block of the file to be backed up and the fingerprint data of the data block with the offset of the address of each data block of the file to be backed up to the server, and use for each file saved by the server.
  • the fingerprint data of the data block and the fingerprint data of the data block with the offset of the address of each data block of the saved file are compared;
  • the client sends the calculated fingerprint data of each data block and the fingerprint data of the data block with the offset of the address of each data block of the file to the server for service.
  • the server sequentially compares the fingerprint data of each data block of the file saved by the server with the fingerprint data of the data block of the address of each data block of the saved file; the file saved by the server here
  • the fingerprint data of each data block refers to the fingerprint data of the data block divided by the given length when the metadata sent by the client to the server is first backed up;
  • the fingerprint data of the data block after the offset of the address of each data block of the saved file is increased by 1 by the offset of the address of the data block divided by the given length of the file saved by the server. After that, the fingerprint data of the obtained data block.
  • Step S403 Send the data block and the pointer with inconsistent fingerprint data to the server according to the comparison result sent by the server.
  • the server only needs to instruct the client to send the pointer of the data block to the server.
  • the server needs to instruct the client to send the data block.
  • a file may be backed up, or a file set composed of multiple files may be backed up.
  • the specific backup method is similar, and details are not described herein.
  • the beneficial effects of the embodiment of the present invention are: when the backup file of the client needs to be backed up again, the fingerprint data of the data block of the file to be backed up and the offset of the address of the data block of the file to be backed up are incremented.
  • the fingerprint data of the data block is compared with the fingerprint data of the data block of the file saved by the server, and the fingerprint data of the data block whose offset of the address of the data block of the file is increased, and then the data block of the fingerprint data is changed.
  • the corresponding data is sent to the server. Therefore, under the premise of ensuring the unique storage of the server-side backup file, the server-side data storage can be effectively reduced, and the deduplication ratio can be further improved.
  • FIG. 2 it is a data processing method disclosed in an embodiment of the present invention.
  • the client has completed the first backup of the file, and after a period of time, the original file changes, that is, the changed file needs to be backed up to the server, and the server is completed.
  • the update of the metadata is different from that of the first embodiment.
  • the first embodiment is described from the perspective of the client, and the embodiment is described from the perspective of the server;
  • Step S501 receiving fingerprint data of each data block of the file to be backed up sent by the client, and fingerprint data of the data block after the offset of the address of each data block of the file to be backed up is incremented;
  • the fingerprint data refers to a similar correspondence between the data blocks of the file to be backed up to be backed up by a SHA-1, MD-5, and the like, and a 32-bit or 128-bit hash value is formed in a one-to-one correspondence with the data.
  • the fingerprint data thus generated is a unique identifier of the file data to be backed up;
  • the file to be backed up is divided into data blocks according to a given length, and the data block is a physical record of data, where the data block can be understood as a group or a group of records continuously arranged in order, which is the main a unit of data transmitted between the memory and the input, output device or external memory;
  • the offset of the address of each data block of the file is incremented by the fingerprint data of the data block, and the fingerprint data of the data block.
  • Step S502 the fingerprint data of each data block of the file to be backed up, and the fingerprint data of the data block after the offset of the address of each data block of the file to be backed up and the fingerprint data of each data block of the saved file, and The fingerprint data of the data block with the offset of the address of each data block of the saved file is compared;
  • the fingerprint data of each data block of the file saved by the server refers to the fingerprint data of the data block divided by the given length when the metadata sent by the client to the server is first backed up;
  • the offset of each data block of the saved file is incremented by the fingerprint data of the data block, and then the fingerprint data of the obtained data block;
  • the server can use the rsync rolling verification algorithm to compare the fingerprint data.
  • the fingerprint data of the incremented data block is compared.
  • the purpose of this step is to compare the fingerprint data.
  • the rsync algorithm is used as an example, but obviously, in addition to the rsync algorithm, the field The technician can select other algorithms according to the actual situation;
  • the client may be notified to send the data block; the second time, the first one sent by the client
  • the data blocks are compared.
  • the data block in addition to dividing the data of the file into data blocks according to the length of 2, the data block may be divided according to the length of 3, the fixed length of 4, etc.;
  • the data block when the data block is divided, when the fingerprint data of the data block sent by the client is compared with the fingerprint data of each data block of the server, not only the offset of the address of the data block but also the fingerprint data of the data block of 1 is compared. It is also necessary to sequentially compare the offset of the address of the data block with the fingerprint data of the data block of 2;
  • the data block is divided by a length of 4, when the fingerprint data of the data block sent by the client is compared with the fingerprint data of each data block of the server side, not only the offset of the address of the data block is sequentially added plus one.
  • the fingerprint data of the data block also needs to compare the addresses of the data blocks in turn.
  • N is a natural number greater than 2
  • the fingerprint data of the data block after the offset of the address of each data block of the file to be backed up is the file to be backed up.
  • the offset of the address of the data block is incremented by 1 and sequentially incremented to the fingerprint data of the data block after N-1 is added.
  • the data block with the offset of the address of each data block of the client is also incremented from the offset amount by one to the offset plus N -1 data block.
  • Step S503 Send the comparison result to the client, and receive the data block and the pointer that the fingerprint data sent by the client is inconsistent.
  • step S502 the server obtains the changed data block by comparing, so that the comparison result can be sent to the client, and the client is instructed to send the data block and the pointer with inconsistent fingerprint data to the server.
  • the beneficial effects of the embodiment of the present invention are: when the backup file of the client needs to be backed up again, the fingerprint data of the data block of the file to be backed up and the offset of the address of the data block of the file to be backed up are incremented.
  • the fingerprint data of the data block is compared with the fingerprint data of the data block of the file saved by the server, and the fingerprint data of the data block whose offset of the address of the data block of the file is increased, and then the data block of the fingerprint data is changed.
  • the corresponding data is sent to the server. Therefore, under the premise of ensuring the unique storage of the server-side backup files, the server-side data storage can be effectively reduced, and the deduplication ratio can be further improved. Referring to FIG.
  • the client has completed the first backup of the file, and the first backup of the file is to completely back up the client file to the server, the server. Save the file as metadata.
  • the file may change, that is, the changed file needs to be backed up to the server to complete the server's update of the metadata.
  • the following will specifically change the file. Description of the backup method after the:
  • Step S101 Calculate fingerprint data of the file to be backed up
  • the fingerprint data refers to a similar algorithm of SHA-1, MD-5, etc., to generate a 32-bit or 128-bit hash value and a data corresponding to the data to be backed up, and the fingerprint data generated thereby is The unique identifier of the file data to be backed up.
  • Step S102 Send the fingerprint data of the file to be backed up to the server, and compare the fingerprint data of the to-be-backed file with the fingerprint data of the file saved by the server;
  • the client sends the fingerprint data of the file to be backed up obtained in step S101 to the server, and the server obtains the fingerprint data of the file through the calculation when the file is first backed up, and after receiving the fingerprint data sent by the client, If the fingerprint data is the same, the file does not change. If the fingerprint data is different, the file has changed.
  • step S102 and step S103 The technical effect of step S102 and step S103 is that after a period of time t, if it is not determined whether the original file has changed, the fingerprint data of the current file can be compared with the fingerprint data of the original file to determine whether the original file has changed. If there is no change, the client sends the pointer of the file to the server, and does not need to perform the subsequent operations.
  • the case where the original file changes is mainly discussed.
  • Step S103 The comparison result sent by the receiving server, when the comparison result is different, calculating fingerprint data of each data block of the file to be backed up, and fingerprint data of the data block with increasing offset of the address of each data block ;
  • the file to be backed up is divided into data blocks according to a given length, and the data block is a physical record of data, where the data block can be understood as a group or a group of records continuously arranged in order.
  • the data block is a data unit for transmission between the main memory and the input, output device or external memory;
  • the method for calculating the fingerprint data of each data block is consistent with the method for calculating the fingerprint data of the file to be backed up in step S101, and is not described here;
  • the comparison result is the same, it indicates that the file to be backed up has not changed compared with the metadata stored in the server, and the pointer of the backup file is sent to the server.
  • Step S104 Send the fingerprint data of each data block and the fingerprint data of the data block with the offset of the address of each data block of the file to the server, and use the fingerprint data of each data block of the file saved by the server, and The fingerprint data of the data block with the offset of the address of each data block of the saved file is compared;
  • the client sends the calculated fingerprint data of each data block and the fingerprint data of the data block with the offset of the address of each data block of the file to the server, and the server sequentially saves it and the server.
  • the fingerprint data of each data block of the file, and the fingerprint data of the data block after the offset of the address of each data block of the saved file is compared; here, the fingerprint data of each data block of the file saved by the server is Refers to the fingerprint data of the data block divided by the given length when the metadata sent by the client to the server is first backed up;
  • the offset of the address of each data block of the saved file is incremented by the amount of shift of the data block, and the fingerprint data of the obtained data block is obtained.
  • Step S105 Send, according to the comparison result sent by the server, the data block and the pointer with inconsistent fingerprint data to the server;
  • the server only needs to instruct the client to send the pointer of the data block to the server.
  • the server needs to instruct the client to send the data.
  • a file may be backed up, or multiple files in a file set may be backed up.
  • the specific backup method is similar, and details are not described herein.
  • the beneficial effects of the embodiment of the present invention are: when the backup file of the client needs to be backed up again, the fingerprint data of the data block of the file to be backed up and the offset of the address of the data block of the file to be backed up are incremented.
  • the fingerprint data of the data block and the number of files saved on the server side The data block of the data block in which the offset of the address of the data block of the file is incremented is compared according to the fingerprint data of the block, and then the data corresponding to the data block whose fingerprint data has changed is transmitted to the server. Therefore, under the premise of ensuring the unique storage of the server-side backup file, the server-side data storage can be effectively reduced, and the deduplication ratio can be further improved. Referring to FIG.
  • the client has completed the first backup of the file, and the first backup of the file is to completely back up the client file to the server, the server. Save the file as metadata.
  • the file may change, that is, the changed file needs to be backed up to the server to complete the server's update of the metadata.
  • the difference from the first embodiment is that the first embodiment is described from the perspective of the client, and the embodiment is described from the perspective of the server;
  • Step S201 Receive fingerprint data of the file to be backed up sent by the client;
  • the fingerprint data refers to a similar algorithm of SHA-1, MD-5, etc., to generate a 32-bit or 128-bit hash value to be backed up by the data to be backed up, and the resulting fingerprint data is The unique identifier of the file data to be backed up.
  • Step S202 Comparing the fingerprint data of the file to be backed up with the fingerprint data of the saved file, and sending the comparison result to the client;
  • the server obtains the fingerprint data of the file by performing calculation on the first backup of the file, and compares the fingerprint data sent by the client with the previously stored fingerprint data. If the fingerprint data is the same, the file does not occur. Change, if the fingerprint data is different, the file has changed.
  • step S202 and step S203 The technical effect of step S202 and step S203 is that after a period of time t, if it is not determined whether the original file has changed, the fingerprint data of the current file can be compared with the fingerprint data of the original file to determine whether the original file has changed. If there is no change, the client sends the pointer of the file to the server, and does not need to perform the following operations, but in the implementation of the present invention The case focuses on the changes in the original documents.
  • Step S203 calculating fingerprint data of each data block of the saved file, and fingerprint data of the data block after the offset of the address of each data block of the saved file is incremented;
  • the file to be backed up is divided into data blocks according to a given length, and the data block is a physical record of data, where the data block can be understood as a group or a group of records continuously arranged in order.
  • the data block is a data unit for transmission between the main memory and the input, output device or external memory;
  • the method for calculating the fingerprint data of each data block is the same as the method for calculating the fingerprint data of the file to be backed up in step S201, and details are not described herein again;
  • the offset of the address of each data block of the saved file is incremented by 1 and the fingerprint data of the obtained data block is calculated in the same manner as before, and will not be described here.
  • Step S204 Receive fingerprint data of each data block of the file to be backed up sent by the client, and fingerprint data of the data block with the offset of the address of each data block of the file, and compare the data with the saved file.
  • the fingerprint data of the block and the fingerprint data of the data block in which the offset of the address of each data block of the saved file is incremented are compared.
  • Step S205 Send the comparison result to the client, and receive the data block and the pointer that the fingerprint data sent by the client is inconsistent;
  • the server only needs to instruct the client to send the pointer of the data block to the server.
  • the server needs to instruct the client to send the data block.
  • a file may be backed up, or multiple files included in the same file set may be backed up.
  • the specific backup method is similar and will not be described again.
  • FIG. 5 is a specific example for explaining a data processing method according to an embodiment of the present invention. It should be noted that, for an embodiment in which data is first backed up, the following steps S301 and S302 are not necessary;
  • Step S301 The client backs up the file data to the server for the first time
  • Step S302 The server saves the first backup file data and the fingerprint data of the file. Specifically, the server saves the backup file data sent by the client as metadata, and saves the fingerprint data of the file obtained by the calculation, for the fingerprint.
  • the calculation method of the data has been explained in the previous embodiment, and will not be described again here.
  • the original file may change, and the client needs to send the changed data to the server to implement data synchronization;
  • the client does not know whether the original file has changed. Therefore, it is necessary to verify whether the original file is changed or not.
  • the specific method of verification is to calculate the fingerprint data of the current file and send it to the server.
  • Step S304 The server compares the fingerprint data sent by the client with the saved fingerprint data.
  • the server receives the fingerprint data of the current file sent by the client, and compares the fingerprint data of the original file saved by the client. If the two fingerprint data are the same, the file does not change. If the two fingerprint data are different, The description file has changed, due to the present invention.
  • the technical problem to be solved by the example is the processing after the file changes, so here we will focus on the situation after the file changes.
  • Step S305 Send a comparison result
  • the server sends the comparison result to the client by comparing the discovery file in step S304, and then performs the following steps. If the server does not change the file through the comparison in step S304, The files saved in the server need to be updated.
  • Step S306 Calculate and send fingerprint data of each data block, and fingerprint data of the data block with the offset of the address of each data block of the file is incremented;
  • the original document is: 1234ABC;
  • the data block is divided by the fixed length 2, that is, the above 01234ABC can be divided into four data blocks as shown in FIG. 10(a);
  • step S306 the client obtains the fingerprint data of each data block by calculation, and the fingerprint data of the data block after the offset of the address of each data block of the file is incremented, specifically, the data block (0, Fingerprint data of 1), (2, 3), (4, A), (B, C), and fingerprint data of (1, 2), (3, 4), (A, B), (C), And sending the fingerprint data to the server;
  • Step S307 The server calculates the fingerprint data of each data block of the saved file and the fingerprint data of each data block after the address offset of each data block of the file is added, and sequentially connects with the client The fingerprint data of each data block sent by the terminal is compared;
  • the metadata saved by the server is also: 1234ABC; the fingerprint data of each data block and the fingerprint data of each data block after the address offset of each data block of the file is increased by one.
  • the server only calculates and stores the fingerprint data FPA, FPB, FPC, FPD of each data block of the metadata, and when compared with the fingerprint data of each data block of the current file, it can be found that the current
  • the fingerprint data of each data block of the file is different from FPA, FPB, FPC, and FPD, so all data blocks of the current file are backed up to the server, which reduces the deduplication rate and increases the amount of data on the server side.
  • Network bandwidth consumption
  • the server will use the fingerprint data FP A of the data block (0, 1), and FP A,
  • FPB FPB
  • FPC FPD
  • FPD FP E
  • FPF FPG
  • FPG FPG
  • the difference is found, and then the fingerprint data of the data block (1, 2) is incremented by 1 and the data block (1, 2) is found.
  • the fingerprint data is the same as FPA, which means that one bit of data is added to the header of the original file, and the server sends the comparison result to the client, requesting the client to send data 0 to the server.
  • the fingerprint data of the remaining data blocks are sequentially compared, and the data blocks can be found.
  • the fingerprint data of (3, 4) is the same as that of FPB.
  • the fingerprint data of data block (A, B) is the same as that of FPC, and the fingerprint data of data block (C) is the same as FP D. It can be concluded that the current file is in the original file. The header has been incremented by one bit of data 0.
  • Step S308 sends a comparison result
  • Step S309 Send a data block and a fingerprint with inconsistent fingerprint data.
  • the client sends the data 0 inserted in the header of the original file to the server.
  • the embodiment of the present invention only sends the changed one-bit data and its pointer to the server, which improves the deduplication ratio compared with the prior art, and reduces the data storage and network bandwidth consumption of the server.
  • the data processing method of the embodiment of the present invention is described below by taking the data in the middle of the original file as an example.
  • the data block that the client divides the current file 15D23C according to the length 2 is (1, 5), (D, 2), (3, C) as shown in FIG. 10(c): It can be known from the above embodiment that the server saves the element.
  • the data is: 1234ABC; the fingerprint data of each data block and the offset of each data block of the file plus one fingerprint data of each data block are as shown in FIG. 10(d):
  • the server sends the data block sent by the client ( 1, 5)
  • the fingerprint data FPA is compared with the calculated fingerprint data FPA, FPB, FPC, FPD, FPE, FPF, FPG, and no matching fingerprint data is found;
  • the server compares the fingerprint data of the data block (5, D) sent by the client with the fingerprint data saved by the client, and finds that there is still no matching fingerprint data, indicating that the data block (1, 5) is a changed data. Piece;
  • the server compares the fingerprint data of the data block (D, 2) sent by the client with the fingerprint data saved by the client, and finds that there is no fingerprint data matching the same;
  • the server compares the fingerprint data of the data block (2, 3) sent by the client with the fingerprint data saved by the client, and finds that it matches the FP E, indicating that the current data block adds one bit of data to the original data block 23. D, therefore, can instruct the client to send the data D and its pointer to the server;
  • the server compares the fingerprint data of the remaining data block C of the client with the fingerprint data stored by the client, and finds that it matches the FPD, so that the data block C does not change, and only the client is required to pointer the data block C. Send it to the server;
  • the client divides the file according to the given length, which is a logical division. It is not a true division of the file into several data blocks.
  • the purpose is to facilitate comparison with the file data saved on the server side. Find the data that has changed, so the partitioning of the data block is not fixed. Take the above example as an example.
  • the offset of the data block (D, 2) is found,
  • the fingerprint data of the data block (2, 3) of 1 can find the matching fingerprint data on the server side, the data block (1, 2) can be logically used as a data block, and the data D of the previous bit is And the following data C as separate data blocks.
  • the data block (1, 5), the data block (D), and the pointers of the two data blocks, which the client will change, may be sent to the server by sending a comparison result.
  • the embodiment of the present invention can improve the data erasure rate and reduce the data storage capacity of the server and the consumption of the network bandwidth when the head or the middle of the file changes compared with the prior art.
  • a client disclosed in an embodiment of the present invention
  • the first calculation module 601 is configured to calculate fingerprint data of each data block of the file to be backed up, and fingerprint data of the data block with the offset of the address of each data block of the file to be backed up, which is summarized in the embodiment of the present invention.
  • Data is data used to uniquely identify a certain data block of a file or file;
  • the fingerprint data refers to a similar algorithm of SHA-1, MD-5, etc., which generates a 32-bit or 128-bit hash value and a data corresponding to each data block of the file to be backed up, thereby forming a correspondence relationship.
  • the generated fingerprint data is a unique identifier of the file data to be backed up.
  • other algorithms may be used to generate fingerprint data according to specific needs. As long as the algorithm of the client and the server is consistent or corresponding, and the data generated by the algorithm can uniquely identify the data block to be backed up;
  • the file to be backed up is divided into data blocks according to a given length, and the data block is a physical record of data, where the data block can be understood as a group or a group of records continuously arranged in order, which is the main a unit of data transmitted between the memory and the input, output device or external memory;
  • the offset of the address of each data block of the file is incremented by the fingerprint data of the data block, Fingerprint data of the data block.
  • the first sending module 602 is configured to send the fingerprint data of each data block of the file to be backed up and the fingerprint data of the data block with the offset of the address of each data block of the file to be backed up to the server, and the server
  • the fingerprint data of each data block of the saved file, and the fingerprint data of the data block with the offset of the address of each data block of the saved file are compared; specifically, each data block that the client will calculate Fingerprint data, and the offset of the address of each data block of the file is incremented, and the fingerprint data of the data block is sent to the server, and the fingerprint data of each data block of the file that is sequentially saved by the server and the server, and the saved data.
  • the first receiving module 603 is configured to receive a comparison result sent by the server.
  • the second sending module 604 is configured to send, according to the comparison result sent by the server, the data block and the pointer with inconsistent fingerprint data to the server.
  • the server only needs to instruct the client to send the pointer of the data block to the server.
  • the server needs to instruct the client to send the data block.
  • a file may be backed up, or a file set composed of multiple files may be backed up.
  • the specific backup method is similar, and details are not described herein.
  • the beneficial effects of the embodiment of the present invention are: when the client to be backed up file needs to be backed up again, the fingerprint data of the data block of the file to be backed up, and the data of the file to be backed up.
  • the fingerprint data of the data block after the offset of the block is incremented is compared with the fingerprint data of the data block of the file saved by the server, and the fingerprint data of the data block with the offset of the address of the data block of the file. Then, the data corresponding to the data block whose fingerprint data has changed is sent to the server. Therefore, under the premise of ensuring the unique storage of the server-side backup file, the server-side data storage can be effectively reduced, and the deduplication ratio can be further improved.
  • the foregoing embodiment may further include: a second calculating module 605, a third sending module 606, and a second receiving module 607;
  • a second calculating module 605, configured to calculate fingerprint data of the file to be backed up
  • the third sending module 606 is configured to send the fingerprint data of the file to be backed up to the server, and compare the fingerprint data of the file to be backed up with the fingerprint data of the file saved by the server; specifically, the client obtains the calculation The fingerprint data of the file to be backed up is sent to the server.
  • the server obtains the fingerprint data of the file through the calculation when the file is first backed up, and compares with the fingerprint data saved before receiving the fingerprint data sent by the client, if the fingerprint If the data is the same, the file has not changed. If the fingerprint data is different, the file has changed.
  • the second receiving module 607 is configured to receive a comparison result of the server.
  • the difference between this embodiment and the previous embodiment is that after a period of time t, if it is not determined whether the original file changes, the fingerprint data of the current file can be compared with the fingerprint data of the original file to determine the original. Whether the file has changed, if there is no change, the client sends the pointer of the file to the server, and does not need to perform the following operations.
  • FIG. 6 it is a server disclosed in an embodiment of the present invention.
  • the third receiving module 701 is configured to: receive fingerprint data of each data block of the file to be backed up sent by the client, and fingerprint data of the data block with the offset of the address of each data block of the file to be backed up;
  • the fingerprint data refers to a similar algorithm to be backed up by SHA-1, MD-5, and the like.
  • Each of the data blocks of the backup file generates a 32-bit or 128-bit hash value and forms a one-to-one correspondence with the data, and the fingerprint data thus generated is a unique identifier of the file data to be backed up;
  • the file to be backed up is divided into data blocks according to a given length, and the data block is a physical record of data, where the data block can be understood as a group or a group of records continuously arranged in order, which is the main a unit of data transmitted between the memory and the input, output device or external memory;
  • the first comparison module 702 is configured to: the fingerprint data of each data block of the file to be backed up, and the fingerprint data of the data block after the offset of the address of each data block of the file to be backed up and the data of the saved file The fingerprint data of the block and the fingerprint data of the data block with the offset of the address of each data block of the saved file are compared;
  • the fingerprint data of each data block of the file saved by the server refers to the fingerprint data of the data block divided by the given length when the metadata sent by the client to the server is first backed up;
  • the offset of each data block of the saved file is incremented by the fingerprint data of the data block, and then the fingerprint data of the obtained data block;
  • the server can use the rsync rolling check algorithm to compare the fingerprint data, and sequentially add the fingerprint data of the data block sent by the client, and the fingerprint data of the address offset by 1 to the data of the file saved by the server.
  • the fingerprint data of the block and the fingerprint data of the data block with the offset of the address of each data block of the saved file are compared, and it is pointed out that the purpose of this step is to compare the fingerprint data for convenience.
  • the description is based on the rsync algorithm, but it is obvious that in addition to the rsync algorithm, those skilled in the art may select other algorithms according to actual conditions;
  • the fingerprint data of the block is compared with the fingerprint data of the first data block of the file saved by the server. If not, compare it with the fingerprint data of the second data block of the file saved by the server, if not the same , then compare, until the fingerprint data of the last data block of the file saved by the server, if still not the same, then add 1 fingerprint data to the offset of the address of the first data block of the file saved by the server.
  • the client can be notified to send the data block; secondly, the fingerprint data of the data block of the first data block sent by the client is increased by 1 according to the first step, and sequentially saved with the server.
  • the data block of the file is compared with the data block with the offset plus one.
  • the data block in addition to dividing the data of the file according to the length of 2, the data block may be divided according to the length of 3, the fixed length of 4, and the like;
  • the data block is divided by a length of 3, when the fingerprint data of the data block sent by the client is compared with the fingerprint data of each data block of the server side, not only the offset of the address of the data block is sequentially added plus one.
  • the fingerprint data of the data block also needs to compare the offset of the address of the data block and the fingerprint data of the data block of 2 in sequence;
  • the data block is divided by a length of 4, when the fingerprint data of the data block sent by the client is compared with the fingerprint data of each data block of the server side, not only the offset of the address of the data block is sequentially added plus one.
  • the fingerprint data of the data block also needs to compare the offset of the address of the data block and the fingerprint data of the data block of 2, and the offset of the address of the data block plus the fingerprint data of the data block of 3;
  • N is a natural number greater than 2
  • the fingerprint data of the data block after the offset of the address of each data block of the file to be backed up is the file to be backed up.
  • the offset of the address of the data block is incremented by 1 and sequentially incremented to the fingerprint data of the data block after N-1 is added.
  • the data block with the offset of the address of each data block of the client is also biased.
  • the shift is incremented by one to the data block with the offset plus N - 1.
  • the fourth sending module 703 is configured to send the comparison result to the client, and receive a data block in which the fingerprint data sent by the client is inconsistent, and the pointer server compares the first comparison module 702 to obtain the changed data block, This can send the comparison result to the client, instructing the client to send the data block and pointer with inconsistent fingerprint data to the server.
  • the beneficial effects of the embodiment of the present invention are: when the backup file of the client needs to be backed up again, the fingerprint data of the data block of the file to be backed up and the offset of the address of the data block of the file to be backed up are incremented.
  • the fingerprint data of the data block is compared with the fingerprint data of the data block of the file saved by the server, and the fingerprint data of the data block whose offset of the address of the data block of the file is increased, and then the data block of the fingerprint data is changed.
  • the corresponding data is sent to the server. Therefore, under the premise of ensuring the unique storage of the server-side backup files, the server-side data storage can be effectively reduced, and the deduplication ratio can be further improved.
  • the fourth receiving module 705 and the second comparing module 706 are further included; the fourth receiving module 705 is configured to receive fingerprint data of the file to be backed up sent by the client; The fingerprint data of the file to be backed up is compared with the fingerprint data of the saved file, and the comparison result is sent to the client, and the client determines whether the file to be backed up changes.
  • the difference between this embodiment and the previous embodiment is that after a period of time t, if it is not determined whether the original file changes, the fingerprint data of the current file can be compared with the fingerprint data of the original file to determine the original. Whether the file has changed, if there is no change, the client sends the pointer of the file to the server, and does not need to perform the following operations.
  • the third calculating module 704 may be further included;
  • the third calculation module is configured to calculate and save the fingerprint data of each data block of the file, and the fingerprint data of the data block with the offset of the address of each data block of the saved file.

Abstract

Disclosed is a data processing method, comprising: calculating fingerprint data of each data block of a file to be backed up and fingerprint data of the data block after an offset of an address of each data block of the file to be backed up is progressively increased; sending the fingerprint data of each data block of the file to be backed up and the fingerprint data of the data block after the offset of the address of each data block of the file to be backed up is progressively increased to a server, for comparison with fingerprint data of each data block of a file stored by the server and fingerprint data of the data block after an offset of each data bock of the stored file is progressively increased; and sending changed data to the server according to a comparison result sent by the server. The data de-duplication rate can be improved.

Description

数据处理方法及装置 本申请要求于 2011年 5月 25日提交中国专利局、 申请号为  Data processing method and device The application is submitted to the Chinese Patent Office on May 25, 2011, and the application number is
201110136079.7、 发明名称为 "数据处理方法及装置" 的中国专利申请的优 先权, 其全部内容通过引用结合在本申请中。 技术领域 The priority of the Chinese patent application entitled "Data Processing Method and Apparatus" is hereby incorporated by reference. Technical field
本发明涉及存储领域, 特别涉及一种数据处理方法及装置。  The present invention relates to the field of storage, and in particular, to a data processing method and apparatus.
背景技术 Background technique
随着企业的数据量不断增大, 大量的重复数据给存储带来严峻的挑战。 而重复数据删除( Data de-duplication, 简称 De-Dupe )作为通过有效地减少 数据, 降低数据存储成本的重要技术, 成为大家关注的焦点。  As the amount of data in an enterprise continues to increase, a large amount of duplicate data poses a serious challenge to storage. Data de-duplication (De-Dupe), as an important technology to reduce data storage costs by effectively reducing data, has become the focus of attention.
重复数据删除技术的实现中, 系统通过计算并检查数据块(或文件 ) 的指纹数据, 指纹数据是用以唯一标识某一文件或者文件的某一数据块的 数据, 判断该数据块是否与已经存储的元数据重复。 如果重复, 则只需要 保留指向该元数据的指针, 如果指纹数据显示该数据块是全新的, 则保留 该数据块, 并作为元数据供以后使用。  In the implementation of the data deduplication technology, the system calculates and checks the fingerprint data of the data block (or file), which is data for uniquely identifying a certain data block of a file or file, and determines whether the data block is already The stored metadata is duplicated. If it is repeated, it only needs to keep a pointer to the metadata. If the fingerprint data shows that the data block is brand new, the data block is retained and used as metadata for later use.
在现有的重复数据删除技术中, 大多釆用定长的数据块切割方式对需 备份的文件进行切割, 当客户端首次备份后如果在该文件的头部或者中部 进行修改, 例如插入、 删除、 更新等, 此时如果釆用传统的定长数据块切 割方式, 即使对原备份文件进行修改的数据量很小, 原文件中已有的数据 块也会依次发生移动, 因此在发生变化后的文件中找到之前备份过的重复 数据块的个数会减少, 这就会降低重复数据删除效率, 导致更多的数据块 被传输到服务器端, 这一方面会增加网络带宽的消耗, 另一方面会增加服 务器端的数据存储。 发明内容 In the existing deduplication technology, most of the files to be backed up are cut by a fixed length data block cutting method, and if the client performs the first backup, if it is modified in the head or the middle of the file, for example, inserting and deleting. Update, etc. At this time, if the conventional fixed-length data block cutting method is used, even if the amount of data modified for the original backup file is small, the existing data blocks in the original file will move in order, so after the change The number of duplicated blocks that were previously backed up in the file will be reduced, which will reduce the efficiency of deduplication, resulting in more data blocks being transmitted to the server. This will increase the consumption of network bandwidth. Aspects will increase server-side data storage. Summary of the invention
本发明实施例提供了一种数据处理方法及装置, 可以在保证服务器端 备份文件唯一存储的前提下, 有效地减少服务器端数据存储, 进一步提高 重复数据删除率。  The embodiment of the invention provides a data processing method and device, which can effectively reduce server-side data storage and further improve the deduplication rate under the premise of ensuring the unique storage of the server-side backup file.
本发明实施例提供的数据处理方法, 包括:  The data processing method provided by the embodiment of the present invention includes:
计算待备份文件的各个数据块的指纹数据, 以及所述待备份文件的各个 数据块的地址的偏移量递增后的数据块的指纹数据;  Calculating fingerprint data of each data block of the file to be backed up, and fingerprint data of the data block after the offset of the address of each data block of the file to be backed up is increased;
将所述待备份文件的各个数据块的指纹数据, 以及所述待备份文件的各 个数据块的地址的偏移量递增后的数据块的指纹数据发送给服务器, 用于 与服务器保存的文件的各个数据块的指纹数据, 以及保存的文件的各个数 据块的地址的偏移量递增后的数据块的指纹数据进行比对;  Transmitting the fingerprint data of each data block of the file to be backed up and the fingerprint data of the data block with the offset of the address of each data block of the file to be backed up to the server for use with the file saved by the server The fingerprint data of each data block and the fingerprint data of the data block with the offset of the address of each data block of the saved file are compared;
根据服务器发送的比对结果, 将指纹数据不一致的数据块及所述指纹数 据不一致的数据块的指针发送给服务器。 本发明实施例提供的数据处理方 法, 包括:  According to the comparison result sent by the server, the data block in which the fingerprint data is inconsistent and the pointer of the data block in which the fingerprint data is inconsistent are sent to the server. The data processing method provided by the embodiment of the present invention includes:
接收客户端发送的待备份文件的各个数据块的指纹数据, 以及所述待备 份文件的各个数据块的地址的偏移量递增后的数据块的指纹数据;  Receiving fingerprint data of each data block of the file to be backed up sent by the client, and fingerprint data of the data block after the offset of the address of each data block of the to-be-backed file is incremented;
将所述待备份文件的各个数据块的指纹数据, 以及所述待备份文件的各 个数据块的地址的偏移量递增后的数据块的指纹数据与保存的文件的各个 数据块的指纹数据, 以及保存的文件的各个数据块的地址的偏移量递增后 的数据块的指纹数据进行比对;  Fingerprint data of each data block of the file to be backed up, and fingerprint data of the data block after incrementing the offset of the address of each data block of the file to be backed up, and fingerprint data of each data block of the saved file, And comparing the fingerprint data of the data block with the offset of the address of each data block of the saved file;
将比对结果发送给客户端, 并接收客户端发送的指纹数据不一致的数 据块及所述指纹数据不一致的数据块的指针。 本发明实施例提供的客户端, 包括:  The comparison result is sent to the client, and a data block in which the fingerprint data sent by the client is inconsistent and a pointer of the data block in which the fingerprint data is inconsistent are received. The client provided by the embodiment of the present invention includes:
第一计算模块, 用于计算待备份文件的各个数据块的指纹数据, 以及所 述待备份文件的各个数据块的地址的偏移量递增后的数据块的指纹数据; 第一发送模块, 用于将所述待备份文件的各个数据块的指纹数据, 以及 所述待备份文件的各个数据块的地址的偏移量递增后的数据块的指纹数据 发送给服务器, 用于与服务器保存的文件的各个数据块的指纹数据, 以及 保存的文件的各个数据块的地址的偏移量递增后的数据块的指纹数据进行 比对; a first calculation module, configured to calculate fingerprint data of each data block of the file to be backed up, and fingerprint data of the data block with an offset of an address of each data block of the file to be backed up; Fingerprint data of each data block of the file to be backed up, and The fingerprint data of the data block with the offset of the address of each data block of the file to be backed up is sent to the server, and the fingerprint data of each data block of the file saved with the server, and each data block of the saved file Aligning the fingerprint data of the data block after the offset of the address is incremented;
第一接收模块, 用于接收服务器发送的比对结果;  a first receiving module, configured to receive a comparison result sent by the server;
第二发送模块, 用于根据服务器发送的比对结果, 将指纹数据不一致的 数据块及所述指纹数据不一致的数据块的指针发送给服务器。 本发明实施例提供的服务器, 包括:  And a second sending module, configured to send, according to the comparison result sent by the server, a data block in which the fingerprint data is inconsistent and a pointer of the data block in which the fingerprint data is inconsistent to the server. The server provided by the embodiment of the present invention includes:
第三接收模块, 用于接收客户端发送的待备份文件的各个数据块的指纹 数据, 以及所述待备份文件的各个数据块的地址的偏移量递增后的数据块 的指纹数据;  a third receiving module, configured to receive fingerprint data of each data block of the file to be backed up sent by the client, and fingerprint data of the data block with an offset of the address of each data block of the file to be backed up;
第一比对模块, 用于将所述待备份文件的各个数据块的指纹数据, 以及 所述待备份文件的各个数据块的地址的偏移量递增后的数据块的指纹数据 与保存的文件的各个数据块的指纹数据, 以及保存的文件的各个数据块的 地址的偏移量递增后的数据块的指纹数据进行比对;  a first comparison module, configured to: fingerprint data of each data block of the file to be backed up, and fingerprint data of the data block with the offset of the address of each data block of the file to be backed up and the saved file The fingerprint data of each data block and the fingerprint data of the data block with the offset of the address of each data block of the saved file are compared;
第四发送模块, 用于将比对结果发送给客户端, 并接收客户端发送的 指纹数据不一致的数据块及所述指纹数据不一致的数据块的指针。  And a fourth sending module, configured to send the comparison result to the client, and receive a data block in which the fingerprint data sent by the client is inconsistent and a pointer of the data block in which the fingerprint data is inconsistent.
从以上技术方案可以看出, 本发明实施例具有以下优点:  As can be seen from the above technical solutions, the embodiments of the present invention have the following advantages:
本发明实施例当客户端待备份文件发生变化需要再次备份时, 将该待备 份文件的数据块的指纹数据, 以及待备份文件的数据块的地址的偏移量递 增后的数据块的指纹数据与服务器端保存的文件的数据块的指纹数据, 以 及文件的数据块的地址的偏移量递增的数据块的指纹数据进行比对, 然后 将指纹数据发生变化的数据块所对应的数据发送给服务器。 因此可以在保 证服务器端备份文件唯一存储的前提下, 有效地减少服务器端数据存储, 进一步提高重复数据删除率。 附图说明 In the embodiment of the present invention, when the backup file to be backed up by the client needs to be backed up again, the fingerprint data of the data block of the file to be backed up and the fingerprint data of the data block of the file to be backed up are incremented. Comparing with the fingerprint data of the data block of the file saved by the server side, and the fingerprint data of the data block whose offset of the address of the data block of the file is increasing, and then transmitting the data corresponding to the data block whose fingerprint data has changed to server. Therefore, under the premise of ensuring the unique storage of the server-side backup file, the server-side data storage can be effectively reduced, and the deduplication ratio can be further improved. DRAWINGS
为了更清楚地说明本发明实施例中的技术方案, 下面将对实施例描述 中所需要使用的附图作简单地介绍, 显而易见地, 下面描述中的附图仅仅 是本发明的一些实施例, 对于本领域普通技术人员来讲, 在不付出创造性 劳动的前提下, 还可以根据这些附图获得其他的附图。  In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly described. It is obvious that the drawings in the following description are only some embodiments of the present invention. Other drawings may also be obtained from those of ordinary skill in the art in view of the drawings.
图 1 是本发明实施方式的一个数据处理方法流程图;  1 is a flow chart of a data processing method according to an embodiment of the present invention;
图 2是本发明实施方式的另一个数据处理方法流程图;  2 is a flow chart of another data processing method according to an embodiment of the present invention;
图 3 是本发明实施方式中再一个数据处理方法流程图;  3 is a flowchart of still another data processing method in an embodiment of the present invention;
图 4是本发明实施方式中又一个数据处理方法流程图;  4 is a flowchart of still another data processing method in an embodiment of the present invention;
图 5 是本发明实施方式中客户端与服务器的数据交互图;  5 is a data interaction diagram between a client and a server in an embodiment of the present invention;
图 6是本发明实施方式一个客户端的结构示意图;  6 is a schematic structural diagram of a client according to an embodiment of the present invention;
图 7是本发明实施方式另一个客户端的结构示意图;。  FIG. 7 is a schematic structural diagram of another client according to an embodiment of the present invention; FIG.
图 8是本发明实施方式一个服务器的结构示意图;  8 is a schematic structural diagram of a server according to an embodiment of the present invention;
图 9是本发明实施方式另一个服务器的结构示意图;  9 is a schematic structural diagram of another server according to an embodiment of the present invention;
图 10 ( a )是本发明实施例中的数据块划分示意图;  Figure 10 (a) is a schematic diagram of data block division in the embodiment of the present invention;
图 10 ( b )是本发明实施例中的数据块的一个指纹数据示意图; 图 10 ( c )是本发明实施例中的数据块的另一个指纹数据示意图; 图 10 ( d )是本发明实施例中的数据块的再一个指纹数据示意图。 具体实施方式  Figure 10 (b) is a schematic diagram of one fingerprint data of a data block in the embodiment of the present invention; Figure 10 (c) is a schematic diagram of another fingerprint data of the data block in the embodiment of the present invention; Figure 10 (d) is an implementation of the present invention A further fingerprint data diagram of the data block in the example. detailed description
为使本发明实施例的目的、 技术方案和优点更加清楚, 下面将结合本 发明实施例中的附图, 对本发明实施例中的技术方案进行清楚、 完整地描 述, 显然, 所描述的实施例是本发明一部分实施例, 而不是全部的实施例。 基于本发明中的实施例, 本领域普通技术人员在没有作出创造性劳动前提 下所获得的所有其他实施例, 都属于本发明保护的范围。 参见图一, 是本发明一个实施方式所公开的数据处理方法; 在本发明实施例中, 客户端已完成了文件的首次备份, 经过一段时间以 后, 原文件发生变化, 即需要将变化后的文件重新备份到服务器, 完成服 务器对元数据的更新; The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is a partial embodiment of the invention, and not all of the embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention. Referring to FIG. 1 , it is a data processing method according to an embodiment of the present invention. In the embodiment of the present invention, the client has completed the first backup of the file, and after a period of time, the original file changes, that is, the changed The file is backed up to the server, and the server updates the metadata;
步骤 S401 : 计算待备份文件的各个数据块的指纹数据, 以及待备份文件 的各个数据块的地址的偏移量递增后的数据块的指纹数据;  Step S401: Calculate fingerprint data of each data block of the file to be backed up, and fingerprint data of the data block with the offset of the address of each data block of the file to be backed up;
具体地,指纹数据是指釆用 SHA-1 , MD-5等类似的算法将待备份的文 件的各个数据块各自生成一个 32位或者 128位哈希值与数据形成——对应 关系, 由此生成的指纹数据是该待备份文件数据的唯一标识, 需要特别指 出的是, 除了使用以上列举的算法外, 本实施例以及后续的其他实施例中, 还可以根据具体需要使用其他算法生成指纹数据, 只要客户端与服务器的 算法一致或者具有对应性, 且通过算法产生的数据能够唯一标识待备份文 件数据块即可;  Specifically, the fingerprint data refers to a similar algorithm of SHA-1, MD-5, etc., which generates a 32-bit or 128-bit hash value and a data corresponding to each data block of the file to be backed up, thereby forming a correspondence relationship. The generated fingerprint data is a unique identifier of the file data to be backed up. In addition to using the above enumerated algorithms, in this embodiment and other subsequent embodiments, other algorithms may be used to generate fingerprint data according to specific needs. As long as the algorithm of the client and the server is consistent or corresponding, and the data generated by the algorithm can uniquely identify the data block to be backed up;
在本发明实施例中将待备份文件按照给定长度划分为数据块, 数据块是 数据的物理记录, 这里的数据块可以理解为一组或按顺序连续排列在一起 的几组记录, 是主存储器与输入、 输出设备或外存储器之间进行传输的一 个数据单位;  In the embodiment of the present invention, the file to be backed up is divided into data blocks according to a given length, and the data block is a physical record of data, where the data block can be understood as a group or a group of records continuously arranged in order, which is the main a unit of data transmitted between the memory and the input, output device or external memory;
这里的文件的各个数据块的地址的偏移量递增后的数据块的指纹数据, 到的数据块的指纹数据。  Here, the offset of the address of each data block of the file is incremented by the fingerprint data of the data block, and the fingerprint data of the data block.
步骤 S402: 将待备份文件的各个数据块的指纹数据, 以及待备份文件的 各个数据块的地址的偏移量递增后的数据块的指纹数据发送给服务器, 用 于与服务器保存的文件的各个数据块的指纹数据, 以及保存的文件的各个 数据块的地址的偏移量递增后的数据块的指纹数据进行比对;  Step S402: Send the fingerprint data of each data block of the file to be backed up and the fingerprint data of the data block with the offset of the address of each data block of the file to be backed up to the server, and use for each file saved by the server. The fingerprint data of the data block and the fingerprint data of the data block with the offset of the address of each data block of the saved file are compared;
具体地, 客户端将计算得出的各个数据块的指纹数据, 及文件的各个 数据块的地址的偏移量递增后的数据块的指纹数据发送给服务器, 用于服 务器依次将其与服务器保存的文件的各个数据块的指纹数据, 以及保存的 文件的各个数据块的地址的偏移量递增后的数据块的指纹数据进行对比; 这里的服务器保存的文件的各个数据块的指纹数据, 是指将首次备份 时客户端发送给服务器的元数据按照给定的长度进行划分的数据块的指纹 数据; Specifically, the client sends the calculated fingerprint data of each data block and the fingerprint data of the data block with the offset of the address of each data block of the file to the server for service. The server sequentially compares the fingerprint data of each data block of the file saved by the server with the fingerprint data of the data block of the address of each data block of the saved file; the file saved by the server here The fingerprint data of each data block refers to the fingerprint data of the data block divided by the given length when the metadata sent by the client to the server is first backed up;
这里的保存的文件的各个数据块的地址的偏移量递增后的数据块的指纹 数据, 是指将服务器之前保存的文件按照给定的长度进行划分的数据块的 地址的偏移量加 1后, 得到的数据块的指纹数据。  Here, the fingerprint data of the data block after the offset of the address of each data block of the saved file is increased by 1 by the offset of the address of the data block divided by the given length of the file saved by the server. After that, the fingerprint data of the obtained data block.
步骤 S403: 根据服务器发送的比对结果, 将指纹数据不一致的数据块 及指针发送给服务器。  Step S403: Send the data block and the pointer with inconsistent fingerprint data to the server according to the comparison result sent by the server.
具体地, 对于比对结果相同的数据块, 服务器只需指示客户端将该数 据块的指针发送给服务器即可, 对于比对结果不同的数据块, 服务器需指 示客户端发送该数据块。  Specifically, for the data block with the same comparison result, the server only needs to instruct the client to send the pointer of the data block to the server. For the data block with different comparison results, the server needs to instruct the client to send the data block.
需要说明的是, 上述实施例可以是对一个文件进行备份, 也可以是对 由多个文件组成的一个文件集进行备份, 具体的备份方法类似, 不再赘述。  It should be noted that, in the foregoing embodiment, a file may be backed up, or a file set composed of multiple files may be backed up. The specific backup method is similar, and details are not described herein.
本发明实施例的有益效果在于, 当客户端待备份文件发生变化需要再 次备份时, 将该待备份文件的数据块的指纹数据, 以及待备份文件的数据 块的地址的偏移量递增后的数据块的指纹数据与服务器端保存的文件的数 据块的指纹数据, 以及文件的数据块的地址的偏移量递增的数据块的指纹 数据进行比对, 然后将指纹数据发生变化的数据块所对应的数据发送给服 务器。 因此可以在保证服务器端备份文件唯一存储的前提下, 有效地减少 服务器端数据存储, 进一步提高重复数据删除率。 参见图二, 是本发明一个实施方式所公开的数据处理方法,  The beneficial effects of the embodiment of the present invention are: when the backup file of the client needs to be backed up again, the fingerprint data of the data block of the file to be backed up and the offset of the address of the data block of the file to be backed up are incremented. The fingerprint data of the data block is compared with the fingerprint data of the data block of the file saved by the server, and the fingerprint data of the data block whose offset of the address of the data block of the file is increased, and then the data block of the fingerprint data is changed. The corresponding data is sent to the server. Therefore, under the premise of ensuring the unique storage of the server-side backup file, the server-side data storage can be effectively reduced, and the deduplication ratio can be further improved. Referring to FIG. 2, it is a data processing method disclosed in an embodiment of the present invention.
在本发明实施例中, 客户端已完成了文件的首次备份, 经过一段时间以 后, 原文件发生变化, 即需要将变化后的文件备份到服务器, 完成服务器 对元数据的更新, 与实施例一不同的是, 实施例一是从客户端的角度描述 的, 而本实施例是从服务器的角度来描述的; In the embodiment of the present invention, the client has completed the first backup of the file, and after a period of time, the original file changes, that is, the changed file needs to be backed up to the server, and the server is completed. The update of the metadata is different from that of the first embodiment. The first embodiment is described from the perspective of the client, and the embodiment is described from the perspective of the server;
步骤 S501 :接收客户端发送的待备份文件的各个数据块的指纹数据, 以 及所述待备份文件的各个数据块的地址的偏移量递增后的数据块的指纹数 据;  Step S501: receiving fingerprint data of each data block of the file to be backed up sent by the client, and fingerprint data of the data block after the offset of the address of each data block of the file to be backed up is incremented;
具体地,指纹数据是指釆用 SHA-1 , MD-5等类似的算法将待备份的待 备份文件的各个数据块各自生成一个 32位或者 128位哈希值与数据形成一 一对应关系, 由此生成的指纹数据是该待备份文件数据的唯一标识;  Specifically, the fingerprint data refers to a similar correspondence between the data blocks of the file to be backed up to be backed up by a SHA-1, MD-5, and the like, and a 32-bit or 128-bit hash value is formed in a one-to-one correspondence with the data. The fingerprint data thus generated is a unique identifier of the file data to be backed up;
在本发明实施例中将待备份文件按照给定长度划分为数据块, 数据块是 数据的物理记录, 这里的数据块可以理解为一组或按顺序连续排列在一起 的几组记录, 是主存储器与输入、 输出设备或外存储器之间进行传输的一 个数据单位;  In the embodiment of the present invention, the file to be backed up is divided into data blocks according to a given length, and the data block is a physical record of data, where the data block can be understood as a group or a group of records continuously arranged in order, which is the main a unit of data transmitted between the memory and the input, output device or external memory;
这里的文件的各个数据块的地址的偏移量递增后的数据块的指纹数据, 到的数据块的指纹数据。  Here, the offset of the address of each data block of the file is incremented by the fingerprint data of the data block, and the fingerprint data of the data block.
步骤 S502: 将待备份文件的各个数据块的指纹数据, 以及待备份文件的 各个数据块的地址的偏移量递增后的数据块的指纹数据与保存的文件的各 个数据块的指纹数据, 以及保存的文件的各个数据块的地址的偏移量递增 后的数据块的指纹数据进行比对;  Step S502: the fingerprint data of each data block of the file to be backed up, and the fingerprint data of the data block after the offset of the address of each data block of the file to be backed up and the fingerprint data of each data block of the saved file, and The fingerprint data of the data block with the offset of the address of each data block of the saved file is compared;
这里的服务器保存的文件的各个数据块的指纹数据, 是指将首次备份 时客户端发送给服务器的元数据按照给定的长度进行划分的数据块的指纹 数据;  Here, the fingerprint data of each data block of the file saved by the server refers to the fingerprint data of the data block divided by the given length when the metadata sent by the client to the server is first backed up;
这里的保存的文件的各个数据块的偏移量递增后的数据块的指纹数据, 后, 得到的数据块的指纹数据;  Here, the offset of each data block of the saved file is incremented by the fingerprint data of the data block, and then the fingerprint data of the obtained data block;
需要说明的是, 服务器比对指纹数据可以釆用 rsync滚动校验算法, 依 次将客户端发送的数据块的指纹数据, 以及地址的偏移量加 1 的指纹数据 与服务器保存的文件的各个数据块的指纹数据, 以及保存的文件的各个数 据块的地址的偏移量递增后的数据块的指纹数据进行比对, 同时需要指出 的是, 本步骤的目的在于进行指纹数据的对比, 为了方便说明而以 rsync算 法进行举例, 但显然的, 除了 rsync算法外, 本领域技术人员可以根据实际 情况选用其他算法; It should be noted that the server can use the rsync rolling verification algorithm to compare the fingerprint data. The fingerprint data of the data block sent by the client, and the fingerprint data of the offset of the address plus the fingerprint data of each data block of the file saved by the server, and the offset of the address of each data block of the saved file. The fingerprint data of the incremented data block is compared. At the same time, it should be pointed out that the purpose of this step is to compare the fingerprint data. For convenience of explanation, the rsync algorithm is used as an example, but obviously, in addition to the rsync algorithm, the field The technician can select other algorithms according to the actual situation;
例如: 以定长为 2的数据块为例; 第一次, 将客户端发送的第一个数据 块的指纹数据与服务器保存的文件的第一个数据块的指纹数据进行比对, 如果不相同, 则将其与服务器保存的文件的第二个数据块的指纹数据进行 比对, 如果不相同, 则依次比对, 直至服务器保存的文件的最后一个数据 块的指纹数据, 如果仍然不相同, 则将其与服务器保存的文件的第一个数 据块的地址的偏移量加 1 的指纹数据进行比对, 如果不相同, 则将其与月良 务器保存的文件的第二个数据块的地址的偏移量加 1 的数据块指纹数据进 行比对, 如果仍然找不到匹配的指纹数据, 则可以通知客户端发送该数据 块; 第二次, 将客户端发送的第一个数据块的地址的偏移量加 1 的数据块 的指纹数据按照第一次的步骤, 依次与服务器保存的文件的数据块, 及偏 移量加 1的数据块进行比对。  For example: Take a data block with a fixed length of 2 as an example; for the first time, compare the fingerprint data of the first data block sent by the client with the fingerprint data of the first data block of the file saved by the server, if not If they are the same, compare them with the fingerprint data of the second data block of the file saved by the server. If they are not the same, compare them in sequence, until the fingerprint data of the last data block of the file saved by the server is still different. , and compares it with the fingerprint data of the address of the first data block of the file saved by the server plus one, and if not, the second data of the file saved with the server. The offset of the block address plus 1 data block fingerprint data is compared. If the matching fingerprint data is still not found, the client may be notified to send the data block; the second time, the first one sent by the client The offset of the address of the data block plus the fingerprint data of the data block of 1 according to the first step, in turn with the data block of the file saved by the server, and the offset plus one The data blocks are compared.
需要说明的是, 在本发明实施例中除了可以将文件的数据按照长度为 2 来划分数据块以外, 还可以按照长度为 3、 定长为 4等来划分数据块; 若以长度为 3来划分数据块, 则当将客户端发送的数据块的指纹数据与 服务器端的各个数据块的指纹数据进行比较时, 不但要依次比较数据块的 地址的偏移量加 1 的数据块的指纹数据, 还需要依次比较数据块的地址的 偏移量加 2的数据块的指纹数据;  It should be noted that, in the embodiment of the present invention, in addition to dividing the data of the file into data blocks according to the length of 2, the data block may be divided according to the length of 3, the fixed length of 4, etc.; When the data block is divided, when the fingerprint data of the data block sent by the client is compared with the fingerprint data of each data block of the server, not only the offset of the address of the data block but also the fingerprint data of the data block of 1 is compared. It is also necessary to sequentially compare the offset of the address of the data block with the fingerprint data of the data block of 2;
若以长度为 4来划分数据块, 则当将客户端发送的数据块的指纹数据与 服务器端的各个数据块的指纹数据进行比较时, 不但要依次比较数据块的 地址的偏移量加 1 的数据块的指纹数据, 还需要依次比较数据块的地址的 偏移量加 2的数据块的指纹数据, 以及数据块的地址的偏移量加 3的数据 块的指纹数据; If the data block is divided by a length of 4, when the fingerprint data of the data block sent by the client is compared with the fingerprint data of each data block of the server side, not only the offset of the address of the data block is sequentially added plus one. The fingerprint data of the data block also needs to compare the addresses of the data blocks in turn. The fingerprint data of the data block with the offset plus 2, and the offset of the address of the data block plus the fingerprint data of the data block of 3;
以此类推, 若以长度为 N来划分数据块, N为大于 2的自然数, 待备份 文件的各个数据块的地址的偏移量递增后的数据块的指纹数据, 是该待备 份文件的各个数据块的地址的偏移量从加 1 , 依次递增至加 N - 1后的数据 块的指纹数据。  By analogy, if the data block is divided by the length N, N is a natural number greater than 2, and the fingerprint data of the data block after the offset of the address of each data block of the file to be backed up is the file to be backed up. The offset of the address of the data block is incremented by 1 and sequentially incremented to the fingerprint data of the data block after N-1 is added.
同样地, 客户端的各个数据块的地址的偏移量递增的数据块, 也是从偏 移量加 1递增至偏移量加 N - 1的数据块。  Similarly, the data block with the offset of the address of each data block of the client is also incremented from the offset amount by one to the offset plus N -1 data block.
步骤 S503: 将比对结果发送给客户端, 并接收客户端发送的指纹数据不 一致的数据块及指针。  Step S503: Send the comparison result to the client, and receive the data block and the pointer that the fingerprint data sent by the client is inconsistent.
在步骤 S502 中, 服务器通过比对得出发生变化的数据块, 由此可以将 比对结果发送给客户端, 指示客户端将指纹数据不一致的数据块及指针发 送给服务器。  In step S502, the server obtains the changed data block by comparing, so that the comparison result can be sent to the client, and the client is instructed to send the data block and the pointer with inconsistent fingerprint data to the server.
本发明实施例的有益效果在于, 当客户端待备份文件发生变化需要再次 备份时, 将该待备份文件的数据块的指纹数据, 以及待备份文件的数据块 的地址的偏移量递增后的数据块的指纹数据与服务器端保存的文件的数据 块的指纹数据, 以及文件的数据块的地址的偏移量递增的数据块的指纹数 据进行比对, 然后将指纹数据发生变化的数据块所对应的数据发送给服务 器。 因此可以在保证服务器端备份文件唯一存储的前提下, 有效地减少服 务器端数据存储, 进一步提高重复数据删除率。 参见图三, 是本发明另一个实施方式所公开的数据处理方法; 在本发明实施例中, 客户端已完成了文件的首次备份, 文件首次备份 是将客户端的文件完整地备份到服务器, 服务器将该文件作为元数据保存。  The beneficial effects of the embodiment of the present invention are: when the backup file of the client needs to be backed up again, the fingerprint data of the data block of the file to be backed up and the offset of the address of the data block of the file to be backed up are incremented. The fingerprint data of the data block is compared with the fingerprint data of the data block of the file saved by the server, and the fingerprint data of the data block whose offset of the address of the data block of the file is increased, and then the data block of the fingerprint data is changed. The corresponding data is sent to the server. Therefore, under the premise of ensuring the unique storage of the server-side backup files, the server-side data storage can be effectively reduced, and the deduplication ratio can be further improved. Referring to FIG. 3, it is a data processing method according to another embodiment of the present invention. In the embodiment of the present invention, the client has completed the first backup of the file, and the first backup of the file is to completely back up the client file to the server, the server. Save the file as metadata.
在经过一段时间后, 该文件可能发生变化, 即需要将变化后的文件备 份到服务器, 完成服务器对元数据的更新。 下面将具体对文件可能发生变 化之后的备份方法进行描述: After a period of time, the file may change, that is, the changed file needs to be backed up to the server to complete the server's update of the metadata. The following will specifically change the file. Description of the backup method after the:
步骤 S 101: 计算待备份文件的指纹数据;  Step S101: Calculate fingerprint data of the file to be backed up;
具体地,指纹数据是指釆用 SHA-1 , MD-5等类似的算法将待备份的文 件生成一个 32位或者 128位哈希值与数据形成——对应关系, 由此生成的 指纹数据是该待备份文件数据的唯一标识。  Specifically, the fingerprint data refers to a similar algorithm of SHA-1, MD-5, etc., to generate a 32-bit or 128-bit hash value and a data corresponding to the data to be backed up, and the fingerprint data generated thereby is The unique identifier of the file data to be backed up.
步骤 S102: 将该待备份文件的指纹数据发送给服务器, 用于将该待备 份文件的指纹数据与服务器保存的文件的指纹数据进行比对;  Step S102: Send the fingerprint data of the file to be backed up to the server, and compare the fingerprint data of the to-be-backed file with the fingerprint data of the file saved by the server;
具体地, 客户端将在步骤 S101计算获得的待备份文件的指纹数据发送 给服务器, 服务器在文件的首次备份时已通过计算获得该文件的指纹数据, 在接收到客户端发送的指纹数据后与其在先保存的指纹数据进行比较, 若 指纹数据相同, 则说明文件没有发生变化, 若指纹数据不同, 则说明文件 已经发生变化。  Specifically, the client sends the fingerprint data of the file to be backed up obtained in step S101 to the server, and the server obtains the fingerprint data of the file through the calculation when the file is first backed up, and after receiving the fingerprint data sent by the client, If the fingerprint data is the same, the file does not change. If the fingerprint data is different, the file has changed.
步骤 S102和步骤 S103的技术效果在于,在经过一段时间 t以后,如果 不确定原文件是否发生变化, 可以将当前文件的指纹数据与原文件的指纹 数据进行比对, 以确定原文件是否发生变化, 若没有变化, 则客户端将该 文件的指针发送给服务器即可, 不必进行后面的操作, 然而在本发明实施 例中重点讨论的是原文件发生变化的情况。  The technical effect of step S102 and step S103 is that after a period of time t, if it is not determined whether the original file has changed, the fingerprint data of the current file can be compared with the fingerprint data of the original file to determine whether the original file has changed. If there is no change, the client sends the pointer of the file to the server, and does not need to perform the subsequent operations. However, in the embodiment of the present invention, the case where the original file changes is mainly discussed.
步骤 S103: 接收服务器发送的比对结果, 当比对结果为不相同时, 计 算该待备份文件的各个数据块的指纹数据, 以及各个数据块的地址的偏移 量递增的数据块的指纹数据;  Step S103: The comparison result sent by the receiving server, when the comparison result is different, calculating fingerprint data of each data block of the file to be backed up, and fingerprint data of the data block with increasing offset of the address of each data block ;
具体地, 在本发明实施例中将待备份文件按照给定长度划分为数据块, 数据块是数据的物理记录, 这里的数据块可以理解为一组或按顺序连续排 列在一起的几组记录, 是主存储器与输入、 输出设备或外存储器之间进行 传输的一个数据单位;  Specifically, in the embodiment of the present invention, the file to be backed up is divided into data blocks according to a given length, and the data block is a physical record of data, where the data block can be understood as a group or a group of records continuously arranged in order. , is a data unit for transmission between the main memory and the input, output device or external memory;
计算各个数据块的指纹数据的方法与步骤 S101中计算待备份文件的指 纹数据的方法一致, 这里不在赘述; 另外, 当比对结果相同时, 说明待备份文件与服务器中保存的元数据 相比没有变化, 则将该备份文件的指针发送给服务器即可。 The method for calculating the fingerprint data of each data block is consistent with the method for calculating the fingerprint data of the file to be backed up in step S101, and is not described here; In addition, when the comparison result is the same, it indicates that the file to be backed up has not changed compared with the metadata stored in the server, and the pointer of the backup file is sent to the server.
步骤 S104: 将各个数据块的指纹数据及文件的各个数据块的地址的偏 移量递增后的数据块的指纹数据发送给服务器, 用于与服务器保存的文件 的各个数据块的指纹数据, 以及保存的文件的各个数据块的地址的偏移量 递增后的数据块的指纹数据进行比对;  Step S104: Send the fingerprint data of each data block and the fingerprint data of the data block with the offset of the address of each data block of the file to the server, and use the fingerprint data of each data block of the file saved by the server, and The fingerprint data of the data block with the offset of the address of each data block of the saved file is compared;
具体地, 客户端将计算得出的各个数据块的指纹数据, 及文件的各个 数据块的地址的偏移量递增后的数据块的指纹数据发送给服务器, 用于服 务器依次将其与服务器保存的文件的各个数据块的指纹数据, 以及保存的 文件的各个数据块的地址的偏移量递增后的数据块的指纹数据进行对比; 这里的服务器保存的文件的各个数据块的指纹数据, 是指将首次备份 时客户端发送给服务器的元数据按照给定的长度进行划分的数据块的指纹 数据;  Specifically, the client sends the calculated fingerprint data of each data block and the fingerprint data of the data block with the offset of the address of each data block of the file to the server, and the server sequentially saves it and the server. The fingerprint data of each data block of the file, and the fingerprint data of the data block after the offset of the address of each data block of the saved file is compared; here, the fingerprint data of each data block of the file saved by the server is Refers to the fingerprint data of the data block divided by the given length when the metadata sent by the client to the server is first backed up;
这里的保存的文件的各个数据块的地址的偏移量递增后的数据块的指 移量加 1后, 得到的数据块的指纹数据。  Here, the offset of the address of each data block of the saved file is incremented by the amount of shift of the data block, and the fingerprint data of the obtained data block is obtained.
步骤 S105: 根据服务器发送的比对结果,将指纹数据不一致的数据块及 指针发送给服务器;  Step S105: Send, according to the comparison result sent by the server, the data block and the pointer with inconsistent fingerprint data to the server;
具体地, 对于比对结果相同的数据块, 服务器只需指示客户端将该数 据块的指针发送给服务器即可, 对于比对结果不同的数据, 服务器需指示 客户端发送该数据。  Specifically, for the data block with the same comparison result, the server only needs to instruct the client to send the pointer of the data block to the server. For the data with different comparison results, the server needs to instruct the client to send the data.
需要说明的是, 上述实施例可以是对一个文件进行备份, 也可以是对 一个文件集中的多个文件进行备份, 具体的备份方法类似, 不再赘述。  It should be noted that, in the foregoing embodiment, a file may be backed up, or multiple files in a file set may be backed up. The specific backup method is similar, and details are not described herein.
本发明实施例的有益效果在于, 当客户端待备份文件发生变化需要再 次备份时, 将该待备份文件的数据块的指纹数据, 以及待备份文件的数据 块的地址的偏移量递增后的数据块的指纹数据与服务器端保存的文件的数 据块的指纹数据, 以及文件的数据块的地址的偏移量递增的数据块的数据 块进行比对, 然后将指纹数据发生变化的数据块所对应的数据发送给服务 器。 因此可以在保证服务器端备份文件唯一存储的前提下, 有效地减少服 务器端数据存储, 进一步提高重复数据删除率。 参见图四, 是本发明另一个实施方式所公开的数据处理方法; 在本发明实施例中, 客户端已完成了文件的首次备份, 文件首次备份 是将客户端的文件完整地备份到服务器, 服务器将该文件作为元数据保存。 The beneficial effects of the embodiment of the present invention are: when the backup file of the client needs to be backed up again, the fingerprint data of the data block of the file to be backed up and the offset of the address of the data block of the file to be backed up are incremented. The fingerprint data of the data block and the number of files saved on the server side The data block of the data block in which the offset of the address of the data block of the file is incremented is compared according to the fingerprint data of the block, and then the data corresponding to the data block whose fingerprint data has changed is transmitted to the server. Therefore, under the premise of ensuring the unique storage of the server-side backup file, the server-side data storage can be effectively reduced, and the deduplication ratio can be further improved. Referring to FIG. 4, it is a data processing method according to another embodiment of the present invention. In the embodiment of the present invention, the client has completed the first backup of the file, and the first backup of the file is to completely back up the client file to the server, the server. Save the file as metadata.
在经过一段时间后, 该文件可能发生变化, 即需要将变化后的文件备 份到服务器, 完成服务器对元数据的更新。 下面将具体对文件可能发生变 化之后的备份方法进行描述, 与实施例一不同之处在于, 实施例一是从客 户端的角度描述的, 而本实施例是从服务器的角度描述的;  After a period of time, the file may change, that is, the changed file needs to be backed up to the server to complete the server's update of the metadata. The following describes the backup method after the file may be changed. The difference from the first embodiment is that the first embodiment is described from the perspective of the client, and the embodiment is described from the perspective of the server;
步骤 S201: 接收客户端发送的待备份文件的指纹数据;  Step S201: Receive fingerprint data of the file to be backed up sent by the client;
具体地,指纹数据是指釆用 SHA-1 , MD-5等类似的算法将待备份的数 据生成一个 32位或者 128位哈希值与数据形成——对应关系, 由此生成的 指纹数据是该待备份文件数据的唯一标识。  Specifically, the fingerprint data refers to a similar algorithm of SHA-1, MD-5, etc., to generate a 32-bit or 128-bit hash value to be backed up by the data to be backed up, and the resulting fingerprint data is The unique identifier of the file data to be backed up.
步骤 S202: 将该待备份文件的指纹数据与保存的文件的指纹数据进行 比对, 并将比对结果发送给客户端;  Step S202: Comparing the fingerprint data of the file to be backed up with the fingerprint data of the saved file, and sending the comparison result to the client;
具体地, 服务器在文件的首次备份时已通过计算获得该文件的指纹数 据, 在接收到客户端发送的指纹数据后与其在先保存的指纹数据进行比较, 若指纹数据相同, 则说明文件没有发生变化, 若指纹数据不同, 则说明文 件已经发生变化。  Specifically, the server obtains the fingerprint data of the file by performing calculation on the first backup of the file, and compares the fingerprint data sent by the client with the previously stored fingerprint data. If the fingerprint data is the same, the file does not occur. Change, if the fingerprint data is different, the file has changed.
步骤 S202和步骤 S203的技术效果在于,在经过一段时间 t以后,如果 不确定原文件是否发生变化, 可以将当前文件的指纹数据与原文件的指纹 数据进行比对, 以确定原文件是否发生变化, 若没有变化, 则客户端将该 文件的指针发送给服务器即可, 不必进行后面的操作, 然而在本发明实施 例中重点讨论的是原文件发生变化的情况。 The technical effect of step S202 and step S203 is that after a period of time t, if it is not determined whether the original file has changed, the fingerprint data of the current file can be compared with the fingerprint data of the original file to determine whether the original file has changed. If there is no change, the client sends the pointer of the file to the server, and does not need to perform the following operations, but in the implementation of the present invention The case focuses on the changes in the original documents.
步骤 S203: 计算保存的文件的各个数据块的指纹数据, 以及保存的文 件的各个数据块的地址的偏移量递增后的数据块的指纹数据;  Step S203: calculating fingerprint data of each data block of the saved file, and fingerprint data of the data block after the offset of the address of each data block of the saved file is incremented;
具体地, 在本发明实施例中将待备份文件按照给定长度划分为数据块, 数据块是数据的物理记录, 这里的数据块可以理解为一组或按顺序连续排 列在一起的几组记录, 是主存储器与输入、 输出设备或外存储器之间进行 传输的一个数据单位;  Specifically, in the embodiment of the present invention, the file to be backed up is divided into data blocks according to a given length, and the data block is a physical record of data, where the data block can be understood as a group or a group of records continuously arranged in order. , is a data unit for transmission between the main memory and the input, output device or external memory;
计算各个数据块的指纹数据的方法与步骤 S201中计算待备份文件的指 纹数据的方法一致, 这里不再赘述;  The method for calculating the fingerprint data of each data block is the same as the method for calculating the fingerprint data of the file to be backed up in step S201, and details are not described herein again;
这里的保存的文件的各个数据块的地址的偏移量递增后的数据块的指 移量加 1 后, 得到的数据块的指纹数据, 具体的计算方法与前面一致, 这 里不再赘述。  Here, the offset of the address of each data block of the saved file is incremented by 1 and the fingerprint data of the obtained data block is calculated in the same manner as before, and will not be described here.
步骤 S204: 接收客户端发送的待备份文件的各个数据块的指纹数据, 及文件的各个数据块的地址的偏移量递增后的数据块的指纹数据, 并将其 与保存的文件的各个数据块的指纹数据, 以及保存的文件的各个数据块的 地址的偏移量递增后的数据块的指纹数据进行比对。  Step S204: Receive fingerprint data of each data block of the file to be backed up sent by the client, and fingerprint data of the data block with the offset of the address of each data block of the file, and compare the data with the saved file. The fingerprint data of the block and the fingerprint data of the data block in which the offset of the address of each data block of the saved file is incremented are compared.
步骤 S205: 将比对结果发送给客户端, 并接收客户端发送的指纹数据不 一致的数据块及指针;  Step S205: Send the comparison result to the client, and receive the data block and the pointer that the fingerprint data sent by the client is inconsistent;
具体地, 对于比对结果相同的数据块, 服务器只需指示客户端将该数 据块的指针发送给服务器即可, 对于比对结果不同的数据, 服务器需指示 客户端发送该数据块。  Specifically, for the data block with the same comparison result, the server only needs to instruct the client to send the pointer of the data block to the server. For the data with different comparison results, the server needs to instruct the client to send the data block.
需要说明的是, 上述实施例可以是对一个文件进行备份, 也可以是对 同一个文件集所包含的多个文件进行备份, 具体的备份方法类似, 不再赘 述。  It should be noted that, in the foregoing embodiment, a file may be backed up, or multiple files included in the same file set may be backed up. The specific backup method is similar and will not be described again.
本发明实施例的有益效果在于, 当客户端待备份文件发生变化需要再 次备份时, 将该待备份文件的数据块的指纹数据, 以及待备份文件的数据 块的地址的偏移量递增后的数据块的指纹数据与服务器端保存的文件的数 据块的指纹数据, 以及文件的数据块的地址的偏移量递增的数据块的指纹 数据进行比对, 然后将指纹数据发生变化的数据块所对应的数据发送给服 务器。 因此可以在保证服务器端备份文件唯一存储的前提下, 有效地减少 服务器端数据存储, 进一步提高重复数据删除率。 参见图五, 图五是以一个具体的示例来说明本发明一个实施方式所公 开的数据处理方法, 需要说明的是, 对已经完成数据首次备份的实施例而 言, 下面的步骤 S301及 S302不是必须的; The beneficial effect of the embodiment of the present invention is that when the client to be backed up, the file needs to be changed. In the secondary backup, the fingerprint data of the data block of the file to be backed up, and the offset of the data block of the file to be backed up, and the fingerprint data of the data block of the file saved by the server, And the fingerprint data of the data block with the offset of the address of the data block of the file is compared, and then the data corresponding to the data block whose fingerprint data has changed is sent to the server. Therefore, under the premise of ensuring the unique storage of the server-side backup file, the server-side data storage can be effectively reduced, and the deduplication ratio can be further improved. Referring to FIG. 5, FIG. 5 is a specific example for explaining a data processing method according to an embodiment of the present invention. It should be noted that, for an embodiment in which data is first backed up, the following steps S301 and S302 are not necessary;
步骤 S301 :客户端首次备份文件数据到服务器;  Step S301: The client backs up the file data to the server for the first time;
步骤 S302:服务器保存首次备份文件数据及该文件的指纹数据; 具体地,服务器将客户端发送的备份文件数据作为元数据保存,以及将 通过计算获得的该文件的指纹数据也保存下来,对于指纹数据的计算方法已 在前面实施例中说明,这里不再赘述。  Step S302: The server saves the first backup file data and the fingerprint data of the file. Specifically, the server saves the backup file data sent by the client as metadata, and saves the fingerprint data of the file obtained by the calculation, for the fingerprint. The calculation method of the data has been explained in the previous embodiment, and will not be described again here.
S303:经过一段时间 t后,客户端计算并发送文件的指纹数据;  S303: After a period of time t, the client calculates and sends fingerprint data of the file;
具体地,经过一段时间 t后, 原文件可能会发生变化,客户端需要将变化 的数据发送给服务器以实现数据同步;  Specifically, after a period of time t, the original file may change, and the client needs to send the changed data to the server to implement data synchronization;
然而,此时客户端并不知道原文件是否发生变化,因此需要对原文件变 化与否进行验证,验证的具体方法为计算当前文件的指纹数据,并将其发送 给服务器。  However, at this time, the client does not know whether the original file has changed. Therefore, it is necessary to verify whether the original file is changed or not. The specific method of verification is to calculate the fingerprint data of the current file and send it to the server.
步骤 S304: 服务器将客户端发送的指纹数据与保存的指纹数据进行比 对;  Step S304: The server compares the fingerprint data sent by the client with the saved fingerprint data.
具体地, 服务器接收客户端发送的其当前文件的指纹数据, 与其保存 的原文件的指纹数据进行比对, 如果两个指纹数据相同, 则说明文件没有 发生变化, 如果两个指纹数据不同, 则说明文件发生变化, 由于本发明实 施例所要解决的技术问题是文件发生变化以后的处理, 所以这里重点讨论 文件发生变化以后的情况。 Specifically, the server receives the fingerprint data of the current file sent by the client, and compares the fingerprint data of the original file saved by the client. If the two fingerprint data are the same, the file does not change. If the two fingerprint data are different, The description file has changed, due to the present invention. The technical problem to be solved by the example is the processing after the file changes, so here we will focus on the situation after the file changes.
步骤 S305: 发送比对结果;  Step S305: Send a comparison result;
具体地, 服务器在步骤 S304中通过比对发现文件发生了变化, 就将比 对结果发送给客户端,接下来进行后面的步骤,若服务器在步骤 S304中通过 比对发现文件没有变化, 则不需要对服务器中保存的文件进行更新了。  Specifically, the server sends the comparison result to the client by comparing the discovery file in step S304, and then performs the following steps. If the server does not change the file through the comparison in step S304, The files saved in the server need to be updated.
步骤 S306: 计算并发送各个数据块的指纹数据, 及文件的各个数据块 的地址的偏移量递增后的数据块的指纹数据;  Step S306: Calculate and send fingerprint data of each data block, and fingerprint data of the data block with the offset of the address of each data block of the file is incremented;
下面以在原文件头部插入一位数据来举例说明:  The following is an example of inserting a bit of data in the header of the original file:
原文件为: 1234ABC;  The original document is: 1234ABC;
在原文件头部插入一位数据 0之后为: 01234ABC;  After inserting a bit of data 0 in the header of the original file: 01234ABC;
在本发明实施例中以定长 2来划分数据块, 即上述 01234ABC可以划 分为如图 10 (a)所示的 4个数据块;  In the embodiment of the present invention, the data block is divided by the fixed length 2, that is, the above 01234ABC can be divided into four data blocks as shown in FIG. 10(a);
在步骤 S306中, 客户端通过计算获得每个数据块的指纹数据, 及文件 的各个数据块的地址的偏移量递增后的数据块的指纹数据, 具体而言, 就 是计算数据块(0, 1)、 (2、 3)、 (4、 A)、 (B、 C) 的指纹数据, 以及( 1, 2)、 (3, 4)、 (A, B)、 (C)的指纹数据, 并将这些指纹数据发送给服务器; 步骤 S307: 服务器计算保存的文件的各个数据块的指纹数据及文件的 各个数据块的地址偏移量加 1 后的各个数据块的指纹数据, 并依次与客户 端发送的各个数据块的指纹数据进行比对;  In step S306, the client obtains the fingerprint data of each data block by calculation, and the fingerprint data of the data block after the offset of the address of each data block of the file is incremented, specifically, the data block (0, Fingerprint data of 1), (2, 3), (4, A), (B, C), and fingerprint data of (1, 2), (3, 4), (A, B), (C), And sending the fingerprint data to the server; Step S307: The server calculates the fingerprint data of each data block of the saved file and the fingerprint data of each data block after the address offset of each data block of the file is added, and sequentially connects with the client The fingerprint data of each data block sent by the terminal is compared;
具体地, 以步骤 S306 中的文件为例, 服务器保存的元数据同样为: 1234ABC;该各个数据块的指纹数据及文件的各个数据块的地址偏移量加 1 后的各个数据块的指纹数据为如图 10 (b) 所示: 服务器接收到客户端发送的当前文件的各个数据块的指纹数据后, 将 第一个数据块(0, 1 ) 的指纹数据 FP A 依次与服务器中保存的文件的 数据块(1, 2) 的指纹数据 FPA, 数据块(3, 4) 的指纹数据 FPB, 数据 块(A, B) 的指纹数据 FPC, 数据块(C) 的指纹数据 FPD, 以及偏移量 加 1的数据块(2, 3)的指纹数据 FPE, 数据块(4, A)的指纹数据 FPF, 数据块(B, C) 的指纹数据 FPG进行比对; Specifically, taking the file in step S306 as an example, the metadata saved by the server is also: 1234ABC; the fingerprint data of each data block and the fingerprint data of each data block after the address offset of each data block of the file is increased by one. As shown in Figure 10 (b): After receiving the fingerprint data of each data block of the current file sent by the client, the server sequentially stores the fingerprint data FP A of the first data block (0, 1) with the server. Fingerprint data FPA of data block (1, 2), fingerprint data FFP of data block (3, 4), data Fingerprint data FPC of block (A, B), fingerprint data FPD of data block (C), and fingerprint data FPE of data block (2, 3) with offset plus 1 and fingerprint data of data block (4, A) FPF, the fingerprint data FPG of the data block (B, C) is compared;
在现有技术中, 服务器只会计算、 保存元数据的各个数据块的指纹数 据 FPA, FPB, FPC, FPD, 当用其同当前文件的各个数据块的指纹数据 进行比对时可以发现, 当前文件的各个数据块的指纹数据与 FPA, FPB, FPC, FPD均不相同, 因此将当前文件的所有数据块都备份到服务器中, 这就降低了重复数据删除率, 增加了服务器端的数据量和网络带宽的消耗 了;  In the prior art, the server only calculates and stores the fingerprint data FPA, FPB, FPC, FPD of each data block of the metadata, and when compared with the fingerprint data of each data block of the current file, it can be found that the current The fingerprint data of each data block of the file is different from FPA, FPB, FPC, and FPD, so all data blocks of the current file are backed up to the server, which reduces the deduplication rate and increases the amount of data on the server side. Network bandwidth consumption;
而在本发明实施中,服务器将数据块(0, 1)的指纹数据 FP A,与 FP A, In the implementation of the present invention, the server will use the fingerprint data FP A of the data block (0, 1), and FP A,
FPB, FPC, FPD, FP E, FPF, FPG进行比对后, 发现不同, 随即比对 文件偏移量加 1 的数据块(1, 2) 的指纹数据, 发现数据块(1, 2) 的指 纹数据与 FPA相同, 这说明原文件的头部增加了一位数据, 于是服务器将 该比对结果发送给客户端, 要求客户端将数据 0发送至服务器。 After FPB, FPC, FPD, FP E, FPF, and FPG are compared, the difference is found, and then the fingerprint data of the data block (1, 2) is incremented by 1 and the data block (1, 2) is found. The fingerprint data is the same as FPA, which means that one bit of data is added to the header of the original file, and the server sends the comparison result to the client, requesting the client to send data 0 to the server.
接下来依次对其余的数据块的指纹数据进行比较, 可以发现数据块 Next, the fingerprint data of the remaining data blocks are sequentially compared, and the data blocks can be found.
(3, 4) 的指纹数据与 FPB相同, 数据块(A, B) 的指纹数据与 FPC相 同, 数据块(C) 的指纹数据与 FP D相同, 由此可以得出结论当前文件是 在原文件的头部增加了一位数据 0。 The fingerprint data of (3, 4) is the same as that of FPB. The fingerprint data of data block (A, B) is the same as that of FPC, and the fingerprint data of data block (C) is the same as FP D. It can be concluded that the current file is in the original file. The header has been incremented by one bit of data 0.
步骤 S308发送比对结果;  Step S308 sends a comparison result;
步骤 S309: 发送指纹数据不一致的数据块及指纹。  Step S309: Send a data block and a fingerprint with inconsistent fingerprint data.
具体的, 客户端将原文件头部插入的数据 0发送给服务器。  Specifically, the client sends the data 0 inserted in the header of the original file to the server.
可见, 本发明实施例只是将发生变化了的一位数据及其指针发送给服 务器, 与现有技术相比提高了重复数据删除率, 减少了服务器的数据存储 和网络带宽的消耗。  It can be seen that the embodiment of the present invention only sends the changed one-bit data and its pointer to the server, which improves the deduplication ratio compared with the prior art, and reduces the data storage and network bandwidth consumption of the server.
下面再以原文件的中部的数据发生改变为例说明本发明实施例的数据 处理方法; 例如: 将原文件 1234ABC修改为 15D23C; The data processing method of the embodiment of the present invention is described below by taking the data in the middle of the original file as an example. For example: Modify the original file 1234ABC to 15D23C;
客户端将当前文件 15D23C按照长度 2划分的数据块如图 10 (c)所示 为 (1, 5)、 (D, 2)、 (3, C): 由上述实施例可知, 服务器保存的元数据为: 1234ABC; 该各个数据 块的指纹数据及文件的各个数据块的偏移量加 1 后的各个数据块的指纹数 据如图 10 (d)所示: 服务器将客户端发送的数据块(1, 5) 的指纹数据 FPA,与其计算保存 的指纹数据 FPA, FPB, FPC, FPD, FPE, FPF, FPG进行比于后, 发 现没有匹配的指纹数据;  The data block that the client divides the current file 15D23C according to the length 2 is (1, 5), (D, 2), (3, C) as shown in FIG. 10(c): It can be known from the above embodiment that the server saves the element. The data is: 1234ABC; the fingerprint data of each data block and the offset of each data block of the file plus one fingerprint data of each data block are as shown in FIG. 10(d): The server sends the data block sent by the client ( 1, 5) The fingerprint data FPA is compared with the calculated fingerprint data FPA, FPB, FPC, FPD, FPE, FPF, FPG, and no matching fingerprint data is found;
服务器接下来将客户端发送的数据块(5, D) 的指纹数据与其保存的 指纹数据进行比对后, 发现仍然没有匹配的指纹数据, 说明数据块(1, 5) 是一个发生变化的数据块;  The server then compares the fingerprint data of the data block (5, D) sent by the client with the fingerprint data saved by the client, and finds that there is still no matching fingerprint data, indicating that the data block (1, 5) is a changed data. Piece;
然后, 服务器将客户端发送的数据块(D, 2) 的指纹数据与其保存的 指纹数据进行比对后, 发现没有与其匹配的指纹数据;  Then, the server compares the fingerprint data of the data block (D, 2) sent by the client with the fingerprint data saved by the client, and finds that there is no fingerprint data matching the same;
服务器接下来将客户端发送的数据块(2, 3) 的指纹数据与其保存的 指纹数据进行比对后, 发现与 FP E 匹配, 说明当前数据块在原数据块 23 的基础上增加了一位数据 D, 因此可以指示客户端将数据 D及其指针发送 给服务器;  The server then compares the fingerprint data of the data block (2, 3) sent by the client with the fingerprint data saved by the client, and finds that it matches the FP E, indicating that the current data block adds one bit of data to the original data block 23. D, therefore, can instruct the client to send the data D and its pointer to the server;
服务器接下来将客户端剩下的数据块 C的指纹数据与其保存的指纹数 据进行比对后, 发现与 FPD匹配, 因此可以数据块 C没有发生变化, 只需 指示客户端将数据块 C的指针发送给服务器即可;  The server then compares the fingerprint data of the remaining data block C of the client with the fingerprint data stored by the client, and finds that it matches the FPD, so that the data block C does not change, and only the client is required to pointer the data block C. Send it to the server;
需要说明的是, 客户端对文件按照给定长度进行划分是一种逻辑上的 划分, 并非是真正意义上的将文件划分为若干个数据块, 目的是方便与服 务器端保存的文件数据进行对比, 找出发生变化的数据, 因此在数据块的 划分并不是固定的, 以上述例子为例, 当发现数据块(D, 2) 的偏移量加 1的数据块(2, 3 )的指纹数据能够在服务器端找到匹配的指纹数据时, 可 以在逻辑上将数据块(1 , 2 )作为一个数据块, 而将其前面一位的数据 D, 以及后面的数据 C分别作为单独的数据块。 It should be noted that the client divides the file according to the given length, which is a logical division. It is not a true division of the file into several data blocks. The purpose is to facilitate comparison with the file data saved on the server side. Find the data that has changed, so the partitioning of the data block is not fixed. Take the above example as an example. When the offset of the data block (D, 2) is found, When the fingerprint data of the data block (2, 3) of 1 can find the matching fingerprint data on the server side, the data block (1, 2) can be logically used as a data block, and the data D of the previous bit is And the following data C as separate data blocks.
服务器完成比对之后, 可以通过发送比对结果指示客户端将发生变化 的数据块(1 , 5 )、 数据块(D )及该两个数据块的指针发送给服务器。  After the server completes the comparison, the data block (1, 5), the data block (D), and the pointers of the two data blocks, which the client will change, may be sent to the server by sending a comparison result.
由此可以看出, 本发明实施例与现有技术相比, 当文件的头部或中部 发生变化时, 可以提高数据重复删除率, 减少服务器的数据存储量及网络 带宽的消耗。 参见图六, 是本发明一个实施方式所公开的客户端;  It can be seen that the embodiment of the present invention can improve the data erasure rate and reduce the data storage capacity of the server and the consumption of the network bandwidth when the head or the middle of the file changes compared with the prior art. Referring to FIG. 6, a client disclosed in an embodiment of the present invention;
第一计算模块 601 : 用于计算待备份文件的各个数据块的指纹数据, 以及 待备份文件的各个数据块的地址的偏移量递增后的数据块的指纹数据, 本 发明实施例汇总, 指纹数据是用以唯一标识某一文件或者文件的某一数据 块的数据;  The first calculation module 601 is configured to calculate fingerprint data of each data block of the file to be backed up, and fingerprint data of the data block with the offset of the address of each data block of the file to be backed up, which is summarized in the embodiment of the present invention. Data is data used to uniquely identify a certain data block of a file or file;
具体地,指纹数据是指釆用 SHA-1 , MD-5等类似的算法将待备份的文 件的各个数据块各自生成一个 32位或者 128位哈希值与数据形成——对应 关系, 由此生成的指纹数据是该待备份文件数据的唯一标识, 需要特别指 出的是, 除了使用以上列举的算法外, 本实施例以及后续的其他实施例中, 还可以根据具体需要使用其他算法生成指纹数据, 只要客户端与服务器的 算法一致或者具有对应性, 且通过算法产生的数据能够唯一标识待备份文 件数据块即可;  Specifically, the fingerprint data refers to a similar algorithm of SHA-1, MD-5, etc., which generates a 32-bit or 128-bit hash value and a data corresponding to each data block of the file to be backed up, thereby forming a correspondence relationship. The generated fingerprint data is a unique identifier of the file data to be backed up. In addition to using the above enumerated algorithms, in this embodiment and other subsequent embodiments, other algorithms may be used to generate fingerprint data according to specific needs. As long as the algorithm of the client and the server is consistent or corresponding, and the data generated by the algorithm can uniquely identify the data block to be backed up;
在本发明实施例中将待备份文件按照给定长度划分为数据块, 数据块是 数据的物理记录, 这里的数据块可以理解为一组或按顺序连续排列在一起 的几组记录, 是主存储器与输入、 输出设备或外存储器之间进行传输的一 个数据单位;  In the embodiment of the present invention, the file to be backed up is divided into data blocks according to a given length, and the data block is a physical record of data, where the data block can be understood as a group or a group of records continuously arranged in order, which is the main a unit of data transmitted between the memory and the input, output device or external memory;
这里的文件的各个数据块的地址的偏移量递增后的数据块的指纹数据, 到的数据块的指纹数据。 Here, the offset of the address of each data block of the file is incremented by the fingerprint data of the data block, Fingerprint data of the data block.
第一发送模块 602: 用于将待备份文件的各个数据块的指纹数据, 以及待 备份文件的各个数据块的地址的偏移量递增后的数据块的指纹数据发送给 服务器, 用于与服务器保存的文件的各个数据块的指纹数据, 以及保存的 文件的各个数据块的地址的偏移量递增后的数据块的指纹数据进行比对; 具体地, 客户端将计算得出的各个数据块的指纹数据, 及文件的各个 数据块的地址的偏移量递增后的数据块的指纹数据发送给服务器, 用于服 务器依次将其与服务器保存的文件的各个数据块的指纹数据, 以及保存的 文件的各个数据块的地址的偏移量递增后的数据块的指纹数据进行对比; 这里的服务器保存的文件的各个数据块的指纹数据, 是指将首次备份 时客户端发送给服务器的元数据按照给定的长度进行划分的数据块的指纹 数据;  The first sending module 602 is configured to send the fingerprint data of each data block of the file to be backed up and the fingerprint data of the data block with the offset of the address of each data block of the file to be backed up to the server, and the server The fingerprint data of each data block of the saved file, and the fingerprint data of the data block with the offset of the address of each data block of the saved file are compared; specifically, each data block that the client will calculate Fingerprint data, and the offset of the address of each data block of the file is incremented, and the fingerprint data of the data block is sent to the server, and the fingerprint data of each data block of the file that is sequentially saved by the server and the server, and the saved data. The fingerprint data of the data block after the offset of the address of each data block of the file is incremented; the fingerprint data of each data block of the file saved by the server herein refers to the metadata sent by the client to the server when the first backup is performed. Fingerprint data of a data block divided according to a given length;
这里的保存的文件的各个数据块的地址的偏移量递增后的数据块的指纹 数据, 是指将服务器之前保存的文件按照给定的长度进行划分的数据块的 地址的偏移量加 1后, 得到的数据块的指纹数据。 第一接收模块 603: 用 于接收服务器发送的比对结果。  Here, the fingerprint data of the data block after the offset of the address of each data block of the saved file is increased by 1 by the offset of the address of the data block divided by the given length of the file saved by the server. After that, the fingerprint data of the obtained data block. The first receiving module 603: is configured to receive a comparison result sent by the server.
第二发送模块 604: 用于根据服务器发送的比对结果, 将指纹数据不一 致的数据块及指针发送给服务器。  The second sending module 604 is configured to send, according to the comparison result sent by the server, the data block and the pointer with inconsistent fingerprint data to the server.
具体地, 对于比对结果相同的数据块, 服务器只需指示客户端将该数 据块的指针发送给服务器即可, 对于比对结果不同的数据块, 服务器需指 示客户端发送该数据块。  Specifically, for the data block with the same comparison result, the server only needs to instruct the client to send the pointer of the data block to the server. For the data block with different comparison results, the server needs to instruct the client to send the data block.
需要说明的是, 上述实施例可以是对一个文件进行备份, 也可以是对 由多个文件组成的一个文件集进行备份, 具体的备份方法类似, 不再赘述。  It should be noted that, in the foregoing embodiment, a file may be backed up, or a file set composed of multiple files may be backed up. The specific backup method is similar, and details are not described herein.
本发明实施例的有益效果在于, 当客户端待备份文件发生变化需要再 次备份时, 将该待备份文件的数据块的指纹数据, 以及待备份文件的数据 块的地址的偏移量递增后的数据块的指纹数据与服务器端保存的文件的数 据块的指纹数据, 以及文件的数据块的地址的偏移量递增的数据块的指纹 数据进行比对, 然后将指纹数据发生变化的数据块所对应的数据发送给服 务器。 因此可以在保证服务器端备份文件唯一存储的前提下, 有效地减少 服务器端数据存储, 进一步提高重复数据删除率。 The beneficial effects of the embodiment of the present invention are: when the client to be backed up file needs to be backed up again, the fingerprint data of the data block of the file to be backed up, and the data of the file to be backed up The fingerprint data of the data block after the offset of the block is incremented is compared with the fingerprint data of the data block of the file saved by the server, and the fingerprint data of the data block with the offset of the address of the data block of the file. Then, the data corresponding to the data block whose fingerprint data has changed is sent to the server. Therefore, under the premise of ensuring the unique storage of the server-side backup file, the server-side data storage can be effectively reduced, and the deduplication ratio can be further improved.
在上述实施例中还可以包括: 第二计算模块 605、 第三发送模块 606和 第二接收模块 607;  The foregoing embodiment may further include: a second calculating module 605, a third sending module 606, and a second receiving module 607;
第二计算模块 605 , 用于计算待备份文件的指纹数据;  a second calculating module 605, configured to calculate fingerprint data of the file to be backed up;
第三发送模块 606, 用于将该待备份文件的指纹数据发送给服务器, 用 于将该待备份文件的指纹数据与服务器保存的文件的指纹数据进行比对; 具体地, 客户端将计算获得的待备份文件的指纹数据发送给服务器, 服务器在文件的首次备份时已通过计算获得该文件的指纹数据, 在接收到 客户端发送的指纹数据后与其在先保存的指纹数据进行比较, 若指纹数据 相同, 则说明文件没有发生变化, 若指纹数据不同, 则说明文件已经发生 变化。  The third sending module 606 is configured to send the fingerprint data of the file to be backed up to the server, and compare the fingerprint data of the file to be backed up with the fingerprint data of the file saved by the server; specifically, the client obtains the calculation The fingerprint data of the file to be backed up is sent to the server. The server obtains the fingerprint data of the file through the calculation when the file is first backed up, and compares with the fingerprint data saved before receiving the fingerprint data sent by the client, if the fingerprint If the data is the same, the file has not changed. If the fingerprint data is different, the file has changed.
第二接收模块 607, 用于接收服务器的比对结果。  The second receiving module 607 is configured to receive a comparison result of the server.
本实施例与上一实施例的不同之处在于, 在经过一段时间 t以后, 如果 不确定原文件是否发生变化, 可以将当前文件的指纹数据与原文件的指纹 数据进行比对, 以确定原文件是否发生变化, 若没有变化, 则客户端将该 文件的指针发送给服务器即可, 不必进行后面的操作。 参见图六, 是本发明一个实施方式所公开的服务器;  The difference between this embodiment and the previous embodiment is that after a period of time t, if it is not determined whether the original file changes, the fingerprint data of the current file can be compared with the fingerprint data of the original file to determine the original. Whether the file has changed, if there is no change, the client sends the pointer of the file to the server, and does not need to perform the following operations. Referring to FIG. 6, it is a server disclosed in an embodiment of the present invention;
第三接收模块 701 : 用于接收客户端发送的待备份文件的各个数据块的 指纹数据, 以及所述待备份文件的各个数据块的地址的偏移量递增后的数 据块的指纹数据;  The third receiving module 701 is configured to: receive fingerprint data of each data block of the file to be backed up sent by the client, and fingerprint data of the data block with the offset of the address of each data block of the file to be backed up;
具体地,指纹数据是指釆用 SHA-1 , MD-5等类似的算法将待备份的待 备份文件的各个数据块各自生成一个 32位或者 128位哈希值与数据形成一 一对应关系, 由此生成的指纹数据是该待备份文件数据的唯一标识; Specifically, the fingerprint data refers to a similar algorithm to be backed up by SHA-1, MD-5, and the like. Each of the data blocks of the backup file generates a 32-bit or 128-bit hash value and forms a one-to-one correspondence with the data, and the fingerprint data thus generated is a unique identifier of the file data to be backed up;
在本发明实施例中将待备份文件按照给定长度划分为数据块, 数据块是 数据的物理记录, 这里的数据块可以理解为一组或按顺序连续排列在一起 的几组记录, 是主存储器与输入、 输出设备或外存储器之间进行传输的一 个数据单位;  In the embodiment of the present invention, the file to be backed up is divided into data blocks according to a given length, and the data block is a physical record of data, where the data block can be understood as a group or a group of records continuously arranged in order, which is the main a unit of data transmitted between the memory and the input, output device or external memory;
这里的文件的各个数据块的地址的偏移量递增后的数据块的指纹数据, 到的数据块的指纹数据。 第一比对模块 702, 用于将待备份文件的各个数据 块的指纹数据, 以及待备份文件的各个数据块的地址的偏移量递增后的数 据块的指纹数据与保存的文件的各个数据块的指纹数据, 以及保存的文件 的各个数据块的地址的偏移量递增后的数据块的指纹数据进行比对;  Here, the offset of the address of each data block of the file is incremented by the fingerprint data of the data block, and the fingerprint data of the data block. The first comparison module 702 is configured to: the fingerprint data of each data block of the file to be backed up, and the fingerprint data of the data block after the offset of the address of each data block of the file to be backed up and the data of the saved file The fingerprint data of the block and the fingerprint data of the data block with the offset of the address of each data block of the saved file are compared;
这里的服务器保存的文件的各个数据块的指纹数据, 是指将首次备份 时客户端发送给服务器的元数据按照给定的长度进行划分的数据块的指纹 数据;  Here, the fingerprint data of each data block of the file saved by the server refers to the fingerprint data of the data block divided by the given length when the metadata sent by the client to the server is first backed up;
这里的保存的文件的各个数据块的偏移量递增后的数据块的指纹数据, 后, 得到的数据块的指纹数据;  Here, the offset of each data block of the saved file is incremented by the fingerprint data of the data block, and then the fingerprint data of the obtained data block;
需要说明的是, 服务器比对指纹数据可以釆用 rsync滚动校验算法, 依 次将客户端发送的数据块的指纹数据, 以及地址的偏移量加 1 的指纹数据 与服务器保存的文件的各个数据块的指纹数据, 以及保存的文件的各个数 据块的地址的偏移量递增后的数据块的指纹数据进行比对, 同时需要指出 的是, 本步骤的目的在于进行指纹数据的对比, 为了方便说明而以 rsync算 法进行举例, 但显然的, 除了 rsync算法外, 本领域技术人员可以根据实际 情况选用其他算法;  It should be noted that the server can use the rsync rolling check algorithm to compare the fingerprint data, and sequentially add the fingerprint data of the data block sent by the client, and the fingerprint data of the address offset by 1 to the data of the file saved by the server. The fingerprint data of the block and the fingerprint data of the data block with the offset of the address of each data block of the saved file are compared, and it is pointed out that the purpose of this step is to compare the fingerprint data for convenience. The description is based on the rsync algorithm, but it is obvious that in addition to the rsync algorithm, those skilled in the art may select other algorithms according to actual conditions;
例如: 以定长为 2的数据块为例; 第一次, 将客户端发送的第一个数据 块的指纹数据与服务器保存的文件的第一个数据块的指纹数据进行比对, 如果不相同, 则将其与服务器保存的文件的第二个数据块的指纹数据进行 比对, 如果不相同, 则依次比对, 直至服务器保存的文件的最后一个数据 块的指纹数据, 如果仍然不相同, 则将其与服务器保存的文件的第一个数 据块的地址的偏移量加 1 的指纹数据进行比对, 如果不相同, 则将其与月良 务器保存的文件的第二个数据块的地址的偏移量加 1的指纹数据进行比对, 如果仍然找不到匹配的指纹数据, 则可以通知客户端发送该数据块; 第二 次, 将客户端发送的第一个数据块的地址的偏移量加 1 的数据块的指纹数 据按照第一次的步骤, 依次与服务器保存的文件的数据块, 及偏移量加 1 的数据块进行比对。 For example: Take a data block with a fixed length of 2 as an example; the first time, the first data sent by the client The fingerprint data of the block is compared with the fingerprint data of the first data block of the file saved by the server. If not, compare it with the fingerprint data of the second data block of the file saved by the server, if not the same , then compare, until the fingerprint data of the last data block of the file saved by the server, if still not the same, then add 1 fingerprint data to the offset of the address of the first data block of the file saved by the server. For comparison, if it is not the same, compare it with the fingerprint data of the address of the second data block of the file saved by the server, and if the fingerprint data is still not found, Then, the client can be notified to send the data block; secondly, the fingerprint data of the data block of the first data block sent by the client is increased by 1 according to the first step, and sequentially saved with the server. The data block of the file is compared with the data block with the offset plus one.
需要说明的是, 在本发明实施例中除了可以将文件的数据按照长度为 2 来划分数据块以外, 还可以按照长度为 3、 定长为 4等来划分数据块;  It should be noted that, in the embodiment of the present invention, in addition to dividing the data of the file according to the length of 2, the data block may be divided according to the length of 3, the fixed length of 4, and the like;
若以长度为 3来划分数据块, 则当将客户端发送的数据块的指纹数据与 服务器端的各个数据块的指纹数据进行比较时, 不但要依次比较数据块的 地址的偏移量加 1 的数据块的指纹数据, 还需要依次比较数据块的地址的 偏移量加 2的数据块的指纹数据;  If the data block is divided by a length of 3, when the fingerprint data of the data block sent by the client is compared with the fingerprint data of each data block of the server side, not only the offset of the address of the data block is sequentially added plus one. The fingerprint data of the data block also needs to compare the offset of the address of the data block and the fingerprint data of the data block of 2 in sequence;
若以长度为 4来划分数据块, 则当将客户端发送的数据块的指纹数据与 服务器端的各个数据块的指纹数据进行比较时, 不但要依次比较数据块的 地址的偏移量加 1 的数据块的指纹数据, 还需要依次比较数据块的地址的 偏移量加 2的数据块的指纹数据, 以及数据块的地址的偏移量加 3的数据 块的指纹数据;  If the data block is divided by a length of 4, when the fingerprint data of the data block sent by the client is compared with the fingerprint data of each data block of the server side, not only the offset of the address of the data block is sequentially added plus one. The fingerprint data of the data block also needs to compare the offset of the address of the data block and the fingerprint data of the data block of 2, and the offset of the address of the data block plus the fingerprint data of the data block of 3;
以此类推, 若以长度为 N来划分数据块, N为大于 2的自然数, 待备份 文件的各个数据块的地址的偏移量递增后的数据块的指纹数据, 是该待备 份文件的各个数据块的地址的偏移量从加 1 , 依次递增至加 N - 1后的数据 块的指纹数据。  By analogy, if the data block is divided by the length N, N is a natural number greater than 2, and the fingerprint data of the data block after the offset of the address of each data block of the file to be backed up is the file to be backed up. The offset of the address of the data block is incremented by 1 and sequentially incremented to the fingerprint data of the data block after N-1 is added.
同样地, 客户端的各个数据块的地址的偏移量递增的数据块, 也是从偏 移量加 1递增至偏移量加 N - 1的数据块。 Similarly, the data block with the offset of the address of each data block of the client is also biased. The shift is incremented by one to the data block with the offset plus N - 1.
第四发送模块 703 , 用于将比对结果发送给客户端, 并接收客户端发送 的指纹数据不一致的数据块及指针服务器通过第一比对模块 702 比对得出 发生变化的数据块, 由此可以将比对结果发送给客户端, 指示客户端将指 纹数据不一致的数据块及指针发送给服务器。 本发明实施例的有益效果在于, 当客户端待备份文件发生变化需要再次 备份时, 将该待备份文件的数据块的指纹数据, 以及待备份文件的数据块 的地址的偏移量递增后的数据块的指纹数据与服务器端保存的文件的数据 块的指纹数据, 以及文件的数据块的地址的偏移量递增的数据块的指纹数 据进行比对, 然后将指纹数据发生变化的数据块所对应的数据发送给服务 器。 因此可以在保证服务器端备份文件唯一存储的前提下, 有效地减少服 务器端数据存储, 进一步提高重复数据删除率。  The fourth sending module 703 is configured to send the comparison result to the client, and receive a data block in which the fingerprint data sent by the client is inconsistent, and the pointer server compares the first comparison module 702 to obtain the changed data block, This can send the comparison result to the client, instructing the client to send the data block and pointer with inconsistent fingerprint data to the server. The beneficial effects of the embodiment of the present invention are: when the backup file of the client needs to be backed up again, the fingerprint data of the data block of the file to be backed up and the offset of the address of the data block of the file to be backed up are incremented. The fingerprint data of the data block is compared with the fingerprint data of the data block of the file saved by the server, and the fingerprint data of the data block whose offset of the address of the data block of the file is increased, and then the data block of the fingerprint data is changed. The corresponding data is sent to the server. Therefore, under the premise of ensuring the unique storage of the server-side backup files, the server-side data storage can be effectively reduced, and the deduplication ratio can be further improved.
在上述实施例中, 还可以包括第四接收模块 705和第二比对模块 706; 第四接收模块 705 , 用于接收客户端发送的待备份文件的指纹数据; 第二比对模块 706, 用于将待备份文件的指纹数据与保存的文件的指纹 数据进行比对, 并将比对结果发送给客户端, 用于客户端判断所述待备份 文件是否发生变化。  In the above embodiment, the fourth receiving module 705 and the second comparing module 706 are further included; the fourth receiving module 705 is configured to receive fingerprint data of the file to be backed up sent by the client; The fingerprint data of the file to be backed up is compared with the fingerprint data of the saved file, and the comparison result is sent to the client, and the client determines whether the file to be backed up changes.
本实施例与上一实施例的不同之处在于, 在经过一段时间 t以后, 如果 不确定原文件是否发生变化, 可以将当前文件的指纹数据与原文件的指纹 数据进行比对, 以确定原文件是否发生变化, 若没有变化, 则客户端将该 文件的指针发送给服务器即可, 不必进行后面的操作。  The difference between this embodiment and the previous embodiment is that after a period of time t, if it is not determined whether the original file changes, the fingerprint data of the current file can be compared with the fingerprint data of the original file to determine the original. Whether the file has changed, if there is no change, the client sends the pointer of the file to the server, and does not need to perform the following operations.
进一步地, 在上述实施例中, 还可以包括第三计算模块 704;  Further, in the above embodiment, the third calculating module 704 may be further included;
第三计算模块, 可以用于计算并保存的文件的各个数据块的指纹数据, 以及保存的文件的各个数据块的地址的偏移量递增后的数据块的指纹数 据。 以上所述仅为本发明的优选实施例, 并非因此限制本发明的专利范围 , 凡是利用本发明说明书及附图内容所作的等效结构或等效流程变换, 或直 接或间接运用在其他相关的技术领域, 均同理包括在本发明的专利保护范 围内。 The third calculation module is configured to calculate and save the fingerprint data of each data block of the file, and the fingerprint data of the data block with the offset of the address of each data block of the saved file. The above description is only the preferred embodiment of the present invention, and is not intended to limit the scope of the invention, and the equivalent structure or equivalent flow transformation made by the specification and the drawings of the present invention may be directly or indirectly applied to other related The technical field is equally included in the scope of patent protection of the present invention.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分步 骤是可以通过程序来指令相关的硬件完成, 上述的程序可以存储于一种计 算机可读存储介质中, 上述提到的存储介质可以是只读存储器, 磁盘或光 盘等。  A person skilled in the art can understand that all or part of the steps of implementing the above embodiments can be completed by a program to instruct related hardware, and the above program can be stored in a computer readable storage medium, the above mentioned storage medium. It can be a read-only memory, a disk or a disc, and the like.
以上对本发明所提供的一种能耗确定方法及装置进行了详细介绍, 对 于本领域的一般技术人员, 依据本发明实施例的思想, 在具体实施方式及 应用范围上均会有改变之处, 因此, 本说明书内容不应理解为对本发明的 限制。  The method and device for determining the energy consumption provided by the present invention are described in detail above. For those skilled in the art, according to the idea of the embodiment of the present invention, there are changes in the specific implementation manner and application scope. Therefore, the content of the specification should not be construed as limiting the invention.

Claims

权利要求 Rights request
1. 一种数据处理方法, 其特征在于, 包括: A data processing method, comprising:
计算待备份文件的各个数据块的指纹数据, 以及所述待备份文件的各个 数据块的地址的偏移量递增后的数据块的指纹数据;  Calculating fingerprint data of each data block of the file to be backed up, and fingerprint data of the data block after the offset of the address of each data block of the file to be backed up is increased;
将所述待备份文件的各个数据块的指纹数据, 以及所述待备份文件的各 个数据块的地址的偏移量递增后的数据块的指纹数据发送给服务器, 用于 与服务器保存的文件的各个数据块的指纹数据, 以及保存的文件的各个数 据块的地址的偏移量递增后的数据块的指纹数据进行比对;  Transmitting the fingerprint data of each data block of the file to be backed up and the fingerprint data of the data block with the offset of the address of each data block of the file to be backed up to the server for use with the file saved by the server The fingerprint data of each data block and the fingerprint data of the data block with the offset of the address of each data block of the saved file are compared;
根据服务器发送的比对结果, 将指纹数据不一致的数据块及所述指纹数 据不一致的数据块的指针发送给服务器。  According to the comparison result sent by the server, the data block in which the fingerprint data is inconsistent and the pointer of the data block in which the fingerprint data is inconsistent are sent to the server.
2. 根据权利要求 1所述的方法, 其特征在于, 还包括:  2. The method according to claim 1, further comprising:
计算所述待备份文件的指乡丈数据;  Calculating the data of the document to be backed up;
将所述待备份文件的指纹数据发送给服务器, 用于与服务器保存的文件 的指纹数据进行比对;  Sending the fingerprint data of the file to be backed up to the server for comparison with the fingerprint data of the file saved by the server;
接收服务器发送的比对结果, 当所述待备份文件发生变化时, 执行步骤 计算待备份文件的各个数据块的指纹数据。  Receiving the comparison result sent by the server, when the file to be backed up changes, performing steps to calculate fingerprint data of each data block of the file to be backed up.
3. 根据权利要求 1所述的方法, 其特征在于, 当所述待备份文件的各个 数据块的长度为 2个字节时, 所述待备份文件的各个数据块的地址的偏移 量递增后的数据块的指纹数据, 包括所述待备份文件的各个数据块的地址 的偏移量加 1后的数据块的指纹数据;  The method according to claim 1, wherein when the length of each data block of the file to be backed up is 2 bytes, the offset of the address of each data block of the file to be backed up is incremented Fingerprint data of the subsequent data block, including fingerprint data of the data block after the offset of the address of each data block of the file to be backed up is increased;
当所述待备份文件的各个数据块的长度为 N个字节时, N为大于 2的自 然数, 所述待备份文件的各个数据块的地址的偏移量递增后的数据块的指 纹数据, 包括所述待备份文件的各个数据块的地址的偏移量从加 1 ,依次递 增至加 N - 1后的数据块的指纹数据。  When the length of each data block of the file to be backed up is N bytes, N is a natural number greater than 2, and the fingerprint data of the data block after the offset of the address of each data block of the file to be backed up is increased, The offset of the address of each data block including the file to be backed up is incremented by 1 and sequentially incremented to the fingerprint data of the data block after adding N-1.
4. 一种数据处理方法, 其特征在于, 包括: 接收客户端发送的待备份文件的各个数据块的指纹数据, 以及所述待备 份文件的各个数据块的地址的偏移量递增后的数据块的指纹数据; A data processing method, comprising: Receiving fingerprint data of each data block of the file to be backed up sent by the client, and fingerprint data of the data block with an offset of the address of each data block of the file to be backed up;
将所述待备份文件的各个数据块的指纹数据, 以及所述待备份文件的各 个数据块的地址的偏移量递增后的数据块的指纹数据与保存的文件的各个 数据块的指纹数据, 以及保存的文件的各个数据块的地址的偏移量递增后 的数据块的指纹数据进行比对;  Fingerprint data of each data block of the file to be backed up, and fingerprint data of the data block after incrementing the offset of the address of each data block of the file to be backed up, and fingerprint data of each data block of the saved file, And comparing the fingerprint data of the data block with the offset of the address of each data block of the saved file;
将比对结果发送给客户端, 并接收客户端发送的指纹数据不一致的数据 块及所述指纹数据不一致的数据块的指针。  The comparison result is sent to the client, and a data block in which the fingerprint data sent by the client is inconsistent and a pointer of the data block in which the fingerprint data is inconsistent are received.
5. 根据权利要求 4所述的方法, 其特征在于, 还包括:  5. The method according to claim 4, further comprising:
接收客户端发送的待备份文件的指纹数据;  Receiving fingerprint data of the file to be backed up sent by the client;
将所述待备份文件的指纹数据与保存的文件的指纹数据进行比对, 并将 比对结果发送给客户端, 用于当客户端判断出所述待 ^分文件发生变化时, 执行步骤接收客户端发送的待备份文件的各个数据块的指纹数据。  Comparing the fingerprint data of the file to be backed up with the fingerprint data of the saved file, and sending the comparison result to the client, when the client determines that the file to be changed changes, performing step receiving Fingerprint data of each data block of the file to be backed up sent by the client.
6. 根据权利要求 4所述的方法, 其特征在于, 还包括:  The method according to claim 4, further comprising:
计算并保存所述保存的文件的各个数据块的指纹数据, 以及保存的文件 的各个数据块的地址的偏移量递增后的数据块的指纹数据。  The fingerprint data of each data block of the saved file is calculated and saved, and the fingerprint data of the data block in which the offset of the address of each data block of the saved file is incremented.
7. 根据权利要求 4所述的方法, 其特征在于, 当所述待备份文件的各个 数据块的长度为 2个字节时, 所述待备份文件的各个数据块的地址的偏移 量递增后的数据块的指纹数据, 包括所述待备份文件的各个数据块的地址 的偏移量加 1后的数据块的指纹数据;  The method according to claim 4, wherein when the length of each data block of the file to be backed up is 2 bytes, the offset of the address of each data block of the file to be backed up is incremented Fingerprint data of the subsequent data block, including fingerprint data of the data block after the offset of the address of each data block of the file to be backed up is increased;
当所述待备份文件的各个数据块的长度为 N个字节时, N为大于 2的自 然数, 所述待备份文件的各个数据块的地址的偏移量递增后的数据块的指 纹数据, 包括所述待备份文件的各个数据块的地址的偏移量从加 1 ,依次递 增至加 N - 1后的数据块的指纹数据。  When the length of each data block of the file to be backed up is N bytes, N is a natural number greater than 2, and the fingerprint data of the data block after the offset of the address of each data block of the file to be backed up is increased, The offset of the address of each data block including the file to be backed up is incremented by 1 and sequentially incremented to the fingerprint data of the data block after adding N-1.
8. 一种客户端, 其特征在于, 包括:  8. A client, comprising:
第一计算模块, 用于计算待备份文件的各个数据块的指纹数据, 以及所 述待备份文件的各个数据块的地址的偏移量递增后的数据块的指纹数据; 第一发送模块, 用于将所述待备份文件的各个数据块的指纹数据, 以及 所述待备份文件的各个数据块的地址的偏移量递增后的数据块的指纹数据 发送给服务器, 用于与服务器保存的文件的各个数据块的指纹数据, 以及 保存的文件的各个数据块的地址的偏移量递增后的数据块的指纹数据进行 比对; a first calculation module, configured to calculate fingerprint data of each data block of the file to be backed up, and The fingerprint data of the data block in which the offset of the address of each data block of the backup file is incremented; the first sending module, the fingerprint data of each data block of the file to be backed up, and the file to be backed up The fingerprint data of the data block after the offset of the address of each data block is incremented is sent to the server, and the fingerprint data of each data block of the file saved with the server, and the offset of the address of each data block of the saved file. The fingerprint data of the data block after the increment is compared;
第一接收模块, 用于接收服务器发送的比对结果;  a first receiving module, configured to receive a comparison result sent by the server;
第二发送模块, 用于根据服务器发送的比对结果, 将指纹数据不一致的 数据块及所述指纹数据不一致的数据块的指针发送给服务器。  And a second sending module, configured to send, according to the comparison result sent by the server, a data block in which the fingerprint data is inconsistent and a pointer of the data block in which the fingerprint data is inconsistent to the server.
9. 根据权利要求 8所述的客户端, 其特征在于, 还包括:  9. The client according to claim 8, further comprising:
第二计算模块, 用于计算所述待备份文件的指纹数据;  a second calculating module, configured to calculate fingerprint data of the file to be backed up;
第三发送模块, 用于将所述待备份文件的指纹数据发送给服务器, 用于 与服务器保存的文件的指纹数据进行比对;  a third sending module, configured to send fingerprint data of the file to be backed up to a server, and compare the fingerprint data of the file saved by the server;
第二接收模块, 用于接收服务器发送的比对结果, 当所述待备份文件发 生变化时启动第一计算模块。  And a second receiving module, configured to receive a comparison result sent by the server, and start the first computing module when the file to be backed up changes.
10. 一种服务器, 其特征在于, 包括:  10. A server, comprising:
第三接收模块, 用于接收客户端发送的待备份文件的各个数据块的指纹 数据, 以及所述待备份文件的各个数据块的地址的偏移量递增后的数据块 的指纹数据;  a third receiving module, configured to receive fingerprint data of each data block of the file to be backed up sent by the client, and fingerprint data of the data block with an offset of the address of each data block of the file to be backed up;
第一比对模块, 用于将所述待备份文件的各个数据块的指纹数据, 以及 所述待备份文件的各个数据块的地址的偏移量递增后的数据块的指纹数据 与保存的文件的各个数据块的指纹数据, 以及保存的文件的各个数据块的 地址的偏移量递增后的数据块的指纹数据进行比对;  a first comparison module, configured to: fingerprint data of each data block of the file to be backed up, and fingerprint data of the data block with the offset of the address of each data block of the file to be backed up and the saved file The fingerprint data of each data block and the fingerprint data of the data block with the offset of the address of each data block of the saved file are compared;
第四发送模块, 用于将比对结果发送给客户端, 并接收客户端发送的指 纹数据不一致的数据块及所述指纹数据不一致的数据块的指针。  And a fourth sending module, configured to send the comparison result to the client, and receive a data block inconsistent with the fingerprint data sent by the client and a pointer of the data block in which the fingerprint data is inconsistent.
11. 根据权利要求 10所述的服务器, 其特征在于, 还包括: 第四接收模块, 用于接收客户端发送的待备份文件的指纹数据; 第二比对模块, 用于将所述待备份文件的指纹数据与保存的文件的指纹 数据进行比对, 并将比对结果发送给客户端, 用于当客户端判断所述待备 份文件发生变化时启动第三接收模块。 The server according to claim 10, further comprising: a fourth receiving module, configured to receive fingerprint data of the file to be backed up sent by the client, and a second comparison module, configured to compare the fingerprint data of the file to be backed up with the fingerprint data of the saved file, and compare The result is sent to the client, and is used to start the third receiving module when the client determines that the file to be backed up changes.
12. 根据权利要求 10所述的服务器, 其特征在于, 还包括:  The server according to claim 10, further comprising:
第三计算模块, 用于计算并保存所述保存的文件的各个数据块的指纹数 据, 以及保存的文件的各个数据块的地址的偏移量递增后的数据块的指纹 数据。  And a third calculating module, configured to calculate and save fingerprint data of each data block of the saved file, and fingerprint data of the data block with an offset of an address of each data block of the saved file.
PCT/CN2012/075411 2011-05-25 2012-05-12 Data processing method and device WO2012159532A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201110136079.7 2011-05-25
CN2011101360797A CN102202098A (en) 2011-05-25 2011-05-25 Data processing method and device

Publications (1)

Publication Number Publication Date
WO2012159532A1 true WO2012159532A1 (en) 2012-11-29

Family

ID=44662488

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2012/075411 WO2012159532A1 (en) 2011-05-25 2012-05-12 Data processing method and device

Country Status (2)

Country Link
CN (1) CN102202098A (en)
WO (1) WO2012159532A1 (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102202098A (en) * 2011-05-25 2011-09-28 成都市华为赛门铁克科技有限公司 Data processing method and device
US20130311433A1 (en) * 2012-05-17 2013-11-21 Akamai Technologies, Inc. Stream-based data deduplication in a multi-tenant shared infrastructure using asynchronous data dictionaries
US9451000B2 (en) 2012-12-27 2016-09-20 Akamai Technologies, Inc. Stream-based data deduplication with cache synchronization
CN104063377B (en) * 2013-03-18 2017-06-27 联想(北京)有限公司 Information processing method and use its electronic equipment
CN103942124A (en) * 2014-04-24 2014-07-23 深圳市中博科创信息技术有限公司 Method and device for data backup
CN106990914B (en) * 2017-02-17 2020-06-12 北京同有飞骥科技股份有限公司 Data deleting method and device
CN110659250B (en) * 2018-06-13 2022-02-22 中国电信股份有限公司 File processing method and system
CN113239001A (en) * 2021-05-21 2021-08-10 珠海金山网络游戏科技有限公司 Data storage method and device
CN115623016A (en) * 2022-09-20 2023-01-17 浪潮云信息技术股份公司 Backup breakpoint continuous transmission implementation method and system based on cloud storage technology

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101216791A (en) * 2008-01-04 2008-07-09 华中科技大学 File backup method based on fingerprint
US20100205163A1 (en) * 2009-02-10 2010-08-12 Kave Eshghi System and method for segmenting a data stream
CN101989929A (en) * 2010-11-17 2011-03-23 中兴通讯股份有限公司 Disaster recovery data backup method and system
CN102202098A (en) * 2011-05-25 2011-09-28 成都市华为赛门铁克科技有限公司 Data processing method and device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8620877B2 (en) * 2008-04-30 2013-12-31 International Business Machines Corporation Tunable data fingerprinting for optimizing data deduplication
CN101290628B (en) * 2008-06-17 2010-06-16 中兴通讯股份有限公司 Data file updating storage method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101216791A (en) * 2008-01-04 2008-07-09 华中科技大学 File backup method based on fingerprint
US20100205163A1 (en) * 2009-02-10 2010-08-12 Kave Eshghi System and method for segmenting a data stream
CN101989929A (en) * 2010-11-17 2011-03-23 中兴通讯股份有限公司 Disaster recovery data backup method and system
CN102202098A (en) * 2011-05-25 2011-09-28 成都市华为赛门铁克科技有限公司 Data processing method and device

Also Published As

Publication number Publication date
CN102202098A (en) 2011-09-28

Similar Documents

Publication Publication Date Title
WO2012159532A1 (en) Data processing method and device
US11416452B2 (en) Determining chunk boundaries for deduplication of storage objects
US11386120B2 (en) Data syncing in a distributed system
US9606869B2 (en) Retrieving data segments from a dispersed storage network
US8577850B1 (en) Techniques for global data deduplication
US10256978B2 (en) Content-based encryption keys
JP5204099B2 (en) Group-based full and incremental computer file backup systems, processing and equipment
Batten et al. pStore: A secure peer-to-peer backup system
US8285957B1 (en) System and method for preprocessing a data set to improve deduplication
US20130144840A1 (en) Optimizing restores of deduplicated data
US10416915B2 (en) Assisting data deduplication through in-memory computation
US7844652B2 (en) Efficient computation of sketches
WO2021068351A1 (en) Cloud-storage-based data transmission method and apparatus, and computer device
CN110651246B (en) Data reading and writing method and device and storage server
JP2009536418A5 (en)
WO2017215646A1 (en) Data transmission method and apparatus
US20210096757A1 (en) Data deduplication with collision resistant hash digest processes
Sun et al. Data backup and recovery based on data de-duplication
WO2017097106A1 (en) Method and apparatus for transmitting file difference
CN107391761B (en) Data management method and device based on repeated data deletion technology
WO2015055035A1 (en) Method and device for hashing metadata object
US20170124107A1 (en) Data deduplication storage system and process
JP2017142605A (en) Backup restoration system and restoration method
US20180246666A1 (en) Methods for performing data deduplication on data blocks at granularity level and devices thereof
Tian et al. Sed‐Dedup: An efficient secure deduplication system with data modifications

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12789775

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12789775

Country of ref document: EP

Kind code of ref document: A1