CN115357429A - Method and device for recovering data file and client - Google Patents

Method and device for recovering data file and client Download PDF

Info

Publication number
CN115357429A
CN115357429A CN202210995696.0A CN202210995696A CN115357429A CN 115357429 A CN115357429 A CN 115357429A CN 202210995696 A CN202210995696 A CN 202210995696A CN 115357429 A CN115357429 A CN 115357429A
Authority
CN
China
Prior art keywords
fingerprint
backup
data
file
data block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210995696.0A
Other languages
Chinese (zh)
Other versions
CN115357429B (en
Inventor
杨海锋
马立珂
王子骏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Dingjia Computer Technology Co ltd
Original Assignee
Guangzhou Dingjia Computer Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Dingjia Computer Technology Co ltd filed Critical Guangzhou Dingjia Computer Technology Co ltd
Priority to CN202210995696.0A priority Critical patent/CN115357429B/en
Publication of CN115357429A publication Critical patent/CN115357429A/en
Application granted granted Critical
Publication of CN115357429B publication Critical patent/CN115357429B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to the technical field of backup processing, and provides a method, a device and equipment for recovering data files, which realize that only data which is differentiated is recovered. The method mainly comprises the following steps: if the data in the data block is changed in the next backup, the fingerprint version of the data block in the fingerprint file of the next backup is the backup version corresponding to the next backup, and if the data block is not changed in the next backup, the fingerprint version of the data block in the fingerprint file of the next backup is the backup version corresponding to the last backup; if the fingerprints of the data block in the first fingerprint file and the second fingerprint file are inconsistent or the fingerprint versions of the data block are inconsistent, taking the data block as a dissimilatory data block; and according to the data of the differentiated data block on the backup corresponding to the target backup version, recovering the data file at the current moment.

Description

Method and device for recovering data file and client
Technical Field
The present application relates to the field of backup processing technologies, and in particular, to a method and an apparatus for recovering a data file, and a client.
Background
The data of the database can be stored in a data file form, when the database uses the data file to store data, the data file can be divided into pieces to obtain corresponding data blocks, and the data blocks are used as the minimum logic units to store the corresponding data.
Over time, the data stored by the data file may change or be corrupted, for example, from a first time to a second time; in order to obtain data before change or uncorrupted data, techniques for restoring data files have emerged. In the technology of restoring data files, only changed or damaged data can be restored, thereby realizing rapid restoration of data files. However, in the case of fast recovery, there is no specific method available at present to accurately find the data before change or the data without damage.
Disclosure of Invention
In view of the above, it is necessary to provide a method, an apparatus, a client, a computer-readable storage medium, and a computer program product for recovering a data file in response to the above technical problem.
The application provides a method for recovering a data file, which comprises the following steps:
after the data files are backed up for multiple times, determining the fingerprint files backed up for each time; the fingerprint file of each backup comprises a backup version corresponding to the backup, a fingerprint of each data block of the data file and a fingerprint version; compared with the last backup, if the data in the data block is changed in the next backup, the fingerprint version of the data block in the fingerprint file of the next backup is the backup version corresponding to the next backup, and if the data in the data block is not changed in the next backup, the fingerprint version of the data block in the fingerprint file of the next backup is the backup version corresponding to the last backup;
if the backup is not carried out at the time needing to be restored, taking the fingerprint file which is before the time needing to be restored and is the latest backup from the time needing to be restored as a first fingerprint file;
if the current moment is not backed up and the data file is locally stored, obtaining the fingerprint file of the current moment according to the fingerprint file of the latest backup before and at the current moment and taking the fingerprint file of the current moment as a second fingerprint file;
comparing whether the fingerprints of the data blocks in the first fingerprint file and the second fingerprint file are consistent or not and whether the fingerprint versions in the first fingerprint file and the second fingerprint file are consistent or not;
if the fingerprints of the data block in the first fingerprint file and the second fingerprint file are not consistent, or the fingerprint versions of the data block in the first fingerprint file and the second fingerprint file are not consistent, taking the data block as a dissimilarity data block;
according to the data of the dissimilarity data block on the backup corresponding to the target backup version, recovering the data file at the current moment; the target backup version is obtained according to the fingerprint version of the dissimilarity data block in the first fingerprint file.
The present application further provides an apparatus for restoring a data file, the apparatus comprising:
the backup module is used for determining the fingerprint file of each backup after the data file is backed up for multiple times; the fingerprint file of each backup comprises a backup version corresponding to the backup, a fingerprint of each data block of the data file and a fingerprint version; compared with the last backup, if the data in the data block is changed in the next backup, the fingerprint version of the data block in the fingerprint file of the next backup is the backup version corresponding to the next backup, and if the data in the data block is not changed in the next backup, the fingerprint version of the data block in the fingerprint file of the next backup is the backup version corresponding to the last backup;
the first fingerprint file acquisition module is used for taking the fingerprint file which is backed up at the time before the time needing to be restored and is closest to the time needing to be restored as a first fingerprint file if the backup is not carried out at the time needing to be restored;
the second fingerprint file acquisition module is used for obtaining a fingerprint file at the current moment according to the fingerprint file which is backed up at the latest moment before and at the current moment if the current moment is not backed up and the data file is locally stored, and taking the fingerprint file at the current moment as a second fingerprint file;
the fingerprint file comparison module is used for comparing whether the fingerprints of the data blocks in the first fingerprint file and the second fingerprint file are consistent or not and whether the fingerprint versions in the first fingerprint file and the second fingerprint file are consistent or not;
the data block processing module is used for processing the data block into a first fingerprint file and a second fingerprint file, and processing the first fingerprint file and the second fingerprint file into a second fingerprint file;
the data file recovery module is used for recovering the data file at the current moment according to the data of the differentiated data block on the backup corresponding to the target backup version; the target backup version is obtained according to the fingerprint version of the dissimilarity data block in the first fingerprint file.
The application provides a client comprising a memory storing a computer program and a processor executing the above method.
The present application provides a computer-readable storage medium having stored thereon a computer program for execution by a processor of the above-described method.
The present application provides a computer program product having a computer program stored thereon, the computer program being executable by a processor for performing the above-mentioned method.
According to the method, the device, the client, the computer readable storage medium and the computer program product for restoring the data file, after the data file is backed up for multiple times, the fingerprint file of each backup is determined, wherein the fingerprint file of each backup comprises the backup version corresponding to the backup, the fingerprint of each data block of the data file and the fingerprint version, compared with the fingerprint file of the last backup, if the data in the data block is changed in the next backup, the fingerprint version of the data block is the backup version corresponding to the next backup in the fingerprint file of the next backup, and if the data in the data block is not changed in the next backup, the fingerprint version of the data block is the backup version corresponding to the last backup in the fingerprint file of the next backup, so that the determination of the one-to-one correspondence relationship between each data block and the backup version is realized.
When a recovery operation is performed: determining a time needing to be restored, if the time needing to be restored is not backed up, taking a fingerprint file which is backed up at the time before and closest to the time needing to be restored as a first fingerprint file, if the current time is not backed up and the client locally stores the data file, obtaining the fingerprint file at the current time according to the fingerprint file which is backed up at the time before and closest to the current time, taking the fingerprint file at the current time as a second fingerprint file, comparing whether fingerprints of data blocks in the first fingerprint file and the second fingerprint file are consistent or not, and whether fingerprint versions in the first fingerprint file and the second fingerprint file are consistent or not, if the fingerprints of the data blocks in the first fingerprint file and the second fingerprint file are inconsistent or the fingerprint versions in the first fingerprint file and the second fingerprint file are inconsistent, taking the data blocks as heterogenized data blocks, and determining a target version according to the fingerprint version of the heterogenized data blocks in the first fingerprint file, so that the data blocks are accurately obtained or the data blocks are not damaged or the data which are not obtained in the backup before and in the backup of the target version which is inconsistent; and then, according to the found data before the change or without damage, recovering the data file at the current moment to realize the quick recovery of the data file.
Drawings
FIG. 1 is a diagram of an application environment for recovering data files in one embodiment;
FIG. 2 is a schematic flow chart illustrating recovery of a data file in one embodiment;
FIG. 3 is a schematic flow chart illustrating recovery of a data file in one embodiment;
FIG. 4 is a schematic flow chart illustrating recovery of a data file in one embodiment;
FIG. 5 is a schematic flow chart illustrating recovery of a data file in one embodiment;
FIG. 6 is a block diagram of the structure of a recovery data file in one embodiment;
FIG. 7 is a diagram illustrating the internal structure of a client in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more clearly understood, the present application is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
The method for recovering the data file provided by the application can be applied to the application environment shown in fig. 1.
As shown in fig. 1, the distributed database (which may be but is not limited to a TDSQL-PG database) includes a plurality of database nodes, each database node has a corresponding client, and a database node and its corresponding client are regarded as a recovery node; for example, GTM and client a are considered to be recovery node 1. The backup management system comprises a backup server, and the backup storage system comprises a storage server.
When the backup process is started, the backup management system sends an instruction to each client, and after receiving the instruction, the client backs up the data file of the corresponding database node, for example, the client a backs up the data file of the GTM, and the client b backs up the data file of the CN. After the data files are backed up for multiple times, the related data obtained by each backup are stored in a backup storage system and are merged through a backup merging unit.
When the recovery process is started, the backup management system sends an instruction to the client, the client generates the dissimilarity fingerprint data after receiving the instruction, the dissimilarity fingerprint data are transmitted to the backup storage system, dissimilarity data streams are generated in the backup storage server and are returned to the client, and the client completes the recovery of the data files by receiving and analyzing the dissimilarity data streams.
In one embodiment, as shown in FIG. 2, there is provided a method of restoring a data file, in which the method is performed by a client, comprising the steps of:
step S201, after the data file is backed up for multiple times, the fingerprint file of each backup is determined.
The multiple backup is to perform multiple incremental backups on the basis of full backup, a corresponding fingerprint file is generated in each backup, and the fingerprint file of each backup comprises a backup version corresponding to the backup, fingerprints of each data block of the data file and fingerprint versions; compared with the last backup, if the data in the data block is changed in the next backup, the fingerprint version of the data block in the fingerprint file of the next backup is the backup version corresponding to the next backup, and if the data in the data block is not changed in the next backup, the fingerprint version of the data block in the fingerprint file of the next backup is the backup version corresponding to the last backup.
Exemplary, the description is made in conjunction with fig. 3:
in fig. 3, a, b, e, and f respectively represent fingerprint files corresponding to each data file backup, and the version of each cell in the figure corresponds to the fingerprint version of the corresponding data block; fig. 3a corresponds to a fingerprint file that is backed up in full at the 1 st moment, where the backup version is v1, and the fingerprint version corresponding to each data block is also v1; fig. 3b corresponds to the fingerprint file that is incrementally backed up at time 2, where the backup version is v2, and compared with the full backup at time 1, the data blocks corresponding to row 2, column 4, row 3, column 2, and row 4, column 5 of the incremental backup at time 2 are changed, so that the fingerprint version corresponding to the changed data block is changed to v2, and the fingerprint version of the data block that is not changed remains v1; by the same method, fingerprint files corresponding to the 8 th time incremental backup and the 9 th time incremental backup of fig. 3e and fig. 3f can be obtained.
Step S202, if the time needing to be restored is not backed up, the fingerprint file of the backup before the time needing to be restored and closest to the time needing to be restored is used as the first fingerprint file.
And if the backup is carried out at the time needing to be recovered, taking the fingerprint file needing to be recovered as the first fingerprint file. The time required to be recovered is a time point required to be recovered. Exemplary, the description is made in conjunction with fig. 3:
fig. 3e and 3f respectively correspond to the fingerprint files corresponding to the 8 th time incremental backup and the 9 th time incremental backup, and if the time needing to be restored is between the 8 th time incremental backup and the 9 th time incremental backup, the fingerprint file corresponding to the 8 th time incremental backup is used as the first fingerprint file. And if the required recovery time is the 9 th time, taking the fingerprint file corresponding to the incremental backup at the 9 th time as the first fingerprint file.
Step S203, if the current time is not backed up and the data file is locally stored, obtaining the fingerprint file of the current time according to the fingerprint file of the backup that is previous to the current time and is closest to the current time, and using the fingerprint file of the current time as a second fingerprint file.
And if the current time is backed up, taking the fingerprint file backed up at the current time as the fingerprint file at the current time and taking the fingerprint file at the current time as a second fingerprint file. The current time is a time when recovery is started. Exemplary, the description is made in conjunction with fig. 3:
fig. 3e and fig. 3f respectively correspond to the fingerprint files corresponding to the 8 th time incremental backup and the 9 th time incremental backup, and if the current time is between the 8 th time and the 9 th time, because the backup is not performed at this time, the fingerprint file corresponding to the 8 th time incremental backup is used as the second fingerprint file; and if the current moment is the 9 th moment, directly taking the fingerprint file corresponding to the incremental backup at the 9 th moment as a second fingerprint file.
Step S204, comparing whether the fingerprints of the data block in the first fingerprint file and the second fingerprint file are consistent, and whether the fingerprint versions in the first fingerprint file and the second fingerprint file are consistent. The data in the data block is unique corresponding to the fingerprint, and if the data is different, the fingerprint is different.
Step S205, if the fingerprints in the first fingerprint file and the second fingerprint file are not consistent, or the fingerprint versions in the first fingerprint file and the second fingerprint file are not consistent, the data block is regarded as a differentiated data block. Exemplary, the description is made in conjunction with fig. 3: fig. 3e and 3f respectively correspond to the fingerprint files corresponding to the 8 th time incremental backup and the 9 th time incremental backup, and assuming that the 8 th time is the time to be restored and the 9 th time is the current time, the fingerprint file corresponding to the 8 th time incremental backup is the first fingerprint file and the fingerprint file corresponding to the 9 th time incremental backup is the second fingerprint file, it can be known by referring to fig. 3e and 3f that the data blocks corresponding to the 2 nd row, the 2 nd column and the 4 th row, the 3 rd column of the 2 nd row are changed, and the changed data blocks are referred to as the changed data blocks.
Step S206, according to the data of the dissimilarity data block on the backup corresponding to the target backup version, restoring the data file at the current moment; the target backup version is obtained according to the fingerprint version of the diversification data block in the first fingerprint file. Exemplary, the following is illustrated in connection with FIG. 3: taking the example in step S205 as an example, the fingerprint versions of the differentiated data blocks corresponding to row 2, column 2, row 4, column 3 are v9 and v9, respectively, and then the target backup versions of the differentiated data blocks corresponding to row 2, column 2, row 4, column 3 are v9 and v9, respectively.
The recovery process mainly comprises:
the storage server takes the data of the differentiated data block on the backup corresponding to the target backup version and then generates a differentiated data stream according to a format, wherein the differentiated data stream records the specific position of the data block needing to be updated, namely the differentiated data block in the data file and the data needing to be updated;
and the client receives and analyzes the dissimilarity data stream, updates the data block of the corresponding data file and completes recovery.
In the method for restoring the data file, after the data file is backed up for multiple times, the fingerprint file of each backup is determined, and the fingerprint file of each backup comprises a backup version corresponding to the backup, a fingerprint of each data block of the data file and a fingerprint version.
When a recovery operation is performed: determining a time needing to be restored, if the time needing to be restored is not backed up, taking a fingerprint file which is backed up at the time before and closest to the time needing to be restored as a first fingerprint file, if the current time is not backed up and the client locally stores the data file, obtaining the fingerprint file at the current time according to the fingerprint file which is backed up at the time before and closest to the current time, taking the fingerprint file at the current time as a second fingerprint file, comparing whether fingerprints of data blocks in the first fingerprint file and the second fingerprint file are consistent or not, and whether fingerprint versions in the first fingerprint file and the second fingerprint file are consistent or not, if the fingerprints of the data blocks in the first fingerprint file and the second fingerprint file are inconsistent or the fingerprint versions in the first fingerprint file and the second fingerprint file are inconsistent, taking the data blocks as heterogenized data blocks, and determining a target version according to the fingerprint version of the heterogenized data blocks in the first fingerprint file, so that the data blocks are accurately obtained or the data blocks are not damaged or the data which are not obtained in the backup before and in the backup of the target version which is inconsistent; and then, according to the found data before the change or without damage, recovering the data file at the current moment to realize the quick recovery of the data file.
In one embodiment, if the current time is not backed up and the client locally stores the data file, obtaining the fingerprint file of the current time according to the fingerprint file of the backup that is the latest time before and after the current time, includes:
if the current moment is not backed up and the client locally stores the data file, acquiring a fingerprint file backed up for the time before and closest to the current moment; taking the data block which changes at the current moment compared with the last backup moment as a target data block; the last backup time of the current time is the time of the last backup before and closest to the current time; in the fingerprint file of the primary backup before and closest to the current moment, changing the fingerprint version corresponding to the target data block into a non-backup version; and taking the fingerprint file obtained after the modification as the fingerprint file at the current moment.
Exemplary, the following is illustrated in connection with FIG. 3:
fig. 3a corresponds to a fingerprint file that is backed up in full at the 1 st time, fig. 3b corresponds to a fingerprint file that is backed up in incremental at the 2 nd time, fig. 3e and fig. 3f correspond to fingerprint files that are backed up in incremental at the 8 th time and the 9 th time, respectively, if the current time is the 10 th time, data blocks in the 1 st, 2 nd, 3 nd and 4 th rows of the 2 nd column from the 9 th time to the 10 th time are obtained by obtaining fingerprints of the data blocks and comparing the data blocks, as shown in fig. g, the data blocks are target data blocks, the fingerprint version of the target data blocks is changed into v0, v0 does not exist in all previous backup versions, and is called a non-backup version, and the fingerprint file obtained after the change is used as a current fingerprint file, that is, a second fingerprint file, as shown in fig. 3h.
In the above embodiment, it is considered that data may be damaged or changed from the previous backup time to the current time, so after obtaining the fingerprint file at the previous backup time at the current time, the fingerprint file at the previous backup time is not directly used as the fingerprint file at the current time, but the fingerprint file at the previous backup time is updated, and the fingerprint version of the data block that is changed or damaged in the fingerprint file at the previous backup time is changed into a non-backup version; because the non-backup version is different from any backup version, when the first fingerprint file and the second fingerprint file are compared to determine the data blocks to be differentiated, the data blocks with damaged or changed data from the last backup time to the current time can be found out.
Further, the step of determining a data block with data changed or damaged at the current time compared with the last backup time includes:
acquiring the fingerprint of the data block at the current moment and the fingerprint of the previous backup moment; and if the fingerprint of the data block at the current moment is not consistent with the fingerprint of the previous backup moment, determining that the data block is changed when the current moment is compared with the previous backup moment.
Because the data block is a storage unit of data, the fingerprint of each data block is unique, and when the data in the data block is changed or damaged, the fingerprint of the data block is changed, so that whether the data in the data block is changed or damaged can be determined by comparing whether the fingerprints of the same data block at different moments are changed or not.
In an embodiment, if the current time is not backed up and the client locally stores the data file, obtaining the fingerprint file of the current time according to the fingerprint file of the backup that is previous to and closest to the current time, includes:
if the current moment is not backed up and the client locally stores the data file, acquiring a fingerprint file of the backup which is before the current moment and is closest to the current moment; and if the data change or damage does not occur to each data block at the current moment compared with the last backup moment, taking the fingerprint file of the backup which is the latest before and at the current moment as the fingerprint file at the current moment. Exemplary, the following is explained in connection with fig. 3:
fig. 3a corresponds to a fingerprint file of the 1 st time full backup, fig. 3b corresponds to a fingerprint file of the 2 nd time incremental backup, fig. 3e and fig. 3f correspond to fingerprint files of the 8 th time incremental backup and the 9 th time incremental backup, respectively, if the current time is the 10 th time and the last backup time is the 9 th time, and if the fingerprint of each data block at the 10 th time is consistent with the fingerprint of each data block at the 9 th time, it can be determined that data change and data damage do not occur in each data block from the 9 th time to the 10 th time, then the fingerprint file at the 9 th time is taken as the fingerprint file at the current time, that is, the second fingerprint file, as shown in fig. 3f.
In the above embodiment, the data in the data blocks may not be changed or damaged from the previous backup time to the current time, that is, the fingerprint of each data block is not changed, so that the fingerprint file at the current time is the same as the fingerprint file at the previous backup time, and therefore, the fingerprint file at the previous backup time is directly used as the fingerprint file at the current time without regenerating the fingerprint file at the current time, which saves resources.
In one embodiment, before performing the recovery of the data file at the current time according to the data of the differentiated data block on the backup corresponding to the target backup version, the method includes:
the fingerprint data of the dissimilarity data block is sent to a storage server, so that the storage server executes the following steps: acquiring the position of the dissimilarity data block in a backup bitmap corresponding to the target backup version; under the condition that the data blocks in the backup corresponding to the target backup version are stored in a serialized mode, the positions of the differentiated data blocks in the serialized mode are obtained according to the positions; and taking the data in the position of the differentiated data block in the serialized storage as the data of the differentiated data block on the backup corresponding to the target backup version.
Further, before the storage server performs the above steps, the steps further include:
acquiring data of each data block in a target full backup and data of each incremental backup between the target full backup and the time needing to be restored; the target full backup is a primary full backup before and closest to the time needing to be restored; and determining the data of the positions of the differentiated data blocks in the serialized storage from the data of each data block in the target full backup and the data of each incremental backup between the target full backup and the time needing to be restored.
Assume that time t0 needs to be recovered. Suppose Fn represents a full backup, in represents an incremental backup, and n is a backup version. Taking the backup sequence F1-F2-F3-I4-I5-I6-I7-I8 as an example, assuming that t0 is between I6 and I7, the target full backup is F3, and the incremental backups between F3 and t0 are I4, I5, and I6, so that the data that may be used to restore to the restore time t0 can be found in the data of the F3, I4, I5, and I6 backups. For obtaining the data in the position of the dissimilarity data block in the serialization storage, exemplarily, the following is described in conjunction with fig. 4: after comparing the first fingerprint file with the second fingerprint file, the dissimilarity fingerprint data is generated through dissimilarity data blocks, the dissimilarity fingerprint data records a fingerprint version of the dissimilarity data block, a target backup version is found through the fingerprint version of the dissimilarity data block, a block-index-offset in the dissimilarity fingerprint data records that the dissimilarity data block is located on the data block of the backup data file corresponding to the target backup version, and a position offset after actual sequence storage is calculated through the block-index-offset, taking fig. 4 as an example, a block-index-offset =17 represents from left to right, and from top to bottom, data corresponding to the dissimilarity data block is stored on a 17 th data block, but the position offset =5 after actual sequence storage of the data corresponding to the dissimilarity data block.
In the above embodiment, during the recovery operation, all the data that may be used can be found from the data of each incremental backup between the full backup and the full backup that is closest to the time that needs to be recovered and the time that needs to be recovered, so that the storage server only needs to find the data on the backup of the differentiated data block corresponding to the target backup version when finding the data on the backup, thereby improving the efficiency. Meanwhile, the target backup version is determined according to the fingerprint version of the dissimilarity data block in the first fingerprint file, so that the data before change or without damage can be accurately acquired according to the data of the dissimilarity data block on the backup corresponding to the target backup version.
In one embodiment, comparing whether the fingerprints of the data block in the first fingerprint file and the second fingerprint file are consistent and whether the fingerprint versions in the first fingerprint file and the second fingerprint file are consistent includes:
and under the conditions that the fingerprint versions of the data blocks in the first fingerprint file and the second fingerprint file are compared firstly and the fingerprints of the data blocks in the first fingerprint file and the second fingerprint file are compared secondly, if the fingerprint versions of the data blocks in the first fingerprint file and the second fingerprint file are different in the first comparison result, the subsequent comparison is stopped. In the above embodiment, the fingerprint versions are compared first, and then the specific fingerprints are compared, so that as long as the results of the prior comparison are inconsistent, the subsequent comparison is not performed, and the effect of quickly comparing the data blocks is achieved.
In an embodiment, before performing the recovery of the data file at the current time according to the data of the differentiated data block on the backup corresponding to the target backup version, the method includes:
and if the backup is not carried out at the current moment and the client does not locally store the data file, taking all data blocks of the data file corresponding to the first fingerprint file as differentiated data blocks.
Whether the data file exists in the local of the client side is judged by judging whether the index of the corresponding data file in the first fingerprint file exists at the current moment, if not, the whole data file is lost at the current moment compared with the moment needing to be recovered, the whole data file needs to be recovered, and all data blocks of the data file are differentiated data blocks. Thus, a fast determination of the individualized data block is achieved.
In order to better understand the above method, an application example of the method for restoring the data file of the present application is described in detail below.
The main contents of the present embodiment include:
1. and during backup, the changed data blocks are tracked through a bitmap file, and only the incrementally changed part of the data file is backed up. And generates a fingerprint file for the data file containing information of the fingerprint version, i.e. this is the fingerprint of the backup update of the number of times.
2. The incremental backups are periodically merged onto the full backup on the storage server, generating a full backup at a new point in time. Therefore, after the database is subjected to full backup for the first time, only incremental backups of the database can be performed all the time, each incremental backup can be converted into the full backup of a new time point through merging operation in the background, and the fingerprint version information is updated at the same time.
3. When the data files are restored, the fingerprints of the data files in the current database are updated, the fingerprints of the data files which are backed up recently at the restoration time point are compared, and the dissimilarity fingerprint files and dissimilarity bitmap files are generated. The client transmits the dissimilarity fingerprint file with the version to the storage server, the storage server generates dissimilarity data streams of the data files, and the dissimilarity data streams are received and analyzed at the client and are updated to the corresponding data files. Thus, the data block in the same position is only written once, and is only for the data block which is abnormal or damaged. The data flow of the client and the storage server only comprises differentiated data file data and fingerprints thereof; it is not necessary to transfer all of the data of the full and incremental backed-up data files. Therefore, the data volume of transmission is reduced, the writing operation in the recovery process is greatly reduced, the number of WAL log files corresponding to each increment is small, and the time required by the forward rolling database to reach the consistent time point is correspondingly small. Finally, the aim of quickly recovering the database is fulfilled.
On the one hand, only changed data is backed up, and on the other hand, only damaged data is restored. The data volume generated by backup or recovery is small, and the requirement on service system resources is small. Meanwhile, the operation of generating the new full backup is finished by the back end, and the resources of a service system are not occupied. The recovery time is controllable and quick. Time and space are greatly shortened.
This embodiment specifically includes the following steps, which are described with reference to fig. 5:
a backup process:
1. creating a backup strategy task, declaring backup time, backup types and the like;
2. the scheduling engine schedules the execution of tasks;
3. the task distribution module distributes the tasks to all GTM nodes, CN nodes and DN nodes;
4. each node sends the task progress to a task management node, and a progress management module collects and outputs the total progress;
5. performing backup tasks
5.1 executing a prohibited ddl (database definitionsLanguage) statement, pgxc _ lock _ for _ backup (), on each CN node;
5.2 executing pg _ start _ backup (), by each DN backup node;
5.3 each DN backup node carries out full or incremental backup; the GTM node and the CN node are in the same way;
pgxc _ lock _ for _ backing () above, pg _ start _ backing () is a program execution statement;
5.3.1 performing full volume backup
5.3.1.1, generating a timestamp mark t0, enumerating index files of all data files output data files, and storing the index files into a cluster metadata set;
5.3.1.2 generating a data block bitmap of each data file according to the index file of the data file, and outputting a database bitmap file;
|Datafile1-Index|timestamp|bit-map-length|bit-map-data|bit-map-eof-mark|
|Datafile2-Index|timestamp|bit map length|bit map data|....|
all bit positions of the bit-map-data are 1;
5.3.1.3 generating a fingerprint file for each data file;
the fingerprint format is as follows:
|Datafile1-index|2-bytes-current-version|integer-timestamp|
2-bytes-hist-incre-version|block1-fingerprint|
2-bytes-hist-incre-version|block2-fingerprint|
2-bytes-hist-incre-version|blokc3-fingerprint|...|
|Datafile2-index|2-bytes-current-version|integer-timestamp|
2-bytes-hist-incre-version|block1-fingerprint|
2-bytes-hist-incre-version|block2-fingerprint|
2-bytes-hist-incre-version|blokc3-fingerprint|...|
description of the drawings:
2-bytes-current-version-establish a self-incrementing 2-byte fingerprint version field (starting with 1 and returning to 1 after FFFFF). When the data block changes, the fingerprint of the corresponding block is updated to be a new fingerprint, and the 2-bytes-hist-incre-version is the current 2-bytes-current-version.2-bytes-current-version is fingerprint head information and represents a backup version, and 2-bytes-hist-inch-version is fingerprint unit information and represents a fingerprint version of a corresponding data block, so that a data file where the corresponding data block is located can be retrieved according to the fingerprint version. When the fingerprint file cannot be located after the fingerprint version is reset, the fingerprint file can be aligned by using an integer-timestamp field.
5.3.1.4 backup index files, bitmap files and fingerprint files; backing up each data file;
5.3.2 performing a first incremental backup
5.3.2.1 generating a timestamp mark t1, and enumerating all data file update index files;
5.3.2.2 updating the database bitmap file; if the data block with the bit flag of 0 is changed after the check of t0, setting the flag bit to 1; if the data block with the bit mark of 1 is not changed since t0 is checked, the mark position is set to be 0;
5.3.2.3 updating the fingerprint file for the changed data block according to the bitmap file, namely recording new fingerprint data and a fingerprint version of the current data block; unchanged data blocks do not recalculate fingerprints;
5.3.2.4 backup index files, bitmap files and fingerprint files; backing up the data block (block with bit of 1) changed by each data file according to the bitmap file;
5.3.3 performing Secondary incremental backups
5.3.3.1 generating a timestamp mark t2, and enumerating all data file update index files;
5.3.3.2 updating the database bitmap file; if the data block with the bit flag of 0 changes since t1, the flag bit is set to 1; if the data block with the bit mark of 1 is not changed since t1 is checked, the mark position is set to be 0;
5.3.3.3, updating the fingerprint file for the changed data block according to the bitmap file, namely recording new fingerprint data and a fingerprint version of the current data block; unchanged data blocks do not recalculate fingerprints;
5.3.3.4 backup index files, bitmap files and fingerprint files; backing up a data block (a block with a bit of 1) changed by each data file according to the bitmap file;
5.3.4 subsequent incremental backups, and so on;
5.4 executing pg _ stop _ backup (), each DN backup node;
5.5 per DN Master node execution switch Log: pg _ switch _ WAL (), backup WAL log files;
5.6 backing up the backup _ label file;
and 5.7, backing up the metadata information of the database cluster, and generating a catalog record according to the backup information.
Pg _ stop _ backup (), pg _ switch _ wal () is a program execution statement;
the catalog is a database expression, records metadata information of each backup, and is mainly used for scanning backup pieces, copying files and the like;
the metadata information records the start time and the end time of the backup, the file generated by the backup and the storage position and the size of the file, the time point at which the backup can be recovered, and the like.
Incremental backup merging to generate a new full backup process:
1. creating a strategy task of merging backup sets on a storage server;
2. the scheduling engine schedules the task to execute;
3. executing a merging backup operation;
3.1 load the last full backup metadata information from catalog; metadata information of the current incremental backup; estimating the minimum required storage space, and if the available space is insufficient, reporting an error and exiting;
3.2 loading an index file of the data file of the incremental backup;
3.3 merge each incremental backup;
3.3.1 creating new data files according to the index files of the data files, and writing the merged new data into the data files;
3.3.2 executing clone merge for each data file, and loading the bitmap file of incremental backup; if the bit is 0, copying block data of the full-amount backup data file; if the bit is 1, copying the block data of the data file of the current incremental backup; synchronously generating a new fingerprint file (if the fingerprint version is not in the index set of the dependency chain table, it indicates that the current fingerprint data blocks are merged, and the fingerprint version is updated to be the current 2-bytes-current-version); finally generating a new full-amount backup data file; resetting to generate a new bitmap file;
the dependency chain table is a backup possibly used in the recovery process;
3.3.3 if all the data files of the current incremental backup are merged, deleting the current incremental backup;
assuming that the backup sequence before the merge operation is F6-F7-F8-I9-I10-I11-I12 (F represents the full backup, I represents the incremental backup, and the number is the fingerprint version number), the backup sequence is changed to F6-F7-F8-F9-I10-I11-I12 after the merge operation of this round is finished.
3.3.4 insert the catalog information of the new full backup.
And (3) recovering the flow:
1. selecting a time point which can be restored according to the catalog record, outputting a dependent linked list of a backup set required for restoration according to the catalog, and indexing the dependent linked list by using a fingerprint version;
assume that the time to resume is t0. Suppose Fn represents a full backup, in represents an incremental backup, and n is a fingerprint version number. Taking the backup sequence F1-F2-F3-I4-I5-I6-I7-I8 as an example, assume that t0 is between I6 and I7. Then the dependency table is restored as
3-F3
4-I4
5-I5
6-I6
2. The scheduling engine schedules the task to execute;
3. preparing an environment: stopping all nodes pg _ ctl stop-m immediate-D;
the pg _ ctl stop-m immediate-D is a program execution statement.
4. The task distribution module distributes the tasks to all GTM nodes, CN nodes and DN nodes;
5. executing a recovery task;
5.1 restoring an index file, a bitmap file and a first fingerprint file of a data file which depends on the last backup of the linked list, wherein the first fingerprint file is called a fingerprint file A;
5.2, scanning the existing data file according to the index file of the data file to generate or update a second fingerprint file, and calling the second fingerprint file as a local fingerprint file B;
5.3 comparing the fingerprint file A of each t0 with the local fingerprint file B to generate dissimilarity fingerprint data, wherein the format is as follows:
|datafile-index|2-bytes-current-version|int-timestamp|
block-index-offset|2-bytes-hist-hist-version|fingerprint|
block-index-offset|2-bytes-hist-incre-version|fingerprint|...|
5.3.1 if the local fingerprint of the corresponding datafile-index (index of the data file) does not exist locally, the data file corresponding to the current fingerprint is the newly added file. And directly generating dissimilarity fingerprint data according to the current fingerprint file. The round of comparison is finished.
5.3.2 for each piece of fingerprint data, comparing whether the fingerprints of the data blocks in the fingerprint file A and the local fingerprint file B are consistent or not and whether the fingerprint versions in the fingerprint file A and the local fingerprint file B are consistent or not, if only one of the data blocks is different, the data block is a dissimilarity data block to generate dissimilarity fingerprint data, comparing the fingerprint versions firstly, and then comparing the specific fingerprints, so that the comparison of the fingerprint files can be accelerated.
The block-index-offset indicates that the differentiated data block is located on the block of the data file, the 2-bytes-hist-incre-version indicates the fingerprint version of the differentiated data block, and the fingerprint indicates the fingerprint.
And 5.4, the client sends the dissimilarity fingerprint data of each data file to the storage server.
The storage server generates a dissimilarity data stream according to the dissimilarity fingerprint data, and the stream format is as follows:
|datafile-index|2-bytes-current-version|int-timestamp|
block-index-offset|data-block|
block-index-offset|data-block|...|
the block-index-offset mentioned above indicates the number of data blocks to be updated to the data file, and the data-block indicates what the specific data is updated.
5.4.1 according to the recovery dependency chain table, establishing a datafile table of each version by taking the fingerprint version as an index, such as:
3-datafile.3
4-datafile.4
5-datafile.5
6-datafile.6
5.4.2 parsing the differentiated fingerprint data stream, for each differentiated fingerprint block, finding corresponding data file backup data according to the fingerprint version 2-bytes-hist-hist-version (because the operation of merging incremental backups exists on the storage server, the fingerprint version on the incremental backup may not depend on any value indexed on the linked list, here, the example fingerprint is not a value in 3, 4, 5, 6, this indicates that the data block is already merged on the full backup of the link head, and the corresponding data block on the full backup of the link head is taken as the case may be). The actual offset of the block data store is calculated from the block-index-offset on the bitmap file to find the data blocks of the data file, which can be referred to in fig. 4, and then appended to the stream of diversification data in a format.
And 5.5 the client receives the dissimilarity data stream, analyzes the stream format while receiving the dissimilarity data stream, and updates the data blocks of the corresponding data file according to the block-index-offset.
5.6 go back to step 5.4 until all datafiles have finished updating.
5.7 recovering the WAL log file;
5.8 performing PITR;
5.8.1 preparing a configuration file, modifying postgresql.conf.user, pg _ xlog _ archive.conf, pg _ hba.conf;
the postgresql.conf, postgresql.conf.user, pg _ xlog _ archive.conf, and pg _ hba.conf are configuration files.
5.8.2 updating pgxc _ node according to the information of each node ip and port of the cluster recorded in the catalog;
the ip and port information of the node is the ip address information and the port information of the node;
the pgxc _ node is a system table.
5.8.3 restart to validate routing information, pg _ ctl restart-Z coordinator-D;
the pg _ ctl restart-Z coordinator-D is a program execution statement.
It should be understood that, although the steps in the flowcharts related to the embodiments as described above are sequentially displayed as indicated by arrows, the steps are not necessarily performed sequentially as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the flowcharts related to the embodiments described above may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the execution order of the steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a part of the steps or stages in other steps.
In one embodiment, as shown in fig. 6, there is provided an apparatus for restoring a data file, applied to a client, including:
the backup module 601 is configured to determine a fingerprint file for each backup after multiple backups are performed on a data file; the fingerprint file of each backup comprises a backup version corresponding to the backup, a fingerprint of each data block of the data file and a fingerprint version; compared with the last backup, if the data in the data block is changed in the next backup, the fingerprint version of the data block in the fingerprint file of the next backup is the backup version corresponding to the next backup, and if the data in the data block is not changed in the next backup, the fingerprint version of the data block in the fingerprint file of the next backup is the backup version corresponding to the last backup;
a first fingerprint file obtaining module 602, configured to, if the backup is not performed at the time that needs to be restored, take a fingerprint file of a backup that is before the time that needs to be restored and is closest to the time that needs to be restored, as a first fingerprint file;
a second fingerprint file obtaining module 603, configured to, if the current time is not backed up and the data file is locally stored, obtain, according to a fingerprint file that is before the current time and is backed up most recently to the current time, the fingerprint file at the current time, and use the fingerprint file at the current time as a second fingerprint file;
a fingerprint file comparison module 604, configured to compare whether fingerprints of the data blocks in the first fingerprint file and the second fingerprint file are consistent, and whether fingerprint versions in the first fingerprint file and the second fingerprint file are consistent;
a diversification data block obtaining module 605, configured to, if fingerprints of a data block in the first fingerprint file and the second fingerprint file are not consistent, or fingerprint versions of the data block in the first fingerprint file and the second fingerprint file are not consistent, take the data block as a diversification data block;
a data file recovery module 606, configured to perform recovery of the data file at the current time according to the data of the differentiated data block on the backup corresponding to the target backup version; the target backup version is obtained according to the fingerprint version of the dissimilarity data block in the first fingerprint file.
In an embodiment, the second fingerprint file obtaining module is further configured to, if the backup is not performed at the current time and the data file is locally stored in the client, obtain a fingerprint file of a backup that is previous to and closest to the current time; taking the data block which changes at the current moment compared with the last backup moment as a target data block; the last backup time of the current time is the time of the last backup before and closest to the current time; changing the fingerprint version corresponding to the target data block into a non-backup version in the fingerprint file of the one-time backup which is before and closest to the current time; and taking the fingerprint file obtained after the change as the fingerprint file at the current moment.
In an embodiment, the second fingerprint file obtaining module is further configured to obtain a fingerprint of the data block at the current time and a fingerprint of the previous backup time; and if the fingerprint of the data block at the current moment is inconsistent with the fingerprint of the previous backup moment, determining that the data block is changed or damaged when the current moment is compared with the previous backup moment.
In an embodiment, the second fingerprint file obtaining module is further configured to, if the backup is not performed at the current time and the data file is locally stored in the client, obtain a fingerprint file of a backup that is previous to and closest to the current time; and if the current time of each data block is not changed compared with the last backup time, taking the fingerprint file of the backup at the time before and closest to the current time as the fingerprint file of the current time.
In one embodiment, the apparatus further includes a dissimilatory data block fingerprint data sending module, which sends the fingerprint data of the dissimilatory data block to the storage server, so that the storage server performs the following steps: acquiring the position of the dissimilarity data block in a backup bitmap corresponding to the target backup version; under the condition that the data blocks in the backup corresponding to the target backup version are stored in a serialized mode, the positions of the differentiated data blocks in the serialized storage are obtained according to the positions; and taking the data in the position of the dissimilarity data block in the serialized storage as the data of the dissimilarity data block on the backup corresponding to the target backup version.
In one embodiment, the storage server further performs the steps of: acquiring data of each data block in a target full backup and data of each incremental backup between the target full backup and the time needing to be restored; the target full backup is a one-time full backup before and closest to the time needing to be restored; and determining the data of the dissimilarity data block in the position in the serialized storage from the data of each data block in the target full backup and the data of each incremental backup between the target full backup and the time needing to be restored.
In an embodiment, the fingerprint file comparison module is further configured to compare the fingerprint versions of the corresponding data block in the first fingerprint file and the second fingerprint file, compare the fingerprints of the data block in the first fingerprint file and the second fingerprint file, and directly determine that the corresponding data block is a differentiated data block as long as the fingerprint versions of the corresponding data block in the first fingerprint file and the second fingerprint file are different, without comparing the fingerprints.
In an embodiment, the apparatus further includes a data file determining module, configured to, before performing recovery of the data file at the current time according to data of the differentiated data block on the backup corresponding to the target backup version, if the backup is not performed at the current time and the data file is not locally stored, use all data blocks of the data file corresponding to the first fingerprint file as differentiated data blocks. For specific limitations of the apparatus for restoring data files, reference may be made to the above limitations of the method for restoring data files, which are not described herein again. The respective modules in the above apparatus for restoring data files may be wholly or partially implemented by software, hardware, and a combination thereof. The modules can be embedded in a hardware form or independent of a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a client is provided, the internal structure of which may be as shown in fig. 7. The client includes a processor, memory, and a network interface connected by a system bus. Wherein the processor of the client is configured to provide computing and control capabilities. The memory of the client includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the client is used for storing the data of the recovery data file. The network interface of the client is used for communicating with an external terminal through network connection. The client also comprises an input/output interface, wherein the input/output interface is a connecting circuit for exchanging information between the processor and external equipment, and is connected with the processor through a bus, and the input/output interface is called an I/O interface for short. The computer program when executed by a processor implements a method of recovering a data file.
Those skilled in the art will appreciate that the architecture shown in fig. 7 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a client is provided, comprising a memory and a processor, the memory storing a computer program, the processor implementing the steps of the above method embodiments when executing the computer program.
In an embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the respective method embodiment as described above.
In an embodiment, a computer program product is provided, having a computer program stored thereon, the computer program being executed by a processor for performing the steps of the above-described respective method embodiments.
It should be noted that, the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above examples only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, and these are all within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A method for recovering a data file, applied to a client, the method comprising:
after the data file is backed up for multiple times, determining the fingerprint file of each backup; the fingerprint file of each backup comprises a backup version corresponding to the backup, a fingerprint of each data block of the data file and a fingerprint version; compared with the last backup, if the data in the data block is changed in the next backup, the fingerprint version of the data block in the fingerprint file of the next backup is the backup version corresponding to the next backup, and if the data in the data block is not changed in the next backup, the fingerprint version of the data block in the fingerprint file of the next backup is the backup version corresponding to the last backup;
if the backup is not carried out at the time needing to be restored, taking the fingerprint file of the backup which is before the time needing to be restored and is closest to the time needing to be restored as a first fingerprint file;
if the current moment is not backed up and the data file is locally stored, obtaining the fingerprint file of the current moment according to the fingerprint file which is backed up at the time before and closest to the current moment, and taking the fingerprint file of the current moment as a second fingerprint file;
comparing whether the fingerprints of the data blocks in the first fingerprint file and the second fingerprint file are consistent or not and whether the fingerprint versions in the first fingerprint file and the second fingerprint file are consistent or not;
if the fingerprints of the data block in the first fingerprint file and the second fingerprint file are not consistent, or the fingerprint versions of the data block in the first fingerprint file and the second fingerprint file are not consistent, taking the data block as a dissimilarity data block;
according to the data of the dissimilarity data block on the backup corresponding to the target backup version, recovering the data file at the current moment; the target backup version is obtained according to the fingerprint version of the dissimilarity data block in the first fingerprint file.
2. The method of claim 1, wherein if the current time is not backed up and the data file is locally stored, obtaining the fingerprint file of the current time according to a fingerprint file of a backup that is previous to the current time and closest to the current time comprises:
if the current moment is not backed up and the data file is locally stored, acquiring a fingerprint file which is backed up for the time before the current moment and is closest to the current moment;
taking a data block with data change or data damage at the current moment compared with the last backup moment as a target data block; the last backup time of the current time is the time of the last backup before and closest to the current time;
changing the fingerprint version corresponding to the target data block into a non-backup version in the fingerprint file of the one-time backup which is before and closest to the current time;
and taking the fingerprint file obtained after the modification as the fingerprint file at the current moment.
3. The method of claim 2, wherein determining that a data block has changed or corrupted data at the current time as compared to the last backup time comprises:
acquiring the fingerprint of the data block at the current moment and the fingerprint of the previous backup moment;
and if the fingerprint of the data block at the current moment is not consistent with the fingerprint of the previous backup moment, determining that the data block is changed when the current moment is compared with the previous backup moment.
4. The method of claim 1, wherein if the current time is not backed up and the data file is locally stored, obtaining the fingerprint file of the current time according to a fingerprint file of a backup that is previous to and closest to the current time comprises:
if the current moment is not backed up and the data file is locally stored, acquiring a fingerprint file of the one-time backup which is before the current moment and is closest to the current moment;
and if the data change and the data damage of each data block do not occur at the current moment compared with the last backup moment, taking the fingerprint file of the backup at the time before and closest to the current moment as the fingerprint file of the current moment.
5. The method of claim 1, wherein before performing the recovery of the data file at the current time according to the data of the differentiated data block on the backup corresponding to the target backup version, the method further comprises:
sending the fingerprint data of the dissimilarity data block to a storage server so that the storage server executes the following steps: acquiring the position of the dissimilarity data block in a backup bitmap corresponding to the target backup version; under the condition that the data blocks in the backup corresponding to the target backup version are stored in a serialized mode, the positions of the differentiated data blocks in the serialized storage are obtained according to the positions; and taking the data in the position of the differentiated data block in the serialized storage as the data of the differentiated data block on the backup corresponding to the target backup version.
6. The method of claim 5, wherein the storage server further performs the following steps before the data in the location of the differentiated data block in the serialized storage is used as the data of the differentiated data block on the backup corresponding to the target backup version:
acquiring data of each data block in a target full backup and data of each incremental backup between the target full backup and the time needing to be restored; the target full backup is a one-time full backup before and closest to the time needing to be restored;
and determining the data of the dissimilarity data block in the position in the serialized storage from the data of each data block in the target full backup and the data of each incremental backup between the target full backup and the time needing to be restored.
7. The method of claim 1, wherein comparing whether the fingerprints of the data blocks in the first fingerprint file and the second fingerprint file are consistent and whether the fingerprint versions in the first fingerprint file and the second fingerprint file are consistent comprises:
and under the condition that the fingerprint versions of the data blocks in the first fingerprint file and the second fingerprint file are compared firstly and the fingerprints of the data blocks in the first fingerprint file and the second fingerprint file are compared secondly, if the fingerprint versions of the data blocks in the first fingerprint file and the second fingerprint file are different in the first comparison result, the later comparison is stopped.
8. The method of claim 1, wherein before performing the recovery of the data file at the current time according to the data of the differentiated data block on the backup corresponding to the target backup version, the method further comprises:
and if the current moment is not backed up and the data file is not stored locally, taking all data blocks of the data file corresponding to the first fingerprint file as dissimilarity data blocks.
9. An apparatus for recovering a data file, applied to a client, the apparatus comprising:
the backup module is used for determining the fingerprint file of each backup after the data file is backed up for multiple times; the fingerprint file of each backup comprises a backup version corresponding to the backup, a fingerprint of each data block of the data file and a fingerprint version; compared with the last backup, if the data in the data block is changed in the next backup, the fingerprint version of the data block in the fingerprint file of the next backup is the backup version corresponding to the next backup, and if the data in the data block is not changed in the next backup, the fingerprint version of the data block in the fingerprint file of the next backup is the backup version corresponding to the last backup;
the first fingerprint file acquisition module is used for taking the fingerprint file which is backed up at the time before the time needing to be restored and is closest to the time needing to be restored as a first fingerprint file if the backup is not carried out at the time needing to be restored;
the second fingerprint file acquisition module is used for obtaining a fingerprint file at the current moment according to the fingerprint file which is backed up at the latest moment before and at the current moment if the current moment is not backed up and the data file is locally stored, and taking the fingerprint file at the current moment as a second fingerprint file;
the fingerprint file comparison module is used for comparing whether the fingerprints of the data blocks in the first fingerprint file and the second fingerprint file are consistent or not and whether the fingerprint versions in the first fingerprint file and the second fingerprint file are consistent or not;
the data block processing module is used for processing the data block into a first fingerprint file and a second fingerprint file, and processing the first fingerprint file and the second fingerprint file into a second fingerprint file;
the data file recovery module is used for recovering the data file at the current moment according to the data of the differentiated data block on the backup corresponding to the target backup version; the target backup version is obtained according to the fingerprint version of the dissimilarity data block in the first fingerprint file.
10. A client comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the method of any one of claims 1 to 8 when executing the computer program.
CN202210995696.0A 2022-08-18 2022-08-18 Method, device and client for recovering data file Active CN115357429B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210995696.0A CN115357429B (en) 2022-08-18 2022-08-18 Method, device and client for recovering data file

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210995696.0A CN115357429B (en) 2022-08-18 2022-08-18 Method, device and client for recovering data file

Publications (2)

Publication Number Publication Date
CN115357429A true CN115357429A (en) 2022-11-18
CN115357429B CN115357429B (en) 2023-06-27

Family

ID=84002852

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210995696.0A Active CN115357429B (en) 2022-08-18 2022-08-18 Method, device and client for recovering data file

Country Status (1)

Country Link
CN (1) CN115357429B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101382885A (en) * 2007-09-06 2009-03-11 联想(北京)有限公司 Multi-edition control method and apparatus for data file
CN102184218A (en) * 2011-05-05 2011-09-14 华中科技大学 Repeated data delete method based on causal relationship
CN103617215A (en) * 2013-11-20 2014-03-05 上海爱数软件有限公司 Method for generating multi-version files by aid of data difference algorithm
CN103870362A (en) * 2014-03-21 2014-06-18 华为技术有限公司 Data recovery method, data recovery device and backup system
CN104166606A (en) * 2014-08-29 2014-11-26 华为技术有限公司 File backup method and main storage device
CN104572357A (en) * 2014-12-30 2015-04-29 清华大学 Backup and recovery method for HDFS (Hadoop distributed filesystem)
CN105302675A (en) * 2015-11-25 2016-02-03 上海爱数信息技术股份有限公司 Method and device for data backup

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101382885A (en) * 2007-09-06 2009-03-11 联想(北京)有限公司 Multi-edition control method and apparatus for data file
CN102184218A (en) * 2011-05-05 2011-09-14 华中科技大学 Repeated data delete method based on causal relationship
CN103617215A (en) * 2013-11-20 2014-03-05 上海爱数软件有限公司 Method for generating multi-version files by aid of data difference algorithm
CN103870362A (en) * 2014-03-21 2014-06-18 华为技术有限公司 Data recovery method, data recovery device and backup system
CN104166606A (en) * 2014-08-29 2014-11-26 华为技术有限公司 File backup method and main storage device
CN104572357A (en) * 2014-12-30 2015-04-29 清华大学 Backup and recovery method for HDFS (Hadoop distributed filesystem)
CN105302675A (en) * 2015-11-25 2016-02-03 上海爱数信息技术股份有限公司 Method and device for data backup

Also Published As

Publication number Publication date
CN115357429B (en) 2023-06-27

Similar Documents

Publication Publication Date Title
CN107357688B (en) Distributed system and fault recovery method and device thereof
US20050028026A1 (en) Method and system for backing up and restoring data of a node in a distributed system
CN111143133B (en) Virtual machine backup method and backup virtual machine recovery method
CN110727724B (en) Data extraction method and device, computer equipment and storage medium
CN110602165B (en) Government affair data synchronization method, device, system, computer equipment and storage medium
CN109063005B (en) Data migration method and system, storage medium and electronic device
CN109460438B (en) Message data storage method, device, computer equipment and storage medium
US8271454B2 (en) Circular log amnesia detection
WO2021082925A1 (en) Transaction processing method and apparatus
CN111966531B (en) Data snapshot method and device, computer equipment and storage medium
CN113419897A (en) File processing method and device, electronic equipment and storage medium thereof
CN115357429A (en) Method and device for recovering data file and client
CN115756955A (en) Data backup and data recovery method and device and computer equipment
CN113312309B (en) Snapshot chain management method, device and storage medium
CN111400243B (en) Development management system based on pipeline service and file storage method and device
CN114416689A (en) Data migration method and device, computer equipment and storage medium
CN108376104B (en) Node scheduling method and device and computer readable storage medium
CN113342824A (en) Data storage method, device, equipment and medium based on target storage equipment
JP2022070579A (en) Distributed ledger management method, distributed ledger system, and node
CN107844491B (en) Method and equipment for realizing strong consistency read operation in distributed system
CN108614838B (en) User group index processing method, device and system
CN115617580B (en) Incremental backup and recovery method and system based on Shared SST (SST) file
CN117076197A (en) Data recovery method, device, computer equipment and storage medium
CN115269270A (en) Backup merging method, device and equipment
CN117453454A (en) Data backup method, device, computer equipment, medium and product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant