CN115357429B - Method, device and client for recovering data file - Google Patents

Method, device and client for recovering data file Download PDF

Info

Publication number
CN115357429B
CN115357429B CN202210995696.0A CN202210995696A CN115357429B CN 115357429 B CN115357429 B CN 115357429B CN 202210995696 A CN202210995696 A CN 202210995696A CN 115357429 B CN115357429 B CN 115357429B
Authority
CN
China
Prior art keywords
fingerprint
data
backup
file
data block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210995696.0A
Other languages
Chinese (zh)
Other versions
CN115357429A (en
Inventor
杨海锋
马立珂
王子骏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Dingjia Computer Technology Co ltd
Original Assignee
Guangzhou Dingjia Computer Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Dingjia Computer Technology Co ltd filed Critical Guangzhou Dingjia Computer Technology Co ltd
Priority to CN202210995696.0A priority Critical patent/CN115357429B/en
Publication of CN115357429A publication Critical patent/CN115357429A/en
Application granted granted Critical
Publication of CN115357429B publication Critical patent/CN115357429B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to the technical field of backup processing, and provides a method, a device and equipment for recovering data files, which realize that only data with dissimilarity is recovered. The method mainly comprises the following steps: if the data in the data block changes in the next backup, the fingerprint version of the data block is the backup version corresponding to the next backup in the fingerprint file of the next backup, and if the data block does not change in the next backup, the fingerprint version of the data block is the backup version corresponding to the last backup in the fingerprint file of the next backup; if the fingerprints of the data block in the first fingerprint file and the second fingerprint file are inconsistent or the fingerprint versions are inconsistent, the data block is used as a dissimilated data block; and according to the data of the dissimilated data block on the backup corresponding to the target backup version, recovering the data file at the current moment.

Description

Method, device and client for recovering data file
Technical Field
The present invention relates to the field of backup processing technologies, and in particular, to a method, an apparatus, and a client for recovering a data file.
Background
The data of the database can be stored in the form of data files, when the data files are used for storing the data by the database, the data files can be segmented to obtain corresponding data blocks, and the corresponding data are stored by taking the data blocks as the minimum logic units.
Over time, the data stored by the data file may change or be damaged, e.g., from a first time to a second time, the data stored by the data file may change or be damaged; in order to acquire data before change or uncorrupted data, a restoration technique of a data file has emerged. In the technology of restoring the data file, only the changed or damaged data can be restored, thereby realizing the rapid restoration of the data file. However, at the time of quick recovery, there is currently no particularly feasible method to accurately find pre-change data or uncorrupted data.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a method, apparatus, client, computer-readable storage medium, and computer program product for restoring a data file.
The application provides a method for recovering a data file, comprising the following steps:
after the data files are backed up for a plurality of times, determining fingerprint files backed up for each time; the fingerprint file of each backup comprises a backup version corresponding to the backup, fingerprints of each data block of the data file and fingerprint versions; compared with the previous backup, if the data in the data block changes in the next backup, the fingerprint version of the data block is the backup version corresponding to the next backup in the fingerprint file of the next backup, and if the data in the data block does not change in the next backup, the fingerprint version of the data block is the backup version corresponding to the previous backup in the fingerprint file of the next backup;
If the moment to be restored is not backed up, taking the fingerprint file which is backed up for the time before the moment to be restored and is closest to the moment to be restored as a first fingerprint file;
if the current moment is not backed up and the data file is locally stored, obtaining a fingerprint file at the current moment according to the fingerprint file backed up once before the current moment and the latest time from the current moment, and taking the fingerprint file at the current moment as a second fingerprint file;
comparing whether fingerprints of the data blocks in the first fingerprint file and the second fingerprint file are consistent, and whether fingerprint versions in the first fingerprint file and the second fingerprint file are consistent;
if the fingerprints of the data block in the first fingerprint file and the second fingerprint file are inconsistent, or the fingerprint versions in the first fingerprint file and the second fingerprint file are inconsistent, the data block is used as a dissimilated data block;
according to the data of the dissimilated data block on the backup corresponding to the target backup version, restoring the data file at the current moment; the target backup version is obtained according to the fingerprint version of the dissimilated data block in the first fingerprint file.
The application also provides a device for recovering the data file, which comprises:
the backup module is used for determining fingerprint files of each backup after carrying out multiple backups on the data files; the fingerprint file of each backup comprises a backup version corresponding to the backup, fingerprints of each data block of the data file and fingerprint versions; compared with the previous backup, if the data in the data block changes in the next backup, the fingerprint version of the data block is the backup version corresponding to the next backup in the fingerprint file of the next backup, and if the data in the data block does not change in the next backup, the fingerprint version of the data block is the backup version corresponding to the previous backup in the fingerprint file of the next backup;
the first fingerprint file acquisition module is used for taking a fingerprint file which is backed up for the time before the time needing to be restored and is closest to the time needing to be restored as a first fingerprint file if the time needing to be restored is not backed up;
the second fingerprint file obtaining module is used for obtaining the fingerprint file at the current moment according to the fingerprint file backed up at the last time before the current moment and the moment when the current moment is not backed up and the data file is locally stored, and taking the fingerprint file at the current moment as a second fingerprint file;
The fingerprint file comparison module is used for comparing whether fingerprints of the data blocks in the first fingerprint file and the second fingerprint file are consistent or not and whether fingerprint versions in the first fingerprint file and the second fingerprint file are consistent or not;
the catabolic data block acquisition module is used for taking the data block as a catabolic data block if the fingerprints of the data block in the first fingerprint file and the second fingerprint file are inconsistent or the fingerprint versions in the first fingerprint file and the second fingerprint file are inconsistent;
the data file recovery module is used for recovering the data file at the current moment according to the data of the dissimilated data block on the backup corresponding to the target backup version; the target backup version is obtained according to the fingerprint version of the dissimilated data block in the first fingerprint file.
The application provides a client comprising a memory storing a computer program and a processor executing the above method.
The present application provides a computer readable storage medium having stored thereon a computer program for execution by a processor of the above method.
The present application provides a computer program product having a computer program stored thereon, the computer program being executed by a processor to perform the above method.
The method, the device, the client, the computer readable storage medium and the computer program product for restoring the data file determine the fingerprint file of each backup after the data file is backed up for a plurality of times, wherein the fingerprint file of each backup comprises a backup version corresponding to the backup, fingerprints of each data block of the data file and fingerprint versions, compared with the previous backup, if the data in the data block changes in the next backup, the fingerprint version of the data block is the backup version corresponding to the next backup, and if the data in the data block does not change in the next backup, the fingerprint version of the data block is the backup version corresponding to the previous backup in the fingerprint file of the next backup, so that the one-to-one correspondence between each data block and the backup version is determined.
When the recovery operation is performed: determining the moment to be restored, if the moment to be restored is not backed up, taking the fingerprint file which is backed up once before the moment to be restored and is closest to the moment to be restored as a first fingerprint file, if the moment is not backed up, and the client locally stores the data file, obtaining the fingerprint file at the moment according to the fingerprint file which is backed up once before the moment and is closest to the moment, taking the fingerprint file at the moment as a second fingerprint file, comparing whether fingerprints of a data block in the first fingerprint file and the second fingerprint file are consistent or not, and if the fingerprints of the data block in the first fingerprint file and the second fingerprint file are inconsistent, taking the data block as a dissimilated data block, and determining a target version according to the fingerprint version of the dissimilated data block in the first fingerprint file, thus obtaining accurate data or failure data before the data is changed according to the obtained and corresponding to the target version; and then, according to the found data before the change or undamaged, recovering the data file at the current moment, and realizing the quick recovery of the data file.
Drawings
FIG. 1 is a diagram of an application environment for restoring data files in one embodiment;
FIG. 2 is a flow diagram of recovering a data file in one embodiment;
FIG. 3 is a flow diagram of recovering a data file in one embodiment;
FIG. 4 is a flow diagram of recovering a data file in one embodiment;
FIG. 5 is a flow diagram of recovering a data file in one embodiment;
FIG. 6 is a block diagram of the structure of a recovery data file in one embodiment;
fig. 7 is an internal structural diagram of a client in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly understand that the embodiments described herein may be combined with other embodiments.
The method for recovering the data file can be applied to an application environment shown in fig. 1.
As shown in fig. 1, the distributed database (which may be, but is not limited to, a TDSQL-PG database) includes a plurality of database nodes, each having a corresponding client, and one database node and its corresponding client are regarded as a recovery node; for example, GTM and client a are considered recovery node 1. The backup management system comprises a backup server, and the backup storage system comprises a storage server.
When the backup process is started, the backup management system sends an instruction to each client, and after the client receives the instruction, the data files of the corresponding database nodes are backed up, for example, the client a backs up the data files of the GTM, and the client b backs up the data files of the CN. And after the data files are backed up for a plurality of times, storing the related data obtained by each backup into a backup storage system, and merging the related data by a backup merging unit.
When the recovery flow is started, the backup management system sends an instruction to the client, the client receives the instruction and generates dissimilated fingerprint data, the dissimilated fingerprint data is transmitted to the backup storage system, a dissimilated data stream is generated in the backup storage server and returned to the client, and the client receives and analyzes the dissimilated data stream to complete recovery of the data file.
In one embodiment, as shown in FIG. 2, a method of restoring a data file is provided, in which the method may be performed by a client, comprising the steps of:
step S201, after the data file is backed up for a plurality of times, the fingerprint file of each backup is determined.
The multiple backup is that multiple incremental backups are carried out on the basis of full backup, corresponding fingerprint files are generated in each backup, and each backup fingerprint file comprises a backup version corresponding to the backup, fingerprints of each data block of the data files and fingerprint versions; compared with the previous backup, if the data in the data block changes in the next backup, the fingerprint version of the data block is the backup version corresponding to the next backup in the fingerprint file of the next backup, and if the data in the data block does not change in the next backup, the fingerprint version of the data block is the backup version corresponding to the previous backup in the fingerprint file of the next backup.
Illustratively, the description is provided in connection with FIG. 3:
a, b, e, f in fig. 3 shows fingerprint files corresponding to each data file backup, and the version of each cell in the drawing corresponds to the fingerprint version of the corresponding data block; the fingerprint file of the full-quantity backup at the 1 st moment corresponds to fig. 3a, the backup version is v1, and the fingerprint version corresponding to each data block is also v1; FIG. 3b is a schematic diagram of a fingerprint file that is backed up incrementally at time 2, where the backup version is v2, and when the data blocks corresponding to the row 2, column 4, row 3, column 2, and column 4, column 5 of the incremental backup at time 2 are changed compared with the full backup at time 1, the fingerprint version of the data block corresponding to the changed data block is changed to v2, and the fingerprint version of the data block not changed is kept unchanged by v1; the fingerprint files corresponding to the incremental backups at the 8 th time and the 9 th time respectively corresponding to fig. 3e and fig. 3f can be obtained by adopting the same method.
In step S202, if the time to be restored is not backed up, the fingerprint file of the last backup before the time to be restored is used as the first fingerprint file.
And if the fingerprint file is backed up at the moment of needing to be restored, taking the fingerprint file backed up at the moment of needing to be restored as the first fingerprint file. The time point to be recovered is the time point to be recovered. Illustratively, the description is provided in connection with FIG. 3:
fig. 3e and fig. 3f correspond to the fingerprint files corresponding to the 8 th time incremental backup and the 9 th time incremental backup respectively, and if the time to be restored is between the 8 th time incremental backup and the 9 th time incremental backup, the fingerprint file corresponding to the 8 th time incremental backup is used as the first fingerprint file. And if the required recovery time is the 9 th time, taking the fingerprint file corresponding to the 9 th time incremental backup as the first fingerprint file.
In step S203, if the current time is not backed up and the data file is locally stored, the fingerprint file at the current time is obtained from the fingerprint file backed up before the current time and the last time from the current time, and the fingerprint file at the current time is used as the second fingerprint file.
And if the current moment is backed up, taking the fingerprint file backed up at the current moment as the fingerprint file at the current moment and taking the fingerprint file at the current moment as a second fingerprint file. The current time is the time at which recovery starts. Illustratively, the description is provided in connection with FIG. 3:
Fig. 3e and fig. 3f correspond to the fingerprint files corresponding to the incremental backup at the 8 th time and the 9 th time respectively, and if the current time is between the 8 th time and the 9 th time, because no backup is performed at the time, the fingerprint file corresponding to the incremental backup at the 8 th time is taken as the second fingerprint file; if the current time is the 9 th time, directly taking the fingerprint file corresponding to the 9 th time incremental backup as a second fingerprint file.
Step S204, comparing whether the fingerprints in the first fingerprint file and the second fingerprint file of the data block are consistent with each other, and whether the fingerprint versions in the first fingerprint file and the second fingerprint file are consistent with each other. The corresponding fingerprints of the data in the data block are unique, and if the data are different, the fingerprints are different.
In step S205, if the fingerprints in the first fingerprint file and the second fingerprint file are not identical, or the fingerprint versions in the first fingerprint file and the second fingerprint file are not identical, the data block is regarded as a dissimilated data block. Illustratively, the description is provided in connection with FIG. 3: fig. 3e and 3f correspond to the fingerprint files corresponding to the incremental backup at the 8 th time and the 9 th time, respectively, and assuming that the 8 th time is the time to be restored and the 9 th time is the current time, the fingerprint file corresponding to the incremental backup at the 8 th time is the first fingerprint file, and the fingerprint file corresponding to the incremental backup at the 9 th time is the second fingerprint file, and referring to fig. 3e and 3f, it can be known that the data blocks corresponding to the 2 nd row, the 2 nd column and the 4 th row, and the 3 rd column are changed, and the changed data blocks are called dissimilated data blocks.
Step S206, according to the data of the dissimilated data block on the backup corresponding to the target backup version, recovering the data file at the current moment; the target backup version is obtained from the fingerprint version of the differentiated data block in the first fingerprint file. Illustratively, the following is described in connection with FIG. 3: taking the example in step S205 as an example, the fingerprint versions of the dissimilated data blocks corresponding to the 2 nd row, the 2 nd column, the 4 th row and the 3 rd column are v9 and v9 respectively, and then the target backup versions corresponding to the dissimilated data blocks corresponding to the 2 nd row, the 2 nd column, the 4 th row and the 3 rd column are v9 and v9 respectively.
The recovery process mainly comprises the following steps:
the storage server takes the data of the dissimilated data block on the backup corresponding to the target backup version and then generates a dissimilated data stream in a format, wherein the dissimilated data stream records the specific position of the data block needing to be updated, namely the dissimilated data block in the data file and the data needing to be updated;
and the client receives and analyzes the dissimilated data stream, updates the data block of the corresponding data file and completes recovery.
In the method for restoring the data file, after the data file is backed up for multiple times, the fingerprint file of each backup is determined, the fingerprint file of each backup comprises a backup version corresponding to the backup, fingerprints of each data block of the data file and fingerprint versions, compared with the previous backup, if the data in the data block changes or damages in the next backup, the fingerprint version of the data block is the backup version corresponding to the next backup, and if the data in the data block does not change or damage in the next backup, the fingerprint version of the data block is the backup version corresponding to the previous backup in the fingerprint file of the next backup, so that the determination of the one-to-one correspondence between each data block and the backup version is realized.
When the recovery operation is performed: determining the moment to be restored, if the moment to be restored is not backed up, taking the fingerprint file which is backed up once before the moment to be restored and is closest to the moment to be restored as a first fingerprint file, if the moment is not backed up, and the client locally stores the data file, obtaining the fingerprint file at the moment according to the fingerprint file which is backed up once before the moment and is closest to the moment, taking the fingerprint file at the moment as a second fingerprint file, comparing whether fingerprints of a data block in the first fingerprint file and the second fingerprint file are consistent or not, and if the fingerprints of the data block in the first fingerprint file and the second fingerprint file are inconsistent, taking the data block as a dissimilated data block, and determining a target version according to the fingerprint version of the dissimilated data block in the first fingerprint file, thus obtaining accurate data or failure data before the data is changed according to the obtained and corresponding to the target version; and then, according to the found data before the change or undamaged, recovering the data file at the current moment, and realizing the quick recovery of the data file.
In one embodiment, if the current time is not backed up and the client locally stores the data file, obtaining the fingerprint file at the current time according to the fingerprint file backed up before the current time and closest to the current time includes:
if the current moment is not backed up and the data file is locally stored in the client, acquiring a fingerprint file which is backed up once before the current moment and is nearest to the current moment; taking a data block which changes at the current moment compared with the previous backup moment as a target data block; the last backup time of the current time is the time of the last backup before the current time and the last backup from the current time; changing the fingerprint version corresponding to the target data block into a non-backup version in the fingerprint file which is backed up once before the current moment and is nearest to the current moment; and taking the obtained fingerprint file as the fingerprint file at the current moment.
Illustratively, the following is described in connection with FIG. 3:
fig. 3a corresponds to a fingerprint file of full-back at time 1, fig. 3b corresponds to a fingerprint file of incremental-back at time 2, fig. 3e and fig. 3f correspond to fingerprint files of incremental-back at 8 and 9 respectively, if the current time is 10 th, the fingerprints of the data blocks are acquired and compared to obtain that the data blocks in the 1 st, 2 nd, 3 rd and 4 th rows of the 2 nd column from 9 th to 10 th time are changed, as shown in fig. g, then the data blocks are target data blocks, the fingerprint version of the target data blocks is changed to v0, and v0 does not exist in all previous backup versions, which is called a non-backup version, and the obtained fingerprint file is used as the current fingerprint file, namely a second fingerprint file, as shown in fig. 3h.
In the above embodiment, considering that the data may be damaged or changed from the previous backup time to the current time, after obtaining the fingerprint file of the previous backup time at the current time, the fingerprint file of the previous backup time is not directly used as the fingerprint file of the current time, but the fingerprint file of the previous backup time is updated, and the fingerprint version of the data block in which the change or damage occurs in the fingerprint file of the previous backup time is changed into the non-backup version; because the non-backup version and any backup version are different, when the first fingerprint file and the second fingerprint file are compared and the dissimilated data block is determined, the data block with damaged or changed data from the last backup time to the current time can be found out.
Further, the step of determining the data block whose data has changed or is damaged at the current time as compared to the previous backup time includes:
acquiring the fingerprint of the data block at the current moment and the fingerprint of the last backup moment; if the fingerprint of the data block at the current moment is inconsistent with the fingerprint of the previous backup moment, determining that the data block changes at the current moment compared with the previous backup moment.
Because the data blocks are storage units of data, the fingerprint of each data block is unique, when the data in the data block changes or is damaged, the fingerprint of the data block changes, so whether the data in the data block changes or is damaged can be determined by comparing whether the fingerprints of the same data block at different moments change or not.
In one embodiment, if the current time is not backed up and the client locally stores the data file, obtaining the fingerprint file at the current time according to the fingerprint file backed up before the current time and closest to the current time includes:
if the current moment is not backed up and the data file is locally stored in the client, acquiring a fingerprint file which is backed up once before the current moment and is nearest to the current moment; and if the data blocks are not changed or damaged at the current moment compared with the last backup moment, taking the fingerprint file backed up at the last time before the current moment and the current moment as the fingerprint file at the current moment. Illustratively, the following is described in connection with FIG. 3:
fig. 3a corresponds to a fingerprint file of full-size backup at time 1, fig. 3b corresponds to a fingerprint file of incremental backup at time 2, fig. 3e and fig. 3f respectively correspond to a fingerprint file of incremental backup at time 8 and a fingerprint file of incremental backup at time 9, if the current time is 10 th, the last backup time is 9 th, if the fingerprint of each data block at 10 th is consistent with the fingerprint of each data block at 9 th, it can be determined that no data change or data damage occurs in each data block from 9 th to 10 th, and then the fingerprint file at 9 th is taken as the fingerprint file at the current time, that is, the second fingerprint file, as shown in fig. 3f.
In the above embodiment, the data in the data block may not change or be damaged from the previous backup time to the current time, that is, the fingerprint of each data block is not changed, so the fingerprint file at the current time should be the same as the fingerprint file at the previous backup time, so the fingerprint file at the previous backup time is directly used as the fingerprint file at the current time, and the fingerprint file at the current time does not need to be regenerated, thereby saving resources.
In one embodiment, before the restoring of the current data file according to the data of the dissimilated data block on the backup corresponding to the target backup version, the method includes:
and sending the fingerprint data of the dissimilated data block to a storage server so as to enable the storage server to execute the following steps: acquiring the position of the dissimilated data block in the backup bitmap corresponding to the target backup version; under the condition that the data blocks in the backup corresponding to the target backup version are subjected to serialization storage, the position of the dissimilated data blocks in the serialization storage is obtained according to the position; and taking the data of the dissimilated data block in the position in the serialization storage as the data of the dissimilated data block on the backup corresponding to the target backup version.
Further, before the storage server performs the above steps, the steps further include:
acquiring data of each data block in a target full-volume backup and data of each incremental backup between the target full-volume backup and the moment to be restored; the target full-volume backup is a full-volume backup before and nearest to the time of the restoration; and determining the data of the dissimilated data block in the position of the serialization storage in the data of each data block in the target full-volume backup and the data of each incremental backup between the target full-volume backup and the moment needing to be restored.
Assume that time t0 is to be recovered. Suppose Fn represents a full backup, in represents an incremental backup, and n is a backup version. Taking the backup sequence F1-F2-F3-I4-I5-I6-I7-I8 as an example, assuming that t0 is between I6 and I7, the target full backup is F3, the incremental backups between F3 and t0 are I4, I5 and I6, and the data possibly used for recovering to the recovery time t0 can be found in the data of the backups of F3, I4, I5 and I6. For obtaining the data of the above-mentioned dissimilated data block in the position of the serialization store, exemplary, it is described with reference to fig. 4: after the first fingerprint file is compared with the second fingerprint file, the catabolic fingerprint data is generated through the catabolic data block, the catabolic fingerprint data records the fingerprint version of the catabolic data block, the target backup version is found through the fingerprint version of the catabolic data block, the data block recorded by the block-index-offset in the catabolic fingerprint data is located on the data block of the backup data file corresponding to the target backup version, the position offset after the actual sequence storage is calculated through the block-index-offset, and taking fig. 4 as an example, the block-index-offset=17 represents left to right and the position offset=5 after the data corresponding to the catabolic data block is actually stored in the 17 th data block in a sequential manner.
In the above embodiment, during the restoration operation, the possible data can be found from the data of the full-volume backup before the time to be restored and the time closest to the time to be restored and each incremental backup between the full-volume backup and the time to be restored, so that the storage server only needs to find the data of the dissimilated data block on the backup corresponding to the target backup version, thereby improving the efficiency. And meanwhile, the target backup version is determined according to the fingerprint version of the dissimilated data block in the first fingerprint file, so that the data before the change or the undamaged data is accurately obtained according to the data of the dissimilated data block on the backup corresponding to the target backup version.
In one embodiment, comparing whether fingerprints in the first fingerprint file and the second fingerprint file are identical to each other and whether fingerprint versions in the first fingerprint file and the second fingerprint file are identical to each other includes:
and if the previous comparison result is that the fingerprint versions of the data blocks in the first fingerprint file and the second fingerprint file are different, stopping the subsequent comparison. In the above embodiment, by comparing the fingerprint versions first and comparing the specific fingerprints later, as long as the results of the previous comparison are inconsistent, the subsequent comparison is not performed any more, and the effect of fast comparing the data blocks is achieved.
In one embodiment, before the restoring of the current data file according to the data of the dissimilated data block on the backup corresponding to the target backup version, the method includes:
and if the backup is not performed at the current moment and the data file is not stored locally at the client, taking all the data blocks of the data file corresponding to the first fingerprint file as dissimilated data blocks.
Whether the data file exists locally at the client is judged by whether indexes of the corresponding data file in the first fingerprint file exist at the current moment, if the indexes of the corresponding data file do not exist at the current moment, the indexes of the corresponding data file are not exist, the indexes of the corresponding data file are compared with the data file at the moment when the data file needs to be restored, and then the whole data file needs to be restored, and all data blocks of the data file are dissimilated data blocks. Thus, a fast determination of the dissimilated data block is achieved.
In order to better understand the above method, an application example of the method for restoring a data file of the present application is described in detail below.
The main contents of this embodiment include:
1. the changed data blocks are tracked through the bitmap file during backup, and only the incremental changed parts of the data file are backed up. And generates a fingerprint file of the data file containing information of the version of the fingerprint, i.e. the number of times this was the fingerprint of the backup update.
2. And periodically merging the incremental backups to the full backups on the storage server to generate the full backups at new time points. Therefore, after the full-volume backup of the database is performed for the first time, only the incremental backup of the database can be performed all the time, and each incremental backup can be converted into the full-volume backup of a new time point through merging operation in the background, and meanwhile, fingerprint version information is updated.
3. When recovering, the fingerprint of the data file of the current database is updated first, the fingerprint of the data file backed up recently at the recovering time point is compared, and a dissimilation fingerprint file and a dissimilation bitmap file are generated. The client transmits the dissimilated fingerprint file with the version to the storage server, the storage server generates a dissimilated data stream of the data file, and the dissimilated data stream is received and analyzed at the client and updated to the corresponding data file. Such co-located data blocks are only written once and are directed to only the alien or corrupted data blocks. The data flow of the client and the storage server only comprises differentiated data file data and fingerprints thereof; it is not necessary to transfer data for all full and incremental backed-up data files. Thus, the amount of data transferred is reduced, the write operation in the recovery process is greatly reduced, the number of corresponding WAL log files per increment is small, and the time required for rolling forward the database to a consistent point in time is correspondingly small. Finally, the purpose of quickly recovering the database is achieved.
On the one hand only the changed data is backed up and on the other hand only the corrupted data is restored. The data quantity of backup or recovery is small, and the demand for service system resources is small. Meanwhile, the operation of generating the new full backup is completed by the back end, and the service system resources are not occupied. The recovery time is controllable and quick. The time and space are greatly shortened.
The embodiment specifically includes the following steps, which are described with reference to fig. 5:
backup flow:
1. creating a backup strategy task, declaring backup time, backup type and the like;
2. the scheduling engine schedules task execution;
3. the task distribution module distributes tasks to all GTM nodes, CN nodes and DN nodes;
4. each node sends the task progress to a task management node, and the progress management module gathers and outputs the total progress;
5. performing backup tasks
5.1 executing a forbidden ddl (database definition language) statement on each CN node, pgxc_lock_for_backup ();
5.2 executing pg_start_backup ();
5.3, carrying out full or incremental backup on each DN standby node; the GTM node and the CN node are similar;
the above pgxc_lock_for_backup (), pgstart_backup () being a program execution statement;
5.3.1 performing full back-up
5.3.1.1 generating a timestamp mark t0, enumerating index files of all data file output data files, and storing the index files in a cluster metadata set;
5.3.1.2 generating a data block bitmap of each data file according to the index file of the data file, and outputting a database bitmap file;
|Datafile1-Index|timestamp|bit-map-length|bit-map-data|bit-map-eof-mark|
|Datafile2-Index|timestamp|bit map length|bit map data|....|
the bit-map-data has all bit positions of 1;
5.3.1.3 generating a fingerprint file for each data file;
the fingerprint format is as follows:
|Datafile1-index|2-bytes-current-version|integer-timestamp|
2-bytes-hist-incre-version|block1-fingerprint|
2-bytes-hist-incre-version|block2-fingerprint|
2-bytes-hist-incre-version|blokc3-fingerprint|...|
|Datafile2-index|2-bytes-current-version|integer-timestamp|
2-bytes-hist-incre-version|block1-fingerprint|
2-bytes-hist-incre-version|block2-fingerprint|
2-bytes-hist-incre-version|blokc3-fingerprint|...|
description:
2-bytes-current-version a self-incrementing 2-byte fingerprint version field is created (1 as the start value, value to FFFFF and back to 1). When a data block changes, the fingerprint of the corresponding block is updated to be a new fingerprint, and 2-bytes-hist-incre-version is the current 2-bytes-current-version.2-bytes-current-version is fingerprint header information, representing a backup version, 2-bytes-hist-incre-version is fingerprint unit information, representing a fingerprint version of a corresponding data block, so that a data file where the corresponding data block is located can be retrieved according to the fingerprint version. When the fingerprint file cannot be located after the fingerprint version is reset, the integer-timestamp field can be used for alignment.
5.3.1.4 the backup index file, bitmap file and fingerprint file; backing up each data file;
5.3.2 performing a first incremental backup
5.3.2.1 generating a timestamp mark t1, enumerating all data file update index files;
5.3.2.2 updating the database bitmap file; checking t0 for a data block with bit marks of 0, and if the data block is changed, setting the mark position as 1; checking t0 for the data block with bit mark 1, if no change occurs, setting the mark position as 0;
5.3.2.3 updating the fingerprint file for the changed data block according to the bitmap file, namely recording new fingerprint data and fingerprint version of the current data block; unchanged data blocks do not recalculate the fingerprint;
5.3.2.4 the backup index file, bitmap file and fingerprint file; backing up data blocks (blocks with bit of 1) of each data file change according to the bitmap file;
5.3.3 performing a secondary incremental backup
5.3.3.1 generating a timestamp mark t2, enumerating all data file update index files;
5.3.3.2 updating the database bitmap file; checking t1 for a data block with bit marks of 0, and if the data block is changed, setting the mark position as 1; checking t1 for the data block with bit mark 1, if no change occurs, setting the mark position as 0;
5.3.3.3 updating the fingerprint file for the changed data block according to the bitmap file, namely recording new fingerprint data and fingerprint version of the current data block; unchanged data blocks do not recalculate the fingerprint;
5.3.3.4 the backup index file, bitmap file and fingerprint file; backing up data blocks (blocks with bit of 1) of each data file change according to the bitmap file;
5.3.4 subsequent incremental backups, and so on;
5.4 each DN standby node performs pg_stop_backup ();
5.5 each DN master node performs a handoff log: pg_switch_ WAL (), backing up WAL log files;
5.6 backing up backup_label file;
and 5.7, backing up the metadata information of the database cluster, and generating a category record according to the backup information.
The above-mentioned pg_stop_backup (), pg_switch_ wal () is a program execution statement;
the category is a database term, records metadata information of each backup, and is mainly used for scanning backup films, file copies and the like;
the metadata information records the starting time and ending time of the current backup, the file generated by the backup, the storage position and size of the file, the time point when the current backup can be restored, and the like.
And combining the incremental backups to generate a new full backup flow:
1. creating a policy task of merging the backup sets on the storage server;
2. the scheduling engine schedules task execution;
3. executing a merging backup operation;
3.1 loading the last full backed up metadata information from the catalog; metadata information of the current incremental backup; estimating the minimum required storage space, if the available space is insufficient, reporting an error and exiting;
3.2, loading index files of the data files of the incremental backup;
3.3 merging each incremental backup;
3.3.1 creating new data files according to the index files of the data files, and writing the combined new data into the data files;
3.3.2 executing clone merge on each data file, and loading bitmap files of incremental backup; if the bit is 0, copying the block data of the full backup data file; if the bit is 1, copying the block data of the current incremental backup data file; synchronously generating a new fingerprint file (if the fingerprint version is not in the index set depending on the linked list, the current fingerprint data block is already merged, and the fingerprint version is updated to be the current 2-bytes-current-version); finally generating a new full-volume backup data file; resetting to generate a new bitmap file;
the dependency chain table is a backup possibly used in the recovery process;
3.3.3 deleting the current incremental backup if all the data files of the current incremental backup are merged;
assuming that the backup sequence before the merging operation is F6-F7-F8-I9-I10-I11-I12 (F represents full backup, I represents incremental backup, and the number is a fingerprint version number), the backup sequence is changed to F6-F7-F8-F9-I10-I11-I12 after the merging operation is finished.
3.3.4 inserting new full backed-up category information.
And (3) recovering the flow:
1. selecting a recoverable time point according to the category record, outputting a dependency chain table of a backup set required to be recovered according to category, and indexing the dependency chain table according to fingerprint versions;
let the time of recovery be t0. Suppose Fn represents a full backup, in represents an incremental backup, and n is the fingerprint version number. Taking the backup sequence F1-F2-F3-I4-I5-I6-I7-I8 as an example, it is assumed that t0 is between I6 and I7. The dependency table is restored to
3-F3
4-I4
5-I5
6-I6
2. The scheduling engine schedules task execution;
3. preparing environment: stopping all nodes pg_ctl stop-m immediate-D;
the above-mentioned pg_ctl stop-m immediate-D is a program execution statement.
4. The task distribution module distributes tasks to all GTM nodes, CN nodes and DN nodes;
5. executing a recovery task;
5.1, restoring an index file, a bitmap file and a first fingerprint file which depend on the last backed-up data file of the linked list, wherein the first fingerprint file is called a fingerprint file A;
5.2, scanning the existing data file according to the index file of the data file, generating or updating a second fingerprint file, and enabling the second fingerprint file to be called as a local fingerprint file B;
5.3 comparing the fingerprint file A of each t0 with the local fingerprint file B to generate dissimilated fingerprint data with the following format:
|datafile-index|2-bytes-current-version|int-timestamp|
block-index-offset|2-bytes-hist-hist-version|fingerprint|
block-index-offset|2-bytes-hist-incre-version|fingerprint|...|
5.3.1 if there is no local fingerprint of the corresponding datafile-index (index of data file), it is indicated that the data file corresponding to the current fingerprint is an added file. And directly generating differentiated fingerprint data according to the current fingerprint file. The comparison of this round is ended.
And 5.3.2, comparing whether fingerprints of the data blocks in the fingerprint file A and the local fingerprint file B are consistent or not and whether fingerprint versions of the data blocks in the fingerprint file A and the local fingerprint file B are consistent or not for each block of fingerprint data, if only one data block is different, the data blocks are dissimilated data blocks, the dissimilated fingerprint data are generated, the fingerprint versions are compared firstly, and then the specific fingerprints are compared, so that the comparison of the fingerprint files can be accelerated.
The block-index-offset indicates which block of the data file the catabolized data block is located on, and the 2-bytes-hist-incre-version indicates the fingerprint version of the catabolized data block, and the fingerprint indicates the fingerprint.
5.4 the client sends the differentiated fingerprint data for each data file to the storage server.
The storage server generates a dissimilated data stream according to the dissimilated fingerprint data, and the stream format is as follows:
|datafile-index|2-bytes-current-version|int-timestamp|
block-index-offset|data-block|
block-index-offset|data-block|...|
the block-index-offset indicates what number of data blocks of the data file are to be updated, and the data-block indicates what the updated specific data is.
5.4.1 establishing a datafile table of each version by taking the fingerprint version as an index according to the recovery dependency chain table, for example:
3-datafile.3
4-datafile.4
5-datafile.5
6-datafile.6
5.4.2 parsing the alien fingerprint data stream, for each alien fingerprint block, finding the corresponding data file backup data according to the fingerprint version 2-bytes-hist-hist-version (since there is an operation to merge incremental backups on the storage server, the fingerprint version on the incremental backup may not depend on any value of the index on the linked list, here, the example fingerprint is not a value in 3, 4, 5, 6. This means that this data block has already been merged on the full backup of the link head, taking the corresponding data block on the full backup of the linked list head). The actual offset of the block data store is calculated on the bitmap file based on the block-index-offset to find the data block of the data file, reference is made to fig. 4, and then appended to the differentiated data stream in format.
And 5.5, the client receives the dissimilated data stream, analyzes the stream format while receiving the dissimilated data stream, and updates the data block of the corresponding data file according to the block-index-offset.
5.6 go back to step 5.4 until all datafiles have completed updating.
5.7 recovering the WAL log file;
5.8 performing PITR;
5.8.1 prepare profiles, modify postgresql.conf, postgresql.conf.user, pg_xlog_archive.conf, pg_hba.conf;
The above postgresql.conf, postgresql.conf.user, pg_xlog_archive.conf, pg_hba.conf are configuration files.
5.8.2 updating the pgxc_node according to the information of each node ip and port of the cluster recorded in the category;
the node ip and port information is ip address information and port information of the node;
the pgxc_node is a system table.
5.8.3 restarting validates the routing information, pg_ctl restart-Z terminator-D;
the above-mentioned pg_ctl restart-Z sequencer-D is a program execution statement.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
In one embodiment, as shown in fig. 6, there is provided an apparatus for recovering a data file, applied to a client, including:
the backup module 601 is configured to determine fingerprint files of each backup after performing multiple backups on the data files; the fingerprint file of each backup comprises a backup version corresponding to the backup, fingerprints of each data block of the data file and fingerprint versions; compared with the previous backup, if the data in the data block changes in the next backup, the fingerprint version of the data block is the backup version corresponding to the next backup in the fingerprint file of the next backup, and if the data in the data block does not change in the next backup, the fingerprint version of the data block is the backup version corresponding to the previous backup in the fingerprint file of the next backup;
the first fingerprint file obtaining module 602 is configured to, if the time to be restored is not backed up, use, as the first fingerprint file, a fingerprint file that is backed up once before the time to be restored and is closest to the time to be restored;
a second fingerprint file obtaining module 603, configured to obtain, if the current time is not backed up and the data file is locally stored, a fingerprint file at the current time according to a fingerprint file backed up before the current time and the last time to the current time, and use the fingerprint file at the current time as a second fingerprint file;
A fingerprint file comparison module 604, configured to compare whether fingerprints of the data block in the first fingerprint file and the second fingerprint file are consistent, and whether versions of the fingerprints in the first fingerprint file and the second fingerprint file are consistent;
the catabolic data block obtaining module 605 is configured to take a data block as a catabolic data block if fingerprints of the data block in the first fingerprint file and the second fingerprint file are inconsistent, or fingerprint versions in the first fingerprint file and the second fingerprint file are inconsistent;
a data file recovery module 606, configured to recover the data file at the current time according to the data of the dissimilated data block on the backup corresponding to the target backup version; the target backup version is obtained according to the fingerprint version of the dissimilated data block in the first fingerprint file.
In one embodiment, the second fingerprint file obtaining module is further configured to obtain, if the current time is not backed up and the data file is locally stored in the client, a fingerprint file backed up once before the current time and closest to the current time; taking a data block which changes at the current moment compared with the previous backup moment as a target data block; the last backup time of the current time is the time of the last backup before the current time and the latest backup from the current time; changing the fingerprint version corresponding to the target data block into a non-backup version in the fingerprint file which is backed up once before the current moment and is nearest to the current moment; and taking the obtained fingerprint file as the fingerprint file at the current moment.
In one embodiment, the second fingerprint file obtaining module is further configured to obtain a fingerprint of the data block at the current time and a fingerprint of the previous backup time; if the fingerprint of the data block at the current moment is inconsistent with the fingerprint of the previous backup moment, the data block is determined to be changed or damaged at the current moment compared with the previous backup moment.
In one embodiment, the second fingerprint file obtaining module is further configured to obtain, if the current time is not backed up and the data file is locally stored in the client, a fingerprint file backed up once before the current time and closest to the current time; and if the data blocks are unchanged from the previous backup time at the current time, taking the fingerprint file backed up at the last time before the current time and the current time as the fingerprint file at the current time.
In one embodiment, the apparatus further includes a dissimilated data block fingerprint data sending module that sends the fingerprint data of the dissimilated data block to a storage server, so that the storage server performs the following steps: acquiring the position of the dissimilated data block in the backup bitmap corresponding to the target backup version; under the condition that the data blocks in the backup corresponding to the target backup version are subjected to serialization storage, the position of the dissimilated data blocks in the serialization storage is obtained according to the position; and taking the data of the dissimilated data block in the position in the serialization storage as the data of the dissimilated data block on the backup corresponding to the target backup version.
In one embodiment, the storage server further performs the steps of: acquiring data of each data block in a target full-volume backup and data of each incremental backup between the target full-volume backup and the moment to be restored; the target full-volume backup is a full-volume backup before the time of the recovery and nearest to the time of the recovery; and determining the data of the dissimilated data block in the position of the serialization storage in the data of each data block in the target full-volume backup and the data of each incremental backup between the target full-volume backup and the moment to be restored.
In one embodiment, the fingerprint file comparison module is further configured to compare the fingerprint versions of the corresponding data blocks in the first fingerprint file and the second fingerprint file, and then compare the fingerprints of the corresponding data blocks in the first fingerprint file and the second fingerprint file, and if the fingerprint versions of the corresponding data blocks in the first fingerprint file and the second fingerprint file are different, directly determine that the corresponding data blocks are dissimilated data blocks, and no fingerprint comparison is needed.
In one embodiment, the device further includes a data file determining module, according to the data of the dissimilated data block on the backup corresponding to the target backup version, and before the restoring of the data file at the current time, if the backup is not performed at the current time and the data file does not exist locally, taking all the data blocks of the data file corresponding to the first fingerprint file as the dissimilated data block. For specific limitations on the means for restoring the data file, reference may be made to the limitations of the method for restoring the data file hereinabove, and no further description is given here. The above-described means for restoring the data file may be implemented in whole or in part by software, hardware, or a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a client is provided, the internal structure of which may be as shown in FIG. 7. The client includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the client is configured to provide computing and control capabilities. The memory of the client includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the client is used for storing the data of the recovery data file. The network interface of the client is used for communicating with an external terminal through network connection. The client also comprises an input/output interface, wherein the input/output interface is a connecting circuit for exchanging information between the processor and external equipment, and the input/output interface is connected with the processor through a bus and is called as an I/O interface for short. The computer program is executed by a processor to implement a method of restoring a data file.
It will be appreciated by those skilled in the art that the structure shown in fig. 7 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a client is provided, including a memory storing a computer program and a processor implementing the steps of the method embodiments described above when the processor executes the computer program.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, carries out the steps of the respective method embodiments described above.
In one embodiment, a computer program product is provided, on which a computer program is stored, which computer program is executed by a processor for performing the steps of the various method embodiments described above.
It should be noted that, user information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party.
Those skilled in the art will appreciate that implementing all or part of the above-described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, or the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing examples represent only a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims (10)

1. A method of restoring a data file, for application to a client, the method comprising:
after the data files are backed up for a plurality of times, determining fingerprint files backed up for each time; the fingerprint file of each backup comprises a backup version corresponding to the backup, fingerprints of each data block of the data file and fingerprint versions; compared with the previous backup, if the data in the data block changes in the next backup, the fingerprint version of the data block is the backup version corresponding to the next backup in the fingerprint file of the next backup, and if the data in the data block does not change in the next backup, the fingerprint version of the data block is the backup version corresponding to the previous backup in the fingerprint file of the next backup;
If the moment to be restored is not backed up, taking the fingerprint file which is backed up for the time before the moment to be restored and is closest to the moment to be restored as a first fingerprint file;
if the current moment is not backed up and the data file is locally stored, obtaining a fingerprint file at the current moment according to the fingerprint file backed up once before the current moment and the latest time from the current moment, and taking the fingerprint file at the current moment as a second fingerprint file;
comparing whether fingerprints of a data block in the first fingerprint file and the second fingerprint file are consistent and whether versions of fingerprints in the first fingerprint file and the second fingerprint file are consistent, comprising: comparing the fingerprint versions of the data blocks in the first fingerprint file and the second fingerprint file, and stopping the post comparison if the previous comparison result is that the fingerprint versions of the data blocks in the first fingerprint file and the second fingerprint file are different under the condition that the fingerprints of the data blocks in the first fingerprint file and the second fingerprint file are compared;
if the fingerprints of the data block in the first fingerprint file and the second fingerprint file are inconsistent, or the fingerprint versions in the first fingerprint file and the second fingerprint file are inconsistent, the data block is used as a dissimilated data block;
According to the data of the dissimilated data block on the backup corresponding to the target backup version, restoring the data file at the current moment; the target backup version is obtained according to the fingerprint version of the dissimilated data block in the first fingerprint file.
2. The method according to claim 1, wherein if no backup is performed at the current time and the data file is locally stored, obtaining the fingerprint file at the current time from the fingerprint file backed up at the time before the current time and closest to the current time includes:
if the current moment is not backed up and the data file is locally stored, acquiring a fingerprint file which is backed up once before the current moment and is closest to the current moment;
taking a data block with data change or data damage at the current moment compared with the previous backup moment as a target data block; the last backup time of the current time is the time of the last backup before the current time and the latest backup from the current time;
changing the fingerprint version corresponding to the target data block into a non-backup version in the fingerprint file which is backed up once before the current moment and is nearest to the current moment;
And taking the obtained fingerprint file as the fingerprint file at the current moment.
3. The method of claim 2, wherein determining a data block for which a data change or data corruption occurred at the current time as compared to a last backup time comprises:
acquiring the fingerprint of the data block at the current moment and the fingerprint of the last backup moment;
if the fingerprint of the data block at the current moment is inconsistent with the fingerprint of the previous backup moment, determining that the data block changes at the current moment compared with the previous backup moment.
4. The method according to claim 1, wherein if no backup is performed at the current time and the data file is locally stored, obtaining the fingerprint file at the current time from the fingerprint file backed up at the time before the current time and closest to the current time includes:
if the current moment is not backed up and the data file is locally stored, acquiring a fingerprint file which is backed up once before the current moment and is closest to the current moment;
and if the data blocks have no data change and no data damage compared with the last backup time at the current time, taking the fingerprint file backed up once before the current time and closest to the current time as the fingerprint file at the current time.
5. The method of claim 1, wherein prior to restoring the current instance of the data file based on the data of the dissimilated data block on the backup corresponding to the target backup version, the method further comprises:
and sending the fingerprint data of the dissimilated data block to a storage server so as to enable the storage server to execute the following steps: acquiring the position of the dissimilated data block in the backup bitmap corresponding to the target backup version; under the condition that the data blocks in the backup corresponding to the target backup version are subjected to serialization storage, the position of the dissimilated data blocks in the serialization storage is obtained according to the position; and taking the data of the dissimilated data block in the position in the serialization storage as the data of the dissimilated data block on the backup corresponding to the target backup version.
6. The method of claim 5, wherein the storage server further performs the following steps prior to storing the data of the aliased data block in the serialized storage as data of the aliased data block on a backup corresponding to the target backup version:
acquiring data of each data block in a target full-volume backup and data of each incremental backup between the target full-volume backup and the moment to be restored; the target full-volume backup is a full-volume backup before the time of the recovery and nearest to the time of the recovery;
And determining the data of the dissimilated data block in the position of the serialization storage in the data of each data block in the target full-volume backup and the data of each incremental backup between the target full-volume backup and the moment to be restored.
7. The method of claim 1, wherein prior to restoring the current instance of the data file based on the data of the dissimilated data block on the backup corresponding to the target backup version, the method further comprises:
and if the backup is not performed at the current moment and the data file is not stored locally, taking all the data blocks of the data file corresponding to the first fingerprint file as dissimilated data blocks.
8. An apparatus for restoring a data file, for application to a client, the apparatus comprising:
the backup module is used for determining fingerprint files of each backup after carrying out multiple backups on the data files; the fingerprint file of each backup comprises a backup version corresponding to the backup, fingerprints of each data block of the data file and fingerprint versions; compared with the previous backup, if the data in the data block changes in the next backup, the fingerprint version of the data block is the backup version corresponding to the next backup in the fingerprint file of the next backup, and if the data in the data block does not change in the next backup, the fingerprint version of the data block is the backup version corresponding to the previous backup in the fingerprint file of the next backup;
The first fingerprint file acquisition module is used for taking a fingerprint file which is backed up for the time before the time needing to be restored and is closest to the time needing to be restored as a first fingerprint file if the time needing to be restored is not backed up;
the second fingerprint file obtaining module is used for obtaining the fingerprint file at the current moment according to the fingerprint file backed up at the last time before the current moment and the moment when the current moment is not backed up and the data file is locally stored, and taking the fingerprint file at the current moment as a second fingerprint file;
a fingerprint file comparison module for comparing whether fingerprints of a data block in the first fingerprint file and the second fingerprint file are consistent, and whether fingerprint versions in the first fingerprint file and the second fingerprint file are consistent, comprising: comparing the fingerprint versions of the data blocks in the first fingerprint file and the second fingerprint file, and stopping the post comparison if the previous comparison result is that the fingerprint versions of the data blocks in the first fingerprint file and the second fingerprint file are different under the condition that the fingerprints of the data blocks in the first fingerprint file and the second fingerprint file are compared;
The catabolic data block acquisition module is used for taking the data block as a catabolic data block if the fingerprints of the data block in the first fingerprint file and the second fingerprint file are inconsistent or the fingerprint versions in the first fingerprint file and the second fingerprint file are inconsistent;
the data file recovery module is used for recovering the data file at the current moment according to the data of the dissimilated data block on the backup corresponding to the target backup version; the target backup version is obtained according to the fingerprint version of the dissimilated data block in the first fingerprint file.
9. The apparatus of claim 8, wherein the second fingerprint file acquisition module is further configured to:
if the current moment is not backed up and the data file is locally stored, acquiring a fingerprint file which is backed up once before the current moment and is closest to the current moment;
taking a data block with data change or data damage at the current moment compared with the previous backup moment as a target data block; the last backup time of the current time is the time of the last backup before the current time and the latest backup from the current time;
Changing the fingerprint version corresponding to the target data block into a non-backup version in the fingerprint file which is backed up once before the current moment and is nearest to the current moment;
and taking the obtained fingerprint file as the fingerprint file at the current moment.
10. A client comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the method of any one of claims 1 to 7 when executing the computer program.
CN202210995696.0A 2022-08-18 2022-08-18 Method, device and client for recovering data file Active CN115357429B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210995696.0A CN115357429B (en) 2022-08-18 2022-08-18 Method, device and client for recovering data file

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210995696.0A CN115357429B (en) 2022-08-18 2022-08-18 Method, device and client for recovering data file

Publications (2)

Publication Number Publication Date
CN115357429A CN115357429A (en) 2022-11-18
CN115357429B true CN115357429B (en) 2023-06-27

Family

ID=84002852

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210995696.0A Active CN115357429B (en) 2022-08-18 2022-08-18 Method, device and client for recovering data file

Country Status (1)

Country Link
CN (1) CN115357429B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101382885A (en) * 2007-09-06 2009-03-11 联想(北京)有限公司 Multi-edition control method and apparatus for data file
CN103617215A (en) * 2013-11-20 2014-03-05 上海爱数软件有限公司 Method for generating multi-version files by aid of data difference algorithm

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102184218B (en) * 2011-05-05 2012-11-21 华中科技大学 Repeated data delete method based on causal relationship
CN103870362B (en) * 2014-03-21 2017-08-04 华为技术有限公司 A kind of data reconstruction method, device and standby system
CN104166606B (en) * 2014-08-29 2018-01-09 华为技术有限公司 File backup method and main storage device
CN104572357A (en) * 2014-12-30 2015-04-29 清华大学 Backup and recovery method for HDFS (Hadoop distributed filesystem)
CN105302675A (en) * 2015-11-25 2016-02-03 上海爱数信息技术股份有限公司 Method and device for data backup

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101382885A (en) * 2007-09-06 2009-03-11 联想(北京)有限公司 Multi-edition control method and apparatus for data file
CN103617215A (en) * 2013-11-20 2014-03-05 上海爱数软件有限公司 Method for generating multi-version files by aid of data difference algorithm

Also Published As

Publication number Publication date
CN115357429A (en) 2022-11-18

Similar Documents

Publication Publication Date Title
CN110019140B (en) Data migration method, device, equipment and computer readable storage medium
CN106776130B (en) Log recovery method, storage device and storage node
US6665815B1 (en) Physical incremental backup using snapshots
CN110543386B (en) Data storage method, device, equipment and storage medium
US20050028026A1 (en) Method and system for backing up and restoring data of a node in a distributed system
CN109460438B (en) Message data storage method, device, computer equipment and storage medium
CN113360322B (en) Method and equipment for recovering data based on backup system
US10990312B2 (en) Method, apparatus, device and storage medium for processing data location of storage device
US20100017648A1 (en) Complete dual system and system control method
CN113254394B (en) Snapshot processing method, system, equipment and storage medium
CN111930716A (en) Database capacity expansion method, device and system
CN110121694B (en) Log management method, server and database system
CN114090337A (en) Quick synthesis backup and recovery method based on snapshot
CN115328704A (en) File backup method, file recovery method, device, equipment and storage medium
CN115756955A (en) Data backup and data recovery method and device and computer equipment
EP3147789A1 (en) Method for re-establishing standby database, and apparatus thereof
CN108984343B (en) Virtual machine backup and storage management method based on content analysis
WO2021082925A1 (en) Transaction processing method and apparatus
CN111966531B (en) Data snapshot method and device, computer equipment and storage medium
CN113312309B (en) Snapshot chain management method, device and storage medium
CN113419897A (en) File processing method and device, electronic equipment and storage medium thereof
CN115357429B (en) Method, device and client for recovering data file
WO2017067397A1 (en) Data recovery method and device
CN114442944B (en) Data replication method, system and equipment
CN112912853A (en) Any point-in-time replication to the cloud

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant