CDP backup and recovery method for ensuring database consistency
Technical Field
The invention relates to the technical field of data backup and recovery, in particular to a CDP backup and recovery method for ensuring database consistency.
Background
CDP (continuous data protection) is a methodology that can capture or track changes in data and keep it out of production separately to ensure that the data can be restored to any point in time in the past. The persistent data protection system may be implemented on a block, file, or application basis, and may provide a fine enough granularity of recovery for recovery objects, enabling an almost unlimited number of recovery time points.
CDP products mainly include file-level CDP-based backups, application-level CDP-based backups.
The IO duplication technology is widely used for CDP product development by the industry, and promotes the development of the CDP industry to a certain extent; the IO copy technology mainly comprises a file (system) IO and a disk volume IO; the CDP product based on IO level replication can be applied to various IT environments and has good adaptability to a certain extent; but the method is insufficient in the aspect of dealing with database application, particularly data consistency, and is a technical problem which always puzzles the industry.
The minimum storage unit of the database is generally called a data block and is basically a fixed size, for example, the data block size of an oracle is 8k by default. In most cases, each IO record contains a data block with a complete structure, and the restored database basically meets the data consistency; however, the consistency is weak, and only the consistency at the data block level is met, but the consistency of the logical structure of the database transaction object is not achieved.
More often, the CDP developer does not care about the transaction behavior corresponding to one or more IO records and whether valid data is included, and this detection of logical consistency is often ignored; this is why the restored database, although normally started, has other problems in operation, such as: data object corruption, inability to insert data, inability of instance startup, partial data loss, etc.
The CDP products based on IO replication mostly adopt an IO filtering driving technology, and the IO filtering driving technology requires developing a special IO filtering driving program which is installed on an operating system and used for capturing and replicating IO records of application; IO records are continuously generated along with the operation of the database transaction, and the database can be restored to any IO time point state by utilizing the IO records. Although the effect of reducing the RPO (recovery point objective) is achieved, problems inside the database may be caused in some cases, for example, resulting in the database not being opened. Generally, recovery can be attempted by using other IO records, but this undoubtedly increases the recovery window, and because the consistency time point of valid transaction data is not known, the IO records are restored at any time, which easily causes incomplete logic organization of data objects, and causes the objects to be unusable; this is the case without considering the database logical consistency problem.
The transactional behavior of the database may generate IO records as a physical existence form of the transactional behavior. Each IO record is composed of one or more data blocks, the data length of the IO record is integral multiple of the size of the data block, segmentation can be conveniently carried out, and analysis is facilitated.
The physical storage structure of the database is often complex, for example, Oracle includes data files, control files, log files, etc., and the data block structure of each type of file is different; it is a difficult and complex task if the IO records of these files are all analyzed. Database transaction behaviors are generally recorded in a transaction log, such as redolog of Oracle, binlog of MySQL, and the like; each record in the transaction log file represents a transaction behavior, including operations such as DML, DDL, DCL, and the like. The log records exist in one or more contiguous data blocks, and as shown in FIG. 1, the transaction log file is made up of 4 data blocks, containing 5 records. The consistency of the transaction log determines the consistency of the entire database, so the IO records of the transaction log file are analyzed. Currently, the industry has opened internal structure data of transaction logs of various databases, which are roughly from technical forums, database manufacturers, and opening source projects. It is fully feasible to determine the database coherency state using IO records of the analysis transaction log.
Disclosure of Invention
In order to solve the defects in the prior art, the invention provides a CDP backup and recovery method for ensuring the consistency of a database, and solves the problem that the existing CDP technology based on IO replication is difficult to ensure the logic consistency of the database after recovery.
In order to achieve the above purpose, the invention adopts the following technical scheme: a CDP backup method for ensuring database consistency is characterized in that: the method comprises the following steps:
1) configuring an IO filter driver, configuring a file list to be captured, and adding a database file into the file list;
2) the IO filter driver captures IO records generated by the database file according to the file list, analyzes whether the IO records have consistency, generates an IO record set and stores the IO record set into a backup storage;
3) making a complete backup of the database file to complete the initial backup of the CDP;
4) and after the initial backup is finished, writing the backup set into the backup storage.
The CDP backup method for ensuring database consistency is characterized in that: the IO filter driver in the step 2) captures IO records generated by the database file according to the file list and analyzes whether the IO records have consistency or not; generating an IO record set, and storing the IO record set into a backup storage, wherein the method specifically comprises the following steps:
(1) the IO filter driver starts to capture IO records of the database file;
(2) judging whether the currently captured IO record is a transaction log file, if so, turning to the step (3), otherwise, marking the IO record as common, generating an IO record set, finally storing the IO record set into backup storage, and turning to the step (1) to continue capturing the IO record;
(3) if the current IO record is a transaction log file, the current IO record is divided into data blocks according to the size of the transaction log file block, and the divided data blocks are placed into a data block list;
(4) analyzing the transaction log record information in the data block by traversing the data block list, analyzing the internal structure of the transaction log file data block, and judging whether the last transaction log record data contained in the current data block is complete or not;
(5) if the data block is complete, the current data block has consistency, the information of the data block is recorded first, and then the data block list is traversed continuously; if not, the current data block is non-consistent, and the data block list is continuously traversed;
(6) after traversing the data block list is finished, marking the current IO record according to the recorded data block information; if the consistent data block exists, marking the current IO record type as consistent, and setting the offset position of the consistent data block, otherwise, marking the type as common; finally, generating an IO record set;
(7) and (4) after traversing is finished, writing the IO record set into backup storage, and continuing capturing the IO record in the step (1).
The CDP backup method for ensuring database consistency is characterized in that: the IO record set includes: IO record header information and IO record data, wherein the IO record header information comprises the total length, the type, the timestamp and the offset position of a consistent data block of a record set;
the total length of the record set is the sum of the head information of the IO record set and the length of the IO record data;
the types are classified as common or consistent; if the current IO record contains the consistent data block, the type of the IO record is consistent, otherwise, the IO record is common;
the time stamp is the time of IO record capture and is used for realizing recovery at a specified time point;
the offset position of the consistent data block is the offset position of the consistent data block in the IO recorded data, the offset position is valid only if the type is consistent, and the consistent data block is searched through the offset position during recovery.
A CDP recovery method for ensuring database consistency is characterized in that: the method comprises the following steps:
1) the IO filter driver stops capturing the IO record;
2) stopping the operation of the database;
3) restoring the initial backup set, reading the initial backup set from the backup storage, and restoring;
4) recovering the file by using the IO record set;
5) and restarting the database and the IO filter driver, and completing the recovery.
The CDP recovery method for ensuring database consistency is characterized in that: the step 4) of restoring the file by using the IO record set comprises the following steps:
(1) traversing and reading an IO record set from backup storage;
(2) judging whether the current IO record set is a transaction log file, if not, directly recovering the file by using the IO record set, and then continuing traversing the IO record set in the step (1);
(3) if the current IO record set is the transaction log file, judging whether the type of the current IO record set is consistent, if not, directly recovering by using the IO record set, and then, continuing traversing the IO record set in the step (1);
(4) if the types of the current IO record sets are consistent, judging whether the timestamp field of the head reaches a specified recovery time point, if not, directly recovering by using the IO record set, and then continuing traversing the IO record set in the step (1); if the specified recovery time point is reached, recovering to the offset position of the consistent data block of the IO record set;
(5) and traversing the next IO record set until all the IO records are recovered.
The CDP recovery method for ensuring database consistency is characterized in that: the IO record set includes: IO record header information and IO record data, wherein the IO record header information comprises the total length, the type, the timestamp and the offset position of a consistent data block of a record set;
the total length of the record set is the sum of the head information of the IO record set and the length of the IO record data;
the types are classified as common or consistent; if the current IO record contains the consistent data block, the type of the IO record is consistent, otherwise, the IO record is common;
the time stamp is the time of IO record capture and is used for realizing recovery at a specified time point;
the offset position of the consistent data block is the offset position of the consistent data block in the IO record data, the offset position is effective only if the type is consistent, and the consistent data block is searched through the offset position in the recovery process
The invention achieves the following beneficial effects: the invention determines the database consistency state by analyzing the IO record of the transaction log, optimizes the database consistency method of the traditional CDP product based on the IO replication technology, and obviously improves the performance, the accuracy and the effectiveness.
Drawings
FIG. 1 is a schematic diagram of the IO record composition structure of a transaction log file;
FIG. 2 is a CDP backup flow diagram of the present invention;
FIG. 3 is a flow chart of the IO Capture analysis of the present invention;
FIG. 4 is a schematic diagram of the composition structure of an IO record set;
FIG. 5 is a flow chart of a CDP recovery process of the present invention;
FIG. 6 is a flow chart of IO record recovery in accordance with the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
In order to ensure consistency of the CDP of the database after recovery, the IO records need to be analyzed and data blocks meeting consistency conditions need to be found in the IO capture process, so as to determine effective cut-off IO records according to the consistent data blocks in the CDP recovery process. A condition for data block consistency is that a complete transaction log record must be contained in the data block. The IO record is composed of data blocks, and is generated by a database process, and the data blocks have two organization modes:
1) the last transaction log record data contained in the data block is complete, such as data blocks 1, 2, and 4 in FIG. 1;
2) the data block contains the last transaction log record data that is incomplete with the remainder in the next data block, such as data block 3 in fig. 1, with the transaction log record 4 having a front portion in data block 3 and a back portion in data block 4.
Satisfying organization 1), the data blocks may be considered coherent.
The database can write files in the operation process, the database files mainly comprise data files and log files, and IO records of the files are all captured.
As shown in fig. 2, a CDP backup method for ensuring database consistency includes the following specific steps:
1) configuring an IO filter driver, configuring a file list to be captured, and adding a database file into the file list;
2) the IO filter driver captures IO records generated by the database file according to the file list, analyzes whether the IO records have consistency, generates an IO record set and stores the IO record set in a backup storage; IO records generated by the database files are equivalent to real-time incremental backup; IO capture begins first, then an initial backup is performed;
3) making a complete backup of the database file to complete the initial backup of the CDP;
4) and after the initial backup is finished, writing the backup set into the backup storage.
As shown in fig. 3, the IO filtering driver in step 2) captures an IO record generated by the database file according to the file list, analyzes whether the IO record has consistency, generates an IO record set, and stores the IO record set in the backup storage, and the specific steps include:
1) the IO filter driver starts to capture IO records of the database file;
2) judging whether the currently captured IO record is a transaction log file, if so, turning to the step 3), if not, marking the IO record as common, generating an IO record set, finally storing the IO record set into backup storage, and turning to the step 1) to continue capturing the IO record;
3) if the current IO record is a transaction log file, the current IO record is divided into data blocks according to the size of the transaction log file block, and the divided data blocks are placed into a data block list;
4) analyzing the transaction log record information in the data block by traversing the data block list, analyzing the internal structure of the transaction log file data block, and judging whether the last transaction log record data contained in the current data block is complete or not;
5) if the data block is complete, the current data block has consistency, the information of the data block is recorded first, and then the data block list is traversed continuously; if not, the current data block is non-consistent, and the data block list is continuously traversed;
6) after traversing the data block list is finished, marking the current IO record according to the recorded data block information; if the consistent data block exists, marking the current IO record type as consistent, and setting the offset position of the consistent data block, otherwise, marking the type as common; finally, generating an IO record set;
7) and after traversing is finished, writing the IO record set into backup storage, and continuing capturing the IO record in the step 1).
As shown in fig. 4, the IO record set includes: IO record header information and IO record data, the IO record header information includes record set total length, type, timestamp, offset position of the consistency data block, the IO record header field description is as follows:
total length of record set: sum of IO record header information and IO record data length;
type (2): general or consistent; if the current IO record contains a consistent data block, then the type of the IO record is consistent, otherwise it is normal.
Time stamping: IO records the time of capture for achieving the specified point in time recovery.
Offset position of the coherency data Block: and (3) recording the offset position of the consistent data block in the IO record data, wherein the offset position is valid only if the type is consistent, and searching the consistent data block through the offset position during recovery.
As shown in fig. 5, a CDP recovery method for ensuring database consistency includes the steps of:
1) the IO filter driver stops capturing the IO record;
2) stopping the operation of the database;
3) restoring the initial backup set, reading the initial backup set from the backup storage, and restoring;
4) recovering the file by using the IO record set;
5) and restarting the database and the IO filter driver, and completing the recovery.
As shown in fig. 6, the step 4) of restoring the file by using the IO record set includes the steps of:
1) traversing and reading an IO record set from backup storage;
2) judging whether the current IO record set is a transaction log file, if not, directly recovering the file by using the IO record set, and then turning to the step 1) to continuously traverse the IO record set;
3) if the current IO record set is the transaction log file, judging whether the type of the current IO record set is consistent, if not, directly recovering by using the IO record set, and then turning to the step 1) to continuously traverse the IO record set;
4) if the types of the current IO record sets are consistent, judging whether a timestamp field of the head reaches a specified recovery time point, if not, directly recovering by using the IO record set, and then turning to the step 1) to continuously traverse the IO record set; if the specified recovery time point is reached, recovering to the offset position of the consistent data block of the IO record set;
5) and traversing the next IO record set until all the IO records are recovered.
The invention determines the database consistency state by analyzing the IO record of the transaction log, optimizes the database consistency method of the traditional CDP product based on the IO replication technology, and obviously improves the performance, the accuracy and the effectiveness.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.