CN115016988B - CDP backup recovery method, system and storage medium based on binary tree log - Google Patents

CDP backup recovery method, system and storage medium based on binary tree log Download PDF

Info

Publication number
CN115016988B
CN115016988B CN202210941069.9A CN202210941069A CN115016988B CN 115016988 B CN115016988 B CN 115016988B CN 202210941069 A CN202210941069 A CN 202210941069A CN 115016988 B CN115016988 B CN 115016988B
Authority
CN
China
Prior art keywords
check value
binary tree
metadata record
data
recovery
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210941069.9A
Other languages
Chinese (zh)
Other versions
CN115016988A (en
Inventor
胡晓勤
毛艺萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN202210941069.9A priority Critical patent/CN115016988B/en
Publication of CN115016988A publication Critical patent/CN115016988A/en
Application granted granted Critical
Publication of CN115016988B publication Critical patent/CN115016988B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1469Backup restoration techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3476Data logging

Abstract

The invention relates to a CDP backup recovery method, a system and a storage medium based on binary tree logs, and belongs to the field of data protection. The method comprises the following steps: backing up initial data and initializing; monitoring and intercepting all write operations; judging the type of the write operation; generating a check value by XOR calculation; generating a metadata record; constructing and updating a binary tree structured log; indexing and computing the recovery data. The system comprises: the system comprises an initialization module, a monitoring interception module, a first judgment module, a data backup preparation module, an exclusive OR calculation module, a metadata record generation module, a second judgment module, a third judgment module, a binary tree updating module, an index calculation module and a fourth judgment module. The invention provides a CDP backup recovery method based on a log constructed by a binary tree and a check value, and a CDP system is realized based on the CDP backup recovery method, so that the confidentiality, integrity and availability protection of data are realized at minimum cost, a large amount of storage space is saved, and the data recovery rate and the overall operation efficiency are improved.

Description

CDP backup recovery method, system and storage medium based on binary tree log
Technical Field
The invention belongs to the field of data protection, and belongs to a CDP backup recovery method, a system and a storage medium based on binary tree logs.
Background
Continuous Data Protection (CDP) is a data protection technique that captures and stores all data changes, and can recover any point-in-time data copy, providing security and reliability guarantees for the data. Compared with conventional data protection technologies such as backup, snapshot, etc., the CDP backup only needs to restore the data change from the system startup instead of all data sets, so that the CDP can provide infinite RPO and almost zero RTO.
At present, more and more enterprises use the CDP technology in their data protection schemes, so that the security of data is effectively guaranteed, and reliable support is provided for the continuous and stable development of business. The widely used CDP technology logs all data changes in a time sequence and generates a log for each write operation. During backup, the changed log sequence of each data block is stored; when recovery is performed, data at the recovery time point is acquired by traversing a large number of CDP logs. The method is simple in backup, but a large amount of data needs to be recorded, and a large amount of time is consumed for traversing the log during data recovery, so that the storage resource consumption is large and the recovery speed is slow. Another used technique is to recover to Any time Point TRAP (time Recovery to Any Point-in-time) at Any time, not to record each change of the data block, but to store the xor check value of the changed blocks during backup and form a log chain according to the sequence of the time stamps, and to read the xor check value during Recovery and calculate. The method reduces the space required by backup, but when the log chain is too long, the data recovery computation time is prolonged, and the whole chain is easy to crash due to single-point failure, so that the data recovery cannot be realized.
Therefore, how to improve the data recovery rate while optimizing the storage space, improve the system operation efficiency, and efficiently ensure the data security has become the biggest challenge in the big data era.
Disclosure of Invention
In order to solve the technical problems in the background art, embodiments of the present invention provide a CDP backup method, a system storage medium, and a recovery method based on continuous writing. The technical scheme is as follows:
in a first aspect, a CDP backup recovery method based on binary tree log is provided, which includes the steps:
an initialization step, wherein initial data is backed up and initialized;
a monitoring and intercepting step, which is used for monitoring and intercepting all writing operations;
a first judgment step of judging whether the write operation is a recovery write operation, and if not, executing a data backup preparation step; if yes, executing an index calculation step;
a data backup preparation step, namely acquiring a current time point T (n), reading original data BT (n-1) on a data block B, and writing changed data BT (n) into the data block B, wherein the data block B is a write operation coverage data block;
an exclusive-or calculation step, namely performing exclusive-or calculation on the original data BT (n-1) and the changed data BT (n), and adding a time stamp to a calculation result to generate a verification value PT (n);
generating a metadata record MT (n) according to a binary tree structured log and the check value PT (n);
a second judgment step of judging whether the verification value PT (n) is a first verification value PT (1), if not, executing a third judgment step; if yes, repeatedly executing the monitoring and intercepting step to the metadata record generating step;
a third judgment step of obtaining a check value P (n-1) according to the metadata record MT (n), judging whether n is an odd number, if not, performing exclusive or calculation on the check value PT (n) and the check value PT (n-1), and then sequentially repeating the metadata record generation step and the second judgment step according to a calculation result; if yes, executing a binary tree updating step;
a binary tree updating step, namely updating the binary tree structure log according to the metadata record MT (n), and then executing a fourth judging step;
an index calculation step of calculating and indexing a recovery check value according to the recovery time point and the updated binary tree structure log, carrying out XOR calculation on the recovery check value and the initial data, and executing a fourth judgment step when the calculation result is recovery data;
a fourth judgment step, judging whether the write operation exists, if not, ending; if yes, the monitoring and intercepting steps are repeatedly executed.
In a second aspect, there is also provided a CDP backup and restore system based on binary tree logs, the system comprising:
the initialization module is used for backing up initial data and initializing;
the monitoring interception module is used for monitoring and intercepting all write operations;
the first judgment module is used for judging whether the write operation is the recovery write operation or not, and if not, the data backup preparation module is executed; if yes, executing a calculation index module;
the data backup preparation module is used for acquiring a current time point T (n), reading original data BT (n-1) on a data block B, and writing changed data BT (n) into the data block B, wherein the data block B is a write operation coverage data block;
the exclusive-or calculation module is used for carrying out exclusive-or calculation on the original data BT (n-1) and the changed data BT (n), and adding a time stamp to a calculation result to generate a check value PT (n);
a metadata record generating module, configured to generate a metadata record MT (n) according to a binary tree structured log and the check value PT (n);
the second judging module is used for judging whether the verification value PT (n) is the first verification value PT (1) or not, and if not, the third judging module is executed; if yes, repeatedly executing the monitoring interception module to the metadata record generation module;
a third determining module, configured to obtain a check value P (n-1) according to the metadata record MT (n), determine whether n is an odd number, if n is not an odd number, perform an exclusive or calculation on the check value PT (n) and the check value PT (n-1), and then sequentially repeat the metadata record generating module and the second determining module according to a calculation result; if yes, executing a binary tree updating module;
a binary tree updating module, configured to update the binary tree structure log according to the metadata record MT (n), and then execute the fourth determining module;
the index calculation module is used for calculating and indexing a recovery check value according to the recovery time point and the updated binary tree structure log, carrying out XOR calculation on the recovery check value and the initial data, and executing a fourth judgment module if the calculation result is recovery data;
a fourth judging module, configured to judge whether there is a write operation, and if not, end the process; if yes, the monitoring interception module is executed repeatedly.
In a third aspect, a computer readable storage medium is provided, having stored thereon a computer program which, when executed by a processor, implements the continuous write based CDP backup method described above.
The invention has the beneficial effects that:
(1) The invention does not use the snapshot, and uses the check value to save the data difference in a check log mode, thereby saving the storage space;
(2) The invention realizes data backup by using XOR calculation, and the method is simple and easy to realize;
(3) The invention does not directly store data change, but stores the calculated data difference, thereby ensuring the confidentiality of the data;
(4) The method is realized by using exclusive-or calculation, so that the compression ratio of data is greatly improved, and a basis is provided for reducing the cost of storage space;
(5) According to the invention, a check log chain is not used, and the check value is subjected to XOR calculation for multiple times and stored by using a binary tree structure, so that the fault-tolerant capability is provided, the potential problem of crash of the check log chain is avoided, the reliability of the check log is improved, and the usability of data is ensured;
(6) The check value is subjected to reversible logic calculation for multiple times, no extra storage space is added by using the structure storage of the binary tree, only a very small amount of check values need to be analyzed when data recovery is carried out, the data recovery rate is obviously improved, and the recovery process is optimized;
(7) The check log based on the binary tree structure supports dynamic insertion, deletion and search of the check log, realizes quick indexing of the check log, and improves the data recovery rate;
(8) The invention generates the metadata record for the data stored in the binary tree log file, is used for checking the index of the log and the structure of the binary tree structure, greatly saves the space and lays a foundation for data recovery.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flowchart of a CDP backup recovery method based on a binary tree log in embodiment 1 of the present invention.
Fig. 2 is a schematic structural diagram of a metadata record in embodiment 1 of the present invention.
Fig. 3 is a schematic structural diagram of a binary tree structured log at time T (7) in embodiment 1 of the present invention.
Fig. 4 is a schematic structural diagram of a binary tree structured log at time T (8) in embodiment 1 of the present invention.
Fig. 5 is a schematic structural diagram of a binary tree structured log at time T (9) in embodiment 1 of the present invention.
Fig. 6 is a schematic structural diagram of a binary tree structured log at time T (10) in embodiment 1 of the present invention.
Fig. 7 is a schematic structural diagram of a binary tree structured log at time T (11) in embodiment 1 of the present invention.
Fig. 8 is a schematic structural diagram of a binary tree structured log at time T (12) in embodiment 1 of the present invention.
Fig. 9 is a diagram of memory space consumption during data backup in embodiment 1 of the present invention.
FIG. 10 is a graph showing the time consumption for data recovery with a data block size of 4KB in embodiment 1 of the present invention.
FIG. 11 is a graph showing the time consumption for data recovery with a data block size of 64KB in embodiment 1.
Fig. 12 is a structural diagram of a CDP system based on binary tree logs according to embodiment 2 of the present invention.
Fig. 13 is a structural diagram of an initialization module in embodiment 2 of the present invention.
Fig. 14 is a structural diagram of a metadata recording module in embodiment 2 of the present invention.
Fig. 15 is a structural diagram of a binary tree updating module in embodiment 2 of the present invention.
Fig. 16 is a block diagram of an index calculation module in embodiment 2 of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Interpretation of terms:
(1) Initial data: data on the data block at the beginning of continuous data protection.
(2) Raw data: when the changed data occurs, the data on the data block to which the changed data belongs is changed.
(3) Change data: the data written by the write operation, when it occurs, is the data that needs to be protected for data recovery.
(4) Data block: the basic unit of an I/O operation on data.
(5) Checking the log: and compressing and coding the check value obtained after the XOR calculation.
(6) And (4) recovering the check value: and calculating a check value required by data recovery through the recovery time point to obtain the check value.
(7) Recovery time point: the data backup is performed from the CDP start to an arbitrary time point between the current time point.
Example 1
In one embodiment, as shown in fig. 1, a CDP backup recovery method based on binary tree log is provided, the method includes:
s001 backs up and initializes the initial data.
Optionally, step S001 includes:
s0101, acquiring initial data and backing up;
s0102, creating a binary tree log file and a metadata record file;
s0103, according to the initial data, the binary tree log file and the metadata record file, a binary tree structure log is constructed.
It is worth to be noted that the binary tree log file is used for storing a check log, and the check log is generated by compressing and encoding check values; the metadata record file is used for storing metadata records, including metadata records of all child nodes and parent nodes in the binary tree structured log. The binary tree structured log is composed of the initial data, the check log and the metadata record together.
S002 monitors and intercepts all write operations.
S003, judging whether the write operation is a recovery write operation or not, and if not, executing a step S004; if yes, step S012 is executed.
S004, acquiring a current time point T (n), reading original data BT (n-1) on a data block B, and writing changed data BT (n) into the data block B, wherein the data block B is a write operation coverage data block.
For ease of understanding, we will take n as the number of the check value, T is a function of n, and the current time point T (n) is the time of the user's view.
S005, performing exclusive OR calculation on the original data BT (n-1) and the changed data BT (n), and adding a time stamp to a calculation result to generate a verification value PT (n).
It should be noted that the xor calculation is also called a half-add calculation, and is a reversible logic calculation. If the two values subjected to the XOR calculation are different, the calculation result is 1; if the two values of the exclusive or calculation are the same, the calculation result is 0. Meanwhile, the check value PT (n) is compressed and recoded to generate a check log, and the check log only stores the part which is not 0 in the calculation result.
Because the data difference is stored after the XOR calculation, the true content of the data does not need to be directly transmitted in the data transmission process, and the confidentiality of the data is ensured. Generally, only about 20% of data contents protected by a CDP system change every day, and each data block changes 5-10 times averagely, so that most of check values obtained after XOR calculation are 0, the compression ratio of data is greatly improved, and the storage space overhead is reduced.
S006 generates a metadata record MT (n) according to the binary tree structure log and the check value PT (n).
Optionally, step S006 includes:
s0601, according to the binary tree structure log, acquiring the height of the check value PT (n) located in the binary tree structure log, wherein the height of the binary tree is accumulated layer by layer from bottom to top starting from a leaf node, the calculation result of performing exclusive OR calculation on the original data and the changed data is the leaf node of the binary tree structure log, and the height of the leaf node is 0.
S0602, according to the verification value PT (n), acquiring a number n of the verification value PT (n), wherein the binary tree structure log sequentially numbers all nodes located at the same height from 1, and the number n indicates that the verification value PT (n) is the nth node in all nodes located at the same height as the verification value PT (n).
It will be appreciated that a check value can be uniquely determined by a binary set of (height, number). For ease of understanding, we all use (height, number) duplets to describe a check value.
S0603, obtaining basic information according to the binary tree structured log and the check value PT (n), wherein the basic information comprises: the method comprises the steps of a current time point T (n), a write operation coverage data block number, the number of a check log generated by a check value PT (n), the height of the check log generated by the check value PT (n), the position offset of the check log generated by the check value PT (n) on a binary tree log file, and the position offset of a parent node metadata record of the check log generated by the check value PT (n) on the metadata record file.
S0604 generates a metadata record MT (n) based on the basic information.
As shown in fig. 2, it is a schematic diagram of a structure of a metadata record, where the metadata record format is:
meta_record <timestamp, block_num, seq_num, height, log_offset, father_meta_offset>
wherein the timestamp is a timestamp generated by the change data, block _ num represents a data block number, the block number is a unique identifier of each data block, and seq _ num is a number of the check log; height is the height of the check log; log _ offset records the position offset of the check log on the binary tree log file; the fast _ meta _ offset records the position offset of the metadata record of the parent node of the check log on the metadata record file.
For ease of understanding, in particular, an example of operation is provided: setting the size of a data block to be 4KB, generating write operation on the data block with the block number of 9 at the time of T (5), intercepting changed data BT (5) written by the write operation, reading original data BT (4) on the data block with the block number of 9, carrying out XOR calculation on BT (4) and BT (5) to obtain check values (0, 5), and storing a check log obtained after the check values (0, 5) are compressed and encoded into a binary tree log file with the position offset of 4096 x8, wherein no father node is generated at this moment. The metadata record MT (5) is < T (5), 9,5,0, 4096 x8, 0>.
It should be noted that the basic information contained in the metadata record enables not only to quickly index the check value, but also to obtain the metadata records of all the parent nodes and the check values of the parent nodes through only the height and number of one check value.
S007 judges whether the check value PT (n) is the first check value PT (1), if not, step S008 is executed; if yes, the steps S002 to S006 are repeatedly executed.
S008, according to the metadata record MT (n), a check value P (n-1) is obtained.
Optionally, step S008 includes:
s0801, acquiring the number n of the verification value PT (n) according to the metadata record MT (n);
it is worth noting that one metadata record consists of a timestamp, a block number, a height, a check value position offset, and a parent node metadata record offset. By means of the metadata record MT (n), the number n can be quickly retrieved. Compared with the step S0602 in which the serial number n is obtained through the check value PT (n), the method is more convenient and faster.
S0802, subtracting the number n to obtain a metadata record MT (n-1) with the number (n-1);
s0803, according to the metadata record MT (n-1), a verification value PT (n-1) is obtained, wherein the verification value PT (n-1) is a previous verification value of the verification value PT (n).
S009 judges whether n is an odd number, if not, the step S010 is executed; if yes, go to step S011.
Optionally, step S009, including:
s0901, according to the number n, judging whether the check value PT (n) is an odd number of nodes in all nodes at the same height with the check value PT (n);
s0902, if not, executing the step S010; if yes, go to step S011.
And S010 carries out XOR calculation on the check value PT (n) and the check value PT (n-1) to obtain a check value P (n/2), and then the steps S006 and S007 are repeated in sequence.
S011, updating the binary tree structure log according to the metadata record MT (n), and executing the step S013;
optionally, step S011 includes:
s0111, according to the metadata record MT (n), acquiring the serial number n of the verification value PT (n);
s0112, adding one to the number n, and judging whether (n + 1) is a multiple of 4;
s0113, if not, deleting the check value P (n-1) and the metadata record MT (n-1) of the check value P (n-1), and executing the step S013;
s0114, if yes, subtracting one from the number (n-1) to obtain the metadata record MT (n-2) with the number (n-2);
s0115, according to the metadata record MT (n-2), a check value PT (n-2) is obtained, wherein the check value PT (n-2) is a previous check value of the check value PT (n-1);
s0116, taking the calculation result of the XOR calculation of the check value PT (n-2) and the check value PT (n-1) as a parent node;
s0117, taking the check value PT (n-2) as a left child node of the parent node, and updating the metadata record MT (n-2);
s0118 update the metadata record MT (n) with the check value PT (n) as the right child node of the parent node;
s0119 deletes the check value PT (n-1) and the metadata record MT (n-1) and performs step S013.
It should be noted that, because the binary tree is a non-linear structure, the storage overhead may be increased, and the relationship before each check log is recorded in the binary tree log implemented by metadata recording, so that the operation is simple and the management is convenient, and the consumption of additional storage space is reduced.
In order to facilitate understanding of the steps 005 to 011, specifically, 5 operation examples are provided:
the size of one data block is set to be 4KB, and the length of one metadata record is set to be 10B. 3-8 are used to represent the implementation of 5 operation instances, each of which uses a (height, number) binary group to describe a check value, where P1 is a check value with height 0 and number 1, and is represented by (0, 1), and is obtained by exclusive or calculation of BT (0) and BT (1); p3 is a check value with the height of 0 and the number of 3, is represented by (0, 3) and is obtained by XOR calculation of BT (2) and BT (3); p5 is a check value with the height of 0 and the number of 5, is expressed as (0, 5) and is obtained by XOR calculation of BT (4) and BT (5); p7 is a check value with the height of 0 and the number of 7, is represented by (0, 7) and is obtained by XOR calculation of BT (6) and BT (7); p8 is a check value with the height of 0 and the number of 8, is expressed as (0, 8) and is obtained by XOR calculation of BT (7) and BT (8); p9 is a check value with the height of 0 and the number of 9, is expressed by (0, 9) and is obtained by XOR calculation of BT (8) and BT (9); p10 is a check value with the height of 0 and the number of 10, is represented by (0, 10) and is obtained by XOR calculation of BT (9) and BT (10); p11 is a check value with the height of 0 and the number of 11, is represented by (0, 11) and is obtained by XOR calculation of BT (10) and BT (11); q1 is a check value with the height of 1 and the number of 1, is expressed by (1, 1) and is obtained by exclusive OR calculation of BT (0) and BT (2); q3 is a check value with the height of 1 and the number of 3, is expressed by (1, 3) and is obtained by XOR calculation of BT (4) and BT (6); q4 is a check value with the height of 1 and the number of 4, is expressed by (1, 4) and is obtained by XOR calculation of BT (6) and BT (8); q5 is a check value with the height of 1 and the number of 5, is expressed by (1, 5) and is obtained by XOR calculation of BT (8) and BT (10); q6 is a check value with the height of 1 and the number of 6, is expressed by (1, 6) and is obtained by exclusive OR calculation of BT (10) and BT (12); r1 is a check value with the height of 2 and the number of 1, is expressed by (2, 1) and is obtained by XOR calculation of BT (0) and BT (4); r2 is a check value with the height of 2 and the number of 2, is expressed by (2, 2) and is obtained by XOR calculation of BT (4) and BT (8); r3 is a check value with the height of 2 and the number of 3, is expressed by (2, 3) and is obtained by XOR calculation of BT (8) and BT (12); j1 is a check value with the height of 3 and the number of 1, is represented by (3, 1) and is obtained by XOR calculation of BT (0) and BT (8);
fig. 3 shows a binary tree structure log after occurrence of changed data BT (7) on a data block with block number 9, which includes an initial data BT (0), the changed data BT (7), a plurality of check values and an exclusive or calculation process of the check values, and the sequence of generation and storage of each check value is described by a dotted line and an arrow. At this time, the binary tree structured log has check values (0, 1), (0, 3), (0, 5), (0, 7), (1, 1), (1, 3), (2, 1).
At time T (8), changed data BT (8) occurs on the data block with block number 9. The change data BT (7) and the change data BT (8) are subjected to XOR calculation and time stamping to obtain check values (0, 8), and a metadata record < T (8), 9,8,0,4096 × 12,0> is generated.
The check values (0, 8) are numbered even, and the obtained check values (0, 7) and the check values (0, 8) are subjected to XOR calculation to obtain check values (1, 4) and generate metadata records < T (8), 9,4,1,4096 x 13,0>. The check value (1, 4) is numbered as an even number, the check value (1, 3) and the check value (1, 4) are obtained and subjected to exclusive OR calculation to obtain a check value (2, 2) and a metadata record < T (8), 9,2, 4096 x 14,0> is generated. The check value (2, 2) is numbered as an even number, the check value (2, 1) and the check value (2, 2) are obtained and subjected to XOR calculation to obtain the check value (3, 1) and generate a metadata record < T (8), 9,1,3,4096 x 15,0>. And the check value (3, 1) is the first check value in the check values with the height of 3, and the process is finished.
Fig. 4 shows a binary tree structured log after the occurrence of the changed data BT (8), which includes the initial data BT (0), the changed data BT (8), a plurality of check values and an exclusive or calculation process of the check values, and the sequence of generating and storing each check value is described by a dotted line and an arrow. At this time, the binary tree structured log has check values (0, 1), (0, 3), (0, 5), (0, 7), (0, 8), (1, 1), (1, 3), (1, 4), (2, 1), (2, 2), (3, 1).
At time T (9), changed data BT (9) occurs on the data block with block number 9. And carrying out XOR calculation on the change data BT (8) and the change data BT (9), adding a time stamp to obtain a check value (0, 9), and generating a metadata record < T (9), 9,0,4096 x 16,0>.
The check value (0, 9) is numbered odd, the number 9 plus 1 is not a multiple of 4, and the check value (0, 8) and the metadata record < T (8), 9,8,0,4096 x 12,0> are deleted, ending.
Fig. 5 shows a binary tree structured log after the occurrence of the changed data BT (9), which includes the initial data BT (0), the changed data BT (9), a plurality of check values, and an exclusive or calculation process of the check values, and an arrow depicts an order of generation of each check value. At this time, the binary tree structured log has check values (0, 1), (0, 3), (0, 5), (0, 7), (0, 9), (1, 1), (1, 3), (1, 4), (2, 1), (2, 2), (3, 1).
The change data BT (10) occurs on the data block with the block number 9 at time T (10). The change data BT (9) and the change data BT (10) are subjected to XOR calculation and time stamping to obtain check values (0, 10) and metadata records < T (10), 9,10,0,4096 x 17,0>.
The check value (0, 10) is numbered as an even number, the check value (0, 9) and the check value (0, 10) are obtained and subjected to exclusive OR calculation to obtain a check value (1, 5) and a metadata record < T (10), 9,5,1,4096 x 18,0> is generated. The parity value (1, 5) is numbered odd, the number 5 plus 1 is not an integer multiple of 4, and the parity value (1, 4) and the metadata record < T (8), 9,4,1,4096 x 13,0> are deleted.
Fig. 6 shows a binary tree structured log after the occurrence of the changed data BT (10), which includes the initial data BT (0), the changed data BT (10), a plurality of check values and an exclusive or calculation process of the check values, and the sequence of generating and storing each check value is described by a dotted line and an arrow. At this time, the binary tree structured log has check values (0, 1), (0, 3), (0, 5), (0, 7), (0, 9), (0, 10), (1, 1), (1, 3), (1, 5), (2, 1), (2, 2), (3, 1).
The change data BT (11) occurs on the data block with the block number 9 at time T (11). The change data BT (10) and the change data BT (11) are subjected to XOR calculation and time stamping to obtain check values (0, 11) and metadata records < T (11), 9,11,0,4096 x 19,0>.
The check values (0, 11) are odd numbered, the number 11 plus 1 is an integer multiple of 4, the check values (1, 5) are parent nodes, the check values (0, 9) are left child nodes, the metadata records updating the check values (0, 9) are < T (9), 9,0,4096 > 16,10 > 18>, the check values (0, 11) are right child nodes, and the metadata records updating the check values (0, 11) are < T (11), 9,11,0,4096 > 19,10 > 18>. Deleting the check value (0, 10) and the metadata record < T (10), 9,10,0,4096 x 17,0>, and ending.
Fig. 7 shows a binary tree structured log after the occurrence of the changed data BT (11), which includes the initial data BT (0), the changed data BT (11), a plurality of check values and an exclusive or calculation process of the check values, and the sequence of generating and storing each check value is described by a dotted line and an arrow. At this time, the binary tree structured log has check values (0, 1), (0, 3), (0, 5), (0, 7), (0, 9), (0, 11), (1, 1), (1, 3), (1, 5), (2, 1), (2, 2), (3, 1).
The change data BT (12) occurs on the data block with the block number 9 at time T (12). The change data BT (11) and the change data BT (12) are subjected to XOR calculation and time stamping to obtain check values (0, 12) and metadata records < T (12), 9,12,0,4096 x 20,0>.
The check value (0, 12) is numbered even, the check value (0, 11) and the check value (0, 12) are obtained and subjected to XOR calculation to obtain the check value (1, 6) and a metadata record < T (12), 9,6,1,4096 x 21,0> is generated. The check value (1, 6) is numbered even, the check value (1, 5) and the check value (1, 6) are obtained and subjected to XOR calculation to obtain a check value (2, 3) and a metadata record < T (12), 9,3,2,4096 x 22,0>. The check values (2, 3) are numbered odd, the number 3 plus 1 is an integer multiple of 4, the check values (3, 1) are parent nodes, the check values (2, 1) are left child nodes, the metadata records for updating the check values (2, 1) are < T (4), 9,1,2,4096, 7,10, 15>, the check values (2, 3) are right child nodes, and the metadata records for updating the check values (2, 3) are < T (12), 9,3,2,4096, 22,10, 15>. Deleting the check value (2, 2) and the metadata record < T (8), 9,2, 4096 x 14,0>, then ends.
Fig. 8 shows a binary tree structured log after the occurrence of the changed data BT (12), which includes the initial data BT (0), the changed data BT (12), a plurality of check values and an exclusive or calculation process of the check values, and the sequence of generating and storing each check value is described by a dotted line and an arrow. At this time, the binary tree structured log has check values (0, 1), (0, 3), (0, 5), (0, 7), (0, 9), (0, 11), (0, 12), (1, 1), (1, 3), (1, 5), (1, 6), (2, 1), (2, 3), (3, 1).
S012 calculates and indexes recovery check value according to the recovery time point and the updated binary tree structure log, performs XOR calculation on the recovery check value and the initial data, and executes step S013 if the calculation result is recovery data;
optionally, step S012 includes:
s0121 acquires a recovery time point T (m), and creates recovery data at the recovery time point T (m);
s0122, according to said recovery time point T (m) and said updated binary tree structure log, judging whether the check value P (m) with height of 0 and number of m exists, if not, taking the value of recovery number r as (m + 1); if yes, taking the value of the recovery number r as m;
it can be understood that the binary tree log file only stores the check values with the odd number, and the data recovery can be completed only through the check values with the odd number, so that half of the storage space is saved.
S0123, according to the recovery number r after value taking, obtaining the metadata record MT (r);
s0124, according to the metadata record MT (r), obtaining a verification value PT (r) and a father node metadata record of the verification value PT (r), wherein the height of the father node metadata record is the height of the verification value PT (r) plus 1;
s0125, according to the father node metadata record of the verification value PT (r), judging whether the father node verification value of the verification value PT (r) has a father node, if not, ending the step; if yes, repeating the step S0124 until obtaining the father node metadata records of all the verification values PT (r);
it is to be understood that the recovery process only requires the parent check value of a part of said check value PT (r),
however, the metadata record records the offset position of the metadata record of the father node, and the father node which repeatedly obtains the check value can quickly index to the recovery check value.
S0126, calculating a recovery check value according to the number m;
it is worth noting that the computation of the recovery check value is based on a binary tree structured log. Firstly, the number m is obtained to be divided by 2 n The largest positive integer n in (1), the check value (n, m/2) n ) Is the recovery check value. In this case, if the number m is equal to 2 n If so, the calculation of the recovery check value is finished; if the number m is not equal to 2 n Then, the value of the number m is taken as (m-2) n ) And repeating the calculation step of the maximum positive integer n until the end.
S0127 quickly indexing the recovery check value according to the metadata record MT (r) and the parent node metadata record of all the check values PT (r);
s0127, carrying out XOR calculation on the recovery check value and the initial data to obtain recovery data;
s0128 performs step S013.
For ease of understanding, specifically, 1 operational example is provided:
the recovery time point is acquired as T (20), the initial data BT (0) is acquired, and each bit of the recovery data BT (20) created is 0. Since the check value (0, 20) does not exist in the binary tree structured log, the recovery number is taken to be 21. And acquiring a check value (0, 21) and a metadata record of the check value (0, 21), wherein the check value (0, 21) is obtained by carrying out exclusive OR calculation on BT (21) and BT (20). The parent node check values (1, 11), (2, 5), (3, 3), (4, 1) of all the check values (0, 21) can be quickly indexed according to the metadata record of the check values (0, 21).
The number 20 can be divided by 2 n Is 2, and the check value (2, 5) is the recovery check value. Number 20 not equal to 2 2 Then, thenNumber 16 is taken and the number 16 can be divided by 2 n Is 4, and the check value (4, 1) is a recovery check value. Number 16 equals 2 4 Then all recovery check values have been found as check values (2, 5), (4, 1). The check value (2, 5) is obtained by performing exclusive OR calculation on BT (20) and BT (16), and the check value (4, 1) is obtained by performing exclusive OR calculation on BT (16) and BT (0). Therefore, the initial data BT (0) and all recovery check values are subjected to exclusive OR calculation, the calculation result is BT (20), and recovery data are obtained.
For ease of understanding, specifically, 1 operation example is also provided:
the acquisition recovery time point is T (31), the initial data BT (0) is acquired, and each bit of the created recovery data BT (31) is 0. Since the check value (0, 31) exists in the binary tree structured log, the recovery number is taken to be 31. And acquiring a check value (0, 31) and a metadata record of the check value (0, 31), wherein the check value (0, 31) is obtained by carrying out exclusive OR calculation on BT (31) and BT (30). The parent node check values (1, 15), (2, 7), (3, 3), (4, 1) of all the check values (0, 31) can be quickly indexed according to the metadata record of the check values (0, 31).
The number 31 can be divided by 2 n Is 0, and the check value (0, 31) is the recovery check value. Number 31 not equal to 2 0 If the number 30 is taken, the number 30 can be divided by 2 n Is 1, and the check value (1, 15) is a recovery check value. Number 30 not equal to 2 1 If the number 28 is taken, the number 28 can be divided by 2 n Is 2, the check value (2, 7) is the recovered check value. Number 28 not equal to 2 2 If the number 24 is taken, the number 24 can be divided by 2 n Is 3, the check value (3, 3) is the recovered check value. Number 24 not equal to 2 3 If the number 16 is taken, the number 16 can be divided by 2 n Is 4, and the check value (4, 1) is the recovered check value. Number 16 equals 2 4 Then all recovery check values have been found as check values (1, 15), (2, 7), (3, 3), (4, 1). The check value (0, 31) is obtained by performing exclusive OR calculation on BT (31) and BT (30), and the check valueThe value (1, 15) is obtained by performing exclusive OR calculation on BT (30) and BT (28), the check value (2, 7) is obtained by performing exclusive OR calculation on BT (28) and BT (24), the check value (3, 3) is obtained by performing exclusive OR calculation on BT (24) and BT (16), and the check value (4, 1) is obtained by performing exclusive OR calculation on BT (16) and BT (0). Therefore, the initial data BT (0) and all the recovery check values are subjected to XOR calculation, the calculation result is BT (31), and the recovery data is obtained.
S013 judges whether write operation exists or not, if not, the operation is ended; if yes, the step S002 is repeated.
In the following, we provide a set of comparative experiments to further illustrate the present example, which are as follows:
all experiments are performed on a virtual machine which is started in a virtualization environment, and the experiment environment is built in a local area network with the bandwidth of 1000 Mbps. The host used by the experimental client adopts an Intel (R) Core (TM) i5-7360U CPU @ 2.30GHz, an operating system is Linux version 3.10.0-1160.el7.x86 \u64, a memory is 8GB, and a storage space is 256GB. The host used by the server side of the experiment adopts a processor as ntel (R) Xeon (R) CPU E5-678 v3 @ 2.50GHz, an operating system as Linux version 3.10.0-1160.el7.x86 \u64, a memory as 32GB and a storage space as 4TB. The experiment is based on a TPC-C benchmark test program, the read and write operations of the database under a real application environment are simulated, and the storage space overhead and the data recovery rate are tested through the operations. The host computer uses a benchmark database testing tool with built-in TPC-C testing script, and the version of the benchmark SQL is 5.1. The Postgresql database was installed in the system with a version of 9.3.
In the experiment, the benchmark sql simulates 10 goods warehouses, each warehouse corresponds to 100MB data, 10 terminals are arranged, each terminal runs 10 transactions, and 100 transactions are integrally run. Five data blocks with different sizes, namely 4K, 8K, 16K, 32K and 64K, are set in the experiment respectively, and the test objects are as follows: the CDP of the present embodiment 1, the comparative example 2 and the comparative example 3 are shown in the following, wherein the comparative example 1 is the CDP using the TRAP technology of the check log chain, the comparative example 2 is the CDP using the ST-CDP technology of periodically inserting snapshots in the check log chain, and the comparative example 3 is the CDPFGL directly storing data changes.
Since the xor calculation and the compression re-encoding cause additional calculation overhead of the system, the time for the xor calculation and the compression re-encoding in embodiment 1 is first calculated, and the calculation results are shown in table 1.
TABLE 1 additional calculation elapsed time
Figure 917813DEST_PATH_IMAGE001
As can be seen from table 1, both the xor calculation and the compression re-encoding require only tens to hundreds of microseconds, and the additional time consumed by the xor calculation and the compression re-encoding is negligible compared to the time required to capture the writing and reading of the writing operation.
The CDP backup was completed with three hours of operation in the same TPC-C baseline test environment for each experiment, with the results shown in table 2 and fig. 9.
Table 2 backup consumed storage space
Figure 366111DEST_PATH_IMAGE002
As can be seen from the results of the experiments of comparative example 1, comparative example 2, and comparative example 3, in combination with table 2 and fig. 9, the sizes of the storage spaces consumed for backup are not greatly different for the data blocks of different sizes in example 1, comparative example 1, and comparative example 2. In contrast, in comparative example 3, the storage space consumed by the CDP backup method based on the check log is almost increased by 2 times along with the increase of the size of the data block, so that the storage space can be greatly saved, and the CDP backup method based on the check log can flexibly deal with the data block with any size. Comparing the memory space consumed by the same size data blocks, embodiment 1 is not much different from comparative example 1, comparative example 2 is about 1.4 times of embodiment 1, and comparative example 3 is 2.2 times to 11 times of embodiment 1, and thus it can be seen that embodiment 1 can reduce the data backup space overhead.
The experiment was performed based on data blocks of 4KB and 64KB sizes, and the data recovery was completed by selecting time points 30, 60, 90, 120, 150 and 180 minutes from the latest time point as recovery time points at 30-minute intervals, respectively, and the experimental results are shown in table 3, fig. 10 and fig. 11.
TABLE 3 recovery elapsed time
Figure 677007DEST_PATH_IMAGE004
Comparing the experimental results of example 1, comparative example 2 and comparative example 3, and comparing the recovery times of data blocks with different sizes with table 3, the performance of the data block with 64KB size is better than that of the data block with 4KB size, which shows that the larger the data block size is, the faster the recovery rate is. As can be seen from table 3, fig. 10 and fig. 11, as the recovery time point becomes longer from the latest time, the recovery times of comparative examples 1 and 3 both greatly increased, whereas examples 1 and 2 did not change much, and both the recovery times stabilized between 200s and 300 s. Comparing the recovery time consumed for the same size data blocks, comparative example 1 is 6 to 11 times that of example 1, and comparative example 3 is 1.3 to 7 times that of example 1, and thus it can be seen that example 1 can improve the rate of data recovery.
From table 2 and fig. 9, it can be seen that the consumed storage space of comparative example 1 is slightly smaller than that of example 1, but the recovery performance of example 1 is far better than that of comparative example 1, and the overall performance is better. The embodiment 1 and the comparative example 2 have almost the same recovery performance, but the embodiment 1 consumes about 70% of the space of the comparative example 2, and more storage space can be saved. Meanwhile, in embodiment 1, the recovery time consumed for data recovery of data blocks of different sizes is substantially the same, and the data recovery method has better stability.
The technical scheme of the embodiment provides a CDP backup recovery method based on binary tree logs, check values are obtained through XOR calculation, time stamps are added, check logs are generated, data storage changes are generated, the data compression ratio is improved, and the storage space cost is reduced. According to the method, all the check logs do not need to be backed up, the relation between the check logs is constructed in a binary tree structure by using the metadata records, the check values can be quickly indexed for data recovery, the storage space is maximally utilized, the check block index is recovered at the fastest speed, and the recovery time consumption is minimized.
Example 2
In one embodiment, as shown in fig. 12, there is provided a CDP backup and restore system based on binary tree logs, the system comprising:
an initialization module 1001 for backing up initial data and initializing;
a monitor capture module 1002, configured to monitor and capture all write operations;
a first determining module 1003, configured to determine whether the write operation is a recovery write operation, and if not, execute the data backup preparing module 1004; if yes, the index calculation module 1010 is executed;
a data backup preparation module 1004, configured to obtain a current time point T (n), read original data BT (n-1) on a data block B, and write changed data BT (n) into the data block B, where the data block B is a data block covered by the write operation;
an exclusive-or calculation module 1005, configured to perform exclusive-or calculation on the original data BT (n-1) and the changed data BT (n), and add a timestamp to a calculation result to generate a check value PT (n);
a metadata record generating module 1006, configured to generate a metadata record MT (n) according to a binary tree structured log and the check value PT (n);
a second judging module 1007, configured to judge whether the check value PT (n) is the first check value PT (1), and if not, execute a third judging module 1008; if yes, the monitoring interception module 1002 is repeatedly executed to the metadata record generation module 1006;
a third determining module 1008, configured to obtain a check value P (n-1) according to the metadata record MT (n), determine whether n is an odd number, if n is not an odd number, perform xor calculation on the check value PT (n) and the check value PT (n-1), and then sequentially repeat the metadata record generating module 1006 and the second determining module 1007 according to a calculation result; if yes, execute binary tree update module 1009;
a binary tree updating module 1009, configured to acquire a check value PT (n-2), use a result of performing an exclusive or calculation on the check value PT (n-1) and the check value PT (n-2) as a parent node, update the binary tree structure log, and then execute the fourth determining module 1011;
an index calculation module 1010, configured to calculate and index a recovery check value according to the recovery time point and the updated binary tree structure log, perform xor calculation on the recovery check value and the initial data, obtain a calculation result as recovery data, and execute a fourth determination module 1011;
a fourth determining module 1011, configured to determine whether a write operation exists, and if not, end the process; if yes, the monitoring and intercepting module 1002 is executed repeatedly.
Optionally, as shown in fig. 13, on the basis of this embodiment, the initialization module 1001 includes:
a backup initialization unit 10011, configured to obtain initial data and perform backup;
an initialization creating unit 10012 for creating a binary tree log file and a metadata record file;
a binary tree structure log constructing unit 10013, configured to construct a binary tree structure log according to the initial data, the binary tree log file, and the metadata record file.
Optionally, as shown in fig. 14, on the basis of this embodiment, the metadata record generating module 1006 includes:
a height obtaining unit 10061, configured to obtain, according to the binary tree structure log, that the check value PT (n) is located at the height of the binary tree structure log, where the height of the binary tree is accumulated layer by layer from bottom to top starting from a leaf node, a calculation result of performing an exclusive or calculation on the original data and the changed data is a leaf node of the binary tree structure day, and the height of the leaf node is 0;
a first number obtaining unit 10062, configured to obtain, according to the check value PT (n), a number n of the check value PT (n), where the binary tree structure log sequentially numbers all nodes located at the same height from 1, and the number n indicates that, in all nodes located at the same height as the check value PT (n), the check value PT (n) is an nth node;
a basic information obtaining unit 10063, configured to obtain basic information according to the binary tree structured log and the check value PT (n), where the basic information includes: a current time point T (n), a write operation coverage data block number, a number of a check value PT (n), a height of the check value PT (n), a position offset of the check value PT (n) on the binary tree log file, and a position offset of a parent node metadata record of the check value PT (n) on the metadata record file;
a check value metadata record generating unit 10064 is configured to generate a metadata record MT (n) according to the basic information.
Optionally, as shown in fig. 15, on the basis of this embodiment, the binary tree updating module 1009 includes:
a second number obtaining unit 10091, configured to obtain a number n of the verification value PT (n) according to the metadata record MT (n);
a number operation determination unit 10092, configured to add one to the number n, and determine whether (n + 1) is a multiple of 4;
a first deleting unit 10093, configured to delete the check value P (n-1) and the metadata record MT (n-1) of the check value P (n-1) if the check value P (n-1) is not deleted, and execute a fourth determining module;
a third number obtaining unit 10094, configured to, if yes, perform a subtraction operation on the number (n-1) to obtain the metadata record MT (n-2) with the number (n-2);
a check value obtaining unit 10095, configured to obtain a check value PT (n-2) according to the metadata record MT (n-2), where the check value PT (n-2) is a previous check value of the check value PT (n-1);
a parent node generating unit 10096, configured to use a calculation result obtained by performing xor calculation on the check value PT (n-2) and the check value PT (n-1) as a parent node;
a left child node generating unit 10097, configured to update the metadata record MT (n-2) with the check value PT (n-2) as a left child node of the parent node;
a right child node generating unit 10098, configured to update the metadata record MT (n) with the check value PT (n) as a right child node of the parent node;
a second deleting unit 10099, configured to delete the verification value PT (n-1) and the metadata record MT (n-1), and execute a fourth determining module.
Optionally, as shown in fig. 16, on the basis of this embodiment, the index calculating module 1010 includes:
a recovery initialization unit 10101 for acquiring a recovery time point T (m), creating recovery data at the recovery time point T (m);
a recovery number obtaining unit 10102, configured to determine whether a check value P (m) with a height of 0 and a number of m exists according to the recovery time point T (m) and the updated binary tree structure log, and if not, take the value of the recovery number r as (m + 1); if yes, taking the value of the recovery number r as m;
a metadata record obtaining unit 10103, configured to obtain a metadata record MT (r) according to the retrieved recovery number r;
a first parent node obtaining unit 10104, configured to obtain a check value PT (r) and a parent node metadata record of the check value PT (r) according to a metadata record MT (r), where a height of the parent node metadata record is a height of the check value PT (r) plus 1;
a second parent node obtaining unit 10105, configured to determine, according to the parent node metadata record of the check value PT (r), whether a parent node exists in the parent node check value of the check value PT (r), and if not, end the unit; if yes, repeating the previous unit until obtaining all father node metadata records of the verification value PT (r);
a check value calculation unit 10106 for calculating a recovery check value according to the number m;
a check value indexing unit 10107, configured to quickly index to the recovered check value according to the obtained metadata record MT (r) and the parent node metadata records of all the check values PT (r);
a recovery data calculation unit 10108, configured to perform xor calculation on the recovery check value and the initial data to obtain the recovery data;
an executing unit 10109, configured to execute the fourth determining module.
The technical scheme of the embodiment provides a CDP backup and recovery system based on a binary tree log, and the problems of overlarge storage space consumption and low recovery efficiency caused by the fact that the number of backup data of the CDP system is increased along with the increase of time are solved by a method for constructing a check log in a binary tree structure. The system does not directly store data contents but stores the check value after XOR calculation, thereby providing confidentiality protection for data; the log chain is not used, the binary tree structured log is used, fault tolerance capability is provided for data recovery, the problem of log chain crash is avoided, and integrity and usability protection is provided for data.
Example 3
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements the binary tree log based CDP backup restoration method of embodiment 1 described above.
Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (8)

1. A CDP backup recovery method based on binary tree log is characterized by comprising the following steps:
an initialization step, wherein initial data is backed up and initialized;
monitoring and intercepting step, monitoring and intercepting all writing operation;
a first judgment step of judging whether the write operation is a recovery write operation, and if not, executing a data backup preparation step; if yes, executing an index calculation step;
a data backup preparation step, namely acquiring a current time point T (n), reading original data BT (n-1) on a data block B, and writing changed data BT (n) into the data block B, wherein the data block B is a write operation coverage data block;
an exclusive-or calculation step, namely performing exclusive-or calculation on the original data BT (n-1) and the changed data BT (n), and adding a time stamp to a calculation result to generate a verification value PT (n);
generating a metadata record MT (n) according to a binary tree structure log and the verification value PT (n);
a second judgment step, namely judging whether the verification value PT (n) is the first verification value PT (1) or not, and if not, executing a third judgment step; if yes, repeatedly executing the monitoring and intercepting step to the metadata record generating step;
a third judgment step of obtaining a check value P (n-1) according to the metadata record MT (n), judging whether n is an odd number, if not, performing exclusive or calculation on the check value PT (n) and the check value PT (n-1), and then sequentially repeating the metadata record generation step and the second judgment step according to a calculation result; if yes, executing a binary tree updating step;
updating the binary tree structure log according to the metadata record MT (n), and then executing a fourth judgment step;
an index calculation step of calculating and indexing a recovery check value according to the recovery time point and the updated binary tree structure log, performing exclusive or calculation on the recovery check value and the initial data, and executing a fourth judgment step if the calculation result is recovery data;
a fourth judgment step of judging whether write operation exists or not, if not, ending, and if so, repeatedly executing the monitoring interception step;
wherein, the updating step of the binary tree comprises:
acquiring the number n of the verification value PT (n) according to the metadata record MT (n);
adding one to the number n, and judging whether the number (n + 1) is a multiple of 4;
if not, deleting the check value P (n-1) and the metadata record MT (n-1) of the check value P (n-1), and executing a fourth judgment step;
if yes, performing a subtraction operation on the number (n-1) to obtain a metadata record MT (n-2) with the number (n-2);
acquiring a verification value PT (n-2) according to the metadata record MT (n-2), wherein the verification value PT (n-2) is a previous verification value of the verification value PT (n-1);
taking the calculation result of the XOR calculation of the check value PT (n-2) and the check value PT (n-1) as a parent node;
updating the metadata record MT (n-2) with the check value PT (n-2) as a left child node of the parent node;
updating the metadata record MT (n) with the check value PT (n) as a right child node of the parent node;
the check value PT (n-1) and the metadata record MT (n-1) are deleted, and a fourth determination step is performed.
2. The binary tree log based CDP backup recovery method according to claim 1, wherein said initializing step comprises:
acquiring initial data and backing up the initial data;
creating a binary tree log file and a metadata record file;
constructing a binary tree structure log according to the initial data, the binary tree log file and the metadata record file;
in addition, the metadata record generating step further includes:
according to the binary tree structure log, acquiring the height of the verification value PT (n) in the binary tree structure log, wherein the height of a binary tree is accumulated layer by layer from bottom to top from leaf nodes, the calculation result of carrying out XOR calculation on the original data and the changed data is the leaf node of the binary tree structure log, and the height of the leaf node is 0;
acquiring a number n of the verification value PT (n) according to the verification value PT (n), wherein the binary tree structure log sequentially numbers all nodes located at the same height from 1, and the number n indicates that the verification value PT (n) is the nth node in all nodes located at the same height as the verification value PT (n);
acquiring basic information according to the binary tree structured log and the check value PT (n), wherein the basic information comprises: a current time point T (n), a write operation covering data block number, a number of a check value PT (n), a height of the check value PT (n), a position offset of the check value PT (n) on the binary tree log file, and a position offset of a parent node metadata record of the check value PT (n) on the metadata record file;
according to the basic information, a metadata record MT (n) is generated.
3. The binary tree log based CDP backup and restore method according to claim 1, wherein said third determining step comprises:
acquiring the number n of the verification value PT (n) according to the metadata record MT (n);
subtracting the number n by one to obtain a metadata record MT (n-1) with the number (n-1);
obtaining a verification value PT (n-1) according to the metadata record MT (n-1), wherein the verification value PT (n-1) is a previous verification value of the verification value PT (n);
judging whether the check value PT (n) is an odd number node in all nodes with the same height as the check value PT (n) according to the number n;
if not, carrying out XOR calculation on the check value PT (n-1) and the check value PT (n), and then sequentially and repeatedly executing the metadata record generation step and the second judgment step according to the calculation result; if yes, executing the binary tree updating step.
4. The binary tree log based CDP backup recovery method according to claim 1, wherein said index calculating step further comprises:
acquiring a recovery time point T (m), and creating recovery data at the recovery time point T (m);
judging whether a check value P (m) with the height of 0 and the number of m exists or not according to the recovery time point T (m) and the updated binary tree structure log, and if not, taking the value of a recovery number r as (m + 1); if yes, taking the value of the recovery number r as m;
acquiring a metadata record MT (r) according to the recovery number r after the value is taken;
acquiring a verification value PT (r) and a parent node metadata record of the verification value PT (r) according to a metadata record MT (r), wherein the height of the parent node metadata record is the height of the verification value PT (r) plus 1;
judging whether the father node check value of the check value PT (r) has a father node according to the father node metadata record of the check value PT (r), and if not, finishing the step; if yes, repeating the previous step until all father node metadata records of the verification value PT (r) are obtained;
calculating a recovery check value according to the number m;
rapidly indexing to the recovery check value according to the metadata record MT (r) and the parent node metadata record of all the check values PT (r);
performing exclusive-or calculation on the recovery check value and the initial data to obtain recovery data;
and executing the fourth judging step.
5. A CDP backup and restore system based on binary tree logs, said system comprising:
the initialization module is used for backing up initial data and initializing;
the monitoring interception module is used for monitoring and intercepting all write operations;
the first judgment module is used for judging whether the write operation is the recovery write operation or not, and if not, the data backup preparation module is executed; if yes, executing a calculation index module;
the data backup preparation module is used for acquiring a current time point T (n), reading original data BT (n-1) on a data block B, and writing changed data BT (n) into the data block B, wherein the data block B is a write operation coverage data block;
the exclusive-or calculation module is used for carrying out exclusive-or calculation on the original data BT (n-1) and the changed data BT (n), and adding a time stamp to a calculation result to generate a check value PT (n);
a metadata record generating module, configured to generate a metadata record MT (n) according to a binary tree structure log and the check value PT (n);
the second judging module is used for judging whether the verification value PT (n) is the first verification value PT (1) or not, and if not, the third judging module is executed; if yes, repeatedly executing the monitoring interception module to the metadata record generation module;
a third judging module, configured to obtain a check value P (n-1) according to the metadata record MT (n), judge whether n is an odd number, if not, perform xor calculation on the check value PT (n) and the check value PT (n-1), and then sequentially repeat the metadata record generating module and the second judging module according to a calculation result; if yes, executing a binary tree updating module;
a binary tree updating module, configured to update the binary tree structure log according to the metadata record MT (n), and then execute the fourth determining module;
the index calculation module is used for calculating and indexing a recovery check value according to the recovery time point and the updated binary tree structure log, carrying out exclusive or calculation on the recovery check value and the initial data, and executing a fourth judgment module if the calculation result is recovery data;
the fourth judging module is used for judging whether the write operation exists or not, and if not, ending the operation; if yes, the monitoring interception module is executed repeatedly;
wherein, the binary tree updating module comprises:
a second number obtaining unit, configured to obtain a number n of the check value PT (n) according to the metadata record MT (n);
a number operation judgment unit, configured to add one to the number n, and judge whether (n + 1) is a multiple of 4;
a first deleting unit, configured to delete the check value P (n-1) and the metadata record MT (n-1) of the check value P (n-1) if the first judging module does not execute the first judging module;
a third number obtaining unit, configured to, if yes, perform a subtraction operation on the number (n-1) to obtain the metadata record MT (n-2) with the number (n-2);
a check value obtaining unit, configured to obtain a check value PT (n-2) according to the metadata record MT (n-2), where the check value PT (n-2) is a previous check value of the check value PT (n-1);
a parent node generation unit configured to use a calculation result obtained by performing exclusive or calculation on the check value PT (n-2) and the check value PT (n-1) as a parent node;
a left child node generation unit configured to update the metadata record MT (n-2) with the check value PT (n-2) as a left child node of the parent node;
a right child node generation unit configured to update the metadata record MT (n) with the check value PT (n) as a right child node of the parent node;
and the second deleting unit is used for deleting the verification value PT (n-1) and the metadata record MT (n-1) and executing a fourth judging module.
6. The binary tree log based CDP backup-restore system of claim 5, wherein said initialization module comprises:
the backup initialization unit is used for acquiring initial data and backing up the initial data;
the system comprises a creating initialization unit, a storing unit and a processing unit, wherein the creating initialization unit is used for creating a binary tree log file and a metadata record file;
a binary tree structure log constructing unit, configured to construct a binary tree structure log according to the initial data, the binary tree log file, and the metadata record file;
in addition, the metadata record generating module further includes:
a height obtaining unit, configured to obtain, according to the binary tree structure log, a height at which the check value PT (n) is located in the binary tree structure log, where the height of the binary tree is accumulated layer by layer from bottom to top starting from a leaf node, a calculation result of performing xor calculation on the original data and the changed data is the leaf node of the binary tree structure log, and the height of the leaf node is 0;
a first number obtaining unit, configured to obtain a number n of the check value PT (n) according to the check value PT (n), where the binary tree structured log sequentially numbers all nodes located at the same height from 1, and the number n indicates that, in all nodes located at the same height as the check value PT (n), the check value PT (n) is an nth node;
a basic information obtaining unit, configured to obtain basic information according to the binary tree structured log and the check value PT (n), where the basic information includes: a current time point T (n), a write operation covering data block number, a number of a check value PT (n), a height of the check value PT (n), a position offset of the check value PT (n) on the binary tree log file, and a position offset of a parent node metadata record of the check value PT (n) on the metadata record file;
and the check value metadata record generating unit is used for generating a metadata record MT (n) according to the basic information.
7. The binary tree log based CDP backup and restore system according to claim 5, wherein said compute indexing module comprises:
a recovery initialization unit configured to acquire a recovery time point T (m), and create recovery data at the recovery time point T (m);
a recovery number obtaining unit, configured to determine, according to the recovery time point T (m) and the updated binary tree structure log, whether a check value P (m) with a height of 0 and a number of m exists, and if not, take the value of the recovery number r as (m + 1); if yes, taking the value of the recovery number r as m;
a metadata record obtaining unit, configured to obtain a metadata record MT (r) according to the valued recovery number r;
a first parent node obtaining unit, configured to obtain a check value PT (r) and a parent node metadata record of the check value PT (r) according to a metadata record MT (r), where a height of the parent node metadata record is a height of the check value PT (r) plus 1;
a second father node obtaining unit, configured to determine, according to the father node metadata record of the verification value PT (r), whether a father node exists in the father node verification value of the verification value PT (r), and if not, end the unit; if yes, repeating the previous unit until obtaining all father node metadata records of the verification value PT (r);
the check value calculating unit is used for calculating a recovery check value according to the serial number m;
the check value indexing unit is used for quickly indexing to the recovery check value according to the acquired metadata records MT (r) and the parent node metadata records of all the check values PT (r);
a recovery data calculation unit, configured to perform xor calculation on the recovery check value and the initial data to obtain the recovery data;
and the execution unit is used for executing the fourth judgment module.
8. A computer readable storage medium having stored thereon a computer program, wherein the program when executed by a processor implements the binary tree log based CDP backup restoration method of any of claims 1 to 4.
CN202210941069.9A 2022-08-08 2022-08-08 CDP backup recovery method, system and storage medium based on binary tree log Active CN115016988B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210941069.9A CN115016988B (en) 2022-08-08 2022-08-08 CDP backup recovery method, system and storage medium based on binary tree log

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210941069.9A CN115016988B (en) 2022-08-08 2022-08-08 CDP backup recovery method, system and storage medium based on binary tree log

Publications (2)

Publication Number Publication Date
CN115016988A CN115016988A (en) 2022-09-06
CN115016988B true CN115016988B (en) 2022-10-21

Family

ID=83065867

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210941069.9A Active CN115016988B (en) 2022-08-08 2022-08-08 CDP backup recovery method, system and storage medium based on binary tree log

Country Status (1)

Country Link
CN (1) CN115016988B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113868273A (en) * 2021-09-23 2021-12-31 北京百度网讯科技有限公司 Metadata snapshot method and device

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7577806B2 (en) * 2003-09-23 2009-08-18 Symantec Operating Corporation Systems and methods for time dependent data storage and recovery
CN101777016B (en) * 2010-02-08 2012-04-25 北京同有飞骥科技股份有限公司 Snapshot storage and data recovery method of continuous data protection system
US9946607B2 (en) * 2015-03-04 2018-04-17 Sandisk Technologies Llc Systems and methods for storage error management
CN110058969B (en) * 2019-04-18 2023-02-28 腾讯科技(深圳)有限公司 Data recovery method and device
CN110837650B (en) * 2019-10-25 2021-08-31 华中科技大学 Cloud storage ORAM access system and method under untrusted network environment
CN112464044B (en) * 2020-12-09 2023-04-07 上海爱数信息技术股份有限公司 File data block change information monitoring and management system and method thereof
CN112699123A (en) * 2020-12-30 2021-04-23 武汉大学 Method and system for verifying existence and integrity of data in data storage system
CN114461456B (en) * 2022-04-11 2022-06-21 成都云祺科技有限公司 CDP backup method, system, storage medium and recovery method based on continuous writing

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113868273A (en) * 2021-09-23 2021-12-31 北京百度网讯科技有限公司 Metadata snapshot method and device

Also Published As

Publication number Publication date
CN115016988A (en) 2022-09-06

Similar Documents

Publication Publication Date Title
US11960777B2 (en) Utilizing multiple redundancy schemes within a unified storage element
US8250043B2 (en) System and method for compression of partially ordered data sets
US8386443B2 (en) Representing and storing an optimized file system using a system of symlinks, hardlinks and file archives
US10936228B2 (en) Providing data deduplication in a data storage system with parallelized computation of crypto-digests for blocks of host I/O data
US10311083B2 (en) Search and analytics for storage systems
US8239706B1 (en) Data retrieval system and method that provides retrieval of data to any point in time
US10715184B2 (en) Techniques for fast IO and low memory consumption while using erasure codes
US11269817B2 (en) System and method for efficiently measuring physical space for an ad-hoc subset of files in protection storage filesystem with stream segmentation and data deduplication
US11656942B2 (en) Methods for data writing and for data recovery, electronic devices, and program products
CN114461456B (en) CDP backup method, system, storage medium and recovery method based on continuous writing
Venkatesan et al. Effect of codeword placement on the reliability of erasure coded data storage systems
Wang et al. Exalt: Empowering Researchers to Evaluate {Large-Scale} Storage Systems
US11797397B2 (en) Hybrid NVRAM logging in filesystem namespace
US10430383B1 (en) Efficiently estimating data compression ratio of ad-hoc set of files in protection storage filesystem with stream segmentation and data deduplication
Wu et al. PP: Popularity-based proactive data recovery for HDFS RAID systems
Zhang et al. Improving restore performance of packed datasets in deduplication systems via reducing persistent fragmented chunks
CN111857603B (en) Data processing method and related device
CN115016988B (en) CDP backup recovery method, system and storage medium based on binary tree log
Arafa et al. Fault tolerance performance evaluation of large-scale distributed storage systems HDFS and Ceph case study
CN112328433A (en) Processing method and device for restoring archived data, electronic device and storage medium
Meister Advanced data deduplication techniques and their application
CN114328373A (en) Method, electronic device and computer program product for managing a file system
Zhang et al. Reducing chunk fragmentation for in-line delta compressed and deduplicated backup systems
CN110688071A (en) Data synchronization method and system for reducing data synchronization quantity
CN114281246B (en) Cloud hard disk online migration method, device and equipment based on cloud management platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant