CN110309227B - Distributed data rollback method, device and computer readable storage medium - Google Patents

Distributed data rollback method, device and computer readable storage medium Download PDF

Info

Publication number
CN110309227B
CN110309227B CN201810522885.XA CN201810522885A CN110309227B CN 110309227 B CN110309227 B CN 110309227B CN 201810522885 A CN201810522885 A CN 201810522885A CN 110309227 B CN110309227 B CN 110309227B
Authority
CN
China
Prior art keywords
data
node
compensation
distributed
time point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810522885.XA
Other languages
Chinese (zh)
Other versions
CN110309227A (en
Inventor
张文
雷海林
赵伟
潘安群
赵东志
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Tencent Cloud Computing Beijing Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Tencent Cloud Computing Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd, Tencent Cloud Computing Beijing Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201810522885.XA priority Critical patent/CN110309227B/en
Publication of CN110309227A publication Critical patent/CN110309227A/en
Application granted granted Critical
Publication of CN110309227B publication Critical patent/CN110309227B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Abstract

The application relates to a distributed database archive returning method, a distributed database archive returning device, a computer readable storage medium and computer equipment, wherein the method comprises the following steps: acquiring a target time point of data retracing; acquiring backup data corresponding to the target time point of each data node; the data of each data node is restored to the target time point according to the backup data; acquiring a distributed data operation result record of each data node recorded in a distributed transaction log; and performing data compensation on data obtained by each data node from the target time point to the file return according to the backup data of each data node and the distributed data operation result record to obtain the file return data of each data node at the target time point. The method eliminates the problem of data inconsistency caused by time inconsistency of each data node, and ensures the consistency of the global data of the whole distributed database.

Description

Distributed data rollback method, device and computer readable storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a distributed data archive returning method and apparatus, a computer-readable storage medium, and a computer device.
Background
For database systems, it is often desirable to perform a data rollback, i.e., restore data to a specified point in time or period of time, for a variety of reasons. For example, due to misoperation, the data needs to be backed out to one hour ago, or due to analysis statistics, the historical data needs to be analyzed, so that the cluster data needs to be backed up to a target time point, and the analysis is convenient.
In the conventional technology, data backups are often based on existing backups, and common backup modes are full backup and incremental backup. The full backup is a complete data mirror image of one data node, so that real-time backup is not needed, and a regular backup mechanism is often adopted; incremental backups are content that is newly generated by a data node every moment, and real-time backups are often adopted for the data. Full and incremental backups form the basic conditions for distributed database data rollback.
However, in the distributed database, there may be a case where the time of each data node is inconsistent, and the data rollback mode in the conventional technology may cause a problem of inconsistent data, for example, when the data rollback mode is applied to a financial aspect, problems such as wrong account, miscounting, and uneven general ledger may be caused.
Disclosure of Invention
Based on this, it is necessary to provide a distributed data rollback method, apparatus, computer-readable storage medium and computer device for improving data consistency, aiming at the technical problem of data inconsistency.
A distributed database rollback method comprises the following steps:
acquiring a target time point of data return;
acquiring backup data of each data node corresponding to the target time point;
according to the backup data, the data of each data node is restored to the target time point;
acquiring a distributed data operation result record of each data node recorded in a distributed transaction log;
and performing data compensation on the data obtained from the target time point of the backlog of each data node according to the backup data of each data node and the distributed data operation result record to obtain the backlog data of each data node at the target time point.
A distributed database archive back device, the device comprising:
the backup data acquisition module is used for acquiring a target time point of data backtracking; acquiring backup data corresponding to the target time point of each data node;
the backup data back-shifting module is used for back-shifting the data of each data node to the target time point according to the backup data;
the data compensation module is used for acquiring a distributed data operation result record of each data node recorded in the distributed transaction log; and performing data compensation on the data obtained from the target time point of the backlog of each data node according to the backup data of each data node and the distributed data operation result record to obtain the backlog data of each data node at the target time point.
A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:
acquiring a target time point of data return;
acquiring backup data of each data node corresponding to the target time point;
the data of each data node is restored to the target time point according to the backup data;
acquiring a distributed data operation result record of each data node recorded in a distributed transaction log;
and performing data compensation on the data obtained from the target time point of the backlog of each data node according to the backup data of each data node and the distributed data operation result record to obtain the backlog data of each data node at the target time point.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:
acquiring a target time point of data retracing;
acquiring backup data of each data node corresponding to the target time point;
according to the backup data, the data of each data node is restored to the target time point;
acquiring a distributed data operation result record of each data node recorded in a distributed transaction log;
and performing data compensation on the data obtained from the target time point of the backlog of each data node according to the backup data of each data node and the distributed data operation result record to obtain the backlog data of each data node at the target time point.
According to the distributed database gear-returning method, the distributed database gear-returning device, the computer readable storage medium and the computer equipment, after the backup data corresponding to each data node at the target time point is used for carrying out the first data gear-returning operation, the distributed data operation result record of each data node recorded in the distributed transaction log is obtained, and data compensation is carried out on each data node which has been subjected to the first data gear-returning operation according to the backup data of each data node and the distributed data operation result record, so that the data gear-returning operation of the distributed database can be completed. By the method, on the basis of using the backup data to perform data archive operation, the backup data of each data node and the distributed transaction logs are used for respectively performing data compensation on each data node to complete the whole data archive operation, the problem of data inconsistency caused by time inconsistency of each data node is solved, the consistency of the global data of the whole distributed database is ensured, and the problem of data disorder caused by data inconsistency of each data node is avoided.
Drawings
FIG. 1 is a schematic flow chart diagram illustrating a distributed database archive back method in one embodiment;
FIG. 2 is a schematic flow chart that illustrates logging of successfully executed data operations for data compensation over a period of time in incremental backup data of a data node performed in step(s) in one embodiment;
FIG. 3 is a flowchart of step 206 in one embodiment;
FIG. 4 is a schematic flow chart of step 106 in one embodiment;
FIG. 5 is a flowchart illustrating a distributed database rollback method according to another embodiment;
FIG. 6 is a flowchart illustrating a distributed database rollback method according to yet another embodiment;
FIG. 7 is a diagram illustrating data backups for each data node based on full backup data and incremental backup data, under an embodiment;
FIG. 8 is a schematic diagram illustrating a time difference between data node 1 and data node 2 in one embodiment;
FIG. 9 is a block diagram of the distributed database archive back means in one embodiment;
FIG. 10 is a block diagram of the structure of a data compensation module in one embodiment;
FIG. 11 is a block diagram showing the structure of a record execution module in one embodiment;
FIG. 12 is a block diagram of a computer device in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more clearly understood, the present application is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
As shown in FIG. 1, in one embodiment, a distributed database rollback method is provided. Referring to fig. 1, the distributed database archive returning method specifically includes the following steps:
and 102, acquiring a target time point of data retracing.
For a database, it may often be necessary to shift back data due to misoperation, that is, to return data to a specific time point, that is, to shift back data to a target time point. The distributed database is composed of a plurality of data nodes, and the data of all the data nodes form the complete data of the distributed database, so that the file returning operation of the whole distributed database system can be divided into the file returning operation of each data node when the distributed database is subjected to file returning, and the file returning of the whole distributed database system is finished when all the data nodes are subjected to file returning.
And 104, acquiring backup data corresponding to the target time point of each data node.
The backup data corresponding to the target time point of each data node is data which is backed up by each data node at the target time point, and the backup data comprises all data of the data node before the target time point. The backup data may include full backup data and incremental backup data. The full backup data refers to a complete data mirror image, and each data node has corresponding full backup data in the distributed database. That is, at a particular time, the full amount of backup data for each data node includes all of the data for each data node at that particular time. Therefore, the full backup does not need real-time backup, and a mechanism of periodic backup is often adopted. The incremental backup data refers to the content generated by the data node at every moment, namely, the changed data, so that the incremental backup data adopts a real-time backup mechanism, namely, the changed data can be backed up in real time as long as the data is changed. When data backups need to be performed on the distributed database system, the full backup data and the incremental backup data of each data node at a target time point must be acquired first, and the full backup data and the incremental backup data form a basic condition for the data backups of the distributed database.
And step 106, the data of each data node is backed up to the target time point according to the backup data.
When the data of the distributed database needs to be restored to the target time point, and the backup data corresponding to each data node, such as the full backup data and the incremental backup data, needs to be acquired at the target time point, the data of each data node can be restored to the target time point according to the full backup data and the incremental backup data of each data node.
For example, when the target time point of data rollback is 12 hours 00 hours, and the full backup data is periodically backed up at 11 hours 30 minutes, the full backup data of 11 hours 30 minutes and the incremental backup data between 11 hours 30 minutes and 12 hours 00 minutes can be acquired, i.e., the data can be rolled back to 12 hours 00 hours. When data is subjected to data retraining, the same processing mode can be adopted for each data node, namely, the full backup data and the incremental backup data of each data node are obtained, so that data retraining operation is performed on each data node, and the data of each data node is retrained to a target time point.
And step 108, acquiring a distributed data operation result record of each data node recorded in the distributed transaction log.
The distributed transaction log is a file for recording data modification conditions in the distributed database, that is, a distributed data operation result record of a plurality of data nodes in the distributed database is recorded in the distributed transaction log. The distributed data operation result record refers to a record generated when the data node performs data operation, and the data operation may be data change operation such as modification and deletion of data. The distributed transaction logs can be stored in each data node discretely, and when the distributed transaction logs need to be acquired, the distributed transaction logs can be acquired from the data nodes. In the distributed transaction log, a record of a result of a distributed data operation involving the relevant data node in the distributed database is recorded. That is, any data node associated with the distributed transaction performs the data operation of the distributed transaction operation, a distributed transaction identifier is generated, and the identifier is recorded in the distributed transaction log and the data operation log of the incremental backup data. In addition, corresponding distributed transaction identifications of a plurality of data nodes of the same distributed transaction are consistent, and the identifications are recorded in the distributed transaction log. Therefore, the distributed transaction log is stored in the cluster in an atomization mode, namely, a distributed transaction can be inquired in the log or cannot be inquired in the log.
And step 110, performing data compensation on data obtained when each data node is returned to the target time point according to the backup data of each data node and the operation result records of the distributed data, and obtaining the return data of each data node at the target time point.
The finally obtained return data of each data node at the target time point is consistent data near the target time point, that is, when the target time point is 12 hours and 00 minutes, the actual return data is 12 hours and 02 minutes, and when the data of all the data nodes are returned to 12 hours and 00 minutes near 12 hours and 00 minutes, the data of all the data nodes are consistent data. After the distributed data operation result record of each data node at the target time point is obtained, data compensation can be performed on data obtained when each data node is returned to the target time point according to the backup data of each data node and the distributed data operation result record, and the return data of each data node at the target time point is obtained. Specifically, after the distributed data operation result record of each data node at the target time point is obtained, the data compensation operation may be performed on each data node by combining the data operation record log in the incremental backup data of each data node near the target time point. The data compensation is to perform data sorting once on the data nodes subjected to the backlog according to the full backup data and the incremental backup data, and uniformly adjust the data of each data node to a certain relative global relative time. The global relative time may be a time point at which each data node is balanced, and at which the data of each data node in the distributed database is in a balanced state. Distributed transactions involving the data nodes are either all successful or all failed, and there is no case where one distributed transaction is successfully executed at some data nodes and fails at other data nodes.
For example, when the data needs to be shifted back to 12 hours 00 minutes, in order to avoid the situation that the data after the data is shifted back is inconsistent due to the time error of each data node, the data node with the slower time stamp can be compensated backwards for a period of time, so as to level up the error of other data nodes. For example, the data of 12 hours 00 is compensated to 12 hours 02, so that the error problem of data backspacing caused by inconsistent time stamps can be eliminated.
And after carrying out first data back-up operation by utilizing the full backup data and the incremental backup data corresponding to each data node at the target time point, acquiring a distributed data operation result record of each data node at the target time point, which is recorded in the distributed transaction log, and carrying out data compensation on each data node which has carried out the first data back-up operation according to the distributed data operation result record of each data node, so that the data back-up operation of the distributed database can be completed. By the method, on the basis of performing data rollback by using full backup data and incremental backup data, data compensation is performed by using the distributed transaction log to complete the whole data rollback operation, the problem of data inconsistency caused by time inconsistency of each data node is solved, the consistency of global data of the whole distributed database is ensured, and the problem of data disorder caused by data inconsistency of each data node is avoided.
In one embodiment, performing data compensation on data obtained by each data node from a file back to a target time point according to backup data of each data node and a distributed data operation result record, includes: when a certain data node needs to perform data compensation, a log of data operation performed successfully within a period of time in the incremental backup data of the data node is executed to perform data compensation.
When data compensation is performed on the data nodes, data compensation is performed according to the backup data of each data node and the operation result records of the distributed data. However, not every data node needs to perform data compensation, but the data node that needs to perform data compensation. For example, when the data operation log included in the incremental backup data of the data node a does not involve multiple data nodes, that is, only the data node a performs data operation, and the data node a does not perform data interaction with other data nodes. In this case, the data node a does not need to perform data compensation.
Or, when the data operation record log included in the incremental backup data of the data node a is not successfully executed, that is, the data operation record log corresponding to the data node a cannot find a consistent data operation record in the distributed transaction log, it represents that the data node a does not successfully execute the data operation record, and the data node a does not need to perform data compensation operation. And only when the data node needs to perform data compensation, performing a log of successfully executed data operation records in the incremental backup data of the data node for a period of time to perform data compensation.
In one embodiment, as shown in fig. 2, the logging of successfully executed data operations in the incremental backup data of the data node for data compensation includes:
step 202, obtaining the time difference between each data node, and taking the time difference with the largest value as the maximum compensation time.
And step 204, acquiring a distributed data operation result record in the maximum compensation time taking the target time point as the starting time from the distributed transaction log.
And step 206, performing data compensation on data obtained when each data node is returned to the target time point according to the distributed data operation result records and the backup data of the data nodes obtained from the distributed transaction logs.
The specific process of performing data compensation on data obtained by retraining each data node to a target time point according to the distributed data operation result record of each data node at the target time point is as follows: and acquiring the time difference between the current time stamps of each data node, comparing the time differences, and taking the time difference with the largest value as the maximum compensation time. The maximum compensation time can be obtained by taking the time error between the data nodes, and in practical situations, other errors can exist. For example, network delay, errors due to machine performance, and the actual compensation time is longer than the time error of the machine. And after the maximum compensation time is obtained, acquiring a distributed data operation result record in the maximum compensation time with the target time point as the starting time from the distributed transaction log. The recording of the distributed data operation result in the maximum compensation time with the target time point as the starting time refers to recording of the data operation including the distributed transaction in the time period of the maximum compensation time d with the target time point T as the starting time. And extracting a data operation record containing the distributed transaction in the time period from T to T + d, namely a distributed data operation result record. And combining the two types of data operation records according to a data operation record log obtained from the incremental backup data of the data nodes to perform data compensation on the data obtained when each data node is returned to the target time point.
And acquiring the distributed data operation result records from the distributed transaction logs by using the maximum compensation time, so that the distributed data operation result records of each data node can be covered. Therefore, data compensation can be performed on data obtained when each data node is returned to a target time point according to the distributed data operation result records and the backup data obtained from the distributed transaction logs, and the consistency of the whole data is ensured.
In one embodiment, as shown in fig. 3, the step 206 includes:
step 302, obtaining incremental backup data of the data node in the maximum compensation time taking the target time point as the starting time.
Step 304, when there is a data operation record consistent with the distributed data operation result record in the distributed transaction log in the data operation record log included in the incremental backup data within the maximum compensation time taking the target time point as the starting time, executing the consistent data operation record log in the incremental backup data of the data node to perform data compensation.
In the incremental backup data of each data node, a data operation log of each data node is stored. For example, if the data node 1 performs a plurality of data operations between 12 hours 00 and 12 hours 05, then in the incremental backup data between 12 hours 00 and 12 hours 05 of the data node 1, a data operation log of the data node 1 in this time period is stored. Therefore, what is stored in the incremental backup data is a data operation log generated by the data node when performing a data operation.
After the time difference between each data node is obtained and the time difference with the largest numerical value is taken as the maximum compensation time, the incremental backup data of each data node in the maximum compensation time with the target time point as the starting time can be obtained. In the incremental backup data corresponding to each data node, a data operation record log of each data node is stored, so that the incremental backup data of each data node in the maximum compensation time taking the target time point as the starting time is obtained, that is, all data operation record logs of each data node in the maximum compensation time taking the target time point as the starting time are obtained.
After the distributed data operation result record of each data node is obtained, the data operation record logs in the incremental backup data can be respectively compared with the distributed data operation result records in the maximum compensation time which is obtained from the distributed transaction logs and takes the target time point as the starting time. When the data operation record log of the incremental backup data has the data operation record matched with the distributed data operation result record in the distributed transaction log, the data operation record log consistent with the distributed transaction log in the incremental backup data of the data node is executed to modify the data, so that the compensation effect is achieved.
In one embodiment, as shown in fig. 4, step 206 further includes:
and step 402, acquiring time difference among the data nodes, and taking the time difference with the largest value as the maximum compensation time.
For a distributed database, there are multiple data nodes, and there may be errors in time stamps, i.e., time differences, between the various data nodes. Therefore, the time difference between the current time stamps of each data node can be obtained, the time difference between each data node is compared, and the time difference with the largest value is used as the maximum compensation time. For example, if there are errors between timestamps of 3 data nodes in the distributed database, and the time differences are 1s (second), 2s, and 3s, respectively, then the time difference 3s with the largest value may be taken as the maximum compensation time.
And step 404, acquiring incremental backup data of the data node in the maximum compensation time taking the target time point as the starting time.
After the maximum compensation time is determined, incremental backup data of each data node in the maximum compensation time with the target time point as the starting time can be acquired. For example, if the maximum compensation time is 3s and the target time point is X, the incremental backup data of each data node in 3s with the time X as the starting time, that is, the incremental backup data in the time period from the time X to the time X +3s, may be obtained. For the incremental backup data of each data node, all data operation record logs of the data node in the time period are stored in the incremental backup data in the time period.
And 406, acquiring a distributed data operation result record in the maximum compensation time taking the target time point as the starting time from the distributed transaction log.
After the maximum compensation time is determined, the distributed data operation result record of each data node recorded in the distributed transaction log in the maximum compensation time with the target time point as the starting time can also be obtained. Different from the incremental backup data of each data node, each data node has the corresponding incremental backup data, so that the incremental backup data corresponding to each data node stores the respective data operation record log of each data node. The distributed transaction log is a file of modification conditions of each data node in the distributed database, so that the distributed transaction log records distributed data operation result records of all the data nodes and is an integral record log. That is, in the incremental backup data, the data modification condition of the corresponding data node is recorded specifically, and what is recorded in the distributed transaction log is the result of the data operation of the data node.
And recording the data operation completion condition of the data nodes participating in the distributed transaction at the current time in the distributed transaction log. When the data is expected to be rewound to the target time point, the distributed data operation result records in the maximum compensation time taking the target time point as the starting time can be extracted, and the distributed transaction logs before the time point are combined. The added log only has the data operation log of the data node, and the distributed transaction log is not subjected to additional processing and is kept at the moment before the first gear return, namely compensation. In this way, it can be ensured that the data operation records in the additional data operation log can cover all the data node target time points and the contents later, that is, the additional data operation log with the compensation time is a superset of the distributed transaction log.
Step 408, when there is a data operation record matching the distributed data operation result record in the distributed transaction log within the maximum compensation time with the target time point as the start time in the data operation record log included in the incremental backup data within the maximum compensation time with the target time point as the start time, executing a data operation record log consistent with the incremental backup data for data compensation.
Through the steps, a data operation record log in the incremental backup data of each data node in the maximum compensation time taking the target time point as the starting time and a log of the data operation completion condition of a plurality of nodes in the distributed transaction log, namely a distributed data operation result record of each data node can be obtained. That is, for each data node, two types of data operation records within the maximum compensation time taking the target time point as the starting time can be obtained, namely a data operation record log in the incremental backup data and a plurality of data node operation completion condition logs in the distributed transaction log. Therefore, for each data node, the data operation record log in the incremental backup data and the record related data node operation completion in the distributed transaction log can be compared respectively.
When comparing the data operation record log in the incremental backup data of each data node in the maximum compensation time taking the target time point as the starting time with the operation completion condition of a plurality of data nodes recorded in the distributed transaction log, namely comparing the data operation record log in the incremental backup data with the distributed data operation result record in the distributed transaction log, firstly scanning and analyzing the data operation log one by one. And filtering out operation records related to the distributed transaction identification, and checking the distributed transaction log. If the distributed transaction identification can be queried in the distributed transaction log, the operation log related to the distributed transaction should be compensated, otherwise, the rollback is discarded.
With the maximum backoff time, the data operation identifiers related to distributed transactions, namely to the operations of a plurality of nodes, in the incremental backup in the scanning backoff time period ensure that the data operation record of each data node can be covered. And obtaining incremental backup data of each data node in the maximum compensation time taking the target time point as the starting time, so that the data operation record log in the incremental backup data can be compared with the identified distributed transaction completion condition of the distributed transaction identifier recorded by the distributed data operation result in the distributed transaction log. And executing the distributed data operation result record which is consistent with the distributed transaction log in the data operation record log contained in the incremental backup data to the corresponding data node, so that the data compensation operation is performed in such a way, and the consistency of the whole data is ensured. Therefore, the transaction in the distributed transaction log is ensured, and all the involved data nodes must perform submission; transactions that do not exist in the distributed transaction log, none of the data nodes involved in the transaction have a modified commit of the data.
In one embodiment, the step 110 includes: acquiring time difference among all data nodes, and taking the time difference with the largest value as the maximum compensation time; acquiring a data operation record log in the maximum compensation time taking a target time point as an initial time from the incremental backup data of each data node; when the data operation record logs in the maximum compensation time in the incremental backup data of the data node are data operation record logs generated by corresponding operations executed by different data nodes, the data node needs to perform data compensation.
One record in the distributed transaction log contains operation records related to a plurality of data nodes, so that the record in the distributed transaction log is a data operation result record set related to all the data nodes in one distributed transaction, namely the distributed transaction log records the distributed data operation result records of all the data nodes. But compensation is not performed on a data node as long as the data node has a data operation record related to the distributed transaction within the maximum compensation time taking the target time point as the starting time. Instead, a judgment needs to be made, when the data operation record log in the incremental backup data is matched with the distributed data operation result record in the distributed transaction log, that is, there is an operation with at least two data nodes or more than two distributed transaction identifiers, and the operation can be queried in the distributed transaction day. Therefore, when a distributed data operation result record generated by a corresponding operation is recorded in the distributed transaction log, data compensation is performed according to data obtained by the data node which is acquired from the data operation record log in the incremental backup data and related to the corresponding data operation of the distributed transaction and executes the corresponding operation and is back-filed to a target time point. That is, the corresponding distributed transaction in the distributed transaction log can be back-checked, and if the distributed transaction log can be checked, the corresponding data compensation operation is performed, that is, the data node needs to perform data compensation.
In one embodiment, as shown in FIG. 5, a distributed database rollback method is provided. Referring to fig. 5, the distributed database rollback method specifically includes the following steps:
step 502, a target time point of data rollback is obtained.
Step 504, acquiring the full backup data and the incremental backup data corresponding to the target time point of each data node.
Step 506, the data of each data node is restored to the target time point according to the full backup data and the incremental backup data.
For a database system, a data rollback operation is often required for various reasons, and when the data rollback is performed, the first rollback operation may be based on existing backup data, i.e., full backup data and incremental backup data. The distributed database is composed of a plurality of data nodes, so that when data is subjected to data back-shifting on the whole distributed database, the data is subjected to data back-shifting on each data node in the distributed database. And when the data back-shifting of each data node is finished, finishing the data back-shifting of the whole distributed database. Therefore, the full backup data and the incremental backup data corresponding to the target time point of each data node can be acquired, the full backup data and the incremental backup data of each data node are combined, the first data return is performed on each data node, and the data of each data node is returned to the target time point.
And step 508, acquiring a data operation record log of each data node near the target time point.
Step 510, when the data operation record logs are data operation records generated by corresponding operations executed by different data nodes, performing data compensation on data obtained by each data node from the file back to the target time point according to the data operation record logs of each data node in the maximum compensation time taking the target time point as the starting time. For a distributed transaction log, a transaction refers to a series of operations performed as a single logical unit of work. Any data modification operation involving multiple data nodes may be referred to as a distributed transaction, i.e., a data modification operation involving more than one node is also referred to as a distributed transaction. If only one data node is involved in the data modification operation, then the operation is referred to as a non-distributed transaction. Each distributed transaction, once deemed to be successful, records the operation in a distributed transaction log.
Therefore, when a data operation record log recorded in the incremental backup data corresponding to each data node is acquired, it needs to be detected whether the data operation record log spans multiple data nodes, that is, whether the data operation record log belongs to a distributed transaction operation. If it is detected that a data operation log is generated by operations corresponding to different data nodes, that is, the data operation log belongs to a sub-operation in a distributed transaction operation, the data operation log needs to be queried about a distributed data operation result record in the distributed transaction log to determine whether to perform data compensation. That is, whether the data operation record log belongs to a distributed transaction or not is detected, that is, whether the same data operation record log relates to a plurality of data nodes or not, and meanwhile, for the distributed transaction and the non-distributed transaction, the data operation log can be obviously marked and distinguished. If yes, the data operation record log belongs to the distributed transaction, the data operation record log needs to be extracted and analyzed, and if the data operation record log needs to be modified, the modification is carried out for compensation. Namely, the data operation log needs to be executed on the corresponding data node to complete the data compensation operation of the data node. For example, there are N data nodes in the distributed database, respectively DataNode-1, dataNode-2, \8230, dataNode-N. For transfer operations involving two nodes, dataNode-1 and DataNode-2, at a target point in time, the debit record for DataNode-1 and the posting record for DataNode-2 in the distributed transaction log are one distributed transaction, i.e., one record. In this case, it may be considered that a distributed data operation result record generated by corresponding operations of a plurality of data nodes is recorded in the distributed transaction log, that is, there is a record generated by corresponding operations executed for different data nodes in the data operation record at the target time point. Then, the deduction record and the posting record need to be respectively executed for the corresponding data nodes DataNode-1 and DataNode-2 to complete the data compensation operation of the two data nodes.
For each data operation record log, if a plurality of data nodes are not involved, the data operation record log is a non-distributed transaction, and the data operation record log does not need to be executed for data compensation. Therefore, when data compensation is performed, distributed transactions are performed, that is, one data operation record log relates to a plurality of data operation nodes, so that the corresponding data operation record log is performed on the data nodes for data compensation, and balance, that is, consistency of the overall data is achieved.
In one embodiment, the target time point is a node time of each data node, and the distributed transaction log is discretely stored in each data node.
When data is subjected to data backtracking, the target time point is the respective node time of each data node, namely the respective timestamp of each data node. For example, the target time is 12 points, which is the time when the respective timestamp of each data node is 12 points, but there may be a time error in the time of each data node. The distributed transaction log records data operation records of each data node, and when data of any data node is changed, one operation record is stored in the distributed transaction log. The distributed transaction log can be discretely stored in each data node and can be acquired through the data node.
In one embodiment, as shown in FIG. 6, a distributed database rollback method is provided. Referring to fig. 6, the distributed database archive returning method specifically includes the following steps:
step 602, a target time point of data rollback is obtained.
And step 604, acquiring the full backup data and the incremental backup data of each data node corresponding to the target time point.
And step 606, the data of each data node is restored to the target time point according to the full backup data and the incremental backup data.
The distributed database is composed of a plurality of data nodes, so that data retraining of the whole distributed database can be split into data retraining of each data node, and when data retraining of each data node is completed, data retraining of the whole distributed database is completed. Therefore, the full backup data and the incremental backup data corresponding to the target time point of each data node can be acquired, and the data of each data node is returned to the target time point, so that the first part of data return operation is completed.
The full backup data refers to a complete data mirror image, and each data node has corresponding full backup data in the distributed database. That is, at a particular time, the full amount of backup data for each data node includes all of the data for each data node at that particular time. For example, for data node 1, the full amount of backup data at 1/12 in 2018 includes all data at the time point of data node 1 at 1/12 in 2018. Incremental backup data refers to the content of a data node that is generated every moment, i.e., changed data. For example, if the data node 1 performs full data backup in 2018, month 1 and day 12, and performs incremental data backup in 2018, month 1 and day 13, there will be corresponding incremental backup data in month 13. And the incremental backup data stores the data operation log of the data node 1 in the period from 12 hours to 13 hours. Thus, what is stored in the incremental backup data may be a data operation log, rather than the actual data.
As shown in fig. 7, the distributed database includes n data nodes, and each data node has corresponding full backup data and incremental backup data. When the target time point of data retracing is determined, the full backup data and the incremental backup data corresponding to the target time point can be acquired. For example, when the target time point is time T1, the data node 1 needs to acquire the full-amount backup data corresponding to time T1 from the plurality of full-amount backup data in order to restore the data to time T1.
That is, there may be a plurality of full backup data of the data node 1, and after the target time point of data rollback is determined, only the full backup data corresponding to the target time point needs to be used. The full backup data corresponding to the data node 1 at the time T1 and the incremental backup data corresponding to the time T1 may be combined to complete the data rollback operation of the data node 1. Similarly, for the other data nodes 2,3, \ 8230, n may all adopt the same data back-shifting operation to complete the back-shifting of the data of all the data nodes to the time T1. And after all the data nodes finish the back-shifting, finishing the operation of shifting the data in the whole distributed database back to the T1 moment.
Step 608, obtaining the distributed data operation result record of each data node recorded in the distributed transaction log.
And step 610, acquiring time difference among the data nodes, and taking the time difference with the largest value as the maximum compensation time.
Step 612, obtaining the distributed data operation result record in the maximum compensation time with the target time point as the starting time from the distributed transaction log.
And 614, acquiring incremental backup data of each data node in the maximum compensation time taking the target time point as the starting time.
The time of each data node in the distributed database may be inconsistent, that is, the timestamp of each data node may have an error, and some data nodes may be fast or slow. For example, when beijing has a time of 12 hours 00 minutes, the time of data node 1 is 12 hours 01 minutes, and the time of data node 2 is 11 hours 59 minutes. There is a time error between the data node 1 and the data node 2, so if the 12 hour 00 shift back data is directly used as the final shift back data, the whole data may have an error.
The financial application scenario is taken as an example for explanation. As shown in fig. 8, taking data node 1 and data node 2 as an example, when beijing time is 12 hours 00 minutes, data node 1 time is 11 hours 59 minutes, and data node 2 time is 12 hours 01 minutes. The data node 1 is a data node of a user account in the area a, and the data node 2 is a data node of a user account in the area B. When beijing is 12 hours and 00 minutes, a certain client in the area B transfers money to a certain client in the area A, the data node 1 generates an account entry record, and the data node 2 generates a deduction record. Therefore, a deposit record is generated when the data operation record log of the data node 1 is 11 and 59 minutes, and a deduction record is generated when the data operation record log of the data node 2 is 12 and 01 minutes.
If the data of the data node 1 and the data node 2 are both returned to beijing time 12 hours and 00 minutes, the account entry record of the node 2 is generated at 12 hours and 01 minutes, so the data is not recovered to cause the phenomena of uneven general account and wrong account, and the account entry record of the corresponding node 2 cannot be found in the deduction record of the data node 1.
Errors in the time stamps, i.e. time differences, may exist between the various data nodes of the distributed database. Therefore, the time difference between each data node can be obtained, the time differences between the data nodes are compared, and the time difference with the largest value is used as the maximum compensation time. For example, if there are 3 data nodes with respective errors in the distributed database, and the time differences are 1s (second), 2s, and 3s, respectively, then the time difference 3s with the largest value may be taken as the maximum compensation time. Namely, the maximum compensation time d = Max { d1, d2, d3, \8230;, dn }, that is, the time difference with the maximum time difference value of each data node is taken as the maximum compensation time.
The distributed transaction log is a file of data modifications in the distributed database. A transaction refers to a series of operations performed as a single logical unit of work, either performed entirely or not performed at all. In order to solve the problem of data inconsistency caused by the time difference between data nodes after data is subjected to file return, distributed transaction logs can be used for performing data compensation processing on each data node which finishes the first file return data according to the full backup data and the incremental backup data. For example, a distributed database has N nodes DataNode-1, datanode-2, dataNode-3, dataNode-N. Assuming that the Beijing time is taken as the standard time and is set as T, namely the standard time is T, the timestamp and the standard time of two data nodes in the distributed database have errors. DataNode-1 is s seconds faster than the standard time and DataNode-2 is s seconds slower than the standard time. And (4) overlapping incremental backups at the time point T on each data node through full backups to finish the primary backups of the data.
At this time, the data of the desired backlog is time T, but the data of the data node-1 is time T + s and the data of the data node-2 is time T-s due to the time error of the data node-1 and the data node-2. At this time, there is a problem of data inconsistency between the DataNode-1 and DataNode-2, and the key of the inconsistency is that a distributed transaction involving two DataNode-1 and DataNode-2 is cut off. Namely, one distributed transaction involving two or more data nodes of the DataNode-1 and the DataNode-2 causes some node operation logs to record and submit the modification of data for the same distributed transaction due to the time error relationship, and some node operation logs do not record the modification of data due to time lag, so that the data is not submitted, and the backlog of consistent data cannot be completed.
For the above situation, if the operations of multiple data nodes involved in this distributed transaction can be completed, the consistency of the whole data can be achieved. For example, in a transfer operation involving two data nodes, namely, dataNode-1 and DataNode-2, for the posting record of DataNode-2, a deduction record of DataNode-1 recorded in a distributed transaction log can be found, and the record is the key for compensating DataNode-1 or DataNode-2. Therefore, the distributed data operation result record in the distributed transaction log is a criterion for compensation.
In addition, incremental backup data of each data node in the maximum compensation time with the target time point as the starting time needs to be acquired. Namely, for each data node, incremental backup data of the data node in the maximum compensation time taking the target time point as the starting time is obtained. The incremental backup data of each data node comprises all the distributed data operation result records of the data node in the period.
When the data is expected to be backed to the target time point, the distributed data operation result record in the distributed transaction log within the maximum compensation time taking the target time point as the starting time can be extracted.
Step 616, determining whether a distributed data operation result record matched with the distributed transaction log exists in a data operation record log contained in the incremental backup data within the maximum compensation time taking the target time point as the starting time, if so, executing step 618; if not, go to step 624.
Step 618, extracting a data operation log from the incremental backup data of the node, checking the distributed transaction log back according to the identification of the distributed transaction in the log, judging whether a corresponding distributed data operation result record can be inquired, if so, executing step 620; if not, go to step 622.
And step 620, executing the matched data operation record logs in the incremental backup data and the distributed transaction log by the corresponding data node to perform application data compensation.
Step 622, no processing is performed.
Step 624, no processing is performed.
After acquiring a data operation record log in the incremental backup data of each data node within the maximum compensation time taking the target time point as the starting time and the completion condition of the distributed transaction recorded in the distributed transaction log, comparing the two types of data operation records. For incremental backup data of each data node in the maximum compensation time taking the target time point as the starting time, comparing data operation record logs in the incremental backup data one by one with distributed data operation result records in the distributed transaction logs. Taking the data node 1 as an example, incremental backup data of the data node 1 in the maximum compensation time taking the target time point as the starting time is B backup data, and a distributed transaction log record of the data node 1 is C.
The distributed data operation result records can be extracted one by one from the backup data B, the data operation records related to the distributed type are compared with the data operation records C, and when the data operation record logs related to the distributed type affairs in the backup data B can be inquired in the data operation records C, namely when the distributed data operation result records in the backup data B are consistent with the data operation records C, the backup data B can be superposed on the data node 1. Namely, the data node 1 executes the distributed data operation result record in the B backup data, which is consistent with that in C. If the incremental backup data is inquired to have no data operation record consistent with the distributed transaction log, the incremental backup data is not processed, namely the data operation record log in the incremental backup data is not executed, namely the distributed data operation result record in the distributed transaction log is not executed.
For a distributed database, any data modification operation involving multiple data nodes is referred to as a distributed transaction, i.e., when the data modification operation involves more than one data node, the transaction is considered to be a distributed transaction. Each distributed transaction, once deemed to be successful, records the operation in a distributed transaction log. Therefore, a judging step can be added on the basis of the incremental backup data, and whether the consistent data operation records in the incremental backup data and the distributed transaction log belong to the distributed transaction or not can be judged. That is, it is determined whether the data operation records consistent in the incremental backup data and the distributed transaction log are data operation records generated by executing corresponding operations for different data nodes. As in the above example, if the backup data B matches the data operation record C, which includes two data operation records X1 and X2, it can be further determined whether the data operation records X1 and X2 relate to multiple data nodes.
For example, in a financial application scenario, the distributed database has N nodes DataNode-1, datanode-2, dataNode-3. In a time period from the time T to the time T + d, when a deduction record of the DataNode-1 is recorded in the incremental backup data of the DataNode-1, and when an account entry record of the DataNode-2 is recorded in the incremental backup data of the DataNode-2, and when a deduction record of the DataNode-1 and an account entry record of the DataNode-2 are also recorded in the distributed transaction log, it is considered that in this time period, a data operation record log in the incremental backup data corresponding to the DataNode 1 and the DataNode 2 is consistent with a transaction completion condition recorded in the distributed transaction log, and the deduction record relates to different datanodes and belongs to a distributed transaction, and the deduction record of the DataNode-1 and the account entry record of the DataNode-2 recorded in the distributed transaction can be executed.
In the compensation process, the data operation log of the DataNode-1 is analyzed, and the deduction record recorded with the DataNode-1 is retrieved, but the deduction record has no distributed transaction mark, which indicates that the deduction record of the node has no associated posting records of other nodes. That is, no accounting record of the DataNode-2 or other data nodes is recorded in the distributed transaction log, which indicates that the data operation record of the deduction record does not involve multiple data nodes, and the data operation record does not belong to the distributed transaction. The data node 1 does not perform this data operation logging for data compensation and skips directly for the operation.
For scenes with high requirements on data consistency, such as financial application scenes, relating to transaction, bill data and the like, the accuracy of partial time can be sacrificed when data is subjected to data backtracking, so that the symmetry of the general ledger is ensured, namely the phenomenon of 'account unevenness' cannot be tolerated. In the distributed data rollback method in this embodiment, data rollback is performed in two stages, where in the first stage, data of each data node is rolled back to a target time point by using full backup data and incremental backup data. While in the second phase the data for each data node may be compensated based on the distributed transaction log. For example, the distributed transaction log may instruct a data node that lacks data to perform a compensation operation, so as to restore the entire data to an account balance state. And in the balanced state, the phenomena of wrong account and disordered account can not occur on each data node, so that general account balance is achieved. For the financial database, due to the fact that monetary correlation is involved, miscount of data is zero-tolerant, and reliable technical guarantee can be provided for consistency of financial data through the distributed data archive returning mode in the embodiment.
The principle of the distributed database archive method provided by the present application is described below with a specific example. Suppose the target time point is 12 hours 00, i.e. it is desired to shift the data in the distributed database back to 12 hours 00. After the target time point is obtained, backup data corresponding to each data node in the distributed database at 12 hours 00, including full backup data and incremental backup data, may be obtained first. The first data return can be performed on each data node according to the backup data, that is, the data of each data node is returned to 12 hours and 00 minutes of each data node.
It can be understood that after the first data rollback, the data backed up by each data node is data of which the respective node time is 12 hours and 00 minutes. However, in an actual process, due to the fact that time differences may exist between data nodes in the distributed data, in fact, data differences still exist between the data nodes although the data nodes themselves think that the data nodes are returned to 12 hours and 00 minutes. For example, for data node a, data node a's data that is shifted back to 12 hours 00 is data node B's data at 11 points 58, while data node B's data that is shifted back to 12 hours 00 is data node a's data that is shifted back to 12 hours 02. Therefore, if there is interaction between data nodes a and B, that is, if data nodes a and B perform corresponding operations in 12 hours 00 minutes, in the incremental backup data corresponding to data nodes a and B, the corresponding time of the corresponding data operation record is inconsistent, the corresponding time of the data operation record in data node a is 12 hours 00 minutes, and the corresponding time of the data operation record in data node B is 12 hours 02 minutes. If the time of both data node a and data node B is directly shifted back to 12 hours 00, the data between the two data nodes is not consistent.
In order to make the data in the distributed database consistent after the data nodes are subjected to the retracing, data compensation can be performed once after the first data retracing is performed. The specific process is as follows: the time difference between each data node in the distributed database is obtained first, and the time difference with the largest numerical value is used as the maximum compensation time. And then acquiring a data operation record log in the maximum compensation time which takes the target time point as the starting time and is recorded in the incremental backup data of each data node. For example, if the target time point is 12 hours and 00 minutes, and the maximum compensation time is 10 minutes, the data operation record log recorded in the incremental backup data of each data node needs to be acquired in the time period from 12 hours and 00 minutes to 12 hours and 10 minutes.
Further, the log of the data operation records of each data node in the period of time needs to be checked. Firstly, whether the data operation record in the period of time is generated by corresponding operations performed between different data nodes is checked. If so, the data operation record log in the period of time belongs to the distributed transaction, otherwise, if the data operation record log in the period of time is generated by the same data node operation, the data operation record log in the period of time belongs to the non-distributed transaction.
For the data operation record logs belonging to the distributed transaction, the distributed transaction logs need to be acquired, and whether distributed data operation result records consistent with the data node operation record logs exist in the distributed transaction logs is checked reversely. If a consistent data operation record exists in the incremental backup data and the distributed transaction log, which indicates that the operation is successfully executed before, the data node needs to execute the consistent data operation record log in the incremental backup data for data compensation. That is, after the first gear back, the data node which needs to perform data compensation executes the data operation log within a period of time, and the period of time of the period of time is the period of time of the maximum compensation time. In this way, it is considered that the data nodes are given enough time to perform data compensation, and the data compensation after the enough time can achieve data consistency of each data node.
It should be noted that, with the distributed database rollback method provided by the present application, after data compensation is performed, data in each data node may be kept consistent, but time in each data node may still be inconsistent. In many application scenarios, however, data consistency is particularly important. Therefore, even if the time of each data node is inconsistent, the consistency of the data is not influenced. In addition, when the distributed database needs to be subjected to a back-up, although the target time point is the time point at which the back-up is expected at the beginning, after data compensation is completed, the time of some data nodes at the moment is not the target time point. For example, the target time point is 12 hours and 00 minutes, and the time of the data node needing data compensation after executing the data operation log needing to be executed may be 12 minutes and 05 minutes. That is, in terms of time, after the whole archive process of the distributed database is completed, all data nodes are not necessarily archive to the original target time point, and are likely to be time near the target time point in practice, so that the data consistency of each data node is ensured.
Fig. 1 to fig. 6 are schematic flowcharts of a distributed database rollback method in various embodiments. It should be understood that, although the steps in the flowcharts of the respective figures are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in the various figures may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternating with other steps or at least some of the sub-steps or stages of other steps.
In one embodiment, as shown in fig. 9, there is provided a distributed database archive back device, including:
a backup data obtaining module 902, configured to obtain a target time point of data rollback; and acquiring backup data corresponding to the target time point of each data node.
And a backup data archive module 904, configured to archive the data of each data node to the target time point according to the backup data.
A data compensation module 906, configured to obtain a distributed data operation result record of each data node recorded in the distributed transaction log; and performing data compensation on data obtained by each data node from the target time point to the file return according to the backup data of each data node and the distributed data operation result record to obtain the file return data of each data node at the target time point.
In one embodiment, the data compensation module 906 is further configured to log data operations that have been successfully executed for a period of time in the incremental backup data of the data node to perform data compensation when the data compensation is required for a certain data node.
In one embodiment, as shown in fig. 10, the data compensation module 906 includes:
the time difference obtaining module 1002 is configured to obtain time differences between data nodes, and use the time difference with the largest value as the maximum compensation time.
A record obtaining module 1004, configured to obtain, from the distributed transaction log, a record of a result of the distributed data operation within a maximum compensation time that takes the target time point as a starting time.
And the record execution module 1006 is configured to perform data compensation on data obtained when each data node is returned to the target time point according to the distributed data operation result record and the backup data of the data node, which are obtained from the distributed transaction log.
In one embodiment, as shown in fig. 11, the record execution module 1006 includes:
the incremental backup data obtaining module 1006A is configured to obtain incremental backup data of the data node in a maximum compensation time taking the target time point as a starting time.
The determining module 1006B is configured to, when a data operation record that is consistent with the distributed data operation result in the maximum compensation time taking the target time point as the start time in the distributed transaction log exists in the data operation record log included in the incremental backup data in the maximum compensation time taking the target time point as the start time, perform data compensation on the consistent data operation record log in the incremental backup data of the data node.
In an embodiment, the data compensation module 906 is further configured to obtain a time difference between each data node, and use the time difference with the largest value as the maximum compensation time; acquiring a data operation record log in the maximum compensation time taking a target time point as an initial time from the incremental backup data of each data node; when the data operation record logs in the maximum compensation time in the incremental backup data of the data node are data operation record logs generated by corresponding operations executed by different data nodes, the data node needs to perform data compensation.
In one embodiment, the target time point is a node time of each data node; the distributed transaction logs are stored discretely in the data nodes.
FIG. 12 is a diagram that illustrates an internal structure of the computer device in one embodiment. The computer device may specifically be a server. As shown in fig. 12, the computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system and may also store a computer program that, when executed by the processor, causes the processor to implement a distributed database rollback method. The internal memory may also have a computer program stored thereon that, when executed by the processor, causes the processor to perform a distributed database archive method.
Those skilled in the art will appreciate that the architecture shown in fig. 12 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, the distributed database archive back device provided by the present application can be implemented in the form of a computer program that can be run on a computer device as shown in fig. 12. The memory of the computer device may store various program modules constituting the distributed database archive device, such as the backup data acquisition module, the backup data archive module, and the data compensation module shown in fig. 9. The computer program constituted by the program modules causes the processor to execute the steps of the distributed database archive method of the embodiments of the present application described in the present specification.
For example, the computer device shown in fig. 12 may execute the target point in time of acquiring the data rollback through the backup data acquisition module in the distributed database rollback apparatus shown in fig. 9; and acquiring backup data corresponding to the target time point of each data node. The computer device can execute the step of backlogging the data of each data node to the target time point according to the backup data through the backup data backlogging module. The computer equipment can execute the operation of acquiring the distributed data operation result record of each data node recorded in the distributed transaction log through the data compensation module; and performing data compensation on data obtained from the target time point of the backlog of each data node according to the backup data of each data node in the distributed transaction log and the operation result record of the distributed data, so as to obtain the backlog data of each data node at the target time point.
In one embodiment, a computer device is provided, comprising a memory having a computer program stored therein and a processor that when executing the computer program performs the steps of: acquiring a target time point of data retracing; acquiring backup data of each data node corresponding to the target time point; the data of each data node is restored to the target time point according to the backup data; acquiring a distributed data operation result record of each data node recorded in a distributed transaction log; and performing data compensation on data obtained by each data node from the target time point to the file return according to the backup data of each data node and the distributed data operation result record to obtain the file return data of each data node at the target time point.
In one embodiment, performing data compensation on data obtained by each data node from a file back to a target time point according to backup data of each data node and a distributed data operation result record, includes: when a certain data node needs to perform data compensation, a log of data operation records successfully executed within a period of time in the incremental backup data of the data node is executed to perform data compensation.
In one embodiment, logging of successfully executed data operations in a period of time in incremental backup data of a data node for data compensation comprises: acquiring time difference among all data nodes, and taking the time difference with the largest value as the maximum compensation time; acquiring a distributed data operation result record in the maximum compensation time taking a target time point as an initial time from a distributed transaction log; and performing data compensation on data obtained when each data node is returned to the target time point according to the distributed data operation result records and the backup data of the data nodes obtained from the distributed transaction log.
In one embodiment, the data compensation of the data obtained by each data node from the backtracking to the target time point by the distributed data operation result records and the backup data of the data nodes obtained in the distributed transaction log includes: acquiring incremental backup data of a data node in maximum compensation time taking a target time point as an initial time; and when the data operation record logs contained in the incremental backup data within the maximum compensation time taking the target time point as the starting time exist in the data operation record logs which are consistent with the data operation record logs in the maximum compensation time taking the target time point as the starting time in the distributed transaction log, executing the consistent data operation record logs in the incremental backup data of the data nodes to perform data compensation.
In one embodiment, the data compensation of the data obtained by each data node from the file back to the target time point according to the distributed data operation result record of each data node further comprises: acquiring time difference among all data nodes, and taking the time difference with the largest value as the maximum compensation time; acquiring a data operation record log in the maximum compensation time taking a target time point as an initial time from the incremental backup data of each data node; when the data operation record logs in the maximum compensation time in the incremental backup data of the data node are data operation record logs generated by corresponding operations executed by different data nodes, the data nodes need to perform data compensation.
In one embodiment, the target time point is a node time of each data node; the distributed transaction logs are stored discretely in the data nodes.
In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of: acquiring a target time point of data return; acquiring backup data corresponding to the target time point of each data node; the data of each data node is restored to the target time point according to the backup data; acquiring a distributed data operation result record of each data node recorded in a distributed transaction log; and performing data compensation on data obtained by each data node from the target time point to the file return according to the backup data of each data node and the distributed data operation result record to obtain the file return data of each data node at the target time point.
In one embodiment, the computer program when executed by the processor further performs the steps of: when a certain data node needs to perform data compensation, a log of data operation records successfully executed within a period of time in the incremental backup data of the data node is executed to perform data compensation.
In one embodiment, the computer program when executed by the processor further performs the steps of: acquiring time difference among all data nodes, and taking the time difference with the largest value as the maximum compensation time; acquiring a distributed data operation result record in the maximum compensation time taking a target time point as an initial time from a distributed transaction log; and performing data compensation on data obtained when each data node is returned to the target time point according to the distributed data operation result records and backup data of the data nodes obtained from the distributed transaction log.
In one embodiment, the computer program when executed by the processor further performs the steps of: acquiring incremental backup data of a data node in maximum compensation time taking a target time point as an initial time; and when the data operation record log contained in the incremental backup data within the maximum compensation time taking the target time point as the starting time exists in the data operation record log which is consistent with the data operation record log in the maximum compensation time taking the target time point as the starting time in the distributed transaction log, executing the consistent data operation record log in the incremental backup data of the data node to perform data compensation.
In one embodiment, the computer program when executed by the processor further performs the steps of: acquiring time difference among all data nodes, and taking the time difference with the largest value as the maximum compensation time; acquiring a data operation record log in the maximum compensation time taking a target time point as an initial time from the incremental backup data of each data node; when the data operation record logs in the maximum compensation time in the incremental backup data of the data node are data operation record logs generated by corresponding operations executed by different data nodes, the data nodes need to perform data compensation.
In one embodiment, the target time point is a node time of each data node; the distributed transaction logs are stored discretely in the data nodes.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a non-volatile computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the program is executed. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), rambus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
All possible combinations of the technical features in the above embodiments may not be described for the sake of brevity, but should be considered as being within the scope of the present disclosure as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (14)

1. A distributed database rollback method comprises the following steps:
acquiring a target time point of data return;
acquiring backup data corresponding to the target time point of each data node;
according to the backup data, the data of each data node is restored to the target time point;
acquiring a distributed data operation result record of each data node recorded in a distributed transaction log;
performing data compensation on data obtained by each data node from the backlog to the target time point according to the backup data of each data node and the distributed data operation result record, wherein the data compensation comprises the following steps: when a certain data node needs to perform data compensation, executing a data operation log which is executed successfully in a period of time in incremental backup data of the data node to perform data compensation, wherein the data operation log comprises the following steps: acquiring time difference among all data nodes, and taking the time difference with the largest value as the maximum compensation time; acquiring a distributed data operation result record in the maximum compensation time with the target time point as an initial time from the distributed transaction log; and performing data compensation on data obtained from the data node to the target time point according to the distributed data operation result records and the backup data of the data nodes obtained from the distributed transaction log to obtain the file return data of each data node at the target time point.
2. The method of claim 1, further comprising:
when the data operation record logs contained in the incremental backup data of the first data node do not relate to a plurality of data nodes, only the data operation of the first data node is performed, and data interaction operation is not performed with other data nodes, the first data node does not need to perform data compensation.
3. The method of claim 1, wherein the maximum backoff time comprises network delay and machine performance cause errors.
4. The method according to claim 1, wherein the data compensation of the data obtained by each data node being brought back to the target time point is performed by using the distributed data operation result records and backup data of the data nodes obtained in the distributed transaction log, and includes:
obtaining incremental backup data of the data node in the maximum compensation time with the target time point as the starting time;
and when the data operation record log contained in the incremental backup data within the maximum compensation time with the target time point as the starting time has the data operation record which is consistent with the data operation record log in the maximum compensation time with the target time point as the starting time in the distributed transaction log, executing the consistent data operation record log in the incremental backup data of the data node to perform data compensation.
5. The method of claim 1, wherein the data compensation of the data obtained by the data nodes from the backlog to the target time point according to the backup data of each data node and the distributed data operation result record further comprises:
acquiring time difference among all data nodes, and taking the time difference with the largest value as the maximum compensation time;
acquiring a data operation record log in the maximum compensation time with the target time point as the starting time from the incremental backup data of each data node;
when the data operation record logs in the maximum compensation time in the incremental backup data of the data node are data operation record logs generated by corresponding operations executed by different data nodes, the data nodes need to perform data compensation.
6. The method according to any one of claims 1 to 5, wherein the target time point is a node time of each data node; the distributed transaction logs are discretely stored in the data nodes.
7. A distributed database archive back device, the device comprising:
the backup data acquisition module is used for acquiring a target time point of data backtracking; acquiring backup data of each data node corresponding to the target time point;
the backup data retraining module is used for retraining the data of each data node to the target time point according to the backup data;
the data compensation module is used for acquiring a distributed data operation result record of each data node recorded in the distributed transaction log; performing data compensation on data obtained by each data node from the backlog to the target time point according to the backup data of each data node and the distributed data operation result record, wherein the data compensation comprises the following steps: when a certain data node needs to perform data compensation, performing a log of successfully executed data operations in incremental backup data of the data node for a period of time to perform data compensation, including: acquiring time difference among all data nodes, and taking the time difference with the largest value as the maximum compensation time; acquiring a distributed data operation result record in the maximum compensation time with the target time point as the starting time from the distributed transaction log; and performing data compensation on data obtained from the data node to the target time point according to the distributed data operation result records and the backup data of the data nodes obtained from the distributed transaction log to obtain the file return data of each data node at the target time point.
8. The apparatus of claim 7, wherein the data compensation module is further configured to, when the log of the data operation record included in the incremental backup data of the first data node does not relate to multiple data nodes, only the data operation of the first data node itself is performed and no data interaction operation is performed with other data nodes, the first data node does not need to perform data compensation.
9. The apparatus of claim 7, wherein the maximum backoff time comprises network delay and machine performance cause errors.
10. The apparatus of claim 7, wherein the data compensation module comprises:
the incremental backup data acquisition module is used for acquiring incremental backup data of the data node in the maximum compensation time taking the target time point as the starting time;
and the judging module is used for executing the consistent data operation record log in the incremental backup data of the data node to perform data compensation when the data operation record log which is consistent with the distributed data operation result record in the maximum compensation time which takes the target time point as the starting time exists in the incremental backup data which takes the target time point as the starting time and is contained in the maximum compensation time.
11. The device according to claim 7, wherein the data compensation module is further configured to obtain a time difference between each data node, and use the time difference with the largest value as the maximum compensation time; acquiring a data operation record log in the maximum compensation time with the target time point as the starting time from the incremental backup data of each data node; when the data operation record logs in the maximum compensation time in the incremental backup data of the data node are data operation record logs generated by corresponding operations executed by different data nodes, the data node needs to perform data compensation.
12. The apparatus according to any one of claims 7 to 11, wherein the target time point is a node time of each data node; the distributed transaction logs are discretely stored in the data nodes.
13. A computer-readable storage medium, storing a computer program which, when executed by a processor, causes the processor to carry out the steps of the method according to any one of claims 1 to 6.
14. A computer device comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps of the method according to any one of claims 1 to 6.
CN201810522885.XA 2018-05-28 2018-05-28 Distributed data rollback method, device and computer readable storage medium Active CN110309227B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810522885.XA CN110309227B (en) 2018-05-28 2018-05-28 Distributed data rollback method, device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810522885.XA CN110309227B (en) 2018-05-28 2018-05-28 Distributed data rollback method, device and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN110309227A CN110309227A (en) 2019-10-08
CN110309227B true CN110309227B (en) 2022-12-13

Family

ID=68073986

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810522885.XA Active CN110309227B (en) 2018-05-28 2018-05-28 Distributed data rollback method, device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN110309227B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110941512B (en) * 2019-11-22 2024-02-20 广东小天才科技有限公司 Redis incremental copying method and device, terminal equipment and storage medium
CN112000521B (en) * 2020-08-24 2021-08-27 中国银联股份有限公司 Full backup method and device for distributed database system and computer readable storage medium
CN113157769B (en) * 2021-04-13 2023-05-02 成都江泰讯安信息技术有限公司 Method for checking hydrological message lack of report
CN116541206A (en) * 2023-04-10 2023-08-04 泽拓科技(深圳)有限责任公司 Data recovery method and device of distributed data cluster and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7725428B1 (en) * 2002-03-14 2010-05-25 Novell, Inc. System and method for restoring a database in a distributed database system
CN104025554A (en) * 2011-10-31 2014-09-03 德国弗劳恩霍夫应用研究促进协会 Apparatus and method for synchronizing events
CN106610876A (en) * 2015-10-23 2017-05-03 中兴通讯股份有限公司 Method and device for recovering data snapshot
CN107451013A (en) * 2017-06-30 2017-12-08 北京奇虎科技有限公司 Data reconstruction method, apparatus and system based on distributed system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9575849B2 (en) * 2014-11-25 2017-02-21 Sap Se Synchronized backup and recovery of database systems

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7725428B1 (en) * 2002-03-14 2010-05-25 Novell, Inc. System and method for restoring a database in a distributed database system
CN104025554A (en) * 2011-10-31 2014-09-03 德国弗劳恩霍夫应用研究促进协会 Apparatus and method for synchronizing events
CN106610876A (en) * 2015-10-23 2017-05-03 中兴通讯股份有限公司 Method and device for recovering data snapshot
CN107451013A (en) * 2017-06-30 2017-12-08 北京奇虎科技有限公司 Data reconstruction method, apparatus and system based on distributed system

Also Published As

Publication number Publication date
CN110309227A (en) 2019-10-08

Similar Documents

Publication Publication Date Title
CN110309227B (en) Distributed data rollback method, device and computer readable storage medium
US11740974B2 (en) Restoring a database using a fully hydrated backup
US10635656B1 (en) Extract, transform, and load application complexity management framework
CN105989059B (en) Data record checking method and device
US10628270B1 (en) Point-in-time database restoration using a reduced dataset
US20060218204A1 (en) Log stream validation in log shipping data replication systems
CN104937556A (en) Recovering pages of database
CN110543446B (en) Block chain direct filing method based on snapshot
CN111026767B (en) Block chain data storage method and device and hardware equipment
CN110263095B (en) Data backup and recovery method and device, computer equipment and storage medium
CN115145697B (en) Database transaction processing method and device and electronic equipment
EP3796174A1 (en) Restoring a database using a fully hydrated backup
CN110706105A (en) Error marking method, error marking device, computer equipment and storage medium
CN115730008A (en) Log analysis method, data synchronization system, electronic device and storage medium
CN116490855A (en) Efficient backup after restore operations
CN106991606B (en) Transaction data processing method and device
CN112948504B (en) Data acquisition method and device, computer equipment and storage medium
CA3191210A1 (en) Data syncronization method and device, computer equipment and storage medium
CN110543485A (en) Block chain reservation filing method based on snapshot
US10922301B1 (en) Apparatus, computer program, and method for trigger-based tracking of database modifications
CN115357429B (en) Method, device and client for recovering data file
CN105045881A (en) Historical data adding method
CN111159313A (en) Method, system, device and storage medium for database rapid synthesis backup
CN106815289B (en) Data storage method and device in transaction processing system
JPH0277950A (en) Data base processing system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant