WO2016180160A1

WO2016180160A1 - Data snapshot recovery method and apparatus

Info

Publication number: WO2016180160A1
Application number: PCT/CN2016/079475
Authority: WO
Inventors: 汪彦舒; 陈河堆; 贾新华; 白涛; 郭龙波; 张宗禹
Original assignee: 中兴通讯股份有限公司
Priority date: 2015-10-23
Filing date: 2016-04-15
Publication date: 2016-11-17
Also published as: CN106610876A; CN106610876B

Abstract

A data snapshot recovery method and apparatus. The method comprises: a cluster manager in a distributed database recovering data of various nodes connected to the cluster manager up to data generated until a moment before a designated moment, when a latest physical backup ends (102); and the cluster manager redoing, on the basis of the data generated until the moment when the latest physical backup ends, operations indicated by transactions in a transaction log in an order of execution time of the transactions in the transaction log, so as to obtain data snapshots, of the various nodes, generated at the designated moment, wherein the transaction log relates to operations on data which are filed between the moment when the physical backup ends and the designated moment (104). The method solves the problem in the relevant art that a distributed database cannot recover a database snapshot which is generated at any designated moment and satisfies global transaction consistency when recovering the database snapshot generated at a historical moment.

Description

Data snapshot recovery method and device

Technical field

The embodiment of the invention relates to the field of databases, and in particular to a method and a device for restoring data snapshots.

Background technique

At present, the mainstream database in the industry is a stand-alone database, such as Oracle, DB2, MySQL and so on. With the expansion of data volume, single-machine databases are increasingly unable to meet the needs of users for large storage and high performance, and the application of distributed databases is gradually becoming wider.

In the related art, there are two main schemes for database snapshots in the history of distributed database recovery:

Solution 1: Use a backup tool on a single node for logical backup recovery or direct disk image backup and daily archive transaction logs. After the backup data is used to restore the backup data of each node, each node uses the transaction log to re-do the specified time. The problems of this scheme are as follows: (1) Because the methods of backup and recovery of each node are adopted, and no mechanism is adopted to ensure the global consistency of the distributed database, the recovery of the database snapshot at any given time cannot be guaranteed. Global consistency; (2) During the backup process, logical backups are backed up in file units, and operations such as searching are performed to reduce the throughput of the disk, which has a great impact on the performance of the online service, resulting in low backup efficiency. (3) It takes a very long time to recover the database using the image.

Solution 2: Use the distributed transaction manager to coordinate when the distributed database is backed up, and only back up the single-node transaction and the distributed transactions successfully submitted by each node. This only guarantees that the backup data during the backup process and at the end of the backup is in global transactional consistency (that is, the database snapshot in the time period can be restored to be in the global transaction consistency state). If you want to restore to any time, you must rely on the transaction log and distributed transaction manager, in the playback process through the resource pool, lock and other mechanisms to achieve global transaction consistency at the moment of playback. That is to say, in the backup process, this scheme controls the data consistency at all times, resulting in performance degradation, which has a great impact on the online service; and in the process of redoing the transaction log, the data consistency at each moment is also guaranteed, and it is necessary to The resource is locked, causing the playback speed to be very slow.

The problem that the database snapshot of the distributed database recovery history moment in the related art cannot recover the database snapshot at any given time and meets the global transaction consistency does not exist yet. s solution.

Summary of the invention

The following is an overview of the topics detailed in this document. This Summary is not intended to limit the scope of the claims.

The object of the embodiments of the present invention is to provide a method and a device for restoring a data snapshot, so as to solve at least the problem that the database snapshot of the distributed database recovery history in the related art cannot recover the database snapshot at any specified time and meet the global transaction consistency. .

According to an aspect of the embodiments of the present invention, a data snapshot recovery method is provided, including: a cluster manager in a distributed database restores data of each node connected to the cluster manager itself to a time before a specified time Data at the end of the physical backup; the cluster manager redoing the operation indicated by the transaction in the transaction log based on the execution time of the transaction in the transaction log based on the data of the last physical backup end time is obtained in the specified a data snapshot of each node at a time; wherein the transaction log is an operation performed on the data archived between the physical backup end time and the specified time.

Optionally, the data that the cluster manager in the distributed database restores the data of each node connected to the cluster manager to the last physical backup end time before the specified time includes: the cluster manager acquires the location An active transaction list snapshot of the respective nodes acquired before the specified time, wherein the active transaction list snapshot records that one or more nodes connected to the cluster manager are active at the specified time a transaction for operating data; the cluster manager looks for a start time of the first active transaction in the active transaction list; when the start time of the first active transaction is less than the physical backup end time Retrieving, by the cluster manager, a transaction that is still active before the physical backup end time in the active transaction list snapshot to obtain data of the last physical backup end time; the start time is greater than the physical At the end of the backup, the cluster manager obtains the nodes before the specified time Near the end of a physical data backup time.

Optionally, the cluster manager redoes the operation indicated by the transaction in the transaction log according to the execution time of the transaction in the transaction log based on the data of the last physical backup end time, and obtains each of the specified moments The data snapshot of the node includes: the cluster manager is based on the The data of the last physical backup end time is redoed in the order of the execution time of the transaction in the transaction log, and the single-node transaction operation of the respective nodes indicated by the transaction in the transaction log, and at the end of the physical backup The sequence of execution times of the transactions in the transaction log between the specified times is redoed. The distributed transaction operation indicated by the transaction in the transaction log obtains a snapshot of the data of the respective nodes at the specified time.

Optionally, before the cluster manager in the distributed database restores the data of each node connected to the cluster manager to the data of the last physical backup end time before the specified time, the method further includes: the cluster The manager periodically sends a physical backup instruction to each node connected to the cluster manager, wherein the physical backup instruction includes: a full backup instruction and/or an incremental backup instruction.

Optionally, before the cluster manager in the distributed database restores the data of each node connected to the cluster manager to the data of the last physical backup end time before the specified time, the method further includes: the cluster The manager periodically performs a snapshot operation on the respective nodes to obtain a snapshot of the active transaction list.

According to another aspect of the embodiments of the present invention, a data snapshot recovery apparatus is provided, which is applied to a cluster manager side in a distributed database, and includes: a recovery module, which is configured to connect each node connected to the cluster manager itself. The data is restored to the data of the last physical backup end time before the specified time; the redo module is set to redo the data in the transaction log according to the execution time of the transaction in the transaction log based on the data of the last physical backup end time The operations indicated by the office obtain a snapshot of the data of the respective nodes at the specified time, wherein the transaction log is an operation performed on the data archived between the physical backup end time and the specified time.

Optionally, the recovery module includes: a first obtaining unit, configured to acquire an active transaction list snapshot of the respective nodes acquired last time before the specified time, where the active transaction list snapshot records a transaction for operating data that is active on one or more nodes connected to the cluster manager at a specified time; a lookup unit configured to look up a start time of the first active transaction in the active transaction list; a rollback unit, configured to roll back a transaction that is still active before the physical backup end time in the active transaction list snapshot when the start time of the first active transaction is less than the physical backup end time Obtaining data of the last physical backup end time; the second obtaining unit is set to start at the first start When the start time of the hop transaction is greater than the physical backup end time, the data of the last physical backup end time of each node before the specified time is acquired.

Optionally, the redo module is further configured to redo the data indicated by the transaction in the transaction log according to the execution time of the transaction in the transaction log based on the data of the last physical backup end time a single transaction operation of each node, and redoing the distributed transaction operation indicated by the transaction in the transaction log in an order of execution time of the transaction in the transaction log between the physical backup end time and the specified time Obtaining a snapshot of the data of each node in the specified time.

Optionally, the apparatus further includes: a sending module, configured to periodically send a physical backup instruction to each node connected to the cluster manager, where the physical backup instruction comprises: a full backup instruction and/or an increment Backup instructions.

Optionally, the apparatus further includes: an execution module, configured to periodically perform a snapshot operation on the respective nodes to obtain the active transaction list snapshot.

The embodiment of the invention further provides a computer readable storage medium storing computer executable instructions for performing a method for restoring a data snapshot of any of the above.

According to the embodiment of the present invention, data of each node connected to the cluster manager is restored in the cluster manager in the distributed database to the data of the last physical backup end time before the specified time, and according to the data according to the transaction log. The order of the execution time of the transaction rewritates the operation indicated by the transaction in the transaction log to obtain a snapshot of the data of each node at the specified time, that is, in the present embodiment, the data based on the last physical backup end time is redone according to the transaction log transaction. The data snapshot of each node is obtained, so that the snapshot satisfies the global consistency, and the problem that the database snapshot of the distributed database recovery history moment cannot recover the database snapshot at any given time and meet the global transaction consistency is solved.

Other features and advantages of the embodiments of the invention will be set forth in the description in the description which The objectives and other advantages of the invention may be realized and obtained by means of the structure particularly pointed in the appended claims.

Other aspects will be apparent upon reading and understanding the drawings and detailed description.

BRIEF abstract

The drawings are intended to provide a further understanding of the embodiments of the present invention, and are intended to be a part of the present invention, and the description of the present invention is not intended to limit the invention. In the drawing:

1 is a flowchart of a method for restoring a data snapshot according to an embodiment of the present invention;

2 is a structural block diagram of a data snapshot recovery apparatus according to an embodiment of the present invention;

3 is a block diagram 1 of an optional structure of a data snapshot recovery apparatus according to an embodiment of the present invention;

4 is a block diagram 2 of an optional structure of a data snapshot recovery apparatus according to an embodiment of the present invention;

FIG. 5 is a block diagram 3 of an optional structure of a data snapshot recovery apparatus according to an embodiment of the present invention; FIG.

6 is a schematic diagram of a database snapshot recovery system in accordance with an alternate embodiment of the present invention;

7 is a flow chart of a database snapshot recovery method in accordance with an alternative embodiment of the present invention;

8 is a timing diagram of database snapshot recovery in accordance with an alternate embodiment of the present invention.

Preferred embodiment of the invention

The embodiments of the present invention will be described in detail below with reference to the accompanying drawings. It should be noted that, in the case of no conflict, the features in the embodiments and the embodiments in the present application may be arbitrarily combined with each other.

It is to be understood that the terms "first", "second" and the like in the specification and claims of the present invention are used to distinguish similar objects, and are not necessarily used to describe a particular order or order.

A method for restoring a data snapshot is provided in this embodiment. FIG. 1 is a flowchart of a method for restoring a data snapshot according to an embodiment of the present invention. As shown in FIG. 1, the process includes the following steps:

Step 102: The cluster manager in the distributed database restores data of each node connected to the cluster manager to the data of the last physical backup end time before the specified time;

Step 104: The cluster manager redoes the operation indicated by the transaction in the transaction log according to the execution time of the transaction in the transaction log based on the data of the last physical backup end time, and obtains a snapshot of the data of each node at the specified time; The transaction log is an operation performed on the data archived between the end of the physical backup and the specified time.

It can be seen from the above steps 102 to 104 of the embodiment that the data of each node connected to the cluster manager is restored to the data of the last physical backup end time before the specified time in the cluster manager in the distributed database, and based on the The data is redoed in the order of the execution time of the transaction in the transaction log, and the operation indicated by the transaction in the transaction log obtains a snapshot of the data of each node at the specified time, that is, in this embodiment, the data based on the last physical backup end time is used. The transaction log transaction is redoed to obtain a snapshot of the data of each node, so that the snapshot satisfies the global consistency, thereby solving the problem that the database snapshot of the distributed database recovery history in the related art cannot be restored at any specified time and meets the global transaction consistency. The problem with database snapshots.

For the cluster manager in the distributed database involved in the above step 102 in the embodiment, the data of each node connected to the cluster manager itself is restored to the data of the last physical backup end time before the specified time. In an optional implementation manner of the embodiment, the implementation may be implemented as follows:

Step 102-1: The cluster manager obtains a snapshot of the active transaction list of each node acquired last time before the specified time, wherein the active transaction list snapshot records that one or more nodes connected to the cluster manager at the specified time are being An active transaction for operating data;

Step 102-2: The cluster manager searches for the start time of the first active transaction in the active transaction list;

Step 102-3: When the start time of the first active transaction is less than the physical backup end time, the cluster manager rolls back the transaction that is still active before the physical backup end time in the active transaction list snapshot to obtain the last physical backup end. Time data

Step 102-4: When the start time of the first active transaction is greater than the physical backup end time, the cluster manager acquires data of the last physical backup end time of each node before the specified time.

It should be noted that, for the active transaction list involved in the above step 102-1, it is obtained periodically, and the earliest active transaction in the obtained active transaction list may be before the physical backup end time or After the physical backup, the corresponding transaction log is obtained according to different situations.

For the cluster manager involved in step 104 in this embodiment, based on the data of the last physical backup end time, the transaction log is redoed according to the execution time of the transaction in the transaction log. The operation indicated by the office is obtained by taking a snapshot of the data of each node at a specified time. In an optional implementation manner of this embodiment, the method can be implemented as follows:

The cluster manager redoes the single-node transaction operation of each node indicated by the transaction in the transaction log based on the execution time of the transaction in the transaction log based on the data of the last physical backup end time, and according to the physical backup end time to the specified time The order of execution time of transactions in the transaction log is redoed. The distributed transaction operation indicated by the transaction in the transaction log obtains a snapshot of the data of each node at the specified time.

That is to say, the first active transaction in the active transaction list is that before the end of the physical backup, when the database snapshot is obtained at the specified time, the active transaction needs to be active between the start time and the physical backup end time in the active transaction list. Rollback is performed to ensure the global consistency of the snapshot before the physical backup ends. For the snapshot to the specified time after the physical backup ends, only the transaction log of the period of time is required to roll forward to obtain the global consistency data. Snapshot.

In the step 102 of the embodiment, the cluster manager in the distributed database may restore the data of each node connected to the cluster manager to the data of the last physical backup end time before the specified time. The method in this embodiment may further include: :

The cluster manager periodically sends a physical backup command to each node connected to itself, where the physical backup command includes: a full backup command and/or an incremental backup command, and the backup mode and the logical backup are performed by the backup mode in this embodiment. It must be much faster. In addition, the database is backed up in full increments. When recovering, only one full backup data and one incremental backup data are needed to quickly recover the database, and the physical backup of the database is used. During the backup process, only the database files are copied, which has little impact on the online business.

In the step 102 of the embodiment, the cluster manager in the distributed database may restore the data of each node connected to the data to the data of the last physical backup end time before the specified time. The method in this embodiment may further include: The manager periodically performs a snapshot operation on each node to get a snapshot of the active transaction list.

Through the description of the above embodiments, those skilled in the art can clearly understand that the method according to the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course, by hardware, but in many cases, the former is A better implementation. Based on such understanding, the technical solution of the present invention may contribute to the prior art in part or in the software product. Formally embodied, the computer software product is stored in a storage medium (such as ROM/RAM, disk, optical disk), and includes a plurality of instructions for making a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) Methods of various embodiments of the invention are performed.

In the embodiment, a recovery device for the database snapshot is provided. The recovery device is used to implement the foregoing embodiments and preferred embodiments, and details are not described herein. As used below, the term "module" may implement a combination of software and/or hardware of a predetermined function. Although the apparatus described in the following embodiments is preferably implemented in software, hardware, or a combination of software and hardware, is also possible and contemplated.

2 is a structural block diagram of a data snapshot recovery apparatus according to an alternative embodiment of the present invention, which is applied to a cluster manager in a distributed database. As shown in FIG. 2, the apparatus includes:

The recovery module 22 is configured to restore data of each node connected to the cluster manager itself to data of the last physical backup end time before the specified time;

The redo module 24 is coupled to the recovery module 22, and is configured to redo the operation indicated by the transaction in the transaction log according to the execution time of the transaction in the transaction log based on the data of the last physical backup end time to obtain the operation at the specified time. A snapshot of the data of each node, wherein the transaction log is an operation performed on the data archived between the physical backup end time and the specified time.

FIG. 3 is a block diagram of an optional structure of a data snapshot recovery apparatus according to an embodiment of the present invention. As shown in FIG. 3, the recovery module 22 shown in FIG. 2 includes:

The first obtaining unit 32 is configured to acquire an active transaction list snapshot of each node acquired last time before the specified time, wherein the active transaction list snapshot records that one or more nodes connected to the cluster manager at the specified time are being An active transaction for operating data;

The searching unit 34 is coupled to the first obtaining unit 32 and configured to search for a starting moment of the first active transaction in the active transaction list;

The rollback unit 36 is coupled to the search unit 34 and configured to roll back a transaction that is still active before the physical backup end time in the active transaction list snapshot to obtain the last physical backup end time when the start time is less than the physical backup end time. The data;

The second obtaining unit 38 is coupled to the rollback unit 36, and is configured to acquire, when the start time is greater than the physical backup end time, the end of the last physical backup of each node before the specified time. Engraved data.

Based on the unit in the recovery module 22 in FIG. 3, the redo module 24 involved in this embodiment is further configured to redo the transaction log according to the execution time of the transaction in the transaction log based on the data of the last physical backup end time. The single transaction operation of each node indicated by the medium office, and the distributed transaction operation indicated by the transaction in the transaction log in the order of the execution time of the transaction in the transaction log between the physical backup end time and the specified time to obtain the specified time Data snapshot of each node

4 is a block diagram 2 of an optional structure of a data snapshot recovery apparatus according to an embodiment of the present invention. As shown in FIG. 4, a cluster manager in a distributed database restores data of each node connected to the cluster manager to a specified Before the data of the last physical backup end time before the moment, the recovery device further includes:

The sending module 42 is coupled to the recovery module 22 and configured to periodically send a physical backup instruction to each node connected to the cluster manager, wherein the physical backup instruction comprises: a full backup instruction and/or an incremental backup instruction.

5 is a block diagram 3 of an optional structure of a data snapshot recovery apparatus according to an embodiment of the present invention. As shown in FIG. 5, a cluster manager in a distributed database restores data of each node connected to the cluster manager to a specified one. Before the data of the last physical backup end time before the moment, the recovery device further includes:

The execution module 52 is coupled to the recovery module 22 and is configured to perform a snapshot operation on each node periodically to obtain an active transaction list snapshot.

It should be noted that each of the above modules may be implemented by software or hardware. For the latter, the foregoing may be implemented by, but not limited to, the foregoing modules are all located in the same processor; or, the modules are located in multiple In the processor.

The invention is exemplified below in conjunction with an alternative embodiment of the invention;

The optional embodiment adopts a periodic distributed database physical backup (including full backup and incremental backup), a daily archived transaction log, and an active transaction list snapshot captured according to a certain frequency. The three basic data can be used. Restores a database snapshot at any given time and meets the global transaction consistency, and all of the above processes have no effect on the online business. That is, the embodiment used in this embodiment It is global consistency processing on the data recovered by a single node.

The global data consistency problem scenario caused by the distributed transaction that the snapshot time is abnormally ended in the optional embodiment includes: (1) a distributed transaction in which some nodes submit successfully and some nodes have not yet submitted; There is a distributed transaction in a part of the node submitted successfully and some of the node failed to submit, and the distributed database has not rolled back the transaction in the node that successfully submitted; (3) there are transactions that all nodes have not submitted or submitted failed.

In addition, in the alternative embodiment, the physical hot backup method is used to obtain the database backup, which has almost no negative impact on the execution of the structured query language SQL, and the backup data can be quickly recovered by using the full-time and one-time incremental backup data. .

6 is a schematic diagram of a database snapshot recovery system according to an alternative embodiment of the present invention. As shown in FIG. 6, the system includes: a Data Node Cluster: a data node cluster for storing data; and a SQL Node Cluster: SQL node cluster. For SQL splitting and parsing; Global Transaction Manager GTM (Global Transaction Manager): used to manage global transactions. Cluster Manager: Cluster manager for managing Data Node Cluster and SQL Node Cluster.

Based on the components of the system in FIG. 6, FIG. 7 is a flowchart of a database snapshot recovery method according to an alternative embodiment of the present invention. As shown in FIG. 7, the steps of the method include:

Step 41: Obtain an active transaction list snapshot.

Step 42, physical hot backup database;

Step 43, periodically archiving the transaction log;

In other words, the above steps 41 to 43 are used to generate basic data, and the Cluster Manager periodically issues a full or incremental backup command to each node according to the backup policy, and the Cluster Manager periodically (configurable) obtains an active transaction list snapshot from the GTM. Persistence to disk files (active transaction list files); database nodes periodically archive database transaction logs.

The method also includes:

Step 44: Obtain a backup database file, a log file, and an active transaction list file.

Step 45, restoring single node consistency data;

Step 46: Restore global consistency data;

That is to say, the step 44 to the step 46 refers to restoring to a distributed database snapshot at any given time, wherein for each database node, according to the full amount + incremental backup data, the data of the single node database is restored to the backup end time. Sexual state. A snapshot of the active transaction list at the specified time is retrieved from the active transaction list file, and the corresponding archived transaction log is retrieved. According to the active transaction list snapshot and the transaction log, all database transactions from the backup time to the specified time are rolled forward or rolled back, so that the snapshot satisfies the global consistency of the distributed database.

With this alternative embodiment, the physical hot backup of the database is adopted, and the backup and recovery are much faster than the logical backup. In addition, the optional embodiment uses a full amount of incremental data to back up the database. When recovering, only one full backup data and one incremental backup data are needed to quickly recover the database. This alternative embodiment adopts a physical hot backup method for the database. In the backup process, only the database file is copied, which has little impact on the online service. In addition, the database snapshot can satisfy the global transaction consistency. At the specified time, the database snapshot does not exist, and some data nodes have not submitted or submitted the failed transaction. A database snapshot can be restored at any given time (to get the granularity of the active transaction list period).

The present invention is described in detail below in conjunction with the specific embodiments of the present invention. It should be noted that the following optional embodiments are respectively described in conjunction with FIG. 8. FIG. 8 is a database snapshot according to an alternative embodiment of the present invention. Time map of recovery;

Alternative embodiment 1:

The optional embodiment takes the MariaDB distributed system as an example, and the step of obtaining the database data consistency snapshot method includes: the initial basic data generation process includes steps 201 to 203, and the database snapshot of the specified time t7 is restored based on the initial basic data. The steps include steps 204 through 211.

In step 201, the distributed database and the hypervisor (ClusterManager, GTM) are started. The time point is t0. From time t0, the ClusterManager periodically obtains a snapshot of the active transaction list from the GTM, writes the file, and persists.

Step 202: At time t1, the Cluster Manager initiates a backup request to the database node according to the backup policy and the result file of the last backup. The backup end time of node 1 is t3, the backup end time of node 2 is t4, and the backup of node 3 is performed. The end time is t5. After the backup is completed, each node names the backup data with the backup start time and returns the backup result. The Cluster Manager records the result. And persist.

In step 203, the data node routinely archives the transaction log file, and the Cluster Manager archives the active transaction list snapshot file, wherein the archive period is configurable.

Step 204: Obtain a backup result file, and analyze the result file, and then obtain the latest backup data before time t7, that is, the data file backed up at time t1;

In step 205, the physical backup data is used to restore the backup data of each node, and the node 1 is restored to the time t3.

Step 206: Matching the record in the snapshot file of the active transaction list to the time t6 when the snapshot of the active transaction list is obtained last time before the specified time t7, wherein the granularity of the restored snapshot is the period for obtaining the snapshot of the active transaction list;

Step 207, analyzing the active transaction list at time t6, obtaining the start time t2 of the first transaction in the active transaction list;

Step 208: Obtain a transaction log of the node time [tmin, tmax];

Where tmin=min{t2, t3}, tmax=t7;

Step 209, when tmin is less than t3, use a tool to generate a rollback statement of the distributed transaction that is still active at time t6 between [tmin, t3]; when tmin is not less than t3, the step is ignored;

Step 210, using the tool to redo all the single-machine transactions between [t3, t6] and all distributed transactions in the set: {gtid does not exist in the active transaction list at time t6} ∩ {less than the next_gtid at time t6};

In step 211, the generated rollback statement is executed.

Alternative embodiment 2:

The application scenario of the optional embodiment is: disaster recovery of an online payment system based on a MySQL distributed cluster database;

In this alternative embodiment, the physical hot backup tool is used to back up single-node data, and the ClusterManager periodically obtains a snapshot of the active transaction list data from the GTM, and archives the binlog log daily. When recovering, obtain single-node backup data, binlog logs, and active transaction list snapshots, and perform recovery, rollback, and redo operations to obtain global transactions at specified time or backup time. A consistent database snapshot does not affect the normal operation of a large amount of online payment system. The steps of the alternative embodiment include:

In step 301, the distributed database and the hypervisor (ClusterManager, GTM) are started. The time point is t0. From time t0, the ClusterManager periodically obtains a snapshot of the active transaction list from the GTM, writes the file, and persists.

Step 302: At time t1, the Cluster Manager initiates a backup request to the database node according to the backup policy and the result file of the last backup. The backup end time of node 1 is t3, the backup end time of node 2 is t4, and the backup of node 3 is performed. The end time is t5. After the backup is completed, each node names the backup data with the backup start time and returns the backup result. Cluster Manager records the result file and persists it.

In step 303, the data node archives the transaction log file daily, and the Cluster Manager archives the active transaction list snapshot file, wherein the archiving period can be 10 minutes.

For the recovery database snapshot involved in the alternative embodiment, the snapshot of the cluster database t7 is required to be restored in another production environment or an intermediate machine, and the steps of the recovery include:

Step 304: Obtain a backup result file, and analyze the result file, and then obtain the latest backup data before time t7, that is, the data file backed up at time t1;

Step 305, if it is necessary to restore the data within the last ten minutes, it is also necessary to manually copy the transaction log that is still archived in the future to the target machine;

Step 306, recovering the backup data of each node by using a physical hot backup tool, and the new node 1 is restored to the time t3;

Step 307: Matching the record in the active transaction list snapshot file to the time t6 when the snapshot of the active transaction list was last acquired before the specified time t7. (The granularity of restoring snapshots is the period of taking snapshots of active transaction lists)

Step 308, analyzing the active transaction list at time t6, and obtaining the start time t2 of the first transaction in the active transaction list.

Step 309, obtaining a transaction log of the node time [tmin, tmax];

Where tmin=min{t2,t3},tmax=t7

Step 310, when tmin is less than t3, use the tool to generate [tmin, t3] at time t6 A rollback statement for a distributed transaction that is still active; ignore this step when tmin is not less than t3.

Step 311, using the tool to redo all the single-machine transactions between [t3, t6] and all distributed transactions in the set: {gtid does not exist in the active transaction list at time t6} ∩ {less than the next_gtid at time t6}.

The above steps of the alternative embodiment can be restored to the time before the database is down.

Alternative embodiment three

Single data table recovery based on MariaDB distributed cluster database user savings system

In an alternative embodiment, the physical hot backup tool is used to back up the single node data, and the ClusterManager periodically obtains a snapshot of the active transaction list data from the GTM, and archives the binlog log daily. During recovery, the single-node backup data, the binlog log, and the active transaction list snapshot are taken for recovery, rollback, and redo operations, so that the database snapshot of the global transaction consistency before the operation and maintenance personnel deletes the data table by mistake. With this snapshot, the erroneous deletion data table can be exported and imported into the online system manually. The specific steps are as follows.

In step 401, the distributed database and the hypervisor (ClusterManager, GTM) are started. The time point is t0. From time t0, the ClusterManager periodically obtains a snapshot of the active transaction list from the GTM, writes the file, and persists.

Step 402: At time t1, the Cluster Manager initiates a backup request to the database node according to the backup policy and the result file of the last backup. The backup end time of node 1 is t3, the backup end time of node 2 is t4, and the backup of node 3 is performed. The end time is t5. After the backup is completed, each node names the backup data with the backup start time and returns the backup result. Cluster Manager records the result file and persists it.

In step 403, the data node daily archives the transaction log file, and the Cluster Manager archives the active transaction list snapshot file. The archiving cycle is 10 minutes.

For the requirements involved in this alternative embodiment, the recovery of the cluster database t7 time snapshot in other production environments or intermediate machines includes the following steps:

Step 404: Obtain a backup result file, and analyze the result file, and then obtain the latest backup data before time t7, that is, the data file backed up at time t1.

Step 405, if it is necessary to restore the data within the last ten minutes, it is also necessary to manually The archived transaction log is copied to the target machine.

In step 406, the physical backup data is used to restore the backup data of each node. The new node 1 is restored to time t3.

Step 407: Matching the record in the active transaction list snapshot file to the time t6 when the snapshot of the active transaction list was last acquired before the specified time t7. (The granularity of restoring snapshots is the period of taking snapshots of active transaction lists)

In step 408, the active transaction list at time t6 is analyzed, and the start time t2 of the first transaction in the active transaction list is obtained.

Step 409, obtaining a transaction log of the node time [tmin, tmax];

Where tmin=min{t2,t3},tmax=t7

Step 410: When tmin is less than t3, use the tool to generate a rollback statement of the distributed transaction that is still active at time t6 between [tmin, t3]; when tmin is not less than t3, the step is ignored.

In step 411, the tool is used to redo all the single-machine transactions between [t3, t6] and all distributed transactions in the set: {gtid does not exist in the active transaction list at time t6} ∩ {less than the next_gtid at time t6}.

The data table file that was accidentally deleted can be derived by the above steps of the alternative embodiment. That is, you can restore to the time before the database was accidentally deleted. In the scenario where the operation and maintenance personnel mistakenly delete certain user information, the snapshot of the time before the deletion of the information in the alternative embodiment may be utilized. The deleted data table in the database snapshot at this moment still exists, and the data table can be exported by using the database tool and re-imported into the current time database.

Alternative embodiment four

Online shopping system based on Oracle distributed cluster database

In this alternative embodiment, the physical hot backup tool is used to back up the single node data, and the ClusterManager periodically obtains a snapshot of the active transaction list data from the GTM, and archives the redo log daily. During recovery, the single-node backup data, the redo log, and the active transaction list snapshot are obtained, and recovery, rollback, and redo operations are performed to obtain a database snapshot of the global transaction consistency before the data mutation. With this snapshot, it is possible to test the database and the room separately after improving the data distribution. The steps of the process include:

In step 501, the distributed database and the hypervisor (ClusterManager, GTM) are started. The time point is t0. From time t0, the ClusterManager periodically obtains a snapshot of the active transaction list from the GTM, writes the file, and persists.

Step 502: At time t1, the Cluster Manager initiates a backup request to the database node according to the backup policy and the result file of the last backup. The backup end time of node 1 is t3, the backup end time of node 2 is t4, and the backup of node 3 is performed. The end time is t5. After the backup is completed, each node names the backup data with the backup start time and returns the backup result. Cluster Manager records the result file and persists it.

In step 503, the data node daily archives the transaction log file, and the Cluster Manager archives the active transaction list snapshot file. The archiving cycle is 10 minutes.

The steps involved in the present embodiment for restoring the snapshot of the cluster database t7 in other production environments or intermediate machines include:

Step 504: Obtain a backup result file, and analyze the result file, and then obtain the latest backup data before time t7, that is, the data file backed up at time t1.

In step 505, if it is necessary to restore the data within the last ten minutes, it is also necessary to manually copy the transaction log that is still archived in the future to the target machine.

In step 506, the physical backup data is used to restore the backup data of each node. The new node 1 is restored to time t3.

In step 507, the time t6 of the latest acquisition of the active transaction list snapshot before the specified time t7 is matched according to the record in the active transaction list snapshot file. (The granularity of restoring snapshots is the period of taking snapshots of active transaction lists)

Step 508, analyzing the active transaction list at time t6, and obtaining the start time t2 of the first transaction in the active transaction list.

Step 509: Obtain a transaction log of the node time [tmin, tmax];

Where tmin=min{t2,t3},tmax=t7

Step 510, when tmin is less than t3, use the tool to generate a rollback statement of the distributed transaction that is still active at time t6 between [tmin, t3]; when tmin is not less than t3, the step is ignored.

Step 512, using the tool to redo all the single-machine transactions between [t3, t6] and in the collection: {gtid in There is no distributed transaction in the active transaction list at t6} ∩{less than next_gtid} at time t6.

Alternative embodiment five

Trading system based on MySQL distributed cluster database

In this alternative embodiment, the physical hot backup tool is used to back up single-node data before the upgrade. The ClusterManager periodically obtains a snapshot of the active transaction list data from the GTM, and archives the binlog log daily. During recovery, the single-node backup data, the binlog log, and the active transaction list snapshot are obtained, and the recovery and rollback operations are performed to obtain a database snapshot of the global transaction consistency before the upgrade. When the upgrade fails, you can roll back to that moment. The steps of the process include:

In step 601, the distributed database and the hypervisor (ClusterManager, GTM) are started. The time point is t0. From time t0, the ClusterManager periodically obtains a snapshot of the active transaction list from the GTM, writes the file, and persists.

Step 602: At time t1, the Cluster Manager initiates a backup request to the database node according to the backup policy and the result file of the last backup. The backup end time of node 1 is t3, the backup end time of node 2 is t4, and the backup of node 3 is performed. The end time is t5. After the backup is completed, each node names the backup data with the backup start time and returns the backup result. Cluster Manager records the result file and persists it.

In step 603, the data node daily archives the transaction log file, and the Cluster Manager archives the active transaction list snapshot file. The archiving cycle is 10 minutes.

For the requirements involved in this embodiment, the steps of restoring the snapshot of the cluster database t7 in other production environments or intermediate machines include:

In step 604, the backup result file is obtained, and the result file is analyzed, and then the latest backup data before time t7, that is, the data file backed up at time t1 is obtained.

In step 605, if it is necessary to restore the data within the last ten minutes, it is also necessary to manually copy the transaction log that is still archived in the future to the target machine.

In step 606, the physical backup data is used to restore the backup data of each node. The new node 1 is restored to time t3.

Step 607: Matching the record in the active transaction list snapshot file before the specified time t7 The time t6 at which the snapshot of the active transaction list was last taken. (The granularity of restoring snapshots is the period of taking snapshots of active transaction lists)

In step 608, the active transaction list at time t6 is analyzed, and the start time t2 of the first transaction in the active transaction list is obtained.

Step 609: Obtain a transaction log of the node time [tmin, tmax];

Where tmin=min{t2,t3},tmax=t7

In step 610, when tmin is less than t3, the tool is used to generate a rollback statement of the distributed transaction that is still active at time t6 between [tmin, t3]; when tmin is not less than t3, the step is ignored.

In step 611, the tool is used to redo all the single-machine transactions between [t3, t6] and all distributed transactions in the set: {gtid does not exist in the active transaction list at time t6} ∩ {less than the next_gtid at time t6}.

Embodiments of the present invention also provide a storage medium. Optionally, in the embodiment, the foregoing storage medium may be configured to store program code for performing the following steps:

S1. The cluster manager in the distributed database restores data of each node connected to the cluster manager to data at the end of the last physical backup before the specified time;

S2, the cluster manager re-does the operation indicated by the transaction in the transaction log according to the execution time of the transaction in the transaction log according to the data of the last physical backup end time, and obtains a data snapshot of each node at the specified time, wherein the transaction log The operation of the data archived between the end of the physical backup and the specified time.

For example, the specific examples in this embodiment may refer to the examples described in the foregoing embodiments and the optional embodiments, and details are not described herein again.

The above is only a preferred embodiment of the present invention and is not intended to limit the scope of the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and scope of the present invention are intended to be included within the scope of the present invention.

Industrial applicability

The method and device for restoring a data snapshot proposed by the embodiment of the present invention, wherein the method includes: the cluster manager in the distributed database restores data of each node connected to the cluster manager to a specified The data of the last physical backup end time before the moment; the cluster manager redoes the operation indicated by the transaction in the transaction log according to the execution time of the transaction in the transaction log based on the data of the last physical backup end time, and obtains the operation at the specified time at each specified time. A snapshot of the data of the node, where the transaction log is an operation performed on the data archived between the end of the physical backup and the specified time. The invention solves the problem that the database snapshot of the distributed database recovery historical moment in the related art cannot recover the database snapshot that meets the specified transaction time and meets the global transaction consistency.

Claims

A method for restoring a data snapshot, comprising:

The cluster manager in the distributed database restores the data of each node connected to the cluster manager itself to the data of the last physical backup end time before the specified time;

The cluster manager obtains a data snapshot of each node at the specified time according to the operation indicated by the transaction in the transaction log in the order of the execution time of the transaction in the transaction log based on the data of the last physical backup end time; wherein, the transaction log The operation of the data archived between the end of the physical backup and the specified time.
The recovery method according to claim 1, wherein the data of the cluster manager in the distributed database recovering the data of each node connected to the cluster manager to the end time of the latest physical backup before the specified time includes:

Obtaining, by the cluster manager, an active transaction list snapshot of the respective nodes acquired before the specified time, wherein the active transaction list snapshot records a connection with the cluster manager at the specified time A transaction that is active on one or more nodes for manipulating data;

The cluster manager searches for a start time of the first active transaction in the active transaction list;

When the start time is less than the physical backup end time, the cluster manager rolls back a transaction that is still active before the physical backup end time in the active transaction list snapshot to obtain the end of the latest physical backup. Time data

When the start time of the first active transaction is greater than the physical backup end time, the cluster manager acquires data of the last physical backup end time of each node before the specified time.
The recovery method according to claim 2, wherein said cluster manager redoes an operation indicated by a transaction in said transaction log in an order of execution time of a transaction in a transaction log based on data of a last physical backup end time Obtaining a data snapshot of each node in the specified moment includes:

The cluster manager according to the data of the last physical backup end time according to the matter Repetitively repeating the execution time of the transaction in the transaction log for the stand-alone transaction operation of the respective nodes indicated by the transaction in the transaction log, and according to the transaction log between the physical backup end time and the specified time The sequence of execution times of the transactions in the transaction redo the distributed transaction operations indicated by the transaction in the transaction log to obtain a snapshot of the data of the respective nodes at the specified time.
The recovery method according to claim 1, wherein the cluster manager in the distributed database restores data of each node connected to the cluster manager to data of a last physical backup end time before a specified time, the method Also includes:

The cluster manager periodically sends a physical backup instruction to each node connected to the cluster manager, wherein the physical backup instruction includes: a full backup instruction and/or an incremental backup instruction.
The recovery method according to claim 2, wherein the cluster manager in the distributed database restores data of each node connected to the cluster manager to data of a last physical backup end time before a specified time, the method Also includes:

The cluster manager periodically performs a snapshot operation on the respective nodes to obtain the active transaction list snapshot.
A data snapshot recovery device is applied to a cluster manager side in a distributed database, including:

The recovery module is configured to restore data of each node connected to the cluster manager itself to data of the last physical backup end time before the specified time;

a redo module, configured to redo the operation indicated by the transaction in the transaction log according to the execution time of the transaction in the transaction log based on the data of the last physical backup end time to obtain the respective nodes at the specified time a data snapshot; wherein the transaction log is an operation performed on data archived between the physical backup end time and the specified time.
The recovery device of claim 6, wherein the recovery module comprises:

a first obtaining unit, configured to acquire an active transaction list snapshot of the respective nodes acquired last time before the specified time, wherein the active transaction list snapshot records the cluster manager at the specified time A transaction that is active on one or more nodes connected to manipulate data;

a lookup unit, set to find the beginning of the first active transaction in the list of active transactions time;

a rollback unit, configured to roll back a transaction that is still active before the physical backup end time in the active transaction list snapshot when the start time of the first active transaction is less than the physical backup end time Obtaining data of the last physical backup end time;

The second obtaining unit is configured to acquire data of the last physical backup end time of each node before the specified time when the start time of the first active transaction is greater than the physical backup end time.
The recovery device according to claim 7, wherein

The redo module is further configured to redo the single machine of the each node indicated by the transaction in the transaction log according to the execution time of the transaction in the transaction log based on the data of the last physical backup end time Transaction operation, and redoing the distributed transaction operation indicated by the transaction in the transaction log in an order of execution time of the transaction in the transaction log between the physical backup end time and the specified time to obtain the designation A snapshot of the data of each node as described at the moment.
The recovery device of claim 6 further comprising:

And a sending module, configured to periodically send a physical backup instruction to each node connected to the cluster manager, where the physical backup instruction comprises: a full backup instruction and/or an incremental backup instruction.
The recovery device of claim 7, further comprising:

The execution module is configured to perform a snapshot operation on the respective nodes periodically to obtain the snapshot of the active transaction list.
A computer readable storage medium storing computer executable instructions for performing a method of restoring a data snapshot of any one of rights 1 to 5.