WO2016180160A1 - Procédé et appareil de récupération d'instantanés de données - Google Patents

Procédé et appareil de récupération d'instantanés de données Download PDF

Info

Publication number
WO2016180160A1
WO2016180160A1 PCT/CN2016/079475 CN2016079475W WO2016180160A1 WO 2016180160 A1 WO2016180160 A1 WO 2016180160A1 CN 2016079475 W CN2016079475 W CN 2016079475W WO 2016180160 A1 WO2016180160 A1 WO 2016180160A1
Authority
WO
WIPO (PCT)
Prior art keywords
transaction
data
time
snapshot
cluster manager
Prior art date
Application number
PCT/CN2016/079475
Other languages
English (en)
Chinese (zh)
Inventor
汪彦舒
陈河堆
贾新华
白涛
郭龙波
张宗禹
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2016180160A1 publication Critical patent/WO2016180160A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation

Definitions

  • the embodiment of the invention relates to the field of databases, and in particular to a method and a device for restoring data snapshots.
  • the mainstream database in the industry is a stand-alone database, such as Oracle, DB2, MySQL and so on.
  • database such as Oracle, DB2, MySQL and so on.
  • single-machine databases are increasingly unable to meet the needs of users for large storage and high performance, and the application of distributed databases is gradually becoming wider.
  • Solution 1 Use a backup tool on a single node for logical backup recovery or direct disk image backup and daily archive transaction logs. After the backup data is used to restore the backup data of each node, each node uses the transaction log to re-do the specified time.
  • the problems of this scheme are as follows: (1) Because the methods of backup and recovery of each node are adopted, and no mechanism is adopted to ensure the global consistency of the distributed database, the recovery of the database snapshot at any given time cannot be guaranteed. Global consistency; (2) During the backup process, logical backups are backed up in file units, and operations such as searching are performed to reduce the throughput of the disk, which has a great impact on the performance of the online service, resulting in low backup efficiency. (3) It takes a very long time to recover the database using the image.
  • Solution 2 Use the distributed transaction manager to coordinate when the distributed database is backed up, and only back up the single-node transaction and the distributed transactions successfully submitted by each node. This only guarantees that the backup data during the backup process and at the end of the backup is in global transactional consistency (that is, the database snapshot in the time period can be restored to be in the global transaction consistency state). If you want to restore to any time, you must rely on the transaction log and distributed transaction manager, in the playback process through the resource pool, lock and other mechanisms to achieve global transaction consistency at the moment of playback.
  • this scheme controls the data consistency at all times, resulting in performance degradation, which has a great impact on the online service; and in the process of redoing the transaction log, the data consistency at each moment is also guaranteed, and it is necessary to The resource is locked, causing the playback speed to be very slow.
  • the object of the embodiments of the present invention is to provide a method and a device for restoring a data snapshot, so as to solve at least the problem that the database snapshot of the distributed database recovery history in the related art cannot recover the database snapshot at any specified time and meet the global transaction consistency. .
  • a data snapshot recovery method including: a cluster manager in a distributed database restores data of each node connected to the cluster manager itself to a time before a specified time Data at the end of the physical backup; the cluster manager redoing the operation indicated by the transaction in the transaction log based on the execution time of the transaction in the transaction log based on the data of the last physical backup end time is obtained in the specified a data snapshot of each node at a time; wherein the transaction log is an operation performed on the data archived between the physical backup end time and the specified time.
  • the data that the cluster manager in the distributed database restores the data of each node connected to the cluster manager to the last physical backup end time before the specified time includes: the cluster manager acquires the location An active transaction list snapshot of the respective nodes acquired before the specified time, wherein the active transaction list snapshot records that one or more nodes connected to the cluster manager are active at the specified time a transaction for operating data; the cluster manager looks for a start time of the first active transaction in the active transaction list; when the start time of the first active transaction is less than the physical backup end time Retrieving, by the cluster manager, a transaction that is still active before the physical backup end time in the active transaction list snapshot to obtain data of the last physical backup end time; the start time is greater than the physical At the end of the backup, the cluster manager obtains the nodes before the specified time Near the end of a physical data backup time.
  • the cluster manager redoes the operation indicated by the transaction in the transaction log according to the execution time of the transaction in the transaction log based on the data of the last physical backup end time, and obtains each of the specified moments
  • the data snapshot of the node includes: the cluster manager is based on the The data of the last physical backup end time is redoed in the order of the execution time of the transaction in the transaction log, and the single-node transaction operation of the respective nodes indicated by the transaction in the transaction log, and at the end of the physical backup The sequence of execution times of the transactions in the transaction log between the specified times is redoed.
  • the distributed transaction operation indicated by the transaction in the transaction log obtains a snapshot of the data of the respective nodes at the specified time.
  • the method further includes: the cluster The manager periodically sends a physical backup instruction to each node connected to the cluster manager, wherein the physical backup instruction includes: a full backup instruction and/or an incremental backup instruction.
  • the method further includes: the cluster The manager periodically performs a snapshot operation on the respective nodes to obtain a snapshot of the active transaction list.
  • a data snapshot recovery apparatus which is applied to a cluster manager side in a distributed database, and includes: a recovery module, which is configured to connect each node connected to the cluster manager itself. The data is restored to the data of the last physical backup end time before the specified time; the redo module is set to redo the data in the transaction log according to the execution time of the transaction in the transaction log based on the data of the last physical backup end time. The operations indicated by the office obtain a snapshot of the data of the respective nodes at the specified time, wherein the transaction log is an operation performed on the data archived between the physical backup end time and the specified time.
  • the recovery module includes: a first obtaining unit, configured to acquire an active transaction list snapshot of the respective nodes acquired last time before the specified time, where the active transaction list snapshot records a transaction for operating data that is active on one or more nodes connected to the cluster manager at a specified time; a lookup unit configured to look up a start time of the first active transaction in the active transaction list; a rollback unit, configured to roll back a transaction that is still active before the physical backup end time in the active transaction list snapshot when the start time of the first active transaction is less than the physical backup end time Obtaining data of the last physical backup end time; the second obtaining unit is set to start at the first start When the start time of the hop transaction is greater than the physical backup end time, the data of the last physical backup end time of each node before the specified time is acquired.
  • a first obtaining unit configured to acquire an active transaction list snapshot of the respective nodes acquired last time before the specified time, where the active transaction list snapshot records a transaction for operating data that is active on one or more nodes connected to the cluster manager
  • the redo module is further configured to redo the data indicated by the transaction in the transaction log according to the execution time of the transaction in the transaction log based on the data of the last physical backup end time a single transaction operation of each node, and redoing the distributed transaction operation indicated by the transaction in the transaction log in an order of execution time of the transaction in the transaction log between the physical backup end time and the specified time Obtaining a snapshot of the data of each node in the specified time.
  • the apparatus further includes: a sending module, configured to periodically send a physical backup instruction to each node connected to the cluster manager, where the physical backup instruction comprises: a full backup instruction and/or an increment Backup instructions.
  • a sending module configured to periodically send a physical backup instruction to each node connected to the cluster manager, where the physical backup instruction comprises: a full backup instruction and/or an increment Backup instructions.
  • the apparatus further includes: an execution module, configured to periodically perform a snapshot operation on the respective nodes to obtain the active transaction list snapshot.
  • the embodiment of the invention further provides a computer readable storage medium storing computer executable instructions for performing a method for restoring a data snapshot of any of the above.
  • data of each node connected to the cluster manager is restored in the cluster manager in the distributed database to the data of the last physical backup end time before the specified time, and according to the data according to the transaction log.
  • the order of the execution time of the transaction rewritates the operation indicated by the transaction in the transaction log to obtain a snapshot of the data of each node at the specified time, that is, in the present embodiment, the data based on the last physical backup end time is redone according to the transaction log transaction.
  • the data snapshot of each node is obtained, so that the snapshot satisfies the global consistency, and the problem that the database snapshot of the distributed database recovery history moment cannot recover the database snapshot at any given time and meet the global transaction consistency is solved.
  • FIG. 1 is a flowchart of a method for restoring a data snapshot according to an embodiment of the present invention
  • FIG. 2 is a structural block diagram of a data snapshot recovery apparatus according to an embodiment of the present invention.
  • FIG. 3 is a block diagram 1 of an optional structure of a data snapshot recovery apparatus according to an embodiment of the present invention.
  • FIG. 4 is a block diagram 2 of an optional structure of a data snapshot recovery apparatus according to an embodiment of the present invention.
  • FIG. 5 is a block diagram 3 of an optional structure of a data snapshot recovery apparatus according to an embodiment of the present invention.
  • FIG. 6 is a schematic diagram of a database snapshot recovery system in accordance with an alternate embodiment of the present invention.
  • FIG. 7 is a flow chart of a database snapshot recovery method in accordance with an alternative embodiment of the present invention.
  • FIG. 8 is a timing diagram of database snapshot recovery in accordance with an alternate embodiment of the present invention.
  • FIG. 1 is a flowchart of a method for restoring a data snapshot according to an embodiment of the present invention. As shown in FIG. 1, the process includes the following steps:
  • Step 102 The cluster manager in the distributed database restores data of each node connected to the cluster manager to the data of the last physical backup end time before the specified time;
  • Step 104 The cluster manager redoes the operation indicated by the transaction in the transaction log according to the execution time of the transaction in the transaction log based on the data of the last physical backup end time, and obtains a snapshot of the data of each node at the specified time;
  • the transaction log is an operation performed on the data archived between the end of the physical backup and the specified time.
  • the data of each node connected to the cluster manager is restored to the data of the last physical backup end time before the specified time in the cluster manager in the distributed database, and based on the The data is redoed in the order of the execution time of the transaction in the transaction log, and the operation indicated by the transaction in the transaction log obtains a snapshot of the data of each node at the specified time, that is, in this embodiment, the data based on the last physical backup end time is used.
  • the transaction log transaction is redoed to obtain a snapshot of the data of each node, so that the snapshot satisfies the global consistency, thereby solving the problem that the database snapshot of the distributed database recovery history in the related art cannot be restored at any specified time and meets the global transaction consistency.
  • the data of each node connected to the cluster manager itself is restored to the data of the last physical backup end time before the specified time.
  • the implementation may be implemented as follows:
  • Step 102-1 The cluster manager obtains a snapshot of the active transaction list of each node acquired last time before the specified time, wherein the active transaction list snapshot records that one or more nodes connected to the cluster manager at the specified time are being An active transaction for operating data;
  • Step 102-2 The cluster manager searches for the start time of the first active transaction in the active transaction list
  • Step 102-3 When the start time of the first active transaction is less than the physical backup end time, the cluster manager rolls back the transaction that is still active before the physical backup end time in the active transaction list snapshot to obtain the last physical backup end. Time data
  • Step 102-4 When the start time of the first active transaction is greater than the physical backup end time, the cluster manager acquires data of the last physical backup end time of each node before the specified time.
  • the active transaction list involved in the above step 102-1 it is obtained periodically, and the earliest active transaction in the obtained active transaction list may be before the physical backup end time or After the physical backup, the corresponding transaction log is obtained according to different situations.
  • the transaction log is redoed according to the execution time of the transaction in the transaction log.
  • the operation indicated by the office is obtained by taking a snapshot of the data of each node at a specified time.
  • the method can be implemented as follows:
  • the cluster manager redoes the single-node transaction operation of each node indicated by the transaction in the transaction log based on the execution time of the transaction in the transaction log based on the data of the last physical backup end time, and according to the physical backup end time to the specified time
  • the order of execution time of transactions in the transaction log is redoed.
  • the distributed transaction operation indicated by the transaction in the transaction log obtains a snapshot of the data of each node at the specified time.
  • the first active transaction in the active transaction list is that before the end of the physical backup, when the database snapshot is obtained at the specified time, the active transaction needs to be active between the start time and the physical backup end time in the active transaction list.
  • Rollback is performed to ensure the global consistency of the snapshot before the physical backup ends. For the snapshot to the specified time after the physical backup ends, only the transaction log of the period of time is required to roll forward to obtain the global consistency data. Snapshot.
  • the cluster manager in the distributed database may restore the data of each node connected to the cluster manager to the data of the last physical backup end time before the specified time.
  • the method in this embodiment may further include: :
  • the cluster manager periodically sends a physical backup command to each node connected to itself, where the physical backup command includes: a full backup command and/or an incremental backup command, and the backup mode and the logical backup are performed by the backup mode in this embodiment. It must be much faster.
  • the database is backed up in full increments. When recovering, only one full backup data and one incremental backup data are needed to quickly recover the database, and the physical backup of the database is used. During the backup process, only the database files are copied, which has little impact on the online business.
  • the cluster manager in the distributed database may restore the data of each node connected to the data to the data of the last physical backup end time before the specified time.
  • the method in this embodiment may further include: The manager periodically performs a snapshot operation on each node to get a snapshot of the active transaction list.
  • the method according to the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course, by hardware, but in many cases, the former is A better implementation.
  • the technical solution of the present invention may contribute to the prior art in part or in the software product.
  • the computer software product is stored in a storage medium (such as ROM/RAM, disk, optical disk), and includes a plurality of instructions for making a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) Methods of various embodiments of the invention are performed.
  • a recovery device for the database snapshot is provided.
  • the recovery device is used to implement the foregoing embodiments and preferred embodiments, and details are not described herein.
  • the term "module” may implement a combination of software and/or hardware of a predetermined function.
  • the apparatus described in the following embodiments is preferably implemented in software, hardware, or a combination of software and hardware, is also possible and contemplated.
  • FIG. 2 is a structural block diagram of a data snapshot recovery apparatus according to an alternative embodiment of the present invention, which is applied to a cluster manager in a distributed database. As shown in FIG. 2, the apparatus includes:
  • the recovery module 22 is configured to restore data of each node connected to the cluster manager itself to data of the last physical backup end time before the specified time;
  • the redo module 24 is coupled to the recovery module 22, and is configured to redo the operation indicated by the transaction in the transaction log according to the execution time of the transaction in the transaction log based on the data of the last physical backup end time to obtain the operation at the specified time.
  • FIG. 3 is a block diagram of an optional structure of a data snapshot recovery apparatus according to an embodiment of the present invention.
  • the recovery module 22 shown in FIG. 2 includes:
  • the first obtaining unit 32 is configured to acquire an active transaction list snapshot of each node acquired last time before the specified time, wherein the active transaction list snapshot records that one or more nodes connected to the cluster manager at the specified time are being An active transaction for operating data;
  • the searching unit 34 is coupled to the first obtaining unit 32 and configured to search for a starting moment of the first active transaction in the active transaction list;
  • the rollback unit 36 is coupled to the search unit 34 and configured to roll back a transaction that is still active before the physical backup end time in the active transaction list snapshot to obtain the last physical backup end time when the start time is less than the physical backup end time.
  • the second obtaining unit 38 is coupled to the rollback unit 36, and is configured to acquire, when the start time is greater than the physical backup end time, the end of the last physical backup of each node before the specified time. Engraved data.
  • the redo module 24 involved in this embodiment is further configured to redo the transaction log according to the execution time of the transaction in the transaction log based on the data of the last physical backup end time.
  • FIG. 4 is a block diagram 2 of an optional structure of a data snapshot recovery apparatus according to an embodiment of the present invention.
  • a cluster manager in a distributed database restores data of each node connected to the cluster manager to a specified Before the data of the last physical backup end time before the moment, the recovery device further includes:
  • the sending module 42 is coupled to the recovery module 22 and configured to periodically send a physical backup instruction to each node connected to the cluster manager, wherein the physical backup instruction comprises: a full backup instruction and/or an incremental backup instruction.
  • FIG. 5 is a block diagram 3 of an optional structure of a data snapshot recovery apparatus according to an embodiment of the present invention.
  • a cluster manager in a distributed database restores data of each node connected to the cluster manager to a specified one.
  • the recovery device further includes:
  • the execution module 52 is coupled to the recovery module 22 and is configured to perform a snapshot operation on each node periodically to obtain an active transaction list snapshot.
  • each of the above modules may be implemented by software or hardware.
  • the foregoing may be implemented by, but not limited to, the foregoing modules are all located in the same processor; or, the modules are located in multiple In the processor.
  • the optional embodiment adopts a periodic distributed database physical backup (including full backup and incremental backup), a daily archived transaction log, and an active transaction list snapshot captured according to a certain frequency.
  • the three basic data can be used. Restores a database snapshot at any given time and meets the global transaction consistency, and all of the above processes have no effect on the online business. That is, the embodiment used in this embodiment It is global consistency processing on the data recovered by a single node.
  • the global data consistency problem scenario caused by the distributed transaction that the snapshot time is abnormally ended in the optional embodiment includes: (1) a distributed transaction in which some nodes submit successfully and some nodes have not yet submitted; There is a distributed transaction in a part of the node submitted successfully and some of the node failed to submit, and the distributed database has not rolled back the transaction in the node that successfully submitted; (3) there are transactions that all nodes have not submitted or submitted failed.
  • the physical hot backup method is used to obtain the database backup, which has almost no negative impact on the execution of the structured query language SQL, and the backup data can be quickly recovered by using the full-time and one-time incremental backup data. .
  • FIG. 6 is a schematic diagram of a database snapshot recovery system according to an alternative embodiment of the present invention.
  • the system includes: a Data Node Cluster: a data node cluster for storing data; and a SQL Node Cluster: SQL node cluster.
  • a Data Node Cluster a data node cluster for storing data
  • a SQL Node Cluster SQL node cluster.
  • For SQL splitting and parsing Global Transaction Manager GTM (Global Transaction Manager): used to manage global transactions.
  • Cluster Manager Cluster manager for managing Data Node Cluster and SQL Node Cluster.
  • FIG. 7 is a flowchart of a database snapshot recovery method according to an alternative embodiment of the present invention. As shown in FIG. 7, the steps of the method include:
  • Step 41 Obtain an active transaction list snapshot.
  • Step 42 physical hot backup database
  • Step 43 periodically archiving the transaction log
  • the above steps 41 to 43 are used to generate basic data, and the Cluster Manager periodically issues a full or incremental backup command to each node according to the backup policy, and the Cluster Manager periodically (configurable) obtains an active transaction list snapshot from the GTM. Persistence to disk files (active transaction list files); database nodes periodically archive database transaction logs.
  • the method also includes:
  • Step 44 Obtain a backup database file, a log file, and an active transaction list file.
  • Step 45 restoring single node consistency data
  • Step 46 Restore global consistency data
  • the step 44 to the step 46 refers to restoring to a distributed database snapshot at any given time, wherein for each database node, according to the full amount + incremental backup data, the data of the single node database is restored to the backup end time.
  • sexual state A snapshot of the active transaction list at the specified time is retrieved from the active transaction list file, and the corresponding archived transaction log is retrieved. According to the active transaction list snapshot and the transaction log, all database transactions from the backup time to the specified time are rolled forward or rolled back, so that the snapshot satisfies the global consistency of the distributed database.
  • the physical hot backup of the database is adopted, and the backup and recovery are much faster than the logical backup.
  • the optional embodiment uses a full amount of incremental data to back up the database. When recovering, only one full backup data and one incremental backup data are needed to quickly recover the database.
  • This alternative embodiment adopts a physical hot backup method for the database. In the backup process, only the database file is copied, which has little impact on the online service.
  • the database snapshot can satisfy the global transaction consistency. At the specified time, the database snapshot does not exist, and some data nodes have not submitted or submitted the failed transaction. A database snapshot can be restored at any given time (to get the granularity of the active transaction list period).
  • FIG. 8 is a database snapshot according to an alternative embodiment of the present invention. Time map of recovery
  • the optional embodiment takes the MariaDB distributed system as an example, and the step of obtaining the database data consistency snapshot method includes: the initial basic data generation process includes steps 201 to 203, and the database snapshot of the specified time t7 is restored based on the initial basic data.
  • the steps include steps 204 through 211.
  • step 201 the distributed database and the hypervisor (ClusterManager, GTM) are started.
  • the time point is t0.
  • the ClusterManager periodically obtains a snapshot of the active transaction list from the GTM, writes the file, and persists.
  • Step 202 At time t1, the Cluster Manager initiates a backup request to the database node according to the backup policy and the result file of the last backup.
  • the backup end time of node 1 is t3
  • the backup end time of node 2 is t4
  • the backup of node 3 is performed.
  • the end time is t5.
  • each node names the backup data with the backup start time and returns the backup result.
  • the Cluster Manager records the result. And persist.
  • step 203 the data node routinely archives the transaction log file, and the Cluster Manager archives the active transaction list snapshot file, wherein the archive period is configurable.
  • Step 204 Obtain a backup result file, and analyze the result file, and then obtain the latest backup data before time t7, that is, the data file backed up at time t1;
  • step 205 the physical backup data is used to restore the backup data of each node, and the node 1 is restored to the time t3.
  • Step 206 Matching the record in the snapshot file of the active transaction list to the time t6 when the snapshot of the active transaction list is obtained last time before the specified time t7, wherein the granularity of the restored snapshot is the period for obtaining the snapshot of the active transaction list;
  • Step 207 analyzing the active transaction list at time t6, obtaining the start time t2 of the first transaction in the active transaction list;
  • Step 208 Obtain a transaction log of the node time [tmin, tmax];
  • Step 209 when tmin is less than t3, use a tool to generate a rollback statement of the distributed transaction that is still active at time t6 between [tmin, t3]; when tmin is not less than t3, the step is ignored;
  • Step 210 using the tool to redo all the single-machine transactions between [t3, t6] and all distributed transactions in the set: ⁇ gtid does not exist in the active transaction list at time t6 ⁇ ⁇ ⁇ less than the next_gtid at time t6 ⁇ ;
  • step 211 the generated rollback statement is executed.
  • the application scenario of the optional embodiment is: disaster recovery of an online payment system based on a MySQL distributed cluster database;
  • the physical hot backup tool is used to back up single-node data
  • the ClusterManager periodically obtains a snapshot of the active transaction list data from the GTM, and archives the binlog log daily.
  • the steps of the alternative embodiment include:
  • step 301 the distributed database and the hypervisor (ClusterManager, GTM) are started.
  • the time point is t0.
  • the ClusterManager periodically obtains a snapshot of the active transaction list from the GTM, writes the file, and persists.
  • Step 302 At time t1, the Cluster Manager initiates a backup request to the database node according to the backup policy and the result file of the last backup.
  • the backup end time of node 1 is t3
  • the backup end time of node 2 is t4
  • the backup of node 3 is performed.
  • the end time is t5.
  • each node names the backup data with the backup start time and returns the backup result.
  • Cluster Manager records the result file and persists it.
  • step 303 the data node archives the transaction log file daily, and the Cluster Manager archives the active transaction list snapshot file, wherein the archiving period can be 10 minutes.
  • the snapshot of the cluster database t7 is required to be restored in another production environment or an intermediate machine, and the steps of the recovery include:
  • Step 304 Obtain a backup result file, and analyze the result file, and then obtain the latest backup data before time t7, that is, the data file backed up at time t1;
  • Step 305 if it is necessary to restore the data within the last ten minutes, it is also necessary to manually copy the transaction log that is still archived in the future to the target machine;
  • Step 306 recovering the backup data of each node by using a physical hot backup tool, and the new node 1 is restored to the time t3;
  • Step 307 Matching the record in the active transaction list snapshot file to the time t6 when the snapshot of the active transaction list was last acquired before the specified time t7. (The granularity of restoring snapshots is the period of taking snapshots of active transaction lists)
  • Step 308 analyzing the active transaction list at time t6, and obtaining the start time t2 of the first transaction in the active transaction list.
  • Step 309 obtaining a transaction log of the node time [tmin, tmax];
  • Step 310 when tmin is less than t3, use the tool to generate [tmin, t3] at time t6 A rollback statement for a distributed transaction that is still active; ignore this step when tmin is not less than t3.
  • Step 311 using the tool to redo all the single-machine transactions between [t3, t6] and all distributed transactions in the set: ⁇ gtid does not exist in the active transaction list at time t6 ⁇ ⁇ ⁇ less than the next_gtid at time t6 ⁇ .
  • the physical hot backup tool is used to back up the single node data
  • the ClusterManager periodically obtains a snapshot of the active transaction list data from the GTM, and archives the binlog log daily.
  • the single-node backup data, the binlog log, and the active transaction list snapshot are taken for recovery, rollback, and redo operations, so that the database snapshot of the global transaction consistency before the operation and maintenance personnel deletes the data table by mistake.
  • the erroneous deletion data table can be exported and imported into the online system manually. The specific steps are as follows.
  • step 401 the distributed database and the hypervisor (ClusterManager, GTM) are started.
  • the time point is t0.
  • the ClusterManager periodically obtains a snapshot of the active transaction list from the GTM, writes the file, and persists.
  • Step 402 At time t1, the Cluster Manager initiates a backup request to the database node according to the backup policy and the result file of the last backup.
  • the backup end time of node 1 is t3
  • the backup end time of node 2 is t4
  • the backup of node 3 is performed.
  • the end time is t5.
  • each node names the backup data with the backup start time and returns the backup result.
  • Cluster Manager records the result file and persists it.
  • step 403 the data node daily archives the transaction log file, and the Cluster Manager archives the active transaction list snapshot file.
  • the archiving cycle is 10 minutes.
  • the recovery of the cluster database t7 time snapshot in other production environments or intermediate machines includes the following steps:
  • Step 404 Obtain a backup result file, and analyze the result file, and then obtain the latest backup data before time t7, that is, the data file backed up at time t1.
  • Step 405 if it is necessary to restore the data within the last ten minutes, it is also necessary to manually The archived transaction log is copied to the target machine.
  • step 406 the physical backup data is used to restore the backup data of each node.
  • the new node 1 is restored to time t3.
  • Step 407 Matching the record in the active transaction list snapshot file to the time t6 when the snapshot of the active transaction list was last acquired before the specified time t7. (The granularity of restoring snapshots is the period of taking snapshots of active transaction lists)
  • step 408 the active transaction list at time t6 is analyzed, and the start time t2 of the first transaction in the active transaction list is obtained.
  • Step 409 obtaining a transaction log of the node time [tmin, tmax];
  • Step 410 When tmin is less than t3, use the tool to generate a rollback statement of the distributed transaction that is still active at time t6 between [tmin, t3]; when tmin is not less than t3, the step is ignored.
  • step 411 the tool is used to redo all the single-machine transactions between [t3, t6] and all distributed transactions in the set: ⁇ gtid does not exist in the active transaction list at time t6 ⁇ ⁇ ⁇ less than the next_gtid at time t6 ⁇ .
  • the data table file that was accidentally deleted can be derived by the above steps of the alternative embodiment. That is, you can restore to the time before the database was accidentally deleted.
  • the snapshot of the time before the deletion of the information in the alternative embodiment may be utilized.
  • the deleted data table in the database snapshot at this moment still exists, and the data table can be exported by using the database tool and re-imported into the current time database.
  • the physical hot backup tool is used to back up the single node data
  • the ClusterManager periodically obtains a snapshot of the active transaction list data from the GTM, and archives the redo log daily.
  • the single-node backup data, the redo log, and the active transaction list snapshot are obtained, and recovery, rollback, and redo operations are performed to obtain a database snapshot of the global transaction consistency before the data mutation. With this snapshot, it is possible to test the database and the room separately after improving the data distribution.
  • the steps of the process include:
  • step 501 the distributed database and the hypervisor (ClusterManager, GTM) are started.
  • the time point is t0.
  • the ClusterManager periodically obtains a snapshot of the active transaction list from the GTM, writes the file, and persists.
  • Step 502 At time t1, the Cluster Manager initiates a backup request to the database node according to the backup policy and the result file of the last backup.
  • the backup end time of node 1 is t3
  • the backup end time of node 2 is t4
  • the backup of node 3 is performed.
  • the end time is t5.
  • each node names the backup data with the backup start time and returns the backup result.
  • Cluster Manager records the result file and persists it.
  • step 503 the data node daily archives the transaction log file, and the Cluster Manager archives the active transaction list snapshot file.
  • the archiving cycle is 10 minutes.
  • the steps involved in the present embodiment for restoring the snapshot of the cluster database t7 in other production environments or intermediate machines include:
  • Step 504 Obtain a backup result file, and analyze the result file, and then obtain the latest backup data before time t7, that is, the data file backed up at time t1.
  • step 505 if it is necessary to restore the data within the last ten minutes, it is also necessary to manually copy the transaction log that is still archived in the future to the target machine.
  • step 506 the physical backup data is used to restore the backup data of each node.
  • the new node 1 is restored to time t3.
  • step 507 the time t6 of the latest acquisition of the active transaction list snapshot before the specified time t7 is matched according to the record in the active transaction list snapshot file. (The granularity of restoring snapshots is the period of taking snapshots of active transaction lists)
  • Step 508 analyzing the active transaction list at time t6, and obtaining the start time t2 of the first transaction in the active transaction list.
  • Step 509 Obtain a transaction log of the node time [tmin, tmax];
  • Step 510 when tmin is less than t3, use the tool to generate a rollback statement of the distributed transaction that is still active at time t6 between [tmin, t3]; when tmin is not less than t3, the step is ignored.
  • Step 512 using the tool to redo all the single-machine transactions between [t3, t6] and in the collection: ⁇ gtid in There is no distributed transaction in the active transaction list at t6 ⁇ ⁇ less than next_gtid ⁇ at time t6.
  • the physical hot backup tool is used to back up single-node data before the upgrade.
  • the ClusterManager periodically obtains a snapshot of the active transaction list data from the GTM, and archives the binlog log daily.
  • the single-node backup data, the binlog log, and the active transaction list snapshot are obtained, and the recovery and rollback operations are performed to obtain a database snapshot of the global transaction consistency before the upgrade.
  • the steps of the process include:
  • step 601 the distributed database and the hypervisor (ClusterManager, GTM) are started.
  • the time point is t0.
  • the ClusterManager periodically obtains a snapshot of the active transaction list from the GTM, writes the file, and persists.
  • Step 602 At time t1, the Cluster Manager initiates a backup request to the database node according to the backup policy and the result file of the last backup.
  • the backup end time of node 1 is t3
  • the backup end time of node 2 is t4
  • the backup of node 3 is performed.
  • the end time is t5.
  • each node names the backup data with the backup start time and returns the backup result.
  • Cluster Manager records the result file and persists it.
  • step 603 the data node daily archives the transaction log file, and the Cluster Manager archives the active transaction list snapshot file.
  • the archiving cycle is 10 minutes.
  • the steps of restoring the snapshot of the cluster database t7 in other production environments or intermediate machines include:
  • step 604 the backup result file is obtained, and the result file is analyzed, and then the latest backup data before time t7, that is, the data file backed up at time t1 is obtained.
  • step 605 if it is necessary to restore the data within the last ten minutes, it is also necessary to manually copy the transaction log that is still archived in the future to the target machine.
  • step 606 the physical backup data is used to restore the backup data of each node.
  • the new node 1 is restored to time t3.
  • Step 607 Matching the record in the active transaction list snapshot file before the specified time t7 The time t6 at which the snapshot of the active transaction list was last taken. (The granularity of restoring snapshots is the period of taking snapshots of active transaction lists)
  • step 608 the active transaction list at time t6 is analyzed, and the start time t2 of the first transaction in the active transaction list is obtained.
  • Step 609 Obtain a transaction log of the node time [tmin, tmax];
  • step 610 when tmin is less than t3, the tool is used to generate a rollback statement of the distributed transaction that is still active at time t6 between [tmin, t3]; when tmin is not less than t3, the step is ignored.
  • step 611 the tool is used to redo all the single-machine transactions between [t3, t6] and all distributed transactions in the set: ⁇ gtid does not exist in the active transaction list at time t6 ⁇ ⁇ ⁇ less than the next_gtid at time t6 ⁇ .
  • Embodiments of the present invention also provide a storage medium.
  • the foregoing storage medium may be configured to store program code for performing the following steps:
  • the cluster manager in the distributed database restores data of each node connected to the cluster manager to data at the end of the last physical backup before the specified time;
  • the cluster manager re-does the operation indicated by the transaction in the transaction log according to the execution time of the transaction in the transaction log according to the data of the last physical backup end time, and obtains a data snapshot of each node at the specified time, wherein the transaction log The operation of the data archived between the end of the physical backup and the specified time.
  • the method and device for restoring a data snapshot proposed by the embodiment of the present invention, wherein the method includes: the cluster manager in the distributed database restores data of each node connected to the cluster manager to a specified The data of the last physical backup end time before the moment; the cluster manager redoes the operation indicated by the transaction in the transaction log according to the execution time of the transaction in the transaction log based on the data of the last physical backup end time, and obtains the operation at the specified time at each specified time.
  • the invention solves the problem that the database snapshot of the distributed database recovery historical moment in the related art cannot recover the database snapshot that meets the specified transaction time and meets the global transaction consistency.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Retry When Errors Occur (AREA)

Abstract

L'invention concerne un procédé et un appareil de récupération d'instantanés de données. Le procédé comprend : la récupération par un gestionnaire de groupe dans une base de données distribuée de données de divers nœuds connectés au gestionnaire de groupe jusqu'à des données générées jusqu'à un moment préalable à un moment désigné, lorsqu'une dernière sauvegarde physique se termine (102) ; et la ré-exécution par le gestionnaire de groupe, en fonction des données générées jusqu'au moment où la dernière sauvegarde physique se termine, d'opérations indiquées par des transactions dans un journal de transactions dans un ordre de temps d'exécution des transactions dans le journal de transactions, de manière à obtenir des instantanés de données, des divers nœuds, générés au moment désigné, le journal de transactions se rapportant à des opérations sur des données déposées entre le moment où la sauvegarde physique se termine et le moment désigné (104). Le procédé résout le problème dans l'état de la technique de l'impossibilité pour une base de données distribuée de récupérer un instantané de base de données généré à n'importe quel moment désigné et garantit une cohérence de transactions globale lors de la récupération de l'instantané de base de données généré à un moment historique.
PCT/CN2016/079475 2015-10-23 2016-04-15 Procédé et appareil de récupération d'instantanés de données WO2016180160A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510698124.6 2015-10-23
CN201510698124.6A CN106610876B (zh) 2015-10-23 2015-10-23 数据快照的恢复方法及装置

Publications (1)

Publication Number Publication Date
WO2016180160A1 true WO2016180160A1 (fr) 2016-11-17

Family

ID=57247762

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/079475 WO2016180160A1 (fr) 2015-10-23 2016-04-15 Procédé et appareil de récupération d'instantanés de données

Country Status (2)

Country Link
CN (1) CN106610876B (fr)
WO (1) WO2016180160A1 (fr)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108304527A (zh) * 2018-01-25 2018-07-20 杭州哲信信息技术有限公司 一种数据提取方法
CN109408289A (zh) * 2018-10-16 2019-03-01 国网山东省电力公司信息通信公司 一种云容灾数据处理方法
CN110286732A (zh) * 2019-06-27 2019-09-27 无锡华云数据技术服务有限公司 高可用集群掉电自动恢复方法、装置、设备及存储介质
CN111651303A (zh) * 2020-07-07 2020-09-11 南京云信达科技有限公司 一种分布式架构的数据库在线备份和恢复方法技术领域
CN112463447A (zh) * 2020-11-25 2021-03-09 浪潮云信息技术股份公司 一种基于分布式数据库实现物理备份的优化方法
CN116541206A (zh) * 2023-04-10 2023-08-04 泽拓科技(深圳)有限责任公司 分布式数据集群的数据恢复方法、装置和电子设备

Families Citing this family (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3627359B1 (fr) * 2017-06-05 2023-10-04 Huawei Technologies Co., Ltd. Procédé, dispositif et équipement de traitement de transaction
CN107451013B (zh) * 2017-06-30 2020-12-25 北京奇虎科技有限公司 基于分布式系统的数据恢复方法、装置及系统
CN110121712B (zh) * 2017-12-05 2022-04-05 华为技术有限公司 一种日志管理方法、服务器和数据库系统
CN110019469B (zh) 2017-12-07 2022-06-21 金篆信科有限责任公司 分布式数据库数据处理方法、装置、存储介质及电子装置
CN111226200B (zh) * 2018-03-23 2023-06-27 华为云计算技术有限公司 为分布式应用创建一致性快照的方法、装置和分布式系统
CN110309227B (zh) * 2018-05-28 2022-12-13 腾讯科技(深圳)有限公司 分布式数据回档方法、装置和计算机可读存储介质
CN108959547B (zh) * 2018-07-02 2022-02-18 上海浪潮云计算服务有限公司 一种pv快照分布式数据库集群恢复方法
CN109144785B (zh) * 2018-08-27 2020-07-28 北京百度网讯科技有限公司 用于备份数据的方法和装置
CN109144790A (zh) * 2018-09-30 2019-01-04 广州鼎甲计算机科技有限公司 MySQL数据库的合成备份方法和装置
CN109271398B (zh) * 2018-10-29 2020-06-23 东软集团股份有限公司 数据库事务处理方法、装置、设备和计算机可读存储介质
CN111177141A (zh) * 2018-11-09 2020-05-19 上海擎感智能科技有限公司 利用MySQL并行复制恢复数据方法、设备及系统
CN109885427A (zh) * 2019-01-31 2019-06-14 郑州云海信息技术有限公司 一种数据库短期数据保护方法、装置、存储器及设备
US11080257B2 (en) * 2019-05-13 2021-08-03 Snowflake Inc. Journaled tables in database systems
CN110807064B (zh) * 2019-10-28 2022-08-26 北京优炫软件股份有限公司 Rac分布式数据库集群系统中的数据恢复装置
CN111124751B (zh) * 2019-11-12 2023-11-17 华为云计算技术有限公司 数据恢复方法及系统、数据存储节点、数据库管理节点
CN111338845B (zh) * 2020-02-16 2021-05-07 西安奥卡云数据科技有限公司 一种细粒度的本地数据保护方法
CN111522631B (zh) * 2020-03-23 2024-02-06 支付宝(杭州)信息技术有限公司 分布式事务处理方法、装置、服务器及介质
CN111611108A (zh) * 2020-05-21 2020-09-01 云和恩墨(北京)信息技术有限公司 虚拟数据库还原的方法及装置
CN113297230B (zh) * 2020-07-27 2024-03-08 阿里巴巴集团控股有限公司 数据验证方法及装置
CN112000521B (zh) * 2020-08-24 2021-08-27 中国银联股份有限公司 分布式数据库系统的全量备份方法、装置及计算机可读存储介质
CN112286870A (zh) * 2020-11-02 2021-01-29 四川长虹电器股份有限公司 一种获取数据库一致性快照的方法
CN112579613B (zh) * 2020-12-31 2023-02-17 华东计算技术研究所(中国电子科技集团公司第三十二研究所) 数据库集群差异比对与数据同步的方法、系统及介质
CN115658239B (zh) * 2022-12-23 2023-04-28 安超云软件有限公司 一种快照管理方法、系统及计算机可读介质
CN116107807B (zh) * 2023-01-10 2023-10-13 北京万里开源软件有限公司 数据库中数据备份时获取全局一致性点位的方法及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110184915A1 (en) * 2010-01-28 2011-07-28 Microsoft Corporation Cluster restore and rebuild
CN102419758A (zh) * 2010-09-28 2012-04-18 金蝶软件(中国)有限公司 数据处理系统及方法
CN102662793A (zh) * 2012-03-07 2012-09-12 江苏引跑网络科技有限公司 一种可保证数据一致性的分布式数据库热备份与恢复方法
CN103699548A (zh) * 2012-09-27 2014-04-02 阿里巴巴集团控股有限公司 一种通过使用日志恢复数据库数据的方法及设备
CN103885856A (zh) * 2014-03-10 2014-06-25 北京大学 一种基于消息再生机制的图计算容错方法及系统

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8818960B2 (en) * 2011-03-18 2014-08-26 Microsoft Corporation Tracking redo completion at a page level
US8949190B2 (en) * 2011-11-07 2015-02-03 Sap Se Point-in-time database recovery using log holes
CN103198159B (zh) * 2013-04-27 2016-01-06 国家计算机网络与信息安全管理中心 一种基于事务重做的异构集群多副本一致性维护方法
CN103412803B (zh) * 2013-08-15 2016-08-10 华为技术有限公司 数据恢复的方法及装置
US9317379B2 (en) * 2014-01-24 2016-04-19 International Business Machines Corporation Using transactional execution for reliability and recovery of transient failures

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110184915A1 (en) * 2010-01-28 2011-07-28 Microsoft Corporation Cluster restore and rebuild
CN102419758A (zh) * 2010-09-28 2012-04-18 金蝶软件(中国)有限公司 数据处理系统及方法
CN102662793A (zh) * 2012-03-07 2012-09-12 江苏引跑网络科技有限公司 一种可保证数据一致性的分布式数据库热备份与恢复方法
CN103699548A (zh) * 2012-09-27 2014-04-02 阿里巴巴集团控股有限公司 一种通过使用日志恢复数据库数据的方法及设备
CN103885856A (zh) * 2014-03-10 2014-06-25 北京大学 一种基于消息再生机制的图计算容错方法及系统

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108304527A (zh) * 2018-01-25 2018-07-20 杭州哲信信息技术有限公司 一种数据提取方法
CN109408289A (zh) * 2018-10-16 2019-03-01 国网山东省电力公司信息通信公司 一种云容灾数据处理方法
CN110286732A (zh) * 2019-06-27 2019-09-27 无锡华云数据技术服务有限公司 高可用集群掉电自动恢复方法、装置、设备及存储介质
CN110286732B (zh) * 2019-06-27 2021-01-12 华云数据控股集团有限公司 高可用集群掉电自动恢复方法、装置、设备及存储介质
CN111651303A (zh) * 2020-07-07 2020-09-11 南京云信达科技有限公司 一种分布式架构的数据库在线备份和恢复方法技术领域
CN112463447A (zh) * 2020-11-25 2021-03-09 浪潮云信息技术股份公司 一种基于分布式数据库实现物理备份的优化方法
CN116541206A (zh) * 2023-04-10 2023-08-04 泽拓科技(深圳)有限责任公司 分布式数据集群的数据恢复方法、装置和电子设备
CN116541206B (zh) * 2023-04-10 2024-05-07 泽拓科技(深圳)有限责任公司 分布式数据集群的数据恢复方法、装置和电子设备

Also Published As

Publication number Publication date
CN106610876A (zh) 2017-05-03
CN106610876B (zh) 2020-11-03

Similar Documents

Publication Publication Date Title
WO2016180160A1 (fr) Procédé et appareil de récupération d'instantanés de données
EP3508978B1 (fr) Catalogue distribué, mémoire de données et indexation
US10169385B2 (en) Managing replicated data
CN106407356B (zh) 一种数据备份方法及装置
CN101334797B (zh) 一种分布式文件系统及其数据块一致性管理的方法
US7610314B2 (en) Online tablespace recovery for export
EP2795476B1 (fr) Clichés instantanés d'un volume partagé cohérents au niveau des applications
US9251008B2 (en) Client object replication between a first backup server and a second backup server
US8108575B2 (en) Methods of multi-server application synchronization without stopping I/O
WO2020207010A1 (fr) Procédé et dispositif de sauvegarde de données, et support de stockage lisible par ordinateur
CN111078667B (zh) 一种数据迁移的方法以及相关装置
EP3822793A1 (fr) Procédé et dispositif de récupération de données, serveur et support de stockage lisible par ordinateur
EP3039568B1 (fr) Système serveur de synchronisation de fichier pour reprise après incident répartie
US11341100B2 (en) System and method for eliminating full rescan synchronizations on service restarts
US11042454B1 (en) Restoration of a data source
US10936430B2 (en) Method and system for automation of differential backups
CN112346907B (zh) 一种基于异构对象存储的数据备份恢复方法及系统
US10289495B1 (en) Method and system for performing an item level restore from a backup
CN114661690A (zh) 多版本并发控制和日志清除方法、节点、设备和介质
US11074141B2 (en) Database recovery using shared memory
US11334455B2 (en) Systems and methods for repairing a data store of a mirror node
US10901641B2 (en) Method and system for inline deduplication

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16792010

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16792010

Country of ref document: EP

Kind code of ref document: A1