WO2017122060A1 - Récupération parallèle pour bases de données à disque partagé - Google Patents

Récupération parallèle pour bases de données à disque partagé Download PDF

Info

Publication number
WO2017122060A1
WO2017122060A1 PCT/IB2016/055065 IB2016055065W WO2017122060A1 WO 2017122060 A1 WO2017122060 A1 WO 2017122060A1 IB 2016055065 W IB2016055065 W IB 2016055065W WO 2017122060 A1 WO2017122060 A1 WO 2017122060A1
Authority
WO
WIPO (PCT)
Prior art keywords
log
data recovery
shared
database
modify
Prior art date
Application number
PCT/IB2016/055065
Other languages
English (en)
Inventor
Nirmala Sreekantaiah
Yuanyuan NIE
Haifeng Li
Original Assignee
Huawei Technologies India Pvt. Ltd.
Huawei Technologies Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies India Pvt. Ltd., Huawei Technologies Co., Ltd. filed Critical Huawei Technologies India Pvt. Ltd.
Priority to PCT/IB2016/055065 priority Critical patent/WO2017122060A1/fr
Publication of WO2017122060A1 publication Critical patent/WO2017122060A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/1658Data re-synchronization of a redundant component, or initial sync of replacement, additional or spare unit
    • G06F11/1662Data re-synchronization of a redundant component, or initial sync of replacement, additional or spare unit the resynchronized component or unit being a persistent storage device
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2023Failover techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2038Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant with a single idle spare processing component
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2046Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant where the redundant components share persistent storage
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2097Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements maintaining the standby controller/processing unit updated
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/80Database-specific techniques

Definitions

  • TECHNICAL FIELD [001] The present subject matter described herein, in general, relates to database technologies, and more particularly, to parallel recovery for shared-disk databases.
  • a database system provides a high-level view of data, but ultimately the data have to be stored as bits on one or more storage nodes.
  • a vast majority of databases today store data on magnetic disk (and, increasingly, on flash storage) and fetch data into main memory for processing, or copy data onto tapes and other backup nodes for archival storage.
  • the physical characteristics of storage nodes play a major role in the way data are stored, in particular because access to a random piece of data on disk is much slower than memory access: Disk access takes tens of milliseconds, whereas memory access takes a tenth of a microsecond.
  • the database system can be a distributed database system, wherein the database is distributed over multiple disparate computers or nodes. Shared-disk databases fall into a general category where multiple database instances share some physical storage resource. With a shared-disk architecture, multiple nodes coordinate access to a shared storage system at a block level.
  • a database management system is generally system software for creating and managing databases.
  • the DBMS provides users and programmers with a systematic way to create, retrieve, update and manage data.
  • the DBMS is a collection of programs that enables you to store, modify, and extract information from a database.
  • DBMSs have had crash recovery for many years.
  • ACID Automaticity, Consistency, Isolation, and Durability
  • a single logical operation on the data is called a transaction. For example, a transfer of funds from one bank account to another, even involving multiple changes such as debiting one account and crediting another, is a single transaction.
  • High availability features in DBMS is about ensuring that a database system remain operational both during planned or unplanned outages, such as maintenance operations, hardware / network failures, and the like. Further, a database replication is a process of ensuring a copy of data exists on a different machine to improve reliability, fault tolerance and availability.
  • the database replication is the frequent electronic copying data from a database in one computer or server to a database in another so that all users can access information in event of failure of the place where original data modification took place.
  • the place where the change is originated is termed as master and the place where data is replicated to is termed as standby.
  • the database replication can either be physical i.e., log-shipping or logical i.e., command-shipping.
  • the database replication can also be synchronous, where in an applications wait time includes changes in the originator node and time to safely commit in the replica, or can be asynchronous, where an application gets response immediately after the data is safely committed in the originator node, and originator takes responsibility of asynchronously committing the data on standby Replication can also be done on distributed database clusters, ensuring availability of a fully functional standby cluster in event of failure of the master cluster.
  • Each of these clusters can be made up of one or more computers/servers/devices/nodes and can host single/multiple databases. Additionally, these clusters can also have a centralized coordinator, which coordinates all the activities related to data management.
  • Failover is a scenario where the master or the node that was transferring the logs is not available for the application (say the master is crashed) and standby has to take over the role of master.
  • Switchover is a scenario where application (or coordinator) instructs the master to become standby, and the existing standby to become new master.
  • a main objective of the present invention is to providing a system and methods for faster data recovery in shared-disk databases.
  • the present invention also provides system and method for fast data recovery during switchover/failover where the clusters are present across geographical locations.
  • the present invention also provides system and method for fast data recovery during switchover/failover to reduce the data recovery time, and the database can be on-line in a much lesser time
  • At least two data recovery nodes/threads wait for each other, if they share a page between them.
  • the nodes/threads decide which nodes/threads go ahead, and which needs to wait.
  • WAL special log record
  • a data recovery node/thread looks at the other node/thread processing data recovery, to check if the corresponding log is replayed. If yes, it will go commit, otherwise it will continue to wait.
  • multiple nodes/threads can participate in data recovery during switchover/failover thereby sharing the load and reducing the data recovery time.
  • the present invention provides a database system for data recovery in at least one shared-disk database.
  • the database system comprises a master cluster comprising a first device, and a standby cluster comprising a second device.
  • the first device of the master cluster is adapted to transmit at least a log to the second device of the standby cluster, wherein the log contains at least information for modifying the shared-disk database and at least a pre-defined condition for recovery during switchover/failover.
  • the second device of the standby cluster is adapted to perform recovery based on the log received from the first device.
  • the present invention provides a first device for data recovery in at least one shared-disk database.
  • the first device includes a processor, and a memory coupled to the processor for executing a plurality of modules present in the memory.
  • the plurality of modules includes a log generation module and a transmitting module.
  • the log generation module is configured to generate at least one log containing at least information for modifying the shared-disk database and at least one pre-defined condition for recovery during switchover/failover.
  • the transmitting module configured to transmit the at least one log to at least one device of a standby cluster
  • the present invention provides a second device for data recovery in at least one shared-disk database.
  • the second device includes a processor, and a memory coupled to the processor for executing a plurality of modules present in the memory.
  • the plurality of modules includes a receiving module, a checking module, and an execution module.
  • the receiving module configured to receive at least a log from at least one device, the log contains at least information to modify the shared-disk database and at least a pre-defined condition for data recovery during switchover/failover.
  • the checking module configured to check if the information received in the log to modify the shared-disk database is replayed.
  • the execution module configured to commit the information to modify the shared-disk database, or hold the commit based on the pre-defined condition for data recovery.
  • the present invention provides a method for data recovery in at least one shared-disk database.
  • the method comprises transmitting, by a first device comprised in a master cluster, at least a log to at least one second device comprised in a standby cluster, the log contains at least an information to modify the shared-disk database and at least a pre-defined condition for data recovery during switchover/failover; and performing, by second device, the data recovery based on the log received from the first device.
  • method performed by a first device, to achieve data recovery in at least one shared-disk database.
  • the method includes generating at least a log containing at least information to modify the shared-disk database and at least a pre-defined condition for data recovery during switchover/failover; and transmitting the log generated to at least one second device to perform data recovery based on the log received.
  • a method, performed by a second device, for data recovery in at least one shared-disk database includes receiving at least a log from at least one first device, the log contains at least an information to modify the shared-disk database and at least a pre-defined condition for data recovery during switchover/failover; checking if the information received in the log to modify the shared-disk database is replayed; and commit the information to modify the shared-disk database, or holding the commit based on the pre-defined condition for data recovery, the pre-defined condition comprises a data recovery condition indicating that the second device holds the data recovery of the log received form the first device, upon determining that the first device of the master cluster lags to transmit the log, until at least one further log is obtained from the first device, the second device holds till the data recovery is completed at the first device.
  • the present invention enables the usage of multiple nodes in a cluster data recovery during switchover/failover thereby sharing the load and reducing the data recovery time. Further, the present invention assists in disaster data recovery scenarios where the clusters are present across geographical locations.
  • Figure 1 illustrates a first device from a master cluster of at least two or more devices to transmit a log, in accordance with an embodiment of the present subject matter.
  • Figure 2 illustrates a second device from a standby cluster of at least two or more devices to achieve data recovery in at least one shared-disk database, in accordance with an embodiment of the present subject matter.
  • Figure 3 illustrates a method comprising a first device of master cluster and second device of the standby cluster to provide availability, fault tolerance and reliability.
  • Figure 4 illustrates a method, performed by a second device, from a standby cluster of at least two or more devices, for data recovery in at least one shared-disk database, in accordance with an embodiment of the present subject matter.
  • Figure 5 illustrates an overall processing of multiple nodes which can participate in data recovery during switchover/failover, in accordance with an embodiment of the present subject matter
  • the invention can be implemented in numerous ways, as a process, an apparatus, a system, a composition of matter, a computer readable medium such as a computer readable storage medium or a computer network wherein program instructions are sent over optical or electronic communication links.
  • these implementations, or any other form that the invention may take, may be referred to as techniques.
  • the order of the steps of disclosed processes may be altered within the scope of the invention.
  • a detailed description of one or more embodiments of the invention is provided below along with accompanying figures that illustrate the principles of the invention. The invention is described in connection with such embodiments, but the invention is not limited to any embodiment. The scope of the invention is limited only by the claims and the invention encompasses numerous alternatives, modifications and equivalents.
  • the shared disk databases fall into a general category where multiple database instances share same physical storage resource.
  • multiple nodes coordinate access to a shared storage system at a block level.
  • Redo logs are generated in shared-disk clusters. It may be understood that the redo logs are in a proprietary format which log a history of all changes made to the database.
  • Each redo log file consists of redo records.
  • a redo record, also called a redo entry holds a group of change vectors, each of which describes or represents a change made to a single block in the database.
  • WAL write-ahead logging
  • each WAL contains an LSN (Log Sequence Number).
  • LSN Log Sequence Number
  • WAL with LSN 'x' has to be applied on database before a WAL with LSN 'x'+'y' can be applied, where in both 'x' and 'y' are positive integers.
  • LSN is replaced by GLSN (Global Log Sequence Number), the uses of which are described in further sections.
  • the Redo logs (WAL) generated in shared-disk clusters are ordered by a particular number, a GLSN.
  • GLSN is global across all nodes of the cluster.
  • the GLSN may be used to define the data recovery point for a restore operation.
  • the GLSNs are used internally during a RESTORE sequence to track the point in time to which data has been restored. Every record in the transaction log is uniquely identified by the GLSN.
  • GLSNs are ordered such that if GLSN 2 is greater than GLSN 1, the change described by the log record referred to by GLSN 2 occurred after the change described by the log record GLSN1.
  • the logs generated in the cluster are ordered using GLSN.
  • the WAL with GLSN from master to standby can be sent by a single dedicated node, or alternatively each node of the master cluster can take the responsibility of transmitting WAL generated by the each node.
  • the entities which send WAL may be termed as WAL senders.
  • a cluster has multiple WAL senders for load balancing and efficiency. Consequently, on the standby cluster, there are nodes which will receive WAL logs and write them to shared storage. These entities are termed as WAL receivers.
  • WAL receivers There can be multiple WAL receivers in a standby cluster. Generally there is a one-to-one mapping between WAL sender and WAL receiver, however it is not mandatory.
  • the WAL logs received by the WAL receivers may be applied to the database to bring it to the same state as master cluster. This process is termed as data recovery.
  • the data recovery may be of two types a cold data recovery (where logs are already present as in the case of failover), and an online data recovery (where logs are being continuously replayed from multiple nodes in the master to standby as in case of switchover).
  • the WAL records to be replayed are already present in the standby shared-disk.
  • the replay order may be just a sort-merge algorithm, according to the GLSN ascending order of log records. This is the procedure followed during failover operations.
  • the WAL is being continuously transmitted from the sender process.
  • the replay order is according to the ascending order of the GLSN, but it may possible that one or more WAL sender(s) lag behind.
  • the GLSN of each WAL record may not be strictly incremented by 1, there might have some gaps. The gaps once they occur may not be skipped, but the system may have to wait until the WAL sender which is lagging behind to catch up.
  • the present invention may also be implemented in a variety of computing systems, such as a laptop computer, a desktop computer, a notebook, a workstation, a mainframe computer, a server, a network server, and the like.
  • the database system may be accessed by multiple users, or applications residing on the database system.
  • Examples of the database system may include, but are not limited to, a portable computer, a personal digital assistant, a handheld node, sensors, routers, gateways and a workstation.
  • the database system is communicatively coupled to each other and/or other nodes or a nodes or apparatuses to form a network (not shown).
  • Examples of the database system may include, but are not limited to, a portable computer, a personal digital assistant, a handheld node, sensors, routers, gateways and a workstation.
  • the database system may comprise at least one first device 100 and at least one second device 200.
  • first device 100 and the second device 200 may also be implemented in a variety of computing systems, such as a laptop computer, a desktop computer, a notebook, a workstation, a mainframe computer, a server, a network server, and the like. It will be understood that the first device 100 and the second device 200 may be accessed by multiple users, or applications residing on the database system.
  • Examples of the first device 100 and the second device 200 may include, but are not limited to, a portable computer, a personal digital assistant, a handheld node, sensors, routers, gateways and a workstation.
  • the first device 100 and the second device 200 may be communicatively coupled to each other and/or other nodes or a nodes or apparatuses to form a network (not shown).
  • Examples of the first device 100 and the second device 200 may include, but are not limited to, a portable computer, a personal digital assistant, a handheld node, sensors, routers, gateways and a workstation.
  • the database system, the first device 100 and the second device 200 is communicatively coupled to each other and/or other nodes or a nodes or apparatuses to form a network (not shown).
  • the network may be a wireless network, a wired network or a combination thereof.
  • the network can be implemented as one of the different types of networks, such as GSM, CDMA, LTE, UMTS, intranet, local area network (LAN), wide area network (WAN), the internet, and the like.
  • the network may either be a dedicated network or a shared network.
  • the shared network represents an association of the different types of networks that use a variety of protocols, for example, Hypertext Transfer Protocol (HTTP), Transmission Control Protocol/Internet Protocol (TCP/IP), Wireless Application Protocol (WAP), and the like, to communicate with one another.
  • HTTP Hypertext Transfer Protocol
  • TCP/IP Transmission Control Protocol/Internet Protocol
  • WAP Wireless Application Protocol
  • the network may include a variety of network nodes, including routers, bridges, servers, computing nodes, storage nodes, and the like.
  • the database system, the first device 100 and the second device 200 may include many processors, an interface, and a memory.
  • the processor may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any nodes that manipulate signals based on operational instructions.
  • the at least one processor is configured to fetch and execute computer-readable instructions or modules stored in the memory.
  • the interface for example interface 104/204, may include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like.
  • the I/O interface may allow the database system the first device 100 and the second device 200 to interact with a user directly. Further, the I/O interface may enable the database system, the first device 100 and the second device 200 to communicate with other nodes or nodes, computing nodes, such as web servers and external data servers (not shown).
  • the I/O interface can facilitate multiple communications within a wide variety of networks and protocol types, including wired networks, for example, GSM, CDMA, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite.
  • the I/O interface may include one or more ports for connecting a number of nodes to one another or to another server.
  • the I/O interface may provide interaction between the user and database system, the first device 100 and the second device 200 via, a screen provided for the interface.
  • the memory may include any computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes.
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • non-volatile memory such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes.
  • the memory may include plurality of instructions or modules or applications to perform various functionalities.
  • the memory includes routines, programs, objects, components, data structures, etc., which perform particular tasks or implement particular
  • each of the clusters can be made up of one or more computers/servers/devices/nodes and can host single/multiple databases. Additionally, these clusters can also have a centralized coordinator, which coordinates all the activities related to data management.
  • the present invention provides a database system for data recovery in at least one shared-disk database.
  • the database system comprises at least one cluster comprising at least two or more devices.
  • At least one first device 100 from the two or more devices is adapted to transmit at least a log to at least one second device 200 from the two or more devices, the log contains at least an information to modify the shared-disk database and at least a pre-defined condition for data recovery during switchover/failover.
  • the second device 200 is adapted to perform data recovery based on the log received from the first device 100.
  • the present invention provides a database system for data recovery in at least one shared-disk database.
  • the database system comprises a master cluster comprising a first device 100, and a standby cluster comprising a second device 200.
  • the first device 100 of the master cluster is adapted to transmit at least a log to the second device 200 of the standby cluster, wherein the log contains at least information for modifying the shared-disk database and at least a pre-defined condition for recovery during switchover/failover.
  • the second device of the standby cluster is adapted to perform recovery based on the log received from the first device.
  • the pre-defined condition may include a data recovery condition indicating that the second device holds the data recovery of the log received form the first device, upon determining that the first device of the master cluster lags to transmit the log, until at least one further log is obtained from the first device, the second device holds till the data recovery is completed at the first device.
  • the log is at least a write-ahead logging (WAL) pre-arranged using at least a Global Log Sequence Number (GLSN).
  • WAL write-ahead logging
  • GLSN Global Log Sequence Number
  • the pre-defined condition for data recovery may include checking if the information received in the log to modify the shared-disk database is replayed based on the GLSN.
  • the second device is adapted to commit the information to modify the shared-disk database or hold the commit based on the pre-defined condition for data recovery.
  • the second device is adapted to utilize one or more data recovery threads/nodes to replay the information received in the log to modify the shared-disk database; commit the information to modify the shared-disk database; or hold the commit based on the pre-defined condition for data recovery.
  • database system characterized by a parallel data recovery in at least one shared-disk database.
  • the first device 100 from the master cluster comprises a processor 102, a memory 106 coupled to the processor 102 for executing a plurality of modules present in the memory 106.
  • the plurality of modules may include a log generation module 110 and a transmitting module 112.
  • the memory 106 may further include a database storage 108 configured to store database information.
  • the log generation module 110 is configured to generate at least a log containing at least information to modify the shared-disk database and at least a pre-defined condition for data recovery during switchover/failover.
  • the transmitting module 112 is configured to transmit the log to at least one second device from a standby cluster having two or more devices.
  • the pre-defined condition may include a data recovery condition indicating that the second device holds the data recovery of the log received form the first device, upon determining that the first device of the master cluster lags to transmit the log, until at least one further log is obtained from the first device, the second device holds till the data recovery is completed at the first device.
  • the log is at least a write-ahead logging (WAL) pre-arranged using at least a Global Log Sequence Number (GLSN).
  • WAL write-ahead logging
  • GLSN Global Log Sequence Number
  • the pre-defined condition for data recovery comprises: checking if the information received in the log to modify the shared-disk database is replayed based on the GLSN.
  • a second device 200 from a standby cluster of at least two or more devices data recovery in at least one shared-disk database comprises a processor 202 and a memory 206 coupled to the processor 202 for executing a plurality of modules present in the memory 206.
  • the plurality of modules comprises a receiving module 210, a checking module 212, and an execution module 214.
  • the receiving module 210 is configured to receive at least a log from at least one first device from a master cluster comprising the two or more devices, the log contains at least information to modify the shared-disk database and at least a pre-defined condition for data recovery during switchover/failover.
  • the checking module 212 configured to check if the information received in the log to modify the shared-disk database is replayed in the database information stored in database storage 208.
  • the execution module214 configured to commit the information to modify the shared-disk database, or hold the commit based on the pre-defined condition for data recovery.
  • the pre-defined condition may include a data recovery condition indicating that the second device holds the data recovery of the log received form the first device, upon determining that the first device of the master cluster lags to transmit the log, until at least one further log is obtained from the first device, the second device holds till the data recovery is completed at the first device.
  • the log is at least a write-ahead logging (WAL) pre-arranged using at least a Global Log Sequence Number (GLSN).
  • WAL write-ahead logging
  • GLSN Global Log Sequence Number
  • the pre-defined condition for data recovery may include checking if the information received in the log to modify the shared-disk database is replayed based on the GLSN.
  • the second device 200 may further utilize one or more data recovery threads/nodes to replay the information received in the log to modify the shared-disk database, commit the information to modify the shared-disk database, or hold the commit based on the pre-defined condition for data recovery.
  • FIG 3 a method comprising a first device of master cluster and a second device of the standby cluster to provide availability, fault tolerance and reliability is illustrated.
  • replication can be done on distributed database clusters, ensuring availability of a fully functional standby cluster in event of failure of the master cluster.
  • Each of these clusters can be made up of one or more computers/servers and can host single/multiple databases. Additionally, these clusters can also have a centralized coordinator, which coordinates all the activities related to data management.
  • a method, performed by a second device, from a standby cluster of at least two or more devices, for data recovery in at least one shared-disk database are disclosed, in accordance with an embodiment of the present subject matter.
  • the method may be described in the general context of computer executable instructions.
  • computer executable instructions can include routines, programs, objects, components, data structures, procedures, modules, functions, etc., that perform particular functions or implement particular abstract data types.
  • the method may also be practiced in a distributed computing environment where functions are performed by remote processing devices that are linked through a communications network.
  • computer executable instructions may be located in both local and remote computer storage media, including memory storage devices.
  • data recovery thread encounters this particular log, it will look at the other node processing data recovery, so see if the corresponding log is replayed. If yes, it will go ahead to commit, otherwise it will continue to wait.
  • a method, performed by the second device from a standby cluster of at least two or more devices, for data recovery in at least one shared-disk database is disclosed.
  • a node/thread starts out by reading a WAL from the WAL file.
  • the WAL is checked to see if contains the holding point. [0073] If WAL does not contain holding point, at block 403, the changes mentioned in WAL are committed to the database.
  • the holding point may include a data recovery condition indicating that the second device holds the data recovery of the log received form the first device, upon determining that the first device of the master cluster lags to transmit the log, until at least one further log is obtained from the first device, the second device holds till the data recovery is completed at the first device.
  • WAL contains a holding point
  • GLSN mentioned in holding point is applied to page or not. If it is already applied, and at block 403, the WAL is committed, otherwise the node waits at block 405 for the WAL to be applied.
  • a method for data recovery in at least one shared-disk database comprises transmitting at least one first device form at least one master cluster comprising at least two or more devices.
  • the log generated by the first device of the master cluster is transmitted to at least a second device, from a standby cluster of at least two or more devices, the log contains at least information to modify the shared-disk database and at least a pre-defined condition for data recovery during switchover/failover.
  • the method further comprises performing the data recovery in the shared-disk database, by the second device, based on the log received from the first device of the master cluster.
  • the present invention reduces an overall data recovery time, and which enables the database to be on-line in a much lesser time.
  • the present invention is beneficial in disaster data recovery scenarios where the clusters are present across geographical locations.
  • the present invention allows multiple nodes to participate in data recovery during switchover/failover thereby sharing the load and reducing the data recovery time.
  • FIG 5 an overall processing of multiple nodes/devices participating in data recovery during switchover/failover as a part of standby cluster is illustrated, in accordance with an embodiment of the present subject matter.
  • a special WAL record is inserted at the receiver.
  • the data recovery thread when the data recovery thread encounters the special WAL it determines if the other data recovery threads are also processing data recovery on this page, and checks if the corresponding log is replayed. If the corresponding log is replayed, it will proceed to commit with the application of WAL or the receiver will continue to wait.
  • the disclosed system, apparatus, and method may be implemented in other manners.
  • the described apparatus embodiment is merely exemplary.
  • the unit division is merely logical function division and may be other division in actual implementation.
  • a plurality of units or components may be combined or integrated into another system, or some features may be ignored or not performed.
  • the displayed or discussed mutual couplings or direct couplings or communication connections may be implemented through some interfaces.
  • the indirect couplings or communication connections between the apparatuses or units may be implemented in electronic, mechanical, or other forms.
  • the functions When the functions are implemented in a form of a software functional unit and sold or used as an independent product, the functions may be stored in a computer-readable storage medium. Based on such an understanding, the technical solutions of the present invention essentially, or the part contributing to the prior art, or a part of the technical solutions may be implemented in a form of a software product.
  • the computer software product is stored in a storage medium, and includes several instructions for instructing a computer node (which may be a personal computer, a server, or a network node) to perform all or a part of the steps of the methods described in the embodiment of the present invention.
  • the foregoing storage medium includes: any medium that can store program code, such as a USB flash drive, a removable hard disk, a read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a magnetic disk, or an optical disc.
  • program code such as a USB flash drive, a removable hard disk, a read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a magnetic disk, or an optical disc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Dans le procédé de ré-exécution en ligne, l'ordre de ré-exécution est également fonction de l'ordre croissant du GLSN, mais il est possible qu'un ou plusieurs expéditeurs WAL soit/soient en retard. De même, le GLSN de chaque enregistrement WAL n'est pas strictement augmenté de 1 et peut présenter des espaces. Ces espaces sont sautés après leur apparition. Par conséquent, si l'on peut savoir quel expéditeur WAL est en retard, on peut effectuer une récupération de données en parallèle. La présente invention concerne un mécanisme pour réaliser une récupération de données parallèle pour des bases de données à disque partagé. Selon la présente invention, pour chaque transfert de page entre deux nœuds, un enregistrement WAL spécial est inséré au niveau du récepteur, faisant fonction de point d'attente (condition) Lorsque l'unité d'exécution de la récupération de données rencontre ce journal spécifique, elle vérifie au niveau de l'autre nœud traitant la récupération de données, et contrôle par conséquent si le journal correspondant est ré-exécuté. Si tel est le cas, elle va essayer d'obtenir la validation; sinon elle va continuer à attendre.
PCT/IB2016/055065 2016-08-25 2016-08-25 Récupération parallèle pour bases de données à disque partagé WO2017122060A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/IB2016/055065 WO2017122060A1 (fr) 2016-08-25 2016-08-25 Récupération parallèle pour bases de données à disque partagé

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/IB2016/055065 WO2017122060A1 (fr) 2016-08-25 2016-08-25 Récupération parallèle pour bases de données à disque partagé

Publications (1)

Publication Number Publication Date
WO2017122060A1 true WO2017122060A1 (fr) 2017-07-20

Family

ID=59311759

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2016/055065 WO2017122060A1 (fr) 2016-08-25 2016-08-25 Récupération parallèle pour bases de données à disque partagé

Country Status (1)

Country Link
WO (1) WO2017122060A1 (fr)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109918178A (zh) * 2019-03-06 2019-06-21 恒生电子股份有限公司 事务提交方法及相关装置
CN110532123A (zh) * 2019-08-30 2019-12-03 北京小米移动软件有限公司 HBase系统的故障转移方法及装置
CN111124754A (zh) * 2019-11-30 2020-05-08 浪潮电子信息产业股份有限公司 一种数据恢复方法、装置、设备及介质
CN114422336A (zh) * 2021-12-22 2022-04-29 深信服科技股份有限公司 控制平面调试方法、装置、节点及存储介质
WO2023160077A1 (fr) * 2022-02-25 2023-08-31 蚂蚁区块链科技(上海)有限公司 Procédé et appareil de récupération de données de chaîne de blocs, et dispositif électronique

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6594676B1 (en) * 2000-04-10 2003-07-15 International Business Machines Corporation System and method for recovery of multiple shared database data sets using multiple change accumulation data sets as inputs

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6594676B1 (en) * 2000-04-10 2003-07-15 International Business Machines Corporation System and method for recovery of multiple shared database data sets using multiple change accumulation data sets as inputs

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109918178A (zh) * 2019-03-06 2019-06-21 恒生电子股份有限公司 事务提交方法及相关装置
CN110532123A (zh) * 2019-08-30 2019-12-03 北京小米移动软件有限公司 HBase系统的故障转移方法及装置
CN110532123B (zh) * 2019-08-30 2023-08-04 北京小米移动软件有限公司 HBase系统的故障转移方法及装置
CN111124754A (zh) * 2019-11-30 2020-05-08 浪潮电子信息产业股份有限公司 一种数据恢复方法、装置、设备及介质
CN114422336A (zh) * 2021-12-22 2022-04-29 深信服科技股份有限公司 控制平面调试方法、装置、节点及存储介质
WO2023160077A1 (fr) * 2022-02-25 2023-08-31 蚂蚁区块链科技(上海)有限公司 Procédé et appareil de récupération de données de chaîne de blocs, et dispositif électronique

Similar Documents

Publication Publication Date Title
US8433869B1 (en) Virtualized consistency group using an enhanced splitter
US8392680B1 (en) Accessing a volume in a distributed environment
US7627775B2 (en) Managing failures in mirrored systems
US8478955B1 (en) Virtualized consistency group using more than one data protection appliance
US7206911B2 (en) Method, system, and program for a system architecture for an arbitrary number of backup components
US8027952B2 (en) System and article of manufacture for mirroring data at storage locations
US8898409B1 (en) Journal-based replication without journal loss
US8788772B2 (en) Maintaining mirror and storage system copies of volumes at multiple remote sites
US6934877B2 (en) Data backup/recovery system
US8521694B1 (en) Leveraging array snapshots for immediate continuous data protection
US8805786B1 (en) Replicating selected snapshots from one storage array to another, with minimal data transmission
US7188222B2 (en) Method, system, and program for mirroring data among storage sites
US9588858B2 (en) Periodic data replication
CN101755257B (zh) 管理在不同的网络上将写入从首要存储器拷贝到次要存储器
US20040260899A1 (en) Method, system, and program for handling a failover to a remote storage location
WO2017122060A1 (fr) Récupération parallèle pour bases de données à disque partagé
US20080189498A1 (en) Method for auditing data integrity in a high availability database
US7761431B2 (en) Consolidating session information for a cluster of sessions in a coupled session environment
US20130166505A1 (en) Monitoring replication lag between geographically dispersed sites
WO2010068570A1 (fr) Procédé et système de gestion de données de base dupliquées
US10235145B1 (en) Distributed scale-out replication
CN115858236A (zh) 一种数据备份方法和数据库集群
US11494271B2 (en) Dynamically updating database archive log dependency and backup copy recoverability
CN110121694B (zh) 一种日志管理方法、服务器和数据库系统
US11966297B2 (en) Identifying database archive log dependency and backup copy recoverability

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16884821

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16884821

Country of ref document: EP

Kind code of ref document: A1