WO2020228712A1 - 数据库系统的故障修复方法、数据库系统和计算设备 - Google Patents

数据库系统的故障修复方法、数据库系统和计算设备 Download PDF

Info

Publication number
WO2020228712A1
WO2020228712A1 PCT/CN2020/089909 CN2020089909W WO2020228712A1 WO 2020228712 A1 WO2020228712 A1 WO 2020228712A1 CN 2020089909 W CN2020089909 W CN 2020089909W WO 2020228712 A1 WO2020228712 A1 WO 2020228712A1
Authority
WO
WIPO (PCT)
Prior art keywords
gbp
node
point
page
disk
Prior art date
Application number
PCT/CN2020/089909
Other languages
English (en)
French (fr)
Inventor
王传廷
朱仲楚
邢玉辉
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CA3137745A priority Critical patent/CA3137745C/en
Priority to EP20806356.0A priority patent/EP3961400B1/en
Publication of WO2020228712A1 publication Critical patent/WO2020228712A1/zh
Priority to US17/525,415 priority patent/US11829260B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1469Backup restoration techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1471Saving, restoring, recovering or retrying involving logging of persistent data for recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • G06F11/142Reconfiguring to eliminate the error
    • G06F11/1425Reconfiguring to eliminate the error by reconfiguration of node membership
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2023Failover techniques
    • G06F11/2028Failover techniques eliminating a faulty processor or activating a spare
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2038Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant with a single idle spare processing component
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2043Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant where the redundant components share a common memory address space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2094Redundant storage or storage space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2097Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements maintaining the standby controller/processing unit updated
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/273Asynchronous replication or reconciliation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/80Database-specific techniques

Definitions

  • This application relates to the field of database technology, in particular to a method for repairing a fault in a database system, and a corresponding database system and computing equipment.
  • FIG. 1 shows a database system, which includes a host 110 and a backup machine 130.
  • the configuration of the host 110 and the backup machine 130 is to ensure the reliability of the database system.
  • the host 110 and the standby machine 130 each have their own data storage and log storage.
  • the host 110 modifies the page and generates a redo log.
  • the host 110 transmits the redo log to the standby machine, and the standby machine 130 receives the redo
  • the log is played back, so as to achieve the purpose of data synchronization between the standby machine 130 and the host 110.
  • the standby machine 130 receives redo logs and plays back the redo logs as two parallel processes.
  • the standby machine 130 can receive the redo logs in batches and write them into the local memory, and at the same time play back the redo logs one by one. Under normal circumstances, the log playback speed is slower than the log receiving speed. For example, if a 10G log is received, only 8G log may be played back, and there are 2G logs to be played back.
  • the standby machine 130 needs to replay all the redo logs received before it can synchronize with the host 110 before the failure and replace the host 110 as a new host (also called “failover") Or "database system recovery”).
  • the Recovery Time Objective (RTO) is the time required for the standby machine 130 to be promoted to a new host. It can be seen from the above-mentioned main-standby switching process that the RTO depends on the amount of logs to be played back. The larger the amount of logs to be played back, the larger the RTO will be, which will affect the continuity of the business.
  • This application relates to a fault repair method of a database system, which is used to reduce the time required for the database system to perform fault repair and improve the efficiency of fault repair when the database system fails.
  • this application also provides a corresponding database system and computing equipment.
  • this application provides a fault repair method for a database system.
  • the method includes the following.
  • the host uses the first data transfer protocol to send multiple pages to the global paged pool GBP node.
  • the GBP node writes the multiple pages to the cache queue of the GBP node.
  • the log sequence numbers LSN corresponding to the multiple pages increase in order from the head to the tail of the cache queue.
  • the backup machine determines the GBP start point, GBP recovery point, and GBP end point.
  • the GBP starting point indicates the smallest LSN included in all pages stored on the GBP node.
  • the GBP recovery point indicates the smallest LSN included in a batch of pages last received by the GBP node.
  • the GBP end point indicates the largest LSN included in the batch of pages last received by the GBP node.
  • the backup machine playback is located at the redo log corresponding to the GBP recovery point and the disk end point All redo logs between the corresponding redo logs, so that the standby machine is promoted to a new host.
  • the disk recovery point indicates the smallest LSN contained in the most recently written multiple pages in the disk of the backup machine.
  • the disk end point indicates the LSN of the last redo log received by the standby machine.
  • the page buffer of the GBP node includes one or more cache queues, and each cache queue stores multiple pages, and multiple pages located in the same cache queue, according to the head of the cache queue In order to the end, the LSN included in each of the multiple pages becomes larger and larger.
  • the redo log corresponding to the GBP recovery point and the redo log corresponding to the end point of the disk are played back by the standby machine, specifically referring to: the standby machine playback
  • the redo log corresponding to the GBP recovery point, the redo log corresponding to the end point of the disk, and the redo log corresponding to the GBP recovery point and the redo log corresponding to the end point of the disk All other redo logs in between.
  • the redo logs played back by the standby machine are located in a closed interval, so the standby machine also needs to replay the redo logs at both ends of the closed interval.
  • the standby machine no longer continues to replay all redo logs that have not been played back, but determines the GBP start point, the GBP recovery point, the GBP end point, and the disk The recovery point and the end point of the disk, and then when the recovery point of the disk is greater than or equal to the start point of the GBP, and the end point of the disk is greater than or equal to the end point of the GBP (or simply "when the conditions are met"), the playback is located at All redo logs between the redo log corresponding to the GBP recovery point and the redo log corresponding to the end point of the disk in order to perform fault recovery on the database system.
  • the host modifies the page, it will generate a redo log corresponding to the modified page, and then the host will send the redo log to the standby machine, and the standby machine can get it by replaying the redo log The corresponding modified page. That is, the standby machine realizes synchronization with the host by playing back the redo log.
  • the backup machine will continue to play back all the remaining redo logs transferred by the host before the failure and have not been played back until all the redo logs received by the backup machine have been played back, and then the The backup machine can synchronize with the host before the failure, and then the backup machine will replace the failed host to become the new host.
  • the standby machine no longer continues to replay all redo logs that have not been replayed, but only replays the redo logs corresponding to the GBP recovery point corresponding to the end point of the disk For all the redo logs between the redo logs, the redo logs located between the redo log corresponding to the disk recovery point and the redo log corresponding to the GBP recovery point will no longer be played back.
  • the standby machine only replays a small part of all redo logs that have not been played back. Therefore, the technical solution provided by this embodiment can improve the efficiency of the database system failure recovery.
  • the host transmits the modified page to the GBP node through the first data transfer protocol (such as the RDMA protocol). Based on the first data transfer protocol, the host The speed of sending pages to the GBP node is very fast.
  • the standby machine does not need to replay the redo log between the redo log corresponding to the disk recovery point and the redo log corresponding to the GBP recovery point, but only needs to replay the corresponding modified pages that do not exist in the GBP node.
  • redo logs corresponding to the modified pages that are not written sequentially in the cache queue of the GBP node For the redo logs corresponding to the pages sequentially arranged in the cache queue of the GBP node, there is no need to replay them, and this part of the pages can be directly pulled from the GBP node to the standby machine.
  • the backup machine skips all redo logs located between the redo log corresponding to the disk recovery point and the redo log corresponding to the GBP recovery point, and playback is located at the GBP recovery point All redo logs between the corresponding redo log and the redo log corresponding to the end point of the disk.
  • the standby machine only replays part of the logs that have not been replayed but not all of them, so the fault repair efficiency of the database system will be improved.
  • the fault repair method further includes: the GBP node updates the GBP recovery point and the GBP end point according to the multiple pages .
  • determining the GBP recovery point and the GBP end point by the standby machine includes: the standby machine obtains the updated GBP recovery point and the GBP end point from the GBP node.
  • the GBP node after the GBP node writes the received multiple pages into the cache queue of the GBP node, it also updates the GBP recovery point and the GBP according to the multiple pages. End point. Then, the backup opportunity obtains the updated GBP recovery point and the GBP end point from the GBP node. Since the GBP node writes the multiple pages into the cache queue of the GBP node, the GBP node maintaining the GBP recovery point and the GBP end point may ensure that the GBP recovery point and the GBP end Points can be updated in time.
  • the backup machine skips all redo logs located between the redo log corresponding to the disk recovery point and the redo log corresponding to the GBP recovery point, and playback is located at the GBP recovery point All redo logs between the corresponding redo log and the redo log corresponding to the end point of the disk.
  • the standby machine only replays part of the logs that have not been replayed but not all of them, so the fault repair efficiency of the database system will be improved.
  • the GBP node if the GBP node maintains the GBP recovery point and the GBP end point, then the GBP After the node writes the multiple pages into the cache queue of the GBP node, the GBP node updates the GBP recovery point and the GBP end point according to the multiple pages.
  • the backup machine obtains the updated GBP recovery point and the GBP end point from the GBP node.
  • the GBP node after the GBP node writes the received multiple pages into the cache queue of the GBP node, it also updates the GBP recovery point and the GBP according to the multiple pages. End point. Then, the backup opportunity obtains the updated GBP recovery point and the GBP end point from the GBP node. Since the GBP node writes the multiple pages into the cache queue of the GBP node, the GBP node maintaining the GBP recovery point and the GBP end point can ensure that the GBP recovery point and the GBP end Points can be updated in time.
  • the GBP node maintains the GBP starting point, and the GBP node is received at the GBP node
  • the GBP node eliminates the page at the head of the cache queue, and writes the new page to the cache And update the GBP starting point to the LSN corresponding to the new head page of the cache queue.
  • the backup machine obtains the updated GBP starting point from the GBP node.
  • new page refers to the page currently received by the GBP node.
  • a new page that does not exist in the page buffer of the GBP node means that the currently received page does not exist in the page buffer of the GBP node. For example, what is currently received is page M, and page M does not exist in the page buffer of the GBP node.
  • maintaining the GBP starting point by the GBP node can ensure that the GBP starting point can be timely Update.
  • the GBP node when the GBP node receives a new page that does not exist in the page buffer of the GBP node, Then the GBP node puts the new page into the tail of the cache queue.
  • the GBP node When the GBP node receives a new page that already exists in the page buffer of the GBP node, the GBP node updates the existing corresponding page according to the received new page, and updates the updated page. The new page is placed at the end of the cache queue.
  • the so-called “new page” refers to the page currently received by the GBP node.
  • page M the page currently received by the GBP node
  • the LSN contained in page M is T
  • page M is a "new page”.
  • the page M is placed at the end of the cache queue.
  • page M already exists in the page buffer of the GBP node the page M is located in the cache queue R and the LSN contained is K
  • the currently received page M is used to update the existing page M, And put the updated page M into the tail of the cache queue R.
  • K and T are integers greater than or equal to 0, and T is greater than K.
  • the pages are placed in order in the cache queue of the GBP node. Therefore, all redo logs located between the redo log corresponding to the GBP recovery point and the redo log corresponding to the GBP end point The log is the last segment of redo logs in all redo logs sent by the host to the standby. It can be seen that by adopting this embodiment, it can be ensured that the backup machine can synchronize with the host after the playback step is completed.
  • the disk recovery point is greater than or equal to the When the GBP starting point and the disk ending point is greater than or equal to the GBP ending point, the backup machine also starts a background thread, and the background thread is used to pull all pages stored on the GBP node to the The page buffer of the standby machine.
  • the background thread is used to pull all pages stored on the GBP node to the page buffer of the backup machine through a second data transfer protocol.
  • the second data transmission protocol is also a low-latency and high-throughput data transmission protocol, so the background thread can quickly pull all pages stored on the GBP node to the standby machine.
  • the standby machine starts a background thread during the execution of playback. That is, the background thread pulls pages from the GBP node to the page buffer of the standby machine and playback can be done in parallel, which can save time and improve the efficiency of fault repair.
  • the backup machine pulls all the pages stored on the GBP node to the page buffer of the backup machine, the backup machine will also pull it to the page buffer of the backup machine. Compare the pages you maintain with the pages you maintain, keep the new pages and discard the old ones.
  • the sixth possible implementation manner after the standby machine has performed the playback step, And when the page to be accessed by the application on the standby machine is still located in the page buffer of the GBP node, the application reads the page to be accessed from the page buffer of the GBP node.
  • the application reads the page to be accessed from the page buffer of the GBP node through a second data transfer protocol.
  • the backup machine in the seventh possible implementation manner, after the host fails, and in all Before the backup machine executes the playback step, the backup machine also obtains the disk recovery point and the disk end point locally.
  • the purpose of the backup machine to obtain the disk recovery point and the disk end point is to determine whether the fault repair method described in the foregoing embodiment can be used.
  • the host during the normal operation of the host, the host The redo log is also sent to the standby machine.
  • the standby machine plays back the received redo log to obtain the corresponding page.
  • the backup machine also flashes the obtained pages to the local disk in batches.
  • the host starts the page sending thread, and the page The sending thread uses the first data transmission protocol to send multiple pages in the sending queue to the GBP node in batches in order from the head to the tail of the sending queue.
  • the sending queue is located in the host, and from the head to the tail of the sending queue, LSNs corresponding to multiple pages in the sending queue are increasing.
  • the page sending thread sends the multiple pages to the GBP node in the order from the head to the tail of the sending queue to ensure that the GBP node is in When pages are received, they are also received in order. Specifically, the LSN of the page received first is smaller than the LSN of the page received later.
  • the GBP node may write the multiple pages into the cache queue of the GBP node in the order of receiving the pages, In this way, the respective LSNs of the multiple pages in the cache queue are increased in the order from the head to the tail of the cache queue. That is, with this solution, multiple pages located in the cache queue can be relatively simply increased in order from the head to the tail of the cache queue.
  • the host starts multiple page sending threads, and the multiple page sending threads and multiple sending queues included in the host It is one-to-one.
  • the advantage of this embodiment is that since the page sending thread and the sending queue are one-to-one, the operation is relatively simple and not prone to errors.
  • the GBP node starts a page receiving thread, and the page receiving thread receives the multiple And write the multiple pages into the cache queue of the GBP node.
  • the GBP node starts a plurality of page receiving threads
  • the multiple page receiving threads and the GBP node include The multiple cache queues are one-to-one
  • the host starts multiple page sending threads and the GBP node starts multiple page receiving threads one-to-one.
  • the advantage of this embodiment is that because the page receiving thread and the cache queue are one-to-one, and the page sending thread and the page receiving thread are also one-to-one, the operation is relatively simple and not prone to errors.
  • this application provides a database system.
  • the database system includes a host, a standby machine, and GBP nodes. Wherein, the host and the GBP node are in communication connection through the first data transmission protocol.
  • the host is used to send multiple pages to the GBP node.
  • the GBP node is used to write the multiple pages into the cache queue of the GBP node.
  • the LSNs corresponding to the multiple pages increase in order from the head to the tail of the cache queue.
  • the backup machine is used to determine the GBP start point, GBP recovery point, and GBP end point.
  • the GBP starting point indicates the smallest log sequence number LSN included in all pages stored on the GBP node.
  • the GBP recovery point indicates the smallest LSN included in a batch of pages last received by the GBP node.
  • the GBP end point indicates the largest LSN included in the batch of pages last received by the GBP node.
  • the backup machine is also used to play back the redo log and corresponding to the GBP recovery point. All redo logs between the redo logs corresponding to the disk end point are promoted to the new host.
  • the disk recovery point indicates the smallest LSN contained in the most recently written multiple pages in the disk of the backup machine.
  • the end point of the disk indicates the LSN of the last redo log received by the standby machine.
  • the host of the database system fails, and under conditions (the so-called condition means that the disk recovery point is greater than or equal to the GBP start point, and the disk end point is greater than or equal to the GBP end When point) is satisfied, the standby machine can be promoted to a new one by playing back a small part of the redo log between the redo log corresponding to the GBP recovery point and the redo log corresponding to the disk end point Host. It can be seen from the above that in the database system provided in this embodiment, it takes only a small amount of time from the failure of a host to the generation of a new host. Therefore, the use of the database system can improve business continuity.
  • This embodiment is to further clarify that for all the remaining redo logs that have not been played back, the backup machine only replays a part and no longer replays the other part, so failover is performed between the backup machine and the host. (failover) requires less time, and therefore the database system requires less time for fault repair, or in other words, the efficiency of fault repair is relatively high.
  • the GBP node After the multiple pages are written to the cache queue of the GBP node, the GBP node It is also used to update the GBP recovery point and the GBP end point according to the multiple pages. Correspondingly, the backup machine is also used to obtain the updated GBP recovery point and the GBP end point from the GBP node. Since the GBP node writes the multiple pages into the cache queue of the GBP node, the GBP node maintaining the GBP recovery point and the GBP end point can ensure that the GBP recovery point and the GBP end Points can be updated in time.
  • the GBP node receives a new page that does not exist in the page buffer of the GBP node and all When the page buffer of the GBP node is full, the GBP node is also used to eliminate the page at the head of the cache queue, and update the GBP starting point to the new head page of the cache queue The corresponding LSN.
  • the backup machine is also used to obtain the updated GBP starting point from the GBP node. Since the GBP node writes the multiple pages into the cache queue of the GBP node, maintaining the GBP starting point by the GBP node can ensure that the GBP starting point can be updated in time.
  • the GBP node when the GBP node receives a new page that does not exist in the page buffer of the GBP node, The GBP node is also used to put the new page into the tail of the cache queue.
  • the GBP node when the GBP node receives a new page that already exists in the page buffer of the GBP node, the GBP node is further configured to update the existing corresponding page according to the received new page, and Put the updated new page at the end of the cache queue.
  • the backup machine when the disk recovery point is greater than or equal to the GBP starting point , And when the disk end point is greater than or equal to the GBP end point, the backup machine is also used to start a background thread, and the background thread is used to pull all pages stored on the GBP node to the The page buffer of the standby machine.
  • the background thread is used to pull all pages stored on the GBP node to the page buffer of the standby machine through a second data transfer protocol.
  • the standby machine Since the standby machine only replays part of the redo logs, the redo logs that have not been replayed do not need to be replayed because their corresponding pages are stored in the GBP node.
  • the GBP node by pulling all pages in the GBP node from the GBP node, it can be ensured that the pages corresponding to the redo logs that have not been played back are also pulled to the backup machine, thereby ensuring that the backup The machine can be fully synchronized with the failed host.
  • the backup machine is also used to obtain (or obtain locally) the disk recovery point and the disk end point.
  • the purpose of obtaining the disk recovery point and the disk end point is to determine whether the conditions for performing playback are met. Only when the conditions are met, the backup machine can perform playback, or in other words, the failure recovery efficiency of the database system provided by this application can be be increased.
  • the host is also used for Send the redo log to the standby machine.
  • the backup machine is also used to replay the redo log, obtain corresponding pages, and flush the pages to the local disk in batches.
  • the host is used to start a page sending thread, and the page sending The thread uses the first data transfer protocol to send multiple pages in the sending queue to the GBP node in batches in order from the head to the tail of the sending queue, and the sending queue is located in the host, and From the head to the tail of the sending queue, the LSNs corresponding to the multiple pages in the sending queue increase.
  • the page sending thread sends the multiple pages to the GBP node in the order from the head to the tail of the sending queue to ensure that the GBP node is in When pages are received, they are also received in order. Specifically, the LSN of the page received first is smaller than the LSN of the page received later.
  • the GBP node may write the multiple pages into the cache queue of the GBP node in the order of receiving the pages, In this way, the respective LSNs of the multiple pages in the cache queue are increased in the order from the head to the tail of the cache queue. That is, with this solution, multiple pages located in the cache queue can be relatively simply increased in order from the head to the tail of the cache queue.
  • the host is configured to start multiple page sending threads, and the multiple page sending threads are associated with multiple page sending threads included in the host.
  • the sending queue is one-to-one.
  • the GBP node is used to start a page receiving thread, and the page The receiving thread receives the multiple pages in batches, and writes the multiple pages into the cache queue of the GBP node.
  • the GBP node is used to start multiple page receiving threads, and the multiple page receiving threads and the GBP node include The multiple cache queues are one-to-one, and the host starts multiple page sending threads and the GBP node starts multiple page receiving threads one-to-one.
  • the advantage of this embodiment is that because the page receiving thread and the cache queue are one-to-one, and the page sending thread and the page receiving thread are also one-to-one, the operation is relatively simple and not prone to errors.
  • this application provides another method for repairing faults in a database system.
  • the method includes the following steps.
  • the host fails, determine the start point, GBP recovery point, and GBP end point of the global paged pool GBP.
  • the disk recovery point is greater than or equal to the GBP start point, and the disk end point is greater than or equal to the GBP end point
  • the redo log corresponding to the GBP recovery point and the redo log corresponding to the disk end point are played back. Do all redo logs between logs to promote to the new host.
  • the GBP starting point indicates the smallest log sequence number LSN included in all pages stored on the GBP node.
  • the GBP recovery point indicates the smallest LSN included in a batch of pages last received by the GBP node.
  • the GBP end point indicates the largest LSN included in the batch of pages last received by the GBP node.
  • the disk recovery point indicates the smallest LSN contained in the most recently written multiple pages in the disk of the backup machine.
  • the disk end point indicates the LSN of the last redo log received by the standby machine.
  • the method further includes: starting a background thread, the background thread being used to pull all pages stored on the GBP node to a page buffer.
  • the background thread is used to pull all pages stored on the GBP node to a page buffer through a second data transfer protocol.
  • the method further includes: reading the page to be accessed from the page buffer of the GBP node.
  • the fourth possible implementation manner after the host fails and during the execution of the playback Before the step, obtain the disk recovery point and the disk end point.
  • the host sends Replay the redo log to obtain the corresponding page, and flash the obtained pages to the local disk in batches.
  • the execution subject of the fault repair method described in the third aspect is the standby machine in the fault repair method described in the first aspect.
  • Each embodiment of the third aspect is described from the standpoint of a machine. Since the failure repair method described in the third aspect has many similarities or similarities with the failure repair method described in the first aspect, for the beneficial effects of each embodiment of the third aspect, please refer to the corresponding For the beneficial effects of the embodiments, in order to make the application more concise, the beneficial effects will not be described repeatedly for each embodiment of the third aspect.
  • the present application also provides a computing device, which includes a determining unit and a playback unit.
  • the determining unit is used to determine the GBP starting point, GBP recovery point, and GBP ending point of the global page buffer pool.
  • the playback unit is used to play back the redo log and the disk corresponding to the GBP recovery point All redo logs between the redo logs corresponding to the end point.
  • the GBP starting point indicates the smallest log sequence number LSN included in all pages stored on the GBP node.
  • the GBP recovery point indicates the smallest LSN included in a batch of pages last received by the GBP node.
  • the GBP end point indicates the largest LSN included in the batch of pages last received by the GBP node.
  • the disk recovery point indicates the smallest LSN contained in the most recently written multiple pages in the disk of the backup machine.
  • the disk end point indicates the LSN of the last redo log received by the standby machine.
  • the computing device further includes a starting unit.
  • the starting unit is used to start a background thread, and the background thread is used to transfer the All pages stored on the GBP node are pulled to the page buffer.
  • the background thread is used to pull all pages stored on the GBP node to a page buffer through a second data transfer protocol.
  • the computing device further includes a reading unit. After the redo log is played back by the playback unit, and when the page to be accessed is still located in the page buffer of the GBP node, the reading unit is configured to read from the page buffer of the GBP node The page that needs to be visited.
  • the determining unit is further configured to obtain the disk recovery point and the disk end point.
  • the computing device further includes a receiving unit.
  • the receiving unit is configured to receive the redo log sent by the host.
  • the replay unit is used to replay the redo log to obtain corresponding pages, and flash the obtained pages to a local disk in batches.
  • the computing device described in the fourth aspect can execute each embodiment of the third aspect, and the computing device described in the fourth aspect can implement the function of a backup computer in the database system described in the second aspect. Therefore, for the beneficial effects of each embodiment of the fourth aspect, please refer to the beneficial effects of the corresponding embodiments of the second aspect. The beneficial effects of each embodiment of the fourth aspect will not be repeated in this application.
  • this application provides another computing device, which includes at least a processor and a memory.
  • the memory is used to store disk recovery points and disk end points.
  • the processor is used to determine the GBP start point, GBP recovery point, and GBP end point.
  • the processor is also used to play back the redo log corresponding to the GBP recovery point and the All redo logs between the redo logs corresponding to the disk end point.
  • the GBP starting point indicates the smallest LSN included in all pages stored on the GBP node.
  • the GBP recovery point indicates the smallest LSN included in a batch of pages last received by the GBP node.
  • the GBP end point indicates the largest LSN included in the batch of pages last received by the GBP node.
  • the disk recovery point indicates the smallest LSN contained in the most recently written multiple pages in the disk of the backup machine.
  • the disk end point indicates the LSN of the last redo log in the memory of the standby machine.
  • the disk recovery point is greater than or equal to the GBP starting point, and the disk end point is greater than or When equal to the GBP end point, the processor is also used to start a background thread, and the background thread is used to pull all pages stored on the GBP node to a page buffer.
  • the background thread is used to pull all pages stored on the GBP node to a page buffer through a second data transfer protocol.
  • the processor is configured to read the page to be accessed from the page buffer of the GBP node.
  • the processor is further configured to obtain the disk recovery point and the disk end point.
  • the computing device further includes an I/O interface.
  • the I/O interface is used to receive the redo log sent by the host.
  • the processor is configured to replay the redo log to obtain the corresponding page, and flash the obtained pages to the local disk in batches.
  • the computing device provided in each embodiment of the fifth aspect can execute the method described in the corresponding embodiment of the third aspect, and the computing device described in the fifth aspect and the computing device described in the fourth aspect can be The same function is realized, that is, the computing device described in the fifth aspect can also realize the function of the backup computer in the database system described in the second aspect. Therefore, for the beneficial effects of each embodiment of the fifth aspect, please refer to the beneficial effects of the corresponding embodiments of the second aspect, and the description will not be repeated here.
  • this application also provides a data backup method.
  • the method includes: sending the page to the GBP node by using the remote direct memory access RDMA protocol during the transmission of the redo log to the standby machine, so that when a failure occurs, the page in the GBP node is used for fault repair.
  • the RDMA protocol is also used to send the modified page to the GBP node for backup on the GBP node. Since the use of the RDMA protocol enables most of the modified pages corresponding to the redo logs sent to the standby machine to be sent to the GBP node, when the local machine fails, the remaining redo logs that have not been played back by the standby machine include two Part, the first part of the redo log refers to all the redo logs located between the redo log corresponding to the disk recovery point and the redo log corresponding to the GBP recovery point, and the second part of the redo log refers to the All redo logs between the redo log corresponding to the GBP recovery point and the redo log corresponding to the disk end point.
  • the backup machine only needs to replay the second part of the redo log to obtain the corresponding page to implement fault repair, because the page corresponding to the first part of the redo log can be directly pulled from the GBP node. It can be seen that using the data backup method provided in this embodiment can improve the efficiency of fault repair.
  • the present application provides a computing device for executing the data backup method described in the sixth aspect.
  • the computing device includes a first transmission interface and a second transmission interface.
  • the first transmission interface is used to transmit the redo log to the standby machine.
  • the second transmission interface is used to send pages to the GBP node based on the remote direct memory access RDMA protocol, so that in the event of a failure, the The page in the GBP node is faulty repaired. It should be known that the use of the computing device provided in this embodiment in a database system can improve the efficiency of the database system for fault repair.
  • this application provides another fault repair method.
  • the method includes the following steps.
  • the host uses the first data transfer protocol to send multiple pages to the full GBP node.
  • the GBP node writes the multiple pages into the cache queue of the GBP node.
  • the log sequence numbers LSN corresponding to the multiple pages increase in order from the head to the tail of the cache queue.
  • the host determines the GBP start point, GBP recovery point, and GBP end point.
  • the host plays back the redo log corresponding to the GBP recovery point and the disk end point corresponding to the All redo logs between redo logs can be pulled up again.
  • the GBP starting point indicates the smallest LSN included in all pages stored on the GBP node.
  • the GBP recovery point indicates the smallest LSN included in a batch of pages last received by the GBP node.
  • the GBP end point indicates the largest LSN included in the batch of pages last received by the GBP node.
  • the disk recovery point indicates the smallest LSN included in the multiple pages written in the most recent batch of the local disk.
  • the disk end point indicates the LSN of the last redo log received.
  • the embodiment corresponding to the eighth aspect is different from the embodiment corresponding to the first aspect.
  • the backup machine plays back some of the redo logs and is promoted to the new host, thereby implementing fault repair.
  • This fault repair is actually a kind of failover, because after the fault repair Later, the original standby machine replaces the original host to perform the functions of the original host.
  • the host is pulled up again by replaying part of the redo log, that is, after the host is repaired, the host will continue to perform the previous function.
  • the host and the backup machine will be switched, and the switched backup machine is called the new master; but the corresponding implementation in the eighth aspect In the example, after the host fails, the host will be pulled up again.
  • the host stores redo logs. It is known that each transaction (addition, deletion or modification) will correspond to a redo log. In this embodiment, the host will send these redo logs to the standby machine, and will correspond to the redo log. The modified page is sent to the GBP node. In particular, it should be noted that the host will also back up these redo logs locally. For example, send these redo logs to the standby machine on the one hand, and cache them in the page buffer of the host on the other hand, or flush them. To the local disk so that when the host sends a failure, part of the redo logs in these redo logs can be replayed so that the host can be pulled again.
  • the failure of the host is a software failure.
  • the host skips all redo logs located between the redo log corresponding to the disk recovery point and the redo log corresponding to the GBP recovery point, and plays back all the redo logs located at the GBP recovery point. All redo logs between the corresponding redo log and the redo log corresponding to the end point of the disk. In other words, in this embodiment, the host only replays part of the logs that have not been replayed but not all of them, so the fault repair efficiency of the database system will be improved.
  • the disk recovery point is greater than or equal to the GBP starting point
  • the disk end point is greater than or When it is equal to the GBP end point
  • the fault repair method further includes: the host starts a background thread, and the background thread is used to pull all the pages located on the GBP node to a page buffer.
  • the background thread is used to pull all the pages located on the GBP node to the page buffer through the first data transfer protocol.
  • the background thread pulls the page from the GBP node to the page buffer of the host and the execution of the playback step can be performed in parallel, thereby saving time and improving the efficiency of fault repair.
  • the host will also compare the pages pulled to the page buffer of the host with itself The maintained pages are compared, the new pages are retained and the old pages are discarded.
  • the fault repair method further includes: the host reads the page to be accessed from the page buffer of the GBP node.
  • the fault repair method further includes: the host obtains the disk recovery point and the disk end point locally.
  • the host sends multiple pages to the GBP node , Specifically including: the host starts the page sending thread, and the page sending thread uses the first data transfer protocol to send the multiple pages in the sending queue to the sending queue in order from the head to the tail of the sending queue.
  • the LSNs corresponding to the multiple pages in the sending queue are increasing.
  • a database system includes a host and a GBP node.
  • the host is used to send multiple pages to the GBP node through the first data transmission protocol.
  • the GBP node is used to write the multiple pages into the cache queue of the GBP node.
  • the log sequence number LSN contained in each of the multiple pages increases in order from the head to the tail of the cache queue
  • the host When the host fails, the host is also used to determine the GBP starting point, the GBP recovery point, and the GBP ending point. When the disk recovery point is greater than or equal to the GBP start point, and the disk end point is greater than or equal to the GBP end point, the host is also used to play back the redo log corresponding to the GBP recovery point and the end of the disk Point all redo logs between the corresponding redo logs.
  • the GBP starting point indicates the smallest LSN included in all pages stored on the GBP node.
  • the GBP recovery point indicates the smallest LSN included in a batch of pages last received by the GBP node.
  • the GBP end point indicates the largest LSN included in the batch of pages last received by the GBP node.
  • the disk recovery point indicates the smallest LSN included in the multiple pages written in the last batch of the local disk, and the disk end point indicates the LSN of the last received redo log.
  • the host when the disk recovery point is greater than or equal to the GBP start point, and the disk end point is greater than or When equal to the GBP end point, the host is also used to start a background thread, and the background thread is used to pull all pages located on the GBP node to a page buffer.
  • the first possible implementation manner of the ninth aspect, or the second possible implementation manner of the ninth aspect, in the third possible implementation manner after the host performs the playback step, and When the page to be accessed is still located in the page buffer of the GBP node, the host is also used to read the page to be accessed from the page buffer of the GBP node.
  • each embodiment of the ninth aspect can execute the fault repair method described in the corresponding embodiment of the eighth aspect. Therefore, for the beneficial effects of each embodiment of the ninth aspect, please refer to the beneficial effects of the corresponding embodiments of the eighth aspect, and the description will not be repeated here.
  • this application also provides another method for repairing faults in a database system.
  • the fault repair method includes the following steps.
  • the GBP starting point indicates the smallest LSN included in all pages stored on the GBP node.
  • the GBP recovery point indicates the smallest LSN included in a batch of pages last received by the GBP node.
  • the GBP end point indicates the largest LSN included in the batch of pages last received by the GBP node.
  • the disk recovery point indicates the smallest LSN included in the multiple pages written in the most recent batch of the local disk.
  • the disk end point indicates the LSN of the last redo log received.
  • the host skips all redo logs located between the redo log corresponding to the disk recovery point and the redo log corresponding to the GBP recovery point, and plays back all the redo logs located at the GBP recovery point. All redo logs between the corresponding redo log and the redo log corresponding to the end point of the disk. In other words, in this embodiment, the host only replays part of the logs that have not been replayed but not all of them, so the fault repair efficiency of the database system will be improved.
  • the fault repair method when the disk recovery point is greater than or equal to the GBP start point, and the disk end point is greater than or When it is equal to the GBP end point, the fault repair method further includes: starting a background thread, the background thread being used to pull all the pages located on the GBP node to a page buffer.
  • the background thread is used to pull all pages located on the GBP node to a page buffer through the first data transfer protocol.
  • the background thread pulls the page from the GBP node to the page buffer of the host and the execution of the playback step can be performed in parallel, thereby saving time and improving the efficiency of fault repair.
  • the host will also compare the pages pulled to the page buffer of the host with itself The maintained pages are compared, the new pages are retained and the old pages are discarded.
  • the fault repair method further includes: reading the page that needs to be accessed from the page buffer of the GBP node.
  • the fault repair method further includes: obtaining the disk recovery point and the disk end point locally.
  • the host sends multiple pages to the GBP node , Specifically including: the host starts the page sending thread, and the page sending thread uses the first data transfer protocol to send the multiple pages in the sending queue to the sending queue in order from the head to the tail of the sending queue.
  • the LSNs corresponding to the multiple pages in the sending queue are increasing.
  • the execution subject of the fault repair method described in the tenth aspect is the host in the fault repair method described in the eighth aspect.
  • Each embodiment of the tenth aspect is described from the perspective of the host. Since the fault repair method described in the tenth aspect has many similarities or similarities with the fault repair method described in the eighth aspect, for the beneficial effects of each embodiment of the tenth aspect, please refer to the corresponding The beneficial effects of the embodiments will not be repeated here.
  • this application provides another computing device.
  • the computing device at least includes a transmission unit, a determination unit, and a playback unit.
  • the transmission unit is configured to send multiple pages to the GBP node through the first data transmission protocol.
  • the determining unit is used to determine the GBP start point, GBP recovery point, and GBP end point.
  • the playback unit When the disk recovery point is greater than or equal to the GBP start point, and the disk end point is greater than or equal to the GBP end point, the playback unit is used to play back the redo log corresponding to the GBP recovery point and the end of the disk Point all redo logs between the corresponding redo logs.
  • the multiple pages are written into the cache queue of the GBP node, and the LSNs corresponding to the multiple pages increase in order from the head to the tail of the cache queue.
  • the GBP starting point indicates the smallest LSN included in all pages stored on the GBP node.
  • the GBP recovery point indicates the smallest LSN included in a batch of pages last received by the GBP node.
  • the GBP end point indicates the largest LSN included in the batch of pages last received by the GBP node.
  • the disk recovery point indicates the smallest LSN included in the multiple pages written in the most recent batch of the local disk.
  • the disk end point indicates the LSN of the last redo log received.
  • the computing device further includes a starting unit.
  • the starting unit is used to start a background thread, and the background thread is used to set the All pages on the GBP node are pulled to the page buffer.
  • the background thread pulls all pages located on the GBP node to a page buffer through the first data transfer protocol.
  • the computing device further includes a reading unit . After the playback step is executed, and when the page to be accessed is still located in the GBP node, the reading unit is configured to read the page to be accessed from the GBP node.
  • each embodiment of the eleventh aspect can execute the fault repair method described in the corresponding embodiment of the tenth aspect, and can realize the function of the host in the database system as described in the eighth aspect .
  • the beneficial effects of each embodiment of the tenth aspect can be referred to the beneficial effects of the corresponding embodiment of the eighth aspect. Therefore, for the beneficial effects of each embodiment of the eleventh aspect, please also refer to the beneficial effects of the corresponding embodiments of the eighth aspect.
  • this application provides another computing device, which includes at least a memory and a processor.
  • the memory is used to store GBP start point, GBP recovery point, GBP end point, disk recovery point, and disk end point.
  • the processor is configured to send multiple pages to the GBP node through the first data transmission protocol.
  • the processor is used to determine the GBP start point, GBP recovery point, and GBP end point.
  • the processor When the disk recovery point is greater than or equal to the GBP start point, and the disk end point is greater than or equal to the GBP end point, the processor is used to play back the redo log corresponding to the GBP recovery point and the disk end point All redo logs between the corresponding redo logs.
  • the multiple pages are written into the cache queue of the GBP node, and the LSNs corresponding to the multiple pages increase in order from the head to the tail of the cache queue.
  • the GBP starting point indicates the smallest LSN included in all pages stored on the GBP node.
  • the GBP recovery point indicates the smallest LSN included in a batch of pages last received by the GBP node.
  • the GBP end point indicates the largest LSN included in the batch of pages last received by the GBP node.
  • the disk recovery point indicates the smallest LSN included in the multiple pages written in the most recent batch of the local disk.
  • the disk end point indicates the LSN of the last redo log received.
  • the processor when the disk recovery point is greater than or equal to the GBP starting point, and the disk ending point When it is greater than or equal to the GBP end point, the processor is also used to start a background thread, and the background thread is used to pull all pages located on the GBP node to a page buffer.
  • the background thread pulls all pages located on the GBP node to a page buffer through the first data transfer protocol.
  • the processor is further configured to read the page that needs to be accessed from the GBP node.
  • each embodiment of the twelfth aspect can execute the fault repair method described in the corresponding embodiment of the tenth aspect, and can realize the function of the host in the database system as described in the eighth aspect .
  • the beneficial effects of each embodiment of the tenth aspect can be referred to the beneficial effects of the corresponding embodiment of the eighth aspect. Therefore, for the beneficial effects of each embodiment of the twelfth aspect, please also refer to the beneficial effects of the corresponding embodiments of the eighth aspect.
  • this application provides another data backup method.
  • the data backup method includes the following steps.
  • the multiple pages are written into the cache queue, and the LSNs corresponding to the multiple pages increase in an order from the head to the tail of the cache queue.
  • the GBP start point, GBP recovery point, and GBP end point are maintained so that when the host fails, the GBP start point and the GBP recovery point can be Repair the fault with the end point of the GBP.
  • the GBP starting point indicates the smallest LSN included in all pages stored in the memory.
  • the GBP recovery point indicates the smallest LSN included in a batch of pages received last time.
  • the GBP end point indicates the largest LSN included in a batch of pages received last time.
  • the execution subject of the data backup method described in this embodiment is the GBP node in the database system described in the ninth aspect.
  • the modified page sent by the host is received through the RDMA protocol, it can be considered that the host sends almost all the modified pages to the GBP node, so that when the host fails, there is no need to replay all the pages.
  • the remaining redo logs of, get the corresponding pages, because most of the pages corresponding to the redo logs already exist in the GBP node. Therefore, using the data backup method provided by this embodiment can improve the efficiency of fault repair.
  • the data backup method further includes: receiving a new page that does not exist in the page buffer and the page buffer has been When full, the page at the head of the cache queue is eliminated, the new page is placed at the end of the cache queue, and the GBP starting point is updated to the new page at the head of the cache queue. LSN.
  • the GBP starting point is updated, which can ensure that the GBP starting point is updated in time.
  • the data backup method further includes: after receiving a new page, according to the new page corresponding To update the GBP recovery point and the GBP end point.
  • updating the GBP recovery point and the GBP end point can ensure that the GBP recovery point and all the The GBP ending point is updated in time.
  • the data backup method further includes: when a new page that does not exist in the page buffer is received, putting the new page into the The tail of the cache queue.
  • the data backup method further includes: when a new page that already exists in the received page buffer is received, using the new page to update the existing corresponding page, and placing the updated page in all The tail of the cache queue.
  • the pages are placed in order in the cache queue of the GBP node. Therefore, all redo logs located between the redo log corresponding to the GBP recovery point and the redo log corresponding to the GBP end point The log is the last segment of redo logs in all redo logs sent by the host to the standby.
  • the data backup method further includes: receiving multiple Redo logs, and by replaying the multiple redo logs, a page corresponding to each of the multiple redo logs is obtained.
  • the obtained pages are also flushed to the local disk in batches.
  • the GBP node can also implement the function of a backup machine.
  • this application provides another computing device, which at least includes a receiving unit, a writing unit, and a maintenance unit.
  • the receiving unit is used to receive multiple pages through the RDMA protocol.
  • the writing unit is used for writing the multiple pages into the cache queue.
  • the LSNs corresponding to the multiple pages increase in order from the head to the tail of the cache queue.
  • the maintenance unit is configured to maintain the GBP starting point, GBP recovery point, and GBP ending point according to the LSN contained in each page in the multiple pages, so that when the host fails, the GBP starting point can be , Perform fault repair on the GBP recovery point and the GBP end point.
  • the GBP starting point indicates the smallest LSN included in all pages stored in the memory.
  • the GBP recovery point indicates the smallest LSN included in a batch of pages received last time.
  • the GBP end point indicates the largest LSN included in a batch of pages received last time.
  • the maintenance unit is further configured to store the data in the cache The page at the head of the queue is eliminated, and the GBP starting point is updated to the new LSN corresponding to the page at the head of the cache queue.
  • the maintenance unit when a new page is received, is further configured to update the GBP recovery point and the GBP according to the LSN corresponding to the new page End point.
  • the writing unit when a new page that does not exist in the page buffer is received, the writing unit is further configured to put the new page into the cache The end of the queue.
  • the writing unit is further configured to use the new page to update the existing corresponding page, and put the updated page into the cache queue The tail.
  • the computing device further includes a playback unit.
  • the receiving unit is also used to receive multiple redo logs.
  • the replay unit also replays the multiple redo logs to obtain a page corresponding to each of the multiple redo logs.
  • each embodiment of the fourteenth aspect can execute the data backup method described in the corresponding embodiment of the thirteenth aspect. Therefore, the beneficial effects of each embodiment of the fourteenth aspect can be referred to the beneficial effects of the corresponding embodiments of the thirteenth aspect.
  • this application provides another computing device, which includes at least an I/O interface and a processor.
  • the I/O interface is used to receive multiple pages through the RDMA protocol.
  • the processor is configured to sequentially write the multiple pages into the cache queue, and maintain the GBP starting point, GBP recovery point, and GBP ending point according to the LSN contained in each page of the multiple pages, so that the When a failure occurs, the failure can be repaired according to the GBP starting point, the GBP recovery point, and the GBP ending point.
  • the GBP starting point indicates the smallest LSN included in all pages stored in the memory.
  • the GBP recovery point indicates the smallest LSN included in a batch of pages received last time.
  • the GBP end point indicates the largest LSN included in a batch of pages received last time.
  • the processor when a new page that does not exist in the page buffer is received and the page buffer is full, the processor is further configured to store the new page in the cache The page at the head of the queue is eliminated, and the GBP starting point is updated to the new LSN corresponding to the page at the head of the cache queue.
  • the processor when a new page is received, is further configured to update the GBP recovery point and the GBP according to the LSN corresponding to the new page End point.
  • the processor when a new page that does not exist in the page buffer is received, the processor is further configured to put the new page into the cache queue The tail.
  • the processor is further configured to use the new page to update the existing corresponding page, and put the updated page into the cache queue. Tail.
  • the I/O interface It is also used to receive multiple redo logs.
  • the processor also replays the multiple redo logs to obtain a page corresponding to each of the multiple redo logs, and flushes the obtained pages to the local disk in batches.
  • each embodiment of the fourteenth aspect can execute the data backup method described in the corresponding embodiment of the thirteenth aspect. Therefore, the beneficial effects of each embodiment of the fourteenth aspect can be referred to the beneficial effects of the corresponding embodiments of the thirteenth aspect.
  • Figure 1 is an architecture diagram of a database system.
  • Figure 2 is an architecture diagram of a database system provided by this application.
  • Figure 3 shows a schematic diagram of the checkpoint mechanism.
  • Fig. 4 is a flowchart of a method for repairing a fault in a database system provided by the present application.
  • Figure 5 is a schematic diagram of the structure when the host sends a page to the GBP node.
  • Figure 6 is a schematic diagram of a cache queue in the GBP node.
  • Figures 7A-7C are diagrams of the change process of the GBP starting point, GBP recovery point and GBP ending point when the host sends a page to the GBP node.
  • Figure 8 is a structural diagram of pulling pages from the GBP node to the standby machine.
  • FIG. 9 is a diagram of the distribution structure of redo logs in the method for repairing the fault of the database system provided by this application.
  • FIG. 10 is a flowchart of another method for repairing faults in a database system provided by the present application.
  • FIG. 11A is a structural diagram of a computing device provided by this application.
  • FIG. 11B is a structural diagram of another computing device provided by this application.
  • FIG. 12 is a structural diagram of another computing device provided by this application.
  • FIG. 13 is a structural diagram of another computing device provided by this application.
  • FIG. 14 is an architecture diagram of another database system provided by this application.
  • FIG. 15 is a flowchart of another method for repairing a fault in a database system provided by the present application.
  • FIG. 16 is a flowchart of another method for repairing faults in a database system provided by the present application.
  • FIG. 17 is a structural diagram of another computing device provided by this application.
  • FIG. 18 is a structural diagram of another computing device provided by this application.
  • FIG. 19 is a flowchart of a data backup method provided by this application.
  • FIG. 20 is a structural diagram of another computing device provided by this application.
  • FIG. 21 is a structural diagram of another computing device provided by this application.
  • WAL protocol also known as pre-written redo logs.
  • the redo logs are placed in order to ensure the durability of the modified pages. After the redo logs are placed, even if the host is down, The standby machine can also restore the standby machine to the same state as the host before the downtime by playing back the redo logs.
  • Dirty pages Pages located in the data buffer, if they are modified after being read from the disk, such pages are called dirty pages. Dirty pages are a concept in the data buffer. In this application, when the modified page is located in the data buffer of the host, it is called a dirty page, and it is written from the host to the global buffer pool (global buffer pool, The page of the GBP) node is called the modified page.
  • the modified page when the modified page is located in the data buffer of the host, it is called a dirty page, and it is written from the host to the global buffer pool (global buffer pool, The page of the GBP) node is called the modified page.
  • global buffer pool global buffer pool
  • Recovery time objective refers to the length of time the customer can tolerate service interruption. For example, if the service needs to be restored within half a day after the disaster, the RTO is twelve hours.
  • Log sequence number LSN Each log has a unique LSN, or in other words, the log and LSN are one-to-one, so a log can be uniquely determined based on the LSN. It should be noted that since each log corresponds to a modified page (that is, the page sent by the host to the GBP node, which will be referred to as a page in the following), each page only includes one LSN, and the page and LSN are also one-to-one. Therefore, the “LSN corresponding to the page” or “LSN contained in the page” or “LSN contained in the page” mentioned in this application have the same meaning.
  • Disk recovery point The smallest log sequence number LSN contained in the latest batch of data pages written in the local disk.
  • Disk end point the LSN of the last redo log on the local disk.
  • An embodiment of the present application provides a first fault repair method for a database system (referred to as "first fault repair method").
  • the first fault repair method can be applied to the database system shown in FIG. 2.
  • the database system includes a master (master) 210, a GBP node 220, and a standby (standby) 230.
  • the host 210 and the GBP node 220 perform data transmission through the first data transmission protocol.
  • the first data transmission protocol is a low-latency and high-throughput data transmission protocol.
  • the first data transmission protocol is a remote direct memory access (remote direct memory access, RDMA) protocol.
  • the host 210 has a 10 Gigabit Ethernet card or an infiniBand network card supporting the RDMA protocol.
  • the RDMA protocol has the characteristics of low delay (for example, the delay is less than or equal to 10 ⁇ s) and does not require the direct participation of the CPU.
  • the modified page in the host 210 can be remotely written into the page buffer (or memory) of the GBP node 220 based on the RDMA protocol.
  • the modified page is written into the page buffer of the GBP node 220 in a remote atomic write manner.
  • the modified page is written to the GBP node 220 in atomic units.
  • An atom usually includes multiple modified pages. Therefore, after multiple modified pages make up one atom, they will be written into the page buffer of the GBP node 220.
  • the first data transmission protocol may also be 40G Ethernet (40GE).
  • a checkpoint is a database event, and its fundamental significance is to reduce crash recovery (crash recovery) time.
  • the database itself has a checkpoint mechanism. Based on the checkpoint mechanism, through one or more background threads, the dirty pages are continuously flushed from the memory to the local disk. Limited by the speed of the local disk, the dirty pages are flushed from the memory to the local disk. The speed of the local disk will be slower. The last page of the disk drop corresponds to the hard disk recovery point. Because the dirty page is dropped to the disk slowly, there are a large number of redo logs between the redo log corresponding to the disk recovery point and the redo log corresponding to the disk end point.
  • Figure 3 shows the role of the checkpoint mechanism when the database fails and needs to be repaired.
  • page P1 is sequentially modified from version V0 to versions V1, V2, and V3, from V0 to V1, the corresponding redo log generated is log 1; from V1 to V2, the corresponding generated The redo log is log 2; from V2 to V3, the corresponding redo log generated is log 3.
  • the WAL protocol when each modification transaction is committed (commit), the host will only flush the corresponding redo log to the local disk, and page flushing is done in the background.
  • the node needs to be restored after a failure, as shown in Figure 3, then the redo log corresponding to the disk recovery point is log 0, so it is necessary to start from log 0 and play back all the redo that occurred after log 0.
  • Logging assuming that the redo logs that need to be played back in sequence are log1, log2, and log3, after all the redo logs that need to be played back are played back, the version of P1 will be restored to V3, which is to restore the page to the one before the failure status.
  • the shared nothing architecture is a distributed computing architecture in which each node (node) is independent, that is, each node has its own CPU/memory/hard disk, etc., and there is no shared resource.
  • the key device that can realize rapid recovery of the database system is the GBP node 220.
  • the GBP node may be a device installed with an application program capable of realizing the global page cache function.
  • “applications that can implement the global page cache function” are referred to as “target applications” in the following.
  • the target application may be deployed on any device other than the host 210 and the backup machine 230, and any other device on which the target application is deployed is the GBP node 220. It is worth noting that, in this embodiment, it is also necessary to configure where the host 210 will write the modified page and where the backup machine 230 will obtain the page according to the location of the device where the target application is deployed. .
  • the host 210 and the standby machine 230 After the host 210 and the standby machine 230 establish a relationship, they will connect to the GBP node according to their respective configuration information. Wherein, the host 210 and the GBP node 220 are connected through the first data transmission protocol. During the normal operation of the host 210, a heartbeat needs to be maintained between the backup machine 230 and the host 210 and between the GBP node 220 and the host 210. When the host 210 fails (or crashes) and causes the database system to fail, a failover will be performed between the host 210 and the standby machine 230. After the failover, the standby machine 230 will be promoted to a new host, thus realizing the The fault repair of the database system.
  • the first fault repair method described in this embodiment will be described in detail below. Please refer to FIG. 4, which shows a schematic flow chart of the first fault repair method. Specifically, the first fault repair method includes the following steps.
  • the host uses a first data transmission protocol to send multiple pages to the GBP node.
  • the host also sends the redo logs corresponding to all modified transactions to the standby machine.
  • the backup machine obtains corresponding pages by playing back these redo logs, and flashes these pages to the local disk of the backup machine in batches.
  • redo logs are also transferred from the host to the standby machine in batches, for example, a batch of redo logs may be 8 MB.
  • the host needs to send redo logs to more than N/2 (rounded up) backup machines, where N is an integer greater than 1.
  • the host starts the page sending thread, and the page sending thread uses the first data transfer protocol to transfer multiple pages in the sending queue according to the order from head to tail.
  • the sending queue is located in the host, and from the head to the tail of the sending queue, LSNs corresponding to multiple pages located in the sending queue are increasing.
  • the host may start multiple page sending threads, and the multiple page sending threads and the host including multiple sending queues are one-to-one.
  • the host includes multiple sending queues
  • which sending queue the modified page will be put into can be determined according to a hash algorithm.
  • multiple pages placed in the same sending queue can be placed in the sending queue Q according to the order in which the multiple pages are modified.
  • the LSN of multiple pages increases from head to tail.
  • the page that is modified first is in front of the page that is modified later.
  • the respective LSNs of the multiple pages are also determined according to the order in which the multiple pages are modified, where the LSN of the page that is modified first is smaller than the LSN of the page that is modified later.
  • the GBP node writes the multiple pages into the cache queue of the GBP node.
  • the LSN corresponding to each of the multiple pages increases in an order from the head to the tail of the cache queue.
  • the page buffer of the GBP node includes one or more cache queues.
  • Each cache queue has multiple pages, and the multiple pages located in the same cache queue are written in the order of the cache queue (or in the order from the head to the tail of the cache queue).
  • the LSN included in each page is getting bigger and bigger.
  • the GBP node starts a page receiving thread, and the page receiving thread receives the multiple pages in batches, and writes the multiple pages into the cache queue of the GBP node.
  • the GBP node may start multiple page receiving threads, and the multiple page receiving threads and the multiple cache queues included in the GBP node are one-to-one.
  • the multiple page sending threads started by the host and the multiple page receiving threads started by the GBP node are one-to-one.
  • the sending queue on the host and the cache queue on the GBP node are also one-to-one, and the pages in each sending queue are sent by the corresponding page sending thread. And after being received by the corresponding page receiving thread, it will be stored in the corresponding cache queue.
  • the host 200 has a sending queue 1-3, and a page sending thread 1-3 is also started.
  • the page sending thread 1 is used to send pages in the sending queue 1
  • the page sending thread 2 is used to send Pages located in the sending queue 2
  • the page sending thread 3 is used to send the pages in the sending queue 3.
  • GBP node 300 starts page receiving threads 1-3, and also has cache queues 1-3. Among them, the page received by page receiving thread 1 will be put into cache queue 1, and page receiving The page received by thread 2 will be put into the cache queue 2, and the page received by the page receiving thread 3 will be put into the cache queue 3. Then, in the embodiment corresponding to FIG. 5, the page located in the sending queue 1 will be put into the cache queue 1 after being sent to the page receiving thread 1 by the page sending thread 1. Pages located in the sending queue 2 will be put into the cache queue 2 after being sent by the page sending thread 2 to the page receiving thread 2. Or the page in the sending queue 3, after being sent to the page receiving thread 3 by the page sending thread 3, will be put into the cache queue 3.
  • the rate at which the host writes modified pages to the GBP node through the first data transfer protocol is much greater than that of the backup machine generating the corresponding modified pages by playing back the redo logs and The rate at which modified pages are flushed to the local disk. Therefore, the number of modified pages stored in the GBP node is much larger than the modified pages flashed in the local disk of the standby machine. Therefore, when the host fails and the database system needs to be repaired, the A part of the page can be directly pulled from the GBP node to the page buffer of the backup machine, and the backup machine only needs to replay the redo log corresponding to the second part of the page and obtain the second part of the page. Therefore, with this embodiment, the repair efficiency of the database system can be improved.
  • all pages located between the page containing the disk recovery point and the page containing the disk end point are divided into a first part page and a second part page .
  • the first part of the page refers to all pages located between the page containing the disk recovery point and the page containing the GBP recovery point, or refers to the redo log corresponding to the disk recovery point and the All redo logs corresponding to the modified pages between the redo logs corresponding to the GBP recovery point.
  • the second part of the page refers to all pages located between the page containing the GBP recovery point and the page containing the end point of the disk, or refers to the redo log corresponding to the GBP recovery point and the end of the disk All redo logs between the corresponding redo logs correspond to the modified pages.
  • the first partial page may include the page including the disk recovery point, or may not include the page including the disk recovery point.
  • the first part of the page may include the page including the GBP recovery point, or may not include the page including the GBP recovery point.
  • the second part of the page may not include the page that includes the GBP recovery point, and naturally, it may also include the page that includes the GBP recovery point.
  • the first partial page does not include the page including the GBP recovery point
  • the second partial page includes the page including the GBP recovery point. It should be understood that the second partial page includes the page containing the end point of the disk.
  • the standby machine determines the GBP starting point, the GBP recovery point, and the GBP ending point.
  • the GBP starting point indicates the smallest LSN included in all pages stored on the GBP node.
  • the GBP recovery point indicates the smallest LSN included in a batch of pages last received by the GBP node.
  • the GBP end point indicates the largest LSN included in the batch of pages last received by the GBP node.
  • the GBP node itself maintains the GBP starting point, the GBP recovery point, and the GBP ending point, and the backup machine obtains these three points from the GBP node.
  • the GBP node After the GBP node receives a new page, it updates the GBP recovery point and the GBP end point.
  • the new page is put into the cache The end of the queue.
  • the GBP node is receiving a new page, but the new page already exists in the page buffer of the GBP node, then the GBP node will follow all the received pages.
  • the new page updates the existing corresponding page, and places the updated new page at the end of the cache queue, or in other words, the GBP node deletes the existing corresponding page and sets the new page Put it at the end of the cache queue.
  • the so-called “new page” refers to the page currently received by the GBP node.
  • page M if the page currently received by the GBP node is page M, then page M is a "new page".
  • the page M is placed at the end of one of the cache queues.
  • page M already exists in the page buffer of the GBP node before page M is received the page M is in the cache queue R
  • the existing page M contains the LSN of K
  • the currently received page M contains LSN is T.
  • K and T are both integers greater than or equal to 0, and T is greater than K.
  • the currently received page M is used to update the existing page M, and the updated page M is placed at the end of the cache queue R. Or, discard the existing page M, and put the currently received page M at the end of the cache queue R.
  • judging (or determining) whether the new page already exists in the page buffer of the GBP node may be executed by the GBP node itself, or may be executed by the host.
  • the GBP node when the GBP node receives a new page that does not exist in the page buffer of the GBP node, the new page does not exist in the page buffer of the GBP node, And when the page buffer of the GBP node is full, the GBP node eliminates the page at the head of the cache queue, and updates the GBP starting point to correspond to the new head page of the cache queue The LSN.
  • the GBP node will eliminate the page at the head of the cache queue, and Put the page Y at the end of the cache queue, and at the same time the GBP starting point will be updated (or advanced) to the LSN corresponding to the new head page of the cache queue.
  • the backup machine obtains the GBP start point, the GBP recovery point, and the GBP end point from the GBP node, the backup machine obtains the GBP that has been updated recently. The starting point, the GBP recovery point, and the GBP ending point.
  • the node is typically a batch GBP (Bitch) receiving a page from the host, e.g., number of pages may include up to 100 pages, including a minimum of one page.
  • GBP Group-Bitch
  • the background thread of the host sends a batch of pages to the GBP node every 5ms, and if there are M (M is an integer greater than 100) pages to be sent in the host, the background thread of the host M/100 (rounded up) times will be sent continuously. If there is only one page in the host, the background thread of the host will only send one page to the GBP node.
  • the last batch of pages received by the GBP node may include one or more pages.
  • the number of the multiple pages is not greater than the maximum number of pages allowed to be sent at one time (for example, 100 pages).
  • the cache elimination algorithm manages all pages in the cache queue of the GBP node. Specifically, assuming that the cache queue of the GBP node is a window (as shown in FIG.
  • FIGS. 7A to 7C show how the GBP node manages the pages stored in the cache queue of the GBP node based on the sliding window cache elimination algorithm after each batch of pages received by the GBP node, and How to maintain the GBP starting point, the GBP recovery point, and the GBP.
  • the GBP start point and GBP recovery point are both 1, and the GBP end point is 3.
  • the GBP start point is 1, the GBP end point is 6, and the GBP recovery point is 4.
  • the GBP start point is 3, the GBP end point is 8, and the GBP recovery point is 7.
  • the backup machine plays back the redo log and the disk corresponding to the GBP recovery point All the redo logs between the redo logs corresponding to the end point, so that the standby machine is switched to a new host, thereby realizing fault repair of the database system.
  • the disk recovery point indicates the smallest LSN contained in the most recently written multiple pages in the disk of the backup machine.
  • the disk end point indicates the LSN of the last redo log received by the standby machine.
  • the backup machine When the disk recovery point is greater than or equal to the GBP start point, and the disk end point is greater than or equal to the GBP end point, the backup machine also starts a background thread, and the background thread is used to transfer the GBP All pages stored on the node are pulled to the page buffer of the standby machine. Subsequently, the backup machine will also start a background thread to flush these pages from the page buffer of the backup machine to the local disk of the backup machine.
  • the starting of the background thread by the standby machine is performed almost simultaneously with the starting of the playback step (S105) by the standby machine.
  • step S105 is to distinguish it from the original host in this embodiment. It should be known that after the first fault repair method is executed, the standby machine (or the original standby machine) is promoted (or switched) to become the new host.
  • the background thread is used to pull all pages stored on the GBP to the page buffer of the backup machine through a second data transfer protocol.
  • the second data transmission protocol may be a data transmission protocol with low latency and high throughput.
  • the second data transmission protocol may be a data transmission protocol with low latency and high throughput.
  • the first data transmission protocol and the second data transmission protocol may be the same protocol.
  • the second data transmission protocol is an RDMA protocol.
  • the standby machine has a 10 Gigabit Ethernet card or an infiniBand network card that supports the RDMA protocol.
  • the second data transmission protocol may also be 40G Ethernet (40GE).
  • the first data transmission protocol and the second data transmission protocol may both be RDMA protocols, or both may be 40GE.
  • the first data transmission protocol and the second data transmission protocol may also be one of the RDMA protocol and the other 40GE.
  • the first data transmission protocol is the RDMA protocol
  • the second data transmission protocol is 40GE.
  • the backup machine pulls all the pages stored on the GBP node to the page buffer of the backup machine through the second data transfer protocol, it will also pull the pages pulled into the page buffer to the backup machine. Compare the pages maintained by yourself, discard the old pages and leave the new pages. As shown in Figure 8, if the version of P1 pulled from the GBP node is V3, and the version of P1 maintained by the backup machine itself is V2, then discard V2 and retain V3. In addition, as shown in Figure 8, if the version of P2 pulled from the GBP node is V0, and the version of P2 maintained by the standby machine itself is V1, then V0 is discarded and V1 is retained.
  • the version of the page pulled from the GBP node is newer than the version of the page maintained by the backup machine itself, and in the backup machine After the computer performs the playback step (S105), the version of the page maintained by the backup computer is newer than the version of the page pulled from the GBP node, or the same as the version of the page pulled from the GBP node new.
  • the version of the page maintained by the backup machine itself may be generated by the backup machine by playing back the redo log, or it may be directly read from the local disk of the backup machine.
  • the backup machine After the host fails and before the backup machine executes the playback step, the backup machine also obtains the disk recovery point and the disk end point locally. Naturally, the purpose of obtaining the disk recovery point and the disk end point is to determine whether the conditions defined in step S105 are satisfied.
  • the standby machine can then be switched (upgraded) to the new host, that is, the fault repair of the database system described in this embodiment is completed. Therefore, the efficiency of the backup machine being switched (upgraded) to the new host is only one of the redo logs corresponding to the GBP recovery point and the disk recovery point being played back by the backup machine.
  • the rate of all redo logs in the period is related to the rate of pulling all pages stored on the GBP node to the page buffer of the backup machine. Therefore, pulling all pages stored on the GBP node to the page buffer of the backup machine can be completed asynchronously in the background of the backup machine.
  • the backup machine only replays all redo logs between the redo log corresponding to the GBP recovery point and the redo log corresponding to the disk end point. However, all redo logs located between the redo log corresponding to the disk recovery point and the redo log corresponding to the GBP recovery point are not played back (as shown in FIG. 9). In other words, in this embodiment, the backup machine skips all redo logs between the redo log corresponding to the disk recovery point and the redo log corresponding to the GBP recovery point, and only plays back All redo logs between the redo log corresponding to the GBP recovery point and the redo log corresponding to the disk end point.
  • the backup machine needs The number of replayed redo logs is relatively small, so the adoption of this embodiment can improve the efficiency of the standby machine being switched to the new host, that is, the efficiency of fault repair of the database system can be improved.
  • the backup machine no longer continues to replay the remaining redo logs that are not replayed, but determines the GBP starting point and the GBP recovery point , The GBP end point, the disk recovery point, and the disk end point, and then compare the size of the disk recovery point and the GBP start point, and compare the size of the disk end point and the GBP end point , And when the disk recovery point is greater than or equal to the GBP start point, and the disk end point is greater than or equal to the GBP end point, playback of the redo log corresponding to the GBP recovery point and the disk end Click all the redo logs between the corresponding redo logs to achieve failover or failure recovery of the database system.
  • the standby machine only replays a small part of all the remaining redo logs that have not been played back. Therefore, the technical solution provided by this embodiment can improve the recovery efficiency of the database system.
  • the application reads the page to be accessed from the page buffer of the GBP node.
  • FIG. 2 is an architecture diagram of a database system.
  • the database system can be used to execute the aforementioned first fault repair method. Since the database system has been described in the foregoing embodiments a lot, this embodiment only describes the parts that are not mentioned in the foregoing embodiments. As for the parts that have been described in the foregoing embodiments, due to this embodiment and other Regarding the embodiments of the database system, reference can be made directly to the relevant descriptions of the foregoing embodiments, so the details are not repeated here.
  • the database system includes a host 210, a backup machine 230, and a GBP node 220, and the host 210 and the GBP node 220 are communicatively connected through a first data transmission protocol.
  • the host 210 is configured to send multiple pages to the GBP node 220 using the first data transmission protocol.
  • the GBP node 220 is used to write the multiple pages into the cache queue of the GBP node. It should be noted that the LSNs corresponding to the multiple pages increase in order from the head to the tail of the cache queue.
  • the backup machine 230 is used to determine the GBP start point, the GBP recovery point, and the GBP end point.
  • the backup machine 230 is also used to play back the corresponding GBP recovery point All redo logs between the redo log and the redo log corresponding to the end point of the disk.
  • the GBP node 220 is used to receive a new page, and update the GBP starting point, the GBP recovery point, and the GBP according to the new page. GBP ending point.
  • the backup machine 230 is also used to obtain all the GBP nodes.
  • the GBP node 220 when the GBP node 220 receives a new page, and the new page does not exist in the page buffer of the GBP node, the GBP node 220 is further configured to put the new page in all The tail of the cache queue.
  • the GBP node 220 when the GBP node 220 receives a new page, and the new page already exists in the page buffer of the GBP node, the GBP node 220 is further configured to respond to the received new page The existing corresponding page is updated, and the updated new page is placed at the end of the cache queue.
  • the GBP node 220 when the GBP node 220 receives a new page, and the new page already exists in the page buffer of the GBP node, the GBP node 220 is also used to discard the existing page and the The page corresponding to the new page, and the new page is placed at the end of the cache queue.
  • the GBP node 220 when the GBP node 220 receives a new page that does not exist in the page buffer of the GBP node, and the page buffer of the GBP node is full, the GBP node 220 is further configured to The page at the head of the cache queue is eliminated, and the GBP starting point is updated to the LSN corresponding to the new head page of the cache queue. Naturally, after the page at the head of the cache queue is eliminated, the GBP node 220 is also used to put new pages that do not exist in the page buffer of the GBP node into the tail of the cache queue.
  • the backup machine 230 when the disk recovery point is greater than or equal to the GBP start point, and the disk end point is greater than or equal to the GBP end point, the backup machine 230 is also used to start the background Thread, the background thread is used to pull all pages stored on the GBP node 220 to the page buffer of the standby machine.
  • the background thread is used to pull all the pages stored on the GBP node 220 to the page buffer of the backup machine through a second data transfer protocol.
  • the backup machine 230 plays back all the redo logs located between the redo log corresponding to the GBP recovery point and the redo log corresponding to the disk end point and the backup machine 230 Pulling all pages stored on the GBP node 220 to the page buffer of the backup machine can be completed asynchronously.
  • the backup machine 230 is also used to determine, or obtain locally, the disk recovery point and The end point of the disk.
  • the host 210 is also used to send redo logs to the backup machine 230.
  • the backup machine 230 is also used to play back the redo log to obtain the corresponding page.
  • the host 210 is configured to start a page sending thread, and the page sending thread may use the first data transfer protocol to transfer multiple pages in the sending queue according to the order from the head to the tail of the sending queue. In order, sent to the GBP node 220 in batches.
  • the sending queue is located in the host 210, and from the head to the tail of the sending queue, the LSN contained in each of the multiple pages is increasing.
  • the host 210 is also used to start multiple page sending threads, and the host 210 may include multiple sending queues, wherein the multiple page sending threads and the page sending queue are one-to-one.
  • the GBP node 220 is used to start a page receiving thread, and the page receiving thread may use the first data transfer protocol to receive the multiple pages in batches, and write the multiple pages to all pages.
  • the cache queue of the GBP node is used to start a page receiving thread, and the page receiving thread may use the first data transfer protocol to receive the multiple pages in batches, and write the multiple pages to all pages.
  • the GBP node 220 is also used to start multiple page receiving threads, and the page buffer of the GBP node has multiple cache queues, where the multiple page receiving threads and the multiple cache queues It is one-to-one.
  • the multiple page sending threads started by the host 210 and the multiple page receiving threads started by the GBP node 220 may also be one-to-one. It should be understood that in this case, the multiple sending queues and the multiple cache queues are also one-to-one, that is, multiple pages in each sending queue can be sent to a corresponding cache In the queue.
  • FIG 10 is a flowchart of the second method for repairing a fault in a database system provided by this application.
  • the second fault repair method of the database system (referred to as the "second fault repair method") is described from the standpoint of the machine, while the foregoing first fault repair method (referred to as the "first fault repair method”) A fault repair method”) is described from the perspective of the system. Since the standby machine is a part of the system, the second fault repair method has many similarities with the first fault repair method. Based on this, in the following about the first In the embodiments of the two fault repair methods, only the parts that are different from the first fault repair method are described, and for the same parts as the first fault repair method, please refer to the aforementioned related embodiments.
  • the second fault repair method includes the following steps.
  • all pages stored on the GBP node are sent by the host to the GBP node through the first data transfer protocol during the normal operation of the host, and are written to the GBP node by the GBP node.
  • the log sequence numbers LSN corresponding to the multiple pages increase in order from the head to the tail of the cache queue.
  • the fault repair method provided in this embodiment further includes: starting a background thread The background thread is used to pull all the pages stored on the GBP node to the page buffer.
  • the background thread pulls all pages stored on the GBP node to a page buffer through a second data transfer protocol.
  • the failure repair method provided in this embodiment further includes: obtaining the disk recovery point and the disk end point. And, during the normal operation of the host, receiving the redo log sent by the host, replaying the redo log to obtain the corresponding page, and flashing the obtained pages to the local disk in batches.
  • the fault repair method provided in this embodiment further includes: downloading from the page of the GBP node Read the page to be accessed in the buffer.
  • FIGS. 11A and 11B it is a schematic structural diagram of the first computing device 500 provided by this application.
  • the computing device 500 may be the backup machine mentioned in the foregoing second fault repair method, and the computing device 500 may execute the foregoing fault repair method described from the perspective of the backup machine.
  • the standby machine and the host mentioned in the foregoing second fault repair method may be two independent nodes.
  • the computing device 500 includes at least a determining unit 510 and a playback unit 530.
  • the determining unit 510 is used to determine the GBP starting point, the GBP recovery point, and the GBP ending point.
  • the playback unit 530 is configured to play back the redo log corresponding to the GBP recovery point and the end of the disk Point all redo logs between the corresponding redo logs.
  • the disk recovery point and the disk end point please also refer to the previous description.
  • the computing device 500 further includes a starting unit 540.
  • the starting unit 540 is used to start a background thread, and the background thread is used to transfer the GBP All pages stored on the node are pulled to the page buffer.
  • the determining unit 510 is also used to obtain the disk recovery point and the disk end point.
  • the computing device further includes a receiving unit 520.
  • the receiving unit 520 is configured to receive the redo log sent by the host.
  • the playback unit 530 is configured to play back the redo log to obtain the corresponding page.
  • the computing device further includes a reading unit 550.
  • the reading unit 550 After performing the playback step, and when the page that needs to be accessed is still located in the page buffer of the GBP node, the reading unit 550 is configured to read the page buffer of the GBP node that needs to be accessed. The page visited.
  • FIG. 12 it is a schematic structural diagram of the second computing device 600 provided by this application.
  • the computing device 600 may be the backup machine mentioned in the foregoing second fault repair method, and the computing device 600 may execute the foregoing second fault repair method described from the perspective of the backup machine.
  • an operating system 620 runs on the hardware layer 610 of the computing device 600
  • an application program 630 runs on the operating system 620.
  • the hardware layer 610 includes a processor 611, a memory 612, an input/output (I/O) interface, and so on.
  • the memory 612 stores executable code, and the executable code is configured to implement the components and functions of the computing device 600 when executed by the processor 611. In this embodiment, the memory 612 is used to store the disk recovery point and the disk end point.
  • the processor 611 is used to determine the GBP starting point, the GBP recovery point, and the GBP ending point.
  • the processor 611 is further configured to play back the redo log corresponding to the GBP recovery point All redo logs between the redo logs corresponding to the end point of the disk.
  • the processor 611 when the disk recovery point is greater than or equal to the GBP start point, and the disk end point is greater than or equal to the GBP end point, the processor 611 is further configured to start a background thread, The background thread is used to pull all pages stored on the GBP node to the page buffer.
  • the processor 611 is further configured to retrieve the page from the GBP node. Read the page to be accessed in the buffer.
  • the processor 611 is further configured to obtain the disk recovery point and the disk end point from the memory.
  • the I/O interface 613 is used to receive the redo log sent by the host.
  • the processor 611 is configured to play back the redo log to obtain the corresponding page.
  • the first data backup method at least includes: sending the page to the GBP node using the RDMA protocol during the transmission of the redo log to the standby machine, so that when a failure occurs, the page in the GBP node is used for fault repair.
  • the RDMA protocol is also used to send the modified page to the GBP node for backup on the GBP node. Since the use of the RDMA protocol enables most of the modified pages corresponding to the redo logs sent to the standby machine to be sent to the GBP node, when the local machine fails, the remaining redo logs that have not been played back by the standby machine include two Part, the first part of the redo log refers to all the redo logs located between the redo log corresponding to the disk recovery point and the redo log corresponding to the GBP recovery point, and the second part of the redo log refers to the All redo logs between the redo log corresponding to the GBP recovery point and the redo log corresponding to the disk end point.
  • the backup machine only needs to replay the second part of the redo log to obtain the corresponding page to implement fault repair, because the page corresponding to the first part of the redo log can be directly pulled from the GBP node. It can be seen that using the data backup method provided in this embodiment can improve the efficiency of fault repair.
  • the present application also provides a third computing device 700, which can execute the foregoing first data backup method.
  • an operating system 720 runs on the hardware layer 710 of the computing device 700
  • an application program 730 runs on the operating system 720.
  • the hardware layer 710 includes a processor 711, a memory 712, a first transmission interface 713, a second transmission interface 714, and so on.
  • the memory 712 stores executable code, and the executable code is configured to implement the components and functions of the computing device 700 when executed by the processor 711.
  • the first transmission interface 713 is used to transmit the redo log to the standby machine.
  • the second transmission interface 714 is used to transmit the page to the GBP node based on the RDMA protocol, so that when a failure occurs, the The page is repaired.
  • the database system using the computing device 700 will have a relatively high failure repair efficiency when performing fault repair.
  • the third fault repair method can be applied to the database system shown in FIG. 14, and the database system includes a host 800 and a GBP node 900.
  • the third fault repair method is also described from a system perspective, but the third fault repair method is different from the aforementioned first fault repair method. The difference between them is that in the first fault repair method, the host 210, the standby machine 230, and the GBP node 220 are involved.
  • the standby machine is promoted to a new host by replaying the log. That is, in the first fault repair method, after the host fails, the backup opportunity is promoted to a new host.
  • the third method for repairing the fault of the database system only the host 800 and the GBP node 900 are involved. When the host 800 fails, the host 800 will be pulled up again by replaying the redo log.
  • the foregoing first fault repair method can be used when the software of the host is faulty, and can also be used when the hardware of the host is faulty.
  • the third fault repair method can usually only be used when the host's software fails.
  • this third fault repair method has many similarities with the aforementioned first fault repair method. Therefore, when the third fault repair method is described below, it is only different from the aforementioned first fault repair method. The part of the repair method is described. For the same part, please refer to the previous description directly.
  • FIG. 15 is a flowchart of the third fault repair method.
  • the execution subject of S301, S303, and S305 is the host 800
  • the execution subject of S302 is the GBP node 900.
  • S101 is executed by the host 210
  • S102 is executed by the GBP node 220
  • both S103 and S105 are executed by the standby machine 230. It is easy to see that S301 and S101 are almost the same; S302 and S102 are almost the same;
  • S303 and S103 are almost the same except for the execution subject; S305 and S105 are also almost the same except for the execution subject.
  • the third fault repair method includes the following steps.
  • the GBP node writes the multiple pages into a cache queue of the GBP node.
  • the third fault repair method further includes: S306. Start a background thread, which is used to pull all pages located on the GBP node to a page buffer. It should be known that the pages pulled to the page buffer will also be flushed to the local disk.
  • the third failure repair method further includes: S304, obtaining the disk recovery point and the disk end point.
  • the third fault repair method further includes: S307, from the GBP Read the page that needs to be accessed from the node.
  • the database system includes a host 800 and a GBP node 900, and the host 800 and the GBP node 900 are in communication connection through a first data transmission protocol.
  • the database system can be used to execute the aforementioned third fault repair method.
  • the host 800 is configured to send multiple pages to the GBP node 900 through the first data transmission protocol.
  • the GBP node 900 is used to write the multiple pages into the cache queue of the GBP node.
  • log sequence number LSN contained in each of the multiple pages increases in order from the head to the tail of the cache queue
  • the host 800 is also used to determine the GBP start point, GBP recovery point, and GBP end point.
  • the host 800 is also used to play back the redo log corresponding to the GBP recovery point and the disk end point All redo logs between the corresponding redo logs.
  • the host 800 when the disk recovery point is greater than or equal to the GBP start point, and the disk end point is greater than or equal to the GBP end point, the host 800 is also used to start a background thread, so The background thread is used to pull all pages located on the GBP node to the page buffer.
  • the background thread pulls all pages located on the GBP node to a page buffer through the first data transfer protocol.
  • the host 800 is also used to read from the page buffer of the GBP node. Take the page that needs to be visited.
  • this application also provides a fourth method for repairing a fault in a database system.
  • the execution subject of the fourth fault repair method is the host 800 in FIG. 14.
  • the fault repair method includes the following steps.
  • S311 During normal operation, send multiple pages to the GBP node through the first data transmission protocol.
  • This application also provides a fourth type of computing device 1000, which can be used to perform the foregoing fourth type of fault repair method, that is, the fourth type of computing device 1000 can implement the foregoing fourth type of fault repair method The function of the host.
  • the computing device 1000 includes at least a sending unit 1010, a determining unit 1020, and a playback unit 1030.
  • the sending unit 1010 is configured to send multiple pages to the GBP node using the first data transmission protocol.
  • the log sequence numbers LSN corresponding to the multiple pages increase in order from the head to the tail of the cache queue.
  • the determining unit 1020 is used to determine the GBP starting point, the GBP recovery point, and the GBP ending point.
  • the playback unit 1030 is configured to play back the redo log and the disk corresponding to the GBP recovery point All redo logs between the redo logs corresponding to the end point.
  • the computing device further includes a starting unit 1040.
  • the starting unit 1040 is used to start a background thread, wherein the background thread is used to Pull all the pages on the GBP node to the page buffer of the computing device.
  • the computing device further includes a reading unit 1050.
  • the reading unit 1050 is configured to read from the GBP node Read the page that needs to be accessed.
  • the present application also provides a fifth computing device 2000, which can be used to perform the aforementioned third fault repair method.
  • an operating system 2020 runs on the hardware layer 2010 of the computing device 2000
  • an application program 2030 runs on the operating system 2020.
  • the hardware layer 2010 includes a processor 2011, a memory 2012, and an I/O interface 2013.
  • the memory 2012 stores executable code, which is configured to implement the components and functions of the computing device 2000 when executed by the processor 2011.
  • the memory 2012 is used to store GBP start point, GBP recovery point, GBP end point, disk recovery point, and disk end point.
  • the processor 2011 is configured to use the first data transmission protocol to send multiple pages to the GBP node. Wherein, the log sequence numbers LSN corresponding to the multiple pages increase in order from the head to the tail of the cache queue.
  • the processor 2011 is further configured to determine the GBP starting point, the GBP recovery point, and the GBP ending point.
  • the processor 2011 is further configured to play back the redo corresponding to the GBP recovery point All redo logs between the log and the redo log corresponding to the end point of the disk.
  • the processor 2011 is also configured to start a background thread, and the The background thread is used to pull all the pages located on the GBP node to the page buffer.
  • the processor 2011 is further configured to download from the GBP node Read the page that needs to be accessed.
  • This application also provides a second data backup method.
  • the execution subject of the second data backup method is the GBP node.
  • the GBP node can be either the GBP node described in the first failure repair method, or the GBP node described in the third failure repair method.
  • the second data backup method includes the following steps.
  • S401 Receive multiple pages from the host through the RDMA protocol.
  • S403 when a new page that does not exist in the memory is received, S403 specifically includes: putting the new page into the tail of the cache queue.
  • S403 specifically includes: eliminating the page at the head of the cache queue, and removing the new page Store it at the end of the cache queue, and update the GBP starting point to a new LSN corresponding to the page at the head of the cache queue.
  • S403 specifically includes: using the new page to update the existing corresponding page, and placing the updated page in the cache queue. Tail.
  • the standby machine of the GBP node and the host machine may be overlapped, that is, the standby machine can implement the standby machine in the first fault repair method.
  • the function can also realize the function of the GBP node in the first fault repair method, in other words, the backup machine is installed with an application program that can realize the global page cache function.
  • the second data backup method further includes: receiving multiple redo logs, and by replaying the multiple redo logs, obtaining data from the multiple The page corresponding to each redo log in the redo log.
  • This application also provides a sixth computing device 3000, which can execute the aforementioned second data backup method, that is, the sixth computing device 3000 can implement the function of the GBP node described in the aforementioned embodiment .
  • the sixth computing device 3000 includes at least a receiving unit 3010, a writing unit 3020, and a maintenance unit 3030.
  • the receiving unit 3010 is configured to receive multiple pages from the host through the RDMA protocol.
  • the writing unit 3020 is used to write the multiple pages into the buffer queue. It is worth noting that the log sequence numbers LSN corresponding to the multiple pages increase in order from the head to the tail of the cache queue.
  • the maintenance unit 3030 is configured to maintain the GBP starting point, GBP recovery point, and GBP ending point according to the LSN of each page in the plurality of pages, so that when the host fails, according to the GBP starting point, all Perform fault repair on the GBP recovery point and the GBP end point.
  • the writing unit 3020 is also used to put the new page into the tail of the cache queue.
  • the writing unit 3020 is also used to eliminate the page at the head of the cache queue, and The new page is stored at the end of the cache queue.
  • the maintenance unit 3030 is further configured to update the GBP starting point to a new LSN corresponding to the page at the head of the cache queue.
  • the writing unit 3020 is further configured to use the new page to update the existing corresponding page, and put the updated page into the cache queue The tail.
  • the maintenance unit 3030 is further configured to update the GBP recovery point and the GBP end point according to the received pages.
  • the receiving unit is also used to receive multiple redo logs, and the The sixth computing device also includes a playback unit.
  • the playback unit is configured to play back the multiple redo logs to obtain a page corresponding to each redo log in the multiple redo logs.
  • This application also provides a seventh computing device 4000.
  • the seventh computing device 4000 can also execute the aforementioned second data backup method.
  • the seventh computing device 4000 can implement the GBP described in the aforementioned embodiment.
  • the function of the node Specifically, as shown in FIG. 21, an operating system 4020 runs on the hardware layer 4010 of the computing device 4000, and an application program 4030 runs on the operating system 4020.
  • the hardware layer 4010 includes a processor 4011, a memory 4012, an I/O interface 4013, and so on.
  • the memory 4012 stores executable code, which is configured to implement the components and functions of the computing device 4000 when executed by the processor 4011.
  • the I/O interface 4013 is used to receive multiple pages from the host through the RDMA protocol.
  • the processor 4012 is configured to sequentially write the multiple pages into the buffer queue, and maintain the GBP start point, GBP recovery point, and GBP end point according to the LSN contained in each page of the multiple pages.
  • the LSNs corresponding to the multiple pages increase in order from the head to the tail of the cache queue.
  • the purpose of maintaining the GBP starting point, the GBP recovery point, and the GBP ending point is to use the GBP starting point, the GBP recovery point, and the GBP ending point when the host fails. Click to repair the fault.
  • the processor 4012 is also configured to put the new page at the end of the cache queue.
  • the processor 4012 is also used to eliminate the page at the head of the cache queue and remove all The new page is stored at the end of the cache queue, and the GBP starting point is updated to the LSN corresponding to the new page at the head of the cache queue.
  • the processor 4012 when receiving a new page that already exists in the memory, is further configured to use the new page to update the existing corresponding page, and put the updated page into the cache queue. Tail.
  • the processor 4012 is further configured to update the GBP recovery point and the GBP end point according to the received pages.
  • the processor 4012 is also configured to receive multiple redo logs, and Play back the multiple redo logs to obtain a page corresponding to each of the multiple redo logs.
  • this application involves multiple protection subjects, and each protection subject corresponds to multiple embodiments, but the protection subjects and the embodiments are all related.
  • this application before describing the fault repair method of the database system including the host, the backup machine and the GBP node, a lot of general contents are described, and these contents are used for all the following related embodiments.
  • the description of the other embodiments is relatively simple. It should be understood that all other embodiments can be understood with reference to the content of any relevant part in this application, and in this application, various embodiments can be referred to each other.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

一种数据库系统的故障修复方法,该方法采用了全局页面缓冲池GBP节点。具体的,主机在正常工作期间,通过低时延、高吞吐的数据传输协议(比如远程直接内存访问RDMA协议),将因为事务修改产生的被修改页面备份到GBP节点上,这样在主机发生故障时,备机不需要回放剩余没有回放的所有重做日志,而只需要将GBP节点上不存在的页面以及没有顺序排列的页面对应的重做日志进行回放并获得这部分页面,备机就能够被提升为主机,或者说该数据库系统的故障就能够被修复了。

Description

数据库系统的故障修复方法、数据库系统和计算设备
相关申请的交叉引用
本申请要求在2019年5月13日提交中国国家知识产权局、申请号为201910395371.7、申请名称为“数据库系统的故障修复方法、数据库系统和计算设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及数据库技术领域,尤其涉及数据库系统的故障修复方法,以及对应的数据库系统和计算设备。
背景技术
图1示出了一种数据库系统,包括主机110和备机130,主机110和备机130的设置是为了保证该数据库系统的可靠性。主机110和备机130各自有自己的数据存储和日志存储,主机110修改页面(page)会产生重做日志(redo log),主机110将重做日志传输给备机,备机130接收重做日志并回放,从而达到备机130与主机110数据同步的目的。备机130接收重做日志和回放重做日志是两个并行的过程,备机130可以批量接收重做日志并写入本地内存,同时逐条回放重做日志。通常情况下,日志回放速度慢于日志接收速度,比如接收了10G的日志,可能仅回放了8G的日志,有2G的日志待回放。在主机110发生故障时,备机130需要将接收到的所有重做日志全部回放完,才能与故障发生前的主机110同步以及替代该主机110成为新的主机(也被称为“故障转移”或“数据库系统恢复”)。恢复时间目标(Recovery Time Objective,RTO)就是备机130被提升为新的主机所需的时间。通过上述主备切换的过程可以看出,RTO取决于待回放的日志量,待回放的日志量越大,就会导致RTO越大,进而影响业务的连续性。
发明内容
本申请涉及数据库系统的故障修复方法,用于在该数据库系统发生故障时,降低该数据库系统进行故障修复所需要的时间,提高故障修复效率。另外,本申请还提供了对应的数据库系统以及计算设备。
第一方面,本申请提供一种数据库系统的故障修复方法。该方法包括如下内容。在所述主机正常工作时,主机使用第一数据传输协议将多个页面发送给全局页面缓冲池GBP节点。所述GBP节点将所述多个页面写入到所述GBP节点的缓存队列。其中,所述多个页面对应的日志序列号LSN按照从所述缓存队列的头部到尾部的顺序递增。
在所述主机发生故障时,所述备机确定GBP起始点、GBP恢复点和GBP结束点。
所述GBP起始点指示所述GBP节点上存储的所有页面中所包括的最小的LSN。所述GBP恢复点指示所述GBP节点最近一次接收到的一批页面中所包括的最小的LSN。所述GBP结束点指示所述GBP节点最近一次接收到的所述一批页面中所包括的最大的LSN。
在磁盘恢复点大于或等于所述GBP起始点,且磁盘结束点大于或等于所述GBP结束点时,所述备机回放位于所述GBP恢复点所对应的重做日志和所述磁盘结束点所对应的重做日志之间的所有重做日志,以使所述备机被提升为为新的主机。
所述磁盘恢复点指示所述备机的磁盘中最近一批被写入的多个页面所包含的最小的LSN。所述磁盘结束点指示所述备机所接收的最后一条重做日志的LSN。
需要说明的是,该GBP节点的页面缓冲区包括一个或多个缓存队列,每一缓存队列内均存有多个页面,且位于同一缓存队列的多个页面,按照从该缓存队列的头部到尾部的顺序,该多个页面各自包括的LSN越来越大。
值得注意的是,在本实施例中,所述备机回放位于所述GBP恢复点所对应的重做日志和所述磁盘结束点所对应的重做日志,具体是指:所述备机回放所述GBP恢复点所对应的重做日志、所述磁盘结束点所对应的重做日志、以及位于所述GBP恢复点所对应的重做日志和所述磁盘结束点所对应的重做日志之间的其他所有重做日志。换句话说,该备机回放的重做日志位于一个闭区间内,所以该备机还需要回放位于该闭区间两端的重做日志。
结合前述实施例,容易知道,在该主机发生故障时,该备机不再继续回放尚未回放的所有重做日志,而是确定该GBP起始点、该GBP恢复点、该GBP结束点、该磁盘恢复点以及该磁盘结束点,然后在该磁盘恢复点大于或等于该GBP起始点,以及该磁盘结束点大于或等于该GBP结束点时(或者简称为“在条件满足时”),回放位于该GBP恢复点对应的重做日志与该磁盘结束点对应的重做日志之间的所有重做日志,以对该数据库系统进行故障恢复。
已经知道的是,该主机修改页面后,会产生与该被修改页面对应的重做日志,然后该主机会将该重做日志发送给该备机,该备机通过回放该重做日志能够得到对应的被修改页面。也即,该备机通过回放重做日志实现与该主机的同步。
当前如果该主机发生故障,则该备机会继续回放该主机在故障发生之前传送过来的所有剩余且尚未回放的重做日志,直至该备机接收到的所有重做日志均被回放完,然后该备机就能够与故障发生前的该主机实现同步了,之后该备机会替代该发生故障的主机成为新的主机。
但是在本实施例中,在该主机发生故障后,该备机不再继续回放尚未回放完的所有重做日志,而是仅回放位于该GBP恢复点对应的重做日志与该磁盘结束点对应的重做日志之间的所有重做日志,对于位于该磁盘恢复点对应的重做日志和该GBP恢复点对应的重做日志之间的所有重做日志不再回放。简单来说,在本实施例中,在该主机发生故障后,对于尚未回放完的所有重做日志,该备机只回放很小的一部分。因此,本实施例提供给的技术方案能够提高该数据库系统故障恢复的效率。
在本实施例中之所以只需要回放一小部分而非全部,原因是主机将被修改的页面通过第一数据传输协议(比如RDMA协议)传送给了GBP节点,基于第一数据传输协议,主机向GBP节点发送页面的速度非常快,在主机发生故障时,该主机中大部分的被修改页面已经被发送到该GBP节点,且被该GBP节点按照顺序写入到该GBP节点的缓存队列,所以备机不需要通过回放位于该磁盘恢复点对应的重做日志和该GBP恢复点对应的重做日志之间的重做日志,而只需要回放所述GBP节点中不存在的被修改页面对应的重做日志,以及在所述GBP节点的缓存队列中没有按照顺序依次被写入的被修改页面对应的重做日志。对于 在所述GBP节点的缓存队列中依次顺序排列的页面对应的重做日志,则不需要再回放,且这部分页面可以直接从所述GBP节点拉取到所述备机。
结合第一方面,在第一种可能的实现方式下,位于所述磁盘恢复点所对应的重做日志和所述GBP恢复点所对应的重做日志之间的所有重做日志没有被回放。
容易知道,在本实施例中,该备机跳过位于该磁盘恢复点所对应的重做日志和该GBP恢复点所对应的重做日志之间的所有重做日志,回放位于该GBP恢复点所对应的重做日志和该磁盘结束点所对应的重做日志之间的所有重做日志。换句话说,在本实施例中,该备机仅回放部分尚未回放完的日志而非全部,因此该数据库系统的故障修复效率会被提升。
结合第一方面或第一方面的第一种可能的实现方式,在第二种可能的实现方式下,所述GBP节点维护了所述GBP恢复点和所述GBP结束点,则在所述GBP节点在将所述多个页面写入到所述GBP节点的缓存队列之后,该故障修复方法还包括:所述GBP节点根据所述多个页面,更新所述GBP恢复点和所述GBP结束点。
相应的,所述备机确定GBP恢复点和GBP结束点,包括:所述备机从所述GBP节点中获取更新后的所述GBP恢复点和所述GBP结束点。
在本实施例中,所述GBP节点在将接收到的所述多个页面写入所述GBP节点的缓存队列之后,还会根据所述多个页面,更新所述GBP恢复点和所述GBP结束点。然后,该备机会从该GBP节点处获取更新后的所述GBP恢复点和所述GBP结束点。由于是GBP节点将所述多个页面写入所述GBP节点的缓存队列的,因此所述GBP节点维护所述GBP恢复点和所述GBP结束点可能保证所述GBP恢复点和所述GBP结束点能够被及时更新。
结合第一方面,在第一种可能的实现方式下,位于所述磁盘恢复点所对应的重做日志和所述GBP恢复点所对应的重做日志之间的所有重做日志没有被回放。
容易知道,在本实施例中,该备机跳过位于该磁盘恢复点所对应的重做日志和该GBP恢复点所对应的重做日志之间的所有重做日志,回放位于该GBP恢复点所对应的重做日志和该磁盘结束点所对应的重做日志之间的所有重做日志。换句话说,在本实施例中,该备机仅回放部分尚未回放完的日志而非全部,因此该数据库系统的故障修复效率会被提升。
结合第一方面或第一方面的第一种可能的实现方式,在第二种可能的实现方式下,所述GBP节点维护了所述GBP恢复点和所述GBP结束点,则在所述GBP节点在将所述多个页面写入到所述GBP节点的缓存队列之后,所述GBP节点根据所述多个页面,更新所述GBP恢复点和所述GBP结束点。
相应的,所述备机从所述GBP节点中获取更新后的所述GBP恢复点和所述GBP结束点。
在本实施例中,所述GBP节点在将接收到的所述多个页面写入所述GBP节点的缓存队列之后,还会根据所述多个页面,更新所述GBP恢复点和所述GBP结束点。然后,该备机会从该GBP节点处获取更新后的所述GBP恢复点和所述GBP结束点。由于是GBP节点将所述多个页面写入所述GBP节点的缓存队列的,因此所述GBP节点维护所述GBP恢复点和所述GBP结束点可以保证所述GBP恢复点和所述GBP结束点能够被及时更新。
结合第一方面或第一方面的第一种可能的实现方式,在第三种可能的实现方式下,所述GBP节点维护了所述GBP起始点,在所述GBP节点接收到所述GBP节点的页面缓冲区中不存在的新页面且所述GBP节点的页面缓冲区已满时,所述GBP节点将位于所述缓存队列头部的页面淘汰掉,将所述新页面写到所述缓存队列的尾部,并将所述GBP起始点更新为所述缓存队列的新的头部页面对应的LSN。
相应的,所述备机从所述GBP节点中获取更新后的所述GBP起始点。
需要解释的是,所谓“新页面”是指所述GBP节点当前接收到的页面。“所述GBP节点的页面缓冲区中不存在的新页面”是指当前接收的页面在所述GBP节点的页面缓冲区中不存在。例如,当前接收的是页面M,而所述GBP节点的页面缓冲区中不存在页面M。
在本实施例中,由于是所述GBP节点将所述多个页面写入所述GBP节点的缓存队列的,因此所述GBP节点维护所述GBP起始点可以保证所述GBP起始点能够被及时更新。
结合第一方面或第一方面的第一种可能的实现方式,在第四种可能的实现方式下,在所述GBP节点接收到所述GBP节点的页面缓冲区中不存在的新页面时,则所述GBP节点将所述新页面放入所述缓存队列的尾部。
在所述GBP节点接收到所述GBP节点的页面缓冲区中已经存在的新页面时,则所述GBP节点根据接收到的所述新页面对已经存在的对应页面进行更新,并将更新后的所述新页面放在所述缓存队列的尾部。
如上所述,所谓“新页面”是指所述GBP节点当前接收到的页面。例如,所述GBP节点当前接收到的页面是页面M,且该页面M包含的LSN为T,则页面M就是“新页面”。相应的,在所述GBP节点的页面缓冲区中不存在页面M时,则将该页面M放入缓存队列的尾部。反之,在所述GBP节点的页面缓冲区中已经存在页面M(该页面M位于缓存队列R中且包含的LSN为K)时,则使用当前接收的页面M对已经存在的页面M进行更新,并将更新后的页面M放入该缓存队列R的尾部。值得注意的是,K和T均为大于或等于0的整数,且T大于K。
需要说明的是,在该GBP节点的页面缓冲区中在接收页面M之前不存在页面M时,该页面M将被放入哪一个缓存队列,可以通过哈希算法确定,也可以通过其他方法确定。
根据本实施例可知,页面在GBP节点的缓存队列中是按照顺序放入的,因此,位于所述GBP恢复点对应的重做日志和所述GBP结束点对应的重做日志之间的所有重做日志是位于主机发送给备机的所有重做日志中最后的一段重做日志。可知,采用本实施例,可以保证该备机在执行完回放步骤之后可以与该主机同步。
结合第一方面或第一方面的第一种至第四种可能的实现方式中任一种可能的实现方式,在第五种可能的实现方式下,在所述磁盘恢复点大于或等于所述GBP起始点,且所述磁盘结束点大于或等于所述GBP结束点时,所述备机还启动后台线程,所述后台线程用于将所述GBP节点上存储的所有页面拉取到所述备机的页面缓冲区。
可选的,所述后台线程用于通过第二数据传输协议将所述GBP节点上存储的所有页面拉取的所述备机的页面缓冲区。其中,该第二数据传输协议也是低时延且高吞吐的数据传输协议,因此该后台线程可以快速地将所述GBP节点上存储的所有页面拉取的所述备机。
可选的,所述备机是在执行回放的过程中启动后台线程的。也即该后台线程从所述GBP节点拉取页面到所述备机的页面缓冲区和回放可以并行做,从而可以节省时间,提高故障修复的效率。
需要说明的是,在备机在将所述GBP节点上存储的所有页面拉取到所述备机的页面缓冲区后,所述备机还会将拉取到所述备机的页面缓冲区的页面与自己维护的页面进行比较,保留新的页面并丢弃旧的页面。
结合第一方面或第一方面的第一种至第五种可能的实现方式中任一种可能的实现方式,在第六种可能的实现方式下,在所述备机执行完回放步骤之后,以及在所述备机上的 应用需要访问的页面还位于所述GBP节点的页面缓冲区时,则所述应用从所述GBP节点的页面缓冲区中读取所述需要访问的页面。
可选的,所述应用通过第二数据传输协议从所述GBP节点的页面缓冲区中读取所述需要访问的页面。
结合第一方面或第一方面的第一种至第六种可能的实现方式中任一种可能的实现方式,在第七种可能的实现方式下,在所述主机发生故障之后,以及在所述备机执行所述回放步骤之前,所述备机还从本地获取所述磁盘恢复点和所述磁盘结束点。所述备机获取所述磁盘恢复点和所述磁盘结束点的目的是为了判断是否可以使用前述实施例所述的故障修复方法。
结合第一方面或第一方面的第一种至第七种可能的实现方式中任一种可能的实现方式,在第八种可能的实现方式下,在所述主机正常工作期间,所述主机还将重做日志发送给所述备机。所述备机回放接收到的所述重做日志,得到对应的页面。
进一步地,所述备机还将得到的所述页面成批地刷到本地磁盘。
结合第一方面或第一方面的第一种至第八种可能的实现方式中任一种可能的实现方式,在第九种可能的实现方式下,所述主机启动页面发送线程,所述页面发送线程使用第一数据传输协议按照从发送队列的头部到尾部的顺序,将位于所述发送队列的多个页面成批发送到所述GBP节点。所述发送队列位于所述主机内,且从所述发送队列的头部到尾部,位于所述发送队列的多个页面所对应的LSN是递增的。
由于位于发送队列的多个页面各自对应的LSN递增,所以页面发送线程按照从发送队列的头部到尾部的顺序将所述多个页面发送给所述GBP节点,是为了保证所述GBP节点在接收页面时也是按照顺序接收的,具体的,先接收的页面的LSN小于后接收到的页面的LSN。在所述GBP节点将接收到的多个页面写入到所述GBP节点的缓存队列时,所述GBP节点可以按照接收页面的顺序将多个页面写入到所述GBP节点的缓存队列中,这样所述缓存队列内的多个页面各自的LSN按照从所述缓存队列的头部到尾部的顺序是的递增的。也即,采用本方案,可以比较简单地实现位于缓存队列内的多个页面按照从所述缓存队列的头部到尾部的顺序递增。
结合第一方面的第九种可能的实现方式,在第十种可能的实现方式下,所述主机启动多个页面发送线程,所述多个页面发送线程与所述主机包括的多个发送队列是一对一的。本实施例的好处在于由于页面发送线程和发送队列是一对一的,因此操作起来比较简单且不容易出错。
结合第一方面的第九种或第十种可能的实现方式,在第十一种可能的实现方式下,所述GBP节点启动页面接收线程,所述页面接收线程成批地接收所述多个页面,并将所述多个页面写入所述GBP节点的缓存队列。
结合第一方面的第十一种可能的实现方式,在第十二种可能的实现方式下,所述GBP节点启动多个页面接收线程,所述多个页面接收线程与所述GBP节点包括的多个缓存队列是一对一的,且所述主机启动多个页面发送线程与所述GBP节点启动多个页面接收线程是一对一的。本实施例的好处在于由于页面接收线程和缓存队列是一对一的,并且页面发送线程和页面接收线程也是一对一的,因此操作起来比较简单且不容易出错。
第二方面,本申请提供了一种数据库系统。该数据库系统包括主机、备机和GBP节点。其中,主机与GBP节点之间通过第一数据传输协议通信连接。
所述主机用于将多个页面发送给所述GBP节点。所述GBP节点用于将所述多个页面写入到所述GBP节点的缓存队列。所述多个页面对应的LSN按照从所述缓存队列的头部到尾部的顺序递增。
在所述主机发生故障时,所述备机用于确定GBP起始点、GBP恢复点和GBP结束点。
值得注意的是,所述GBP起始点指示所述GBP节点上存储的所有页面中所包括的最小的日志序列号LSN。所述GBP恢复点指示所述GBP节点最近一次接收到的一批页面中所包括的最小的LSN。所述GBP结束点指示所述GBP节点最近一次接收到的所述一批页面中所包括的最大的LSN。
在磁盘恢复点大于或等于所述GBP起始点,且磁盘结束点大于或等于所述GBP结束点的情况下,所述备机还用于回放位于所述GBP恢复点所对应的重做日志和所述磁盘结束点所对应的重做日志之间的所有重做日志,以被提升为新的主机。
其中,所述磁盘恢复点指示所述备机的磁盘中最近一批被写入的多个页面所包含的最小的LSN。所处磁盘结束点指示所述备机所接收的最后一条重做日志的LSN。
结合前述实施例可知,在数据库系统的主机发生故障,以及在条件(所谓的条件是指所述磁盘恢复点大于或等于所述GBP起始点,且所述磁盘结束点大于或等于所述GBP结束点)满足时,备机通过回放位于所述GBP恢复点所对应的重做日志和所述磁盘结束点所对应的重做日志之间的一小部分重做日志,就可以被提升为新的主机。由上可知,在本实施例提供的数据库系统中,从主机发生故障,到新的主机产生,只需要很少的时间,所以利用该数据库系统可以提高业务的连续性。
结合第二方面,在第一种可能的实现方式下,位于所述磁盘恢复点所对应的重做日志和所述GBP恢复点所对应的重做日志之间的所有重做日志没有被回放。
本实施例是为了进一步明确,对于剩余的没有被回放的所有重做日志,所述备机只回放了一部分,而不再回放另一部分,因此所述备机和所述主机之间进行故障切换(failover)需要的时间比较少,进而该数据库系统进行故障修复需要的时间也比较少,或者说,故障修复的效率比较高。
结合第二方面或第二方面的第一种可能的实现方式,在第二种可能的实现方式下,在将所述多个页面写入到所述GBP节点的缓存队列之后,所述GBP节点还用于根据所述多个页面,更新所述GBP恢复点和所述GBP结束点。相应的,所述备机还用于从所述GBP节点中获取更新后的所述GBP恢复点和所述GBP结束点。由于是GBP节点将所述多个页面写入所述GBP节点的缓存队列的,因此所述GBP节点维护所述GBP恢复点和所述GBP结束点可以保证所述GBP恢复点和所述GBP结束点能够被及时更新。
结合第二方面或第二方面的第一种可能的实现方式,在第三种可能的实现方式下,在所述GBP节点接收到所述GBP节点的页面缓冲区中不存在的新页面且所述GBP节点的页面缓冲区已满时,所述GBP节点还用于将位于所述缓存队列头部的页面淘汰掉,并将所述GBP起始点更新为所述缓存队列的新的头部页面对应的LSN。相应的,所述备机还用于从所述GBP节点中获取更新后的所述GBP起始点。由于是所述GBP节点将所述多个页面写入所述GBP节点的缓存队列的,因此所述GBP节点维护所述GBP起始点可以保证所述GBP起始点能够被及时更新。
结合第二方面或第二方面的第一种可能的实现方式,在第四种可能的实现方式下,在所述GBP节点接收到所述GBP节点的页面缓冲区中不存在的新页面时,所述GBP节点还用 于将所述新页面放入所述缓存队列的尾部。或者,在所述GBP节点接收到所述GBP节点的页面缓冲区中已经存在的新页面时,所述GBP节点还用于根据接收到的所述新页面对已经存在的对应页面进行更新,并将更新后的所述新页面放在所述缓存队列的尾部。
由上可知,页面在GBP节点的缓存队列中是按照顺序放入的,因此,位于所述GBP恢复点对应的重做日志和所述GBP结束点对应的重做日志之间的所有重做日志是位于主机发送给备机的所有重做日志中最后的一段重做日志。所以采用本实施例,可以保证该备机在执行完回放步骤之后可以与该主机同步。
结合第二方面或第二方面的第一至第四种可能的实现方式中任一种实现方式,在第五种可能的实现方式下,在所述磁盘恢复点大于或等于所述GBP起始点,且所述磁盘结束点大于或等于所述GBP结束点时,所述备机还用于启动后台线程,所述后台线程用于将所述GBP节点上存储的所有页面拉取到的所述备机的页面缓冲区。
可选的,所述后台线程用于通过第二数据传输协议将所述GBP节点上存储的所有页面拉取到的所述备机的页面缓冲区。
由于备机只回放了部分重做日志,对于没有被回放的重做日志来说,之所以不需要回放,是因为它们对应的页面被存在GBP节点中。在本实施例中,通过从所述GBP节点上拉取所述GBP节点中的全部页面,可以保证将没被回放的重做日志对应的页面也拉取到所述备机,进而保证该备机可以实现与发生故障的主机完全同步。
结合第二方面或第二方面的第一至第五种可能的实现方式中任一种实现方式,在第六种可能的实现方式下,在所述主机发生故障之后,以及在所述备机回放所述重做日志之前,所述备机还用于获取(或从本地获取)所述磁盘恢复点和所述磁盘结束点。获取所述磁盘恢复点和所述磁盘结束点是为了判断执行回放的条件是否满足,只有在条件满足时,所述备机才能执行回放,或者说,本申请提供的数据库系统的故障修复效率才能被提高。
结合第二方面或第二方面的第一至第六种可能的实现方式中任一种实现方式,在第七种可能的实现方式下,在所述主机正常工作期间,所述主机还用于将重做日志发送给所述备机。相应的,所述备机还用于回放所述重做日志,得到对应的页面,并将所述页面成批地刷到本地磁盘。
结合第二方面或第二方面的第一至第七种可能的实现方式中任一种实现方式,在第八种可能的实现方式下,所述主机用于启动页面发送线程,所述页面发送线程使用第一数据传输协议按照从发送队列的头部到尾部的顺序,将位于所述发送队列的多个页面成批地发送给所述GBP节点,所述发送队列位于所述主机内,且从所述发送队列的头部到尾部,位于所述发送队列的多个页面所对应的LSN是递增的。
由于位于发送队列的多个页面各自对应的LSN递增,所以页面发送线程按照从发送队列的头部到尾部的顺序将所述多个页面发送给所述GBP节点,是为了保证所述GBP节点在接收页面时也是按照顺序接收的,具体的,先接收的页面的LSN小于后接收到的页面的LSN。在所述GBP节点将接收到的多个页面写入到所述GBP节点的缓存队列时,所述GBP节点可以按照接收页面的顺序将多个页面写入到所述GBP节点的缓存队列中,这样所述缓存队列内的多个页面各自的LSN按照从所述缓存队列的头部到尾部的顺序是的递增的。也即,采用本方案,可以比较简单地实现位于缓存队列内的多个页面按照从所述缓存队列的头部到尾部的顺序递增。
结合第二方面的第八种可能的实现方式,在第九种可能的实现方式下,所述主机用于启动多个页面发送线程,所述多个页面发送线程与所述主机包括的多个发送队列是一对一的。本实施例的好处在于由于页面发送线程和发送队列是一对一的,因此操作起来比较简单且不容易出错。
结合第二方面或第二方面的第一至第九种可能的实现方式中任一种实现方式,在第十种可能的实现方式下,所述GBP节点用于启动页面接收线程,所述页面接收线程成批地接收所述多个页面,并将所述多个页面写入所述GBP节点的缓存队列。
结合第二方面的第十种可能的实现方式,在第十一种可能的实现方式下,所述GBP节点用于启动多个页面接收线程,所述多个页面接收线程与所述GBP节点包括的多个缓存队列是一对一的,且所述主机启动多个页面发送线程与所述GBP节点启动多个页面接收线程是一对一的。本实施例的好处在于由于页面接收线程和缓存队列是一对一的,并且页面发送线程和页面接收线程也是一对一的,因此操作起来比较简单且不容易出错。
第三方面,本申请提供另一种数据库系统的故障修复方法。具体的,该方法包括如下步骤。在主机发生故障时,确定全局页面缓冲池GBP起始点、GBP恢复点和GBP结束点。在磁盘恢复点大于或等于所述GBP起始点,且磁盘结束点大于或等于所述GBP结束点时,回放位于所述GBP恢复点所对应的重做日志和所述磁盘结束点所对应的重做日志之间的所有重做日志,以提升为新的主机。
值得注意的是,所述GBP起始点指示GBP节点上存储的所有页面中所包括的最小的日志序列号LSN。所述GBP恢复点指示所述GBP节点最近一次接收到的一批页面中所包括的最小的LSN。所述GBP结束点指示所述GBP节点最近一次接收到的所述一批页面中所包括的最大的LSN。
需要说明的是,所述GBP节点上存储的所有页面均是所述主机在正常工作期间,通过第一数据传输协议发送给所述GBP节点,并由所述GBP节点写入到所述GBP节点的缓存队列的,所述多个页面对应的LSN按照从所述缓存队列的头部到尾部的顺序递增。
其中,所述磁盘恢复点指示所述备机的磁盘中最近一批被写入的多个页面所包含的最小的LSN。所述磁盘结束点指示所述备机所接收的最后一条重做日志的LSN。
结合第三方面,在第一种可能的实现方式下,位于所述磁盘恢复点所对应的重做日志以及所述GBP恢复点所对应的重做日志之间的所有重做日志没有被回放。
结合第三方面或第三方面的第一种可能的实现方式,在第二种可能的实现方式下,在所述磁盘恢复点大于或等于所述GBP起始点,且所述磁盘结束点大于或等于所述GBP结束点时,所述方法还包括:启动后台线程,所述后台线程用于将所述GBP节点上存储的所有页面拉取到页面缓冲区。
可选的,所述后台线程用于通过第二数据传输协议将所述GBP节点上存储的所有页面拉取到页面缓冲区。
结合第三方面、第三方面的第一种可能的实现方式或第三方面的第二种可能的实现方式,在第三种可能的实现方式下,在执行完所述回放步骤之后,以及在需要被访问的页面还位于所述GBP节点的页面缓冲区时,所述方法还包括:从所述GBP节点的页面缓冲区中读取所述需要被访问的页面。
结合第三方面或第三方面的第一种至第三种可能的实现方式中任一种实现方式,在第四种可能的实现方式下,在所述主机发生故障之后以及在执行所述回放步骤之前,获取所述磁盘恢复点和所述磁盘结束点。
结合第三方面或第三方面的第一种至第四种可能的实现方式中任一种实现方式,在第五种可能的实现方式下,在所述主机正常工作期间,接收所述主机发送的重做日志,回放所述重做日志得到对应的页面,并将得到的页面成批地刷到本地磁盘。
值得注意的是,第三方面所述的故障修复方法的执行主体是第一方面所述的故障修复方法中的备机。第三方面的每一实施例均是从备机的角度描述的。由于第三方面所述的故障修复方法与第一方面所述的故障修复方法有很多相同或相似之处,因此关于第三方面的每一实施例的有益效果,均请参见第一方面的对应实施例所具有的有益效果,为了更本申请更简洁,针对第三方面的每一实施例均不再重复描述其有益效果。
第四方面,本申请还提供了一种计算设备,该计算设备包括确定单元和回放单元。在主机发生故障时,所述确定单元用于确定全局页面缓冲池GBP起始点、GBP恢复点和GBP结束点。在磁盘恢复点大于或等于所述GBP起始点,且磁盘结束点大于或等于所述GBP结束点时,所述回放单元用于回放位于所述GBP恢复点所对应的重做日志和所述磁盘结束点所对应的重做日志之间的所有重做日志。
需要解释的是,所述GBP起始点指示GBP节点上存储的所有页面中所包括的最小的日志序列号LSN。所述GBP恢复点指示所述GBP节点最近一次接收到的一批页面中所包括的最小的LSN。所述GBP结束点指示所述GBP节点最近一次接收到的所述一批页面中所包括的最大的LSN。所述磁盘恢复点指示所述备机的磁盘中最近一批被写入的多个页面所包含的最小的LSN。所述磁盘结束点指示所述备机所接收的最后一条重做日志的LSN。
值得注意的是,所述GBP节点上存储的所有页面均是所述主机在正常工作期间,通过第一数据传输协议发送给所述GBP节点,并由所述GBP节点写入到所述GBP节点的缓存队列的,所述多个页面对应的LSN按照从所述缓存队列的头部到尾部的顺序递增。
结合第四方面,在第一种可能的实现方式下,位于所述磁盘恢复点所对应的重做日志以及所述GBP恢复点所对应的重做日志之间的所有重做日志没有被回放。
结合第四方面或第四方面的第一种可能的实现方式,在第二种可能的实现方式下,该计算设备还包括启动单元。在所述磁盘恢复点大于或等于所述GBP起始点,且所述磁盘结束点大于或等于所述GBP结束点时,所述启动单元用于启动后台线程,所述后台线程用于将所述GBP节点上存储的所有页面拉取到页面缓冲区。
可选的,所述后台线程用于通过第二数据传输协议将所述GBP节点上存储的所有页面拉取到页面缓冲区。
结合第四方面、第四方面的第一种可能的实现方式或第四方面的第二种可能的实现方式,在第三种可能的实现方式下,该计算设备还包括读取单元。在所述回放单元回放完重做日志之后,以及在需要被访问的页面还位于所述GBP节点的页面缓冲区时,所述读取单元用于从所述GBP节点的页面缓冲区中读取所述需要被访问的页面。
结合第四方面或第四方面的第一种至第三种可能的实现方式中任一种实现方式,在第四种可能的实现方式下,在所述主机发生故障之后以及在所述回放单元执行回放步骤之前,所述确定单元还用于获取所述磁盘恢复点和所述磁盘结束点。
结合第四方面或第四方面的第一种至第四种可能的实现方式中任一种实现方式,在第五种可能的实现方式下,该计算设备还包括接收单元。在所述主机正常工作期间,所述接收单元用于接收所述主机发送的重做日志。对应的,所述回放单元用于回放所述重做日志得到对应的页面,并将得到的所述页面成批地刷到本地磁盘。
值得注意的是,第四方面所述的计算设备可以执行第三方面的每一实施例,且第四方面所述的计算设备可以实现第二方面所述的数据库系统中备机的功能。因此,第四方面的每一实施例的有益效果,请参见第二方面的对应实施例所具有的有益效果,针对第四方面的每一实施例的有益效果,本申请均不再重复描述。
第五方面,本申请提供了另一种计算设备,该计算设备至少包括处理器和存储器。所述存储器用于存储磁盘恢复点和磁盘结束点。在主机发生故障时,所述处理器用于确定GBP起始点、GBP恢复点和GBP结束点。在磁盘恢复点大于或等于所述GBP起始点,且磁盘结束点大于或等于所述GBP结束点时,所述处理器还用于回放位于所述GBP恢复点所对应的重做日志和所述磁盘结束点所对应的重做日志之间的所有重做日志。
值得注意的是,所述GBP起始点指示GBP节点上存储的所有页面中所包括的最小的LSN。所述GBP恢复点指示所述GBP节点最近一次接收到的一批页面中所包括的最小的LSN。所述GBP结束点指示所述GBP节点最近一次接收到的所述一批页面中所包括的最大的LSN。所述磁盘恢复点指示所述备机的磁盘中最近一批被写入的多个页面所包含的最小的LSN。所述磁盘结束点指示所述备机的内存上的最后一条重做日志的LSN。
需要说明的是,所述GBP节点上存储的所有页面均是所述主机在正常工作期间,通过第一数据传输协议发送给所述GBP节点,并由所述GBP节点写入到所述GBP节点的缓存队列的,所述多个页面对应的LSN按照从所述缓存队列的头部到尾部的顺序递增。
结合第五方面,在第一种可能的实现方式下,位于所述磁盘恢复点所对应的重做日志以及所述GBP恢复点所对应的重做日志之间的所有重做日志没有被回放。
结合第五方面或第五方面的第一种可能的实现方式,在第二种可能的实现方式下,在所述磁盘恢复点大于或等于所述GBP起始点,且所述磁盘结束点大于或等于所述GBP结束点时,所述处理器还用于启动后台线程,所述后台线程用于将所述GBP节点上存储的所有页面拉取到页面缓冲区。
可选的,所述后台线程用于通过第二数据传输协议将所述GBP节点上存储的所有页面拉取到页面缓冲区。
结合第五方面、第五方面的第一种可能的实现方式或第五方面的第二种可能的实现方式,在第三种可能的实现方式下,在所述回放单元回放完重做日志之后,以及在需要被访问的页面还位于所述GBP节点的页面缓冲区时,所述处理器用于从所述GBP节点的页面缓冲区中读取所述需要被访问的页面。
结合第五方面或第五方面的第一种至第三种可能的实现方式中任一种实现方式,在第四种可能的实现方式下,在所述主机发生故障之后以及在所述回放单元执行回放步骤之前,所述处理器还用于获取所述磁盘恢复点和所述磁盘结束点。
结合第五方面或第五方面的第一种至第四种可能的实现方式中任一种实现方式,在第五种可能的实现方式下,所述计算设备还包括I/O接口。在所述主机正常工作期间,所述I/O接口用于接收所述主机发送的重做日志。对应的,所述处理器用于回放所述重做日志得到对应的页面,并将得到的所述页面成批地刷到本地磁盘。
应当知道的是,第五方面中每一实施例提供的计算设备可以执行第三方面的对应实施例所述的方法,且第五方面所述的计算设备与第四方面所述的计算设备可以实现相同的功能,也就是说,第五方面所述的计算设备也可以实现第二方面所述的数据库系统中备机的功能。因此,第五方面的每一实施例的有益效果,请参见第二方面的对应实施例所具有的有益效果,此处不再重复描述。
第六方面,本申请还提供了一种数据备份方法。该方法包括:在将重做日志传送给备机期间,使用远程直接内存访问RDMA协议将页面发送给GBP节点,以便在发生故障时,利用所述GBP节点中的页面进行故障修复。
在本实施例中,在将重做日志传送给备机期间,还使用RDMA协议将被修改页面发给GBP节点,用于在所述GBP节点上做备份。由于使用RDMA协议可以使得大部分被发送给备机的重做日志对应的被修改页面被发送给了GBP节点,因此在本机发生故障时,所述备机尚未回放的剩余重做日志包括两部分,第一部分重做日志是指位于所述磁盘恢复点所对应的重做日志和所述GBP恢复点所对应的重做日志之间的全部重做日志,第二部分重做日志是指位于所述GBP恢复点所对应的重做日志和所述磁盘结束点所对应的重做日志之间的全部重做日志。所述备机只需要回放该第二部分重做日志以得到对应的页面就可以实现故障修复了,因为第一部分重做日志对应的页面可以直接从所述GBP节点上拉取。可知,采用本实施例提供的数据备份方法能够提高故障修复效率。
第七方面,本申请提供了一种计算设备,用于执行第六方面所述的数据备份方法。其中,该计算设备包括第一传输接口和第二传输接口。第一传输接口,用于将重做日志传送给备机。在所述第一传输接口将所述重做日志传送给所述备机期间,第二传输接口,用于基于远程直接内存访问RDMA协议将页面发送给GBP节点,以便在发生故障时,利用所述GBP节点中的页面进行故障修复。应当知道的是,将本实施例提供的计算设备用于数据库系统中,可以提高该数据库系统进行故障修复的效率。
第八方面,本申请提供了另一种故障修复方法。该方法包括如下步骤。
在主机正常工作时,所述主机使用第一数据传输协议将多个页面发送给全GBP节点。
所述GBP节点将所述多个页面写入所述GBP节点的缓存队列中。所述多个页面对应的日志序列号LSN按照从所述缓存队列的头部到尾部的顺序递增。
在所述主机发生故障时,所述主机确定GBP起始点、GBP恢复点和GBP结束点。
在磁盘恢复点大于或等于所述GBP起始点,且磁盘结束点大于或等于所述GBP结束点时,所述主机回放位于所述GBP恢复点对应的重做日志和所述磁盘结束点对应的重做日志之间的所有重做日志,以被重新拉起。
值得注意的是,所述GBP起始点指示所述GBP节点上存储的所有页面中所包括的最小的LSN。所述GBP恢复点指示所述GBP节点最近一次接收的一批页面中所包括的最小的LSN。所述GBP结束点指示所述GBP节点最近一次接收到的所述一批页面中所包括的最大的LSN。所述磁盘恢复点指示本地磁盘最近一批被写入的多个页面中所包括的最小的LSN。所述磁盘结束点指示所接收到的最后一条重做日志的LSN。
需要解释的是,第八方面对应的实施例不同于第一方面对应的实施例。在第一方面对应的实施例中,主机发生故障后,备机回放部分重做日志后被提升为新的主机,进而实现故障修复,这种故障修复实际是一种故障转移,因为经过故障修复后,原来的备机替代原来的主机执行原来的主机的功能。但是,在第八方面对应的实施例中,主机发生故障后, 该主机通过回放部分重做日志后被重新拉起,也即在该主机被故障修复后,该主机会继续执行以前的功能。简单来说,在第一方面对应的实施例中,在主机发生故障后,该主机和备机之间会进行切换,切换后的备机称为新的主机;但是在第八方面对应的实施例中,在主机发生故障后,该主机将会被重新拉起。
由上可知,在主机发生故障后,该主机仅回放位于该GBP恢复点对应的重做日志与该磁盘结束点对应的重做日志之间的全部重做日志,就能重新被拉取。对于位于该磁盘恢复点对应的重做日志和该GBP恢复点对应的重做日志之间的所有重做日志不再回放。简单来说,在本实施例中,在该主机发生故障后,该主机只回放很小的一部分。因此,本实施例提供给的技术方案能够提高该数据库系统故障恢复的效率。
应当知道的是,所述主机存储重做日志。已知的是,每一个(增加、删除或修改)事务均将对应一个重做日志,在本实施例中,主机会将这些重做日志发送给备机,并将与重做日志对应的被修改页面发送给GBP节点。尤其需要注意的是,所述主机还会在本地备份这些重做日志,譬如一方面将这些重做日志发送给备机,一方面将它们缓存在所述主机的页面缓冲区,或者将它们刷到本地磁盘,以便在该主机发送故障时,回放这些重做日志中的部分重做日志,以实现该主机被重新拉取。
值得解释的是,在实施例中,所述主机的故障为软件故障。
结合第八方面,在第一种可能的实现方式下,位于所述磁盘恢复点对应的重做日志和所述GBP恢复点对应的重做日志之间的所有重做日志没有被回放。
容易知道,在本实施例中,该主机跳过位于该磁盘恢复点所对应的重做日志和该GBP恢复点所对应的重做日志之间的所有重做日志,回放位于该GBP恢复点所对应的重做日志和该磁盘结束点所对应的重做日志之间的所有重做日志。换句话说,在本实施例中,该主机仅回放部分尚未回放完的日志而非全部,因此该数据库系统的故障修复效率会被提升。
结合第八方面或第八方面的第一种可能的实现方式,在第二种可能的实现方式下,在所述磁盘恢复点大于或等于所述GBP起始点,且所述磁盘结束点大于或等于所述GBP结束点时,所述故障修复方法还包括:所述主机启动后台线程,所述后台线程用于将位于所述GBP节点上的所有页面拉取到页面缓冲区。
应当知道的是,所述后台线程用于通过所述第一数据传输协议将位于所述GBP节点上的所有页面拉取到页面缓冲区。
可选的,该后台线程从所述GBP节点拉取页面到所述主机的页面缓冲区和所述回放步骤的执行可以并行做,从而可以节省时间,提高故障修复的效率。
需要说明的是,在主机在将所述GBP节点上存储的所有页面拉取到所述主机的页面缓冲区后,所述主机还会将拉取到所述主机的页面缓冲区的页面与自己维护的页面进行比较,保留新的页面并丢弃旧的页面。
结合第八方面、第八方面的第一种可能的实现方式或第八方面的第二种可能的实现方式,在第三种可能的实现方式下,在所述主机执行完回放步骤之后,以及在需要被访问的页面还位于所述GBP节点的页面缓冲区时,所述故障修复方法还包括:所述主机从所述GBP节点的页面缓冲区中读取所述需要被访问的页面。
结合第八方面或第八方面的第一种至第三种可能的实现方式中任一种实现方式,在第四种可能的实现方式下,在所述主机发生故障之后,以及在执行所述回放步骤之前,所述故障修复方法还包括:所述主机从本地获取所述磁盘恢复点和所述磁盘结束点。
结合第八方面或第八方面的第一种至第四种可能的实现方式中任一种实现方式,在第五种可能的实现方式下,所述主机将多个页面发送给所述GBP节点,具体包括:所述主机启动页面发送线程,所述页面发送线程使用第一数据传输协议按照从发送队列的头部到尾部的顺序,将位于所述发送队列的多个页面成批地发送给所述GBP节点,从所述发送队列的头部到尾部,位于所述发送队列的多个页面所对应的LSN是递增的。
第九方面,一种数据库系统,该数据库系统包括主机和GBP节点。其中,所述主机用于通过第一数据传输协议向所述GBP节点发送多个页面。所述GBP节点用于将所述多个页面写入所述GBP节点的缓存队列。所述多个页面各自包含的日志序列号LSN按照从所述缓存队列的头部到尾部的顺序递增
在所述主机发生故障时,所述主机还用于确定GBP起始点、GBP恢复点和GBP结束点。在磁盘恢复点大于或等于所述GBP起始点,且磁盘结束点大于或等于所述GBP结束点时,所述主机还用于回放位于所述GBP恢复点对应的重做日志和所述磁盘结束点对应的重做日志之间的所有重做日志。
所述GBP起始点指示所述GBP节点上存储的所有页面中所包括的最小的LSN。所述GBP恢复点指示所述GBP节点最近一次接收的一批页面中所包括的最小的LSN。所述GBP结束点指示所述GBP节点最近一次接收到的所述一批页面中所包括的最大的LSN。所述磁盘恢复点指示本地磁盘最近一批被写入的多个页面中所包括的最小的LSN,所述磁盘结束点指示所接收到的最后一条重做日志的LSN。
结合第九方面,在第一种可能的实现方式下,位于所述磁盘恢复点对应的重做日志和所述GBP恢复点对应的重做日志之间的所有重做日志没有被回放。
结合第九方面或第九方面的第一种可能的实现方式,在第二种可能的实现方式下,在所述磁盘恢复点大于或等于所述GBP起始点,且所述磁盘结束点大于或等于所述GBP结束点时,所述主机还用于启动后台线程,所述后台线程用于将位于所述GBP节点上的所有页面拉取到页面缓冲区。
结合第九方面、第九方面的第一种可能的实现方式或第九方面的第二种可能的实现方式,在第三种可能的实现方式下,在所述主机执行完回放步骤之后,以及在需要被访问的页面还位于所述GBP节点的页面缓冲区时,所述主机还用于从所述GBP节点的页面缓冲区中读取所述需要被访问的页面。
应当知道的是,第九方面中每一实施例提供的数据库系统可以执行第八方面的对应实施例所述的故障修复方法。因此,第九方面的每一实施例的有益效果,请参见第八方面的对应实施例所具有的有益效果,此处不再重复描述。
第十方面,本申请还提供了另一种数据库系统的故障修复方法。其中,该故障修复方法包括如下步骤。
在正常工作时,通过第一数据传输协议将多个页面发送给GBP节点。
在发生故障时,确定GBP起始点、GBP恢复点和GBP结束点。
在磁盘恢复点大于或等于所述GBP起始点,且磁盘结束点大于或等于所述GBP结束点时,回放位于所述GBP恢复点对应的重做日志和所述磁盘结束点对应的重做日志之间的所有重做日志。
值得注意的是,所述多个页面被写入所述GBP节点的缓存队列,所述多个页面对应的日志序列号LSN按照从所述缓存队列的头部到尾部的顺序递增。
需要解释的是,所述GBP起始点指示所述GBP节点上存储的所有页面中所包括的最小的LSN。所述GBP恢复点指示所述GBP节点最近一次接收的一批页面中所包括的最小的LSN。所述GBP结束点指示所述GBP节点最近一次接收到的所述一批页面中所包括的最大的LSN。所述磁盘恢复点指示本地磁盘最近一批被写入的多个页面中所包括的最小的LSN。所述磁盘结束点指示所接收到的最后一条重做日志的LSN。
结合第十方面,在第一种可能的实现方式下,位于所述磁盘恢复点对应的重做日志和所述GBP恢复点对应的重做日志之间的所有重做日志没有被回放。
容易知道,在本实施例中,该主机跳过位于该磁盘恢复点所对应的重做日志和该GBP恢复点所对应的重做日志之间的所有重做日志,回放位于该GBP恢复点所对应的重做日志和该磁盘结束点所对应的重做日志之间的所有重做日志。换句话说,在本实施例中,该主机仅回放部分尚未回放完的日志而非全部,因此该数据库系统的故障修复效率会被提升。
结合第十方面或第十方面的第一种可能的实现方式,在第十种可能的实现方式下,在所述磁盘恢复点大于或等于所述GBP起始点,且所述磁盘结束点大于或等于所述GBP结束点时,所述故障修复方法还包括:启动后台线程,所述后台线程用于将位于所述GBP节点上的所有页面拉取到页面缓冲区。
可选的,所述后台线程用于通过所述第一数据传输协议将位于所述GBP节点上的所有页面拉取到页面缓冲区。
可选的,该后台线程从所述GBP节点拉取页面到所述主机的页面缓冲区和所述回放步骤的执行可以并行做,从而可以节省时间,提高故障修复的效率。
需要说明的是,在主机在将所述GBP节点上存储的所有页面拉取到所述主机的页面缓冲区后,所述主机还会将拉取到所述主机的页面缓冲区的页面与自己维护的页面进行比较,保留新的页面并丢弃旧的页面。
结合第十方面、第十方面的第一种可能的实现方式或第十方面的第二种可能的实现方式,在第三种可能的实现方式下,在执行完回放步骤之后,以及在需要被访问的页面还位于所述GBP节点的页面缓冲区时,所述故障修复方法还包括:从所述GBP节点的页面缓冲区中读取所述需要被访问的页面。
结合第十方面或第十方面的第一种至第三种可能的实现方式中任一种实现方式,在第四种可能的实现方式下,在所述主机发生故障之后,以及在执行所述回放步骤之前,所述故障修复方法还包括:从本地获取所述磁盘恢复点和所述磁盘结束点。
结合第十方面或第十方面的第一种至第四种可能的实现方式中任一种实现方式,在第五种可能的实现方式下,所述主机将多个页面发送给所述GBP节点,具体包括:所述主机启动页面发送线程,所述页面发送线程使用第一数据传输协议按照从发送队列的头部到尾部的顺序,将位于所述发送队列的多个页面成批地发送给所述GBP节点,从所述发送队列的头部到尾部,位于所述发送队列的多个页面所对应的LSN是递增的。
值得注意的是,第十方面所述的故障修复方法的执行主体是第八方面所述的故障修复方法中的主机。第十方面的每一实施例均是从主机的角度描述的。由于第十方面所述的故障修复方法与第八方面所述的故障修复方法有很多相同或相似之处,因此关于第十方面的每一实施例的有益效果,均请参见第八方面的对应实施例所具有的有益效果,此处不再赘述。
第十一方面,本申请提供另一种计算设备。该计算设备至少包括传输单元、确定单元和回放单元。
所述传输单元,用于通过第一数据传输协议将多个页面发送给GBP节点。
在发生故障时,所述确定单元用于确定GBP起始点、GBP恢复点和GBP结束点。
在磁盘恢复点大于或等于所述GBP起始点,且磁盘结束点大于或等于所述GBP结束点时,所述回放单元用于回放位于所述GBP恢复点对应的重做日志和所述磁盘结束点对应的重做日志之间的所有重做日志。
值得注意的是,所述多个页面被写入所述GBP节点的缓存队列,所述多个页面对应的LSN按照从所述缓存队列的头部到尾部的顺序递增。
需要解释的是,所述GBP起始点指示所述GBP节点上存储的所有页面中所包括的最小的LSN。所述GBP恢复点指示所述GBP节点最近一次接收的一批页面中所包括的最小的LSN。所述GBP结束点指示所述GBP节点最近一次接收到的所述一批页面中所包括的最大的LSN。所述磁盘恢复点指示本地磁盘最近一批被写入的多个页面中所包括的最小的LSN。所述磁盘结束点指示所接收到的最后一条重做日志的LSN。
结合第十一方面,在第一种可能的实现方式下,位于所述磁盘恢复点对应的重做日志和所述GBP恢复点对应的重做日志之间的所有重做日志没有被回放。
结合第十一方面或第十一方面的第一种可能的实现方式,在第二种可能的实现方式下,该计算设备还包括启动单元。在所述磁盘恢复点大于或等于所述GBP起始点,且所述磁盘结束点大于或等于所述GBP结束点时,所述启动单元用于启动后台线程,所述后台线程用于将位于所述GBP节点上的所有页面拉取到页面缓冲区。
可选的,所述所述后台线程通过所述第一数据传输协议将位于所述GBP节点上的所有页面拉取到页面缓冲区。
结合第十一方面、第十一方面的第一种可能的实现方式或第十一方面的第二种可能的实现方式,在第三种可能的实现方式下,该计算设备还包括读取单元。在所述回放步骤执行完之后,以及在需要被访问的页面还位于所述GBP节点中时,所述读取单元用于从所述GBP节点中读取所述需要被访问的页面。
应当知道的是,第十一方面中每一实施例提供的计算设备可以执行第十方面的对应实施例所述的故障修复方法,且可以实现如第八方面所述的数据库系统中主机的功能。如前所述,第十方面的每一实施例的有益效果,均可以参见第八方面的对应实施例所具有的有益效果。因此,第十一方面的每一实施例的有益效果,也请参见第八方面的对应实施例所具有的有益效果。
第十二方面,本申请提供了另一种计算设备,该计算设备至少包括存储器和处理器。其中,所述存储器用于存储GBP起始点、GBP恢复点、GBP结束点、磁盘恢复点和磁盘结束点。
在正常工作时,所述处理器用于通过第一数据传输协议将多个页面发送给GBP节点。在发生故障时,所述处理器用于确定GBP起始点、GBP恢复点和GBP结束点。
在磁盘恢复点大于或等于所述GBP起始点,且磁盘结束点大于或等于所述GBP结束点时,所述处理器用于回放位于所述GBP恢复点对应的重做日志和所述磁盘结束点对应的重做日志之间的所有重做日志。
值得注意的是,所述多个页面被写入所述GBP节点的缓存队列,所述多个页面对应的LSN按照从所述缓存队列的头部到尾部的顺序递增。
需要解释的是,所述GBP起始点指示所述GBP节点上存储的所有页面中所包括的最小的LSN。所述GBP恢复点指示所述GBP节点最近一次接收的一批页面中所包括的最小的LSN。所述GBP结束点指示所述GBP节点最近一次接收到的所述一批页面中所包括的最大的LSN。所述磁盘恢复点指示本地磁盘最近一批被写入的多个页面中所包括的最小的LSN。所述磁盘结束点指示所接收到的最后一条重做日志的LSN。
结合第十二方面,在第一种可能的实现方式下,位于所述磁盘恢复点对应的重做日志和所述GBP恢复点对应的重做日志之间的所有重做日志没有被回放。
结合第十二方面或第十二方面的第一种可能的实现方式,在第二种可能的实现方式下,在所述磁盘恢复点大于或等于所述GBP起始点,且所述磁盘结束点大于或等于所述GBP结束点时,所述处理器还用于启动后台线程,所述后台线程用于将位于所述GBP节点上的所有页面拉取到页面缓冲区。
可选的,所述后台线程通过所述第一数据传输协议将位于所述GBP节点上的所有页面拉取到页面缓冲区。
结合第十二方面、第十二方面的第一种可能的实现方式或第十二方面的第二种可能的实现方式,在第三种可能的实现方式下,在所述回放步骤执行完之后,以及在需要被访问的页面还位于所述GBP节点中时,所述处理器还用于从所述GBP节点中读取所述需要被访问的页面。
应当知道的是,第十二方面中每一实施例提供的计算设备可以执行第十方面的对应实施例所述的故障修复方法,且可以实现如第八方面所述的数据库系统中主机的功能。如前所述,第十方面的每一实施例的有益效果,均可以参见第八方面的对应实施例所具有的有益效果。因此,第十二方面的每一实施例的有益效果,也请参见第八方面的对应实施例所具有的有益效果。
第十三方面,本申请提供了另一种数据备份方法。该数据备份方法包括如下步骤。
通过RDMA协议接收主机发送的多个页面。
将所述多个页面写入缓存队列中,所述多个页面对应的LSN按照从所述缓存队列的头部到尾部的顺序递增。
根据所述多个页面中每个页面包含的LSN,维护GBP起始点、GBP恢复点和GBP结束点,以便在所述主机发生故障的时候,能够根据所述GBP起始点、所述GBP恢复点和所述GBP结束点进行故障修复。
其中,所述GBP起始点指示内存中存储的所有页面中所包括的最小的LSN。所述GBP恢复点指示最近一次接收到的一批页面中所包括的最小的LSN。所述GBP结束点指示最近一次接收到的一批页面中所包括的最大的LSN。
需要说明的是,本实施例所述的数据备份方法的执行主体是第九方面所述的数据库系统中的GBP节点。在本实施例中,由于是通过RDMA协议接收主机发送的被修改页面的,因此可以认为主机几乎将全部的被修改页面发送给了GBP节点,这样在主机发生故障时,就不需要通过回放所有的剩余重做日志得到对应的页面了,因为大部分重做日志对应的页面在所述GBP节点中已经存在。因此,采用本实施例提供的数据备份方法能够提高故障修复效率。
结合第十三方面,在第一种可能的实现方式下,所述维护GBP起始点,则所述数据备份方法还包括:在接收到页面缓冲区内不存在的新页面且该页面缓冲区已满时,将位于所述缓存队列头部的页面淘汰掉,将所述新页面放入所述缓存队列的尾部,并将GBP起始点更新为新的位于所述缓存队列头部的页面对应的LSN。
在本实施例中,在将新页面写入所述GBP节点的缓存队列之后,更新所述GBP起始点,可以保证所述GBP起始点被及时更新。
结合第十三方面,在第二种可能的实现方式下,所述维护GBP恢复点和GBP结束点,则所述数据备份方法还包括:在接收到新页面后,根据所述新页面所对应的LSN,更新所述GBP恢复点和所述GBP结束点。
在本实施例中,在接收到新页面并将所述新页面写入所述GBP节点的缓存队列之后,更新所述GBP恢复点和所述GBP结束点,可以保证所述GBP恢复点和所述GBP结束点被及时更新。
结合第十三方面,在第三种可能的实现方式下,所述数据备份方法还包括:在接收到页面缓冲区内不存在的新的页面时,则将所述新的页面放入所述缓存队列的尾部。或,所述数据备份方法还包括:在接收到的页面缓冲区已经存在的新的页面时,则使用所述新的页面对已经存在的对应页面进行更新,并将更新后的页面放入所述缓存队列的尾部。
根据本实施例可知,页面在GBP节点的缓存队列中是按照顺序放入的,因此,位于所述GBP恢复点对应的重做日志和所述GBP结束点对应的重做日志之间的所有重做日志是位于主机发送给备机的所有重做日志中最后的一段重做日志。
结合第十三方面或第十三方面的第一种至第三种可能的实现方式中任一种实现方式,在第四种可能的实现方式下,所述数据备份方法还包括:接收多个重做日志,并通过回放所述多个重做日志,得到与所述多个重做日志中每一重做日志对应的页面。
可选的,在得到与所述多个重做日志中每一重做日志对应的页面之后,还将得到的页面成批地刷到本地磁盘。
可知,在本实施例中,该GBP节点还可以实现备机的功能。
第十四方面,本申请提供了另一种计算设备,该计算设备至少包括接收单元、写入单元和维护单元。
所述接收单元用于通过RDMA协议接收多个页面。
所述写入单元用于将所述多个页面写入缓存队列。所述多个页面对应的LSN按照从所述缓存队列的头部到尾部的顺序递增。
所述维护单元用于根据所述多个页面中每个页面包含的LSN,维护GBP起始点、GBP恢复点和GBP结束点,以便在所述主机发生故障的时候,能够根据所述GBP起始点、所述GBP恢复点和所述GBP结束点进行故障修复。
其中,所述GBP起始点指示内存中存储的所有页面中所包括的最小的LSN。所述GBP恢复点指示最近一次接收到的一批页面中所包括的最小的LSN。所述GBP结束点指示最近一次接收到的一批页面中所包括的最大的LSN。
结合第十四方面,在第一种可能的实现方式下,在接收到页面缓冲区内不存在的新的页面且该页面缓冲区已满时,所述维护单元还用于将位于所述缓存队列头部的页面淘汰掉,并将GBP起始点更新为新的位于所述缓存队列头部的页面对应的LSN。
结合第十四方面,在第二种可能的实现方式下,在接收到新的页面时,所述维护单元还用于根据所述新的页面对应的LSN更新所述GBP恢复点和所述GBP结束点。
结合第十四方面,在第三种可能的实现方式下,在接收到页面缓冲区内不存在的新的页面时,所述写入单元还用于将所述新的页面放入所述缓存队列的尾部。在接收到的页面缓冲区已经存在的新的页面时,所述写入单元还用于使用所述新的页面对已经存在的对应页面进行更新,并将更新后的页面放入所述缓存队列的尾部。
结合第十四方面或第十四方面的第一种至第三种可能的实现方式中任一种实现方式,在第四种可能的实现方式下,该计算设备还包括回放单元。值得注意的是,所述接收单元还用于接收多个重做日志。则相应的,所述回放单元还用回放所述多个重做日志,得到与所述多个重做日志中每一重做日志对应的页面。
应当知道的是,第十四方面中每一实施例提供的计算设备可以执行第十三方面的对应实施例所述的数据备份方法。因此,第十四方面的每一实施例的有益效果,可以参见第十三方面的对应实施例所具有的有益效果。
第十五方面,本申请提供了另一种计算设备,该计算设备至少包括I/O接口和处理器。所述I/O接口用于通过RDMA协议接收多个页面。
所述处理器用于将所述多个页面依次写入缓存队列中,根据所述多个页面中每个页面包含的LSN,维护GBP起始点、GBP恢复点和GBP结束点,以便在所述主机发生故障的时候,能够根据所述GBP起始点、所述GBP恢复点和所述GBP结束点进行故障修复。
值得注意的是,所述多个页面对应的LSN按照从所述缓存队列的头部到尾部的顺序递增,
需要说明的是,所述GBP起始点指示内存中存储的所有页面中所包括的最小的LSN。所述GBP恢复点指示最近一次接收到的一批页面中所包括的最小的LSN。所述GBP结束点指示最近一次接收到的一批页面中所包括的最大的LSN。
结合第十五方面,在第一种可能的实现方式下,在接收到页面缓冲区内不存在的新的页面且该页面缓冲区已满时,所述处理器还用于将位于所述缓存队列头部的页面淘汰掉,并将GBP起始点更新为新的位于所述缓存队列头部的页面对应的LSN。
结合第十五方面,在第二种可能的实现方式下,在接收到新的页面时,所述处理器还用于根据所述新的页面对应的LSN更新所述GBP恢复点和所述GBP结束点。
结合第十五方面,在第三种可能的实现方式下,在接收到页面缓冲区内不存在的新的页面时,所述处理器还用于将所述新的页面放入所述缓存队列的尾部。在接收到的页面缓冲区已经存在的新的页面时,所述处理器还用于使用所述新的页面对已经存在的对应页面进行更新,并将更新后的页面放入所述缓存队列的尾部。
结合第十五方面或第十五方面的第一种至第三种可能的实现方式中任一种实现方式,在第四种可能的实现方式下,值得注意的是,所述I/O接口还用于接收多个重做日志。相应的,所述处理器还用回放所述多个重做日志,得到与所述多个重做日志中每一重做日志对应的页面,并将得到的页面成批地刷到本地磁盘。
应当知道的是,第十四方面中每一实施例提供的计算设备可以执行第十三方面的对应实施例所述的数据备份方法。因此,第十四方面的每一实施例的有益效果,可以参见第十三方面的对应实施例所具有的有益效果。
附图说明
图1是一种数据库系统的架构图。
图2是本申请提供的一个数据库系统的架构图。
图3示出了checkpoint机制的示意图。
图4是本申请提供的一种数据库系统的故障修复方法的流程图。
图5是主机向GBP节点发送页面时的结构示意图。
图6是GBP节点内的一个缓存队列的示意图。
图7A-7C是主机向GBP节点发送页面时GBP起始点、GBP恢复点和GBP结束点的变化过程图。
图8是从GBP节点拉取页面到备机的结构图。
图9是本申请提供的数据库系统的故障修复方法中重做日志的分布结构图。
图10是本申请提供的另一种数据库系统的故障修复方法的流程图。
图11A是本申请提供的一种计算设备的结构图。
图11B是本申请提供的另一种计算设备的结构图。
图12是本申请提供的另一种计算设备的结构图。
图13是本申请提供的另一种计算设备的结构图。
图14是本申请提供的另一种数据库系统的架构图。
图15是本申请提供的另一种数据库系统的故障修复方法的流程图。
图16是本申请提供的另一种数据库系统的故障修复方法的流程图。
图17是本申请提供的另一种计算设备的结构图。
图18是本申请提供的另一种计算设备的结构图。
图19是本申请提供的一种数据备份方法的流程图。
图20是本申请提供的另一种计算设备的结构图。
图21是本申请提供的另一种计算设备的结构图。
具体实施方式
在对本申请所述的实施例进行描述之前,首先对本申请文件中将要出现的部分名词进行解释。
WAL协议:又称为预写重做日志,为了保证事务修改的持久性和一致性,先将重做日志顺序落盘以保证修改页面的持久性,重做日志落盘后即使主机宕机,备机也能通过回放重做日志将备机恢复到与宕机前的主机一致的状态。
脏页面:位于数据缓冲区(data buffer)的页面,如果从磁盘读上来以后被修改,则这样的页面被称为脏页面。脏页面是数据缓冲区内的概念,在本申请中,被修改的页面位于主机的数据缓冲区时被称为脏页面,而从所述主机被写入到全局资源缓冲池(global buffer pool,GBP)节点的页面被称为是被修改页面。
复原时间目标(recover time objective,RTO):是指客户容许服务中断的时间长度,比如说灾难发生后半天内便需要恢复服务,则RTO就是十二小时。
日志序列号LSN:每一条日志具有唯一的一个LSN,或者说,日志和LSN是一对一的,因此根据LSN能够唯一地确定出一条日志。需要说明的是,由于每一条日志会对应一个被 修改页面(也即主机发送给GBP节点的页面,后面简称为页面),所以每一页面也只包括一个LSN,且页面和LSN也是一对一的关系,所以本申请中提及的“页面对应的LSN”或“页面包含的LSN”或“页面具有的LSN”均是相同的含义。
磁盘恢复点:本地磁盘内最近一批被写入的数据页所包含的最小的日志序列号LSN。
磁盘结束点:本地磁盘上最后一条重做日志的LSN。
本申请的一个实施例提供了第一种数据库系统的故障修复方法(简称为“第一种故障修复方法”)。具体的,该第一种故障修复方法可以应用于如图2所示的数据库系统中。如图2所示,该数据库系统包括主机(master)210、GBP节点220和备机(standby)230。其中,主机210和GBP节点220之间通过第一数据传输协议进行数据传输。
需要说明的是,该第一数据传输协议是低时延且高吞吐的数据传输协议。可选的,该第一数据传输协议为远程直接内存访问(remote direct memory access,RDMA)协议。在这种情况下,该主机210具有支持RDMA协议的万兆以太网卡或者无限带宽(infiniBand)网卡。
RDMA协议具有低时延(例如时延小于或等于10μs)和不需要CPU直接参与的特点。在本实施例中,主机210中被修改的页面,可以基于RDMA协议被远程写入GBP节点220的页面缓冲区(或内存)中。
需要说明的是,该被修改的页面是以远程原子写入的方式被写入到该GBP节点220的页面缓冲区中的。也就是说,该被修改的页面是以原子为单位被写入该GBP节点220的。一个原子通常包括多个被修改的页面,因此,在多个被修改的页面凑够一个原子后,才会被写入该GBP节点220的页面缓冲区。
另外,由于源于主机210且被写入GBP节点220的页面一定是该主机210中被修改的页面,所以为了简化表述,在本申请的很多地方,将这种页面直接简称为页面。
可选的,该第一数据传输协议还可以均为40G以太网(40GE)。
在对本实施例进行描述之前,首先需要明确什么是检查点(checkpoint)。检查点是一个数据库事件,它存在的根本意义在于减少崩溃恢复(crash recovery)时间。数据库本身都具有检查点(checkpoint)机制,基于checkpoint机制,通过一个或多个后台线程,持续不断地将脏页面从内存刷到本地磁盘,受本地磁盘本身速度的限制,脏页面从内存刷到本地磁盘的速度会比较慢。落盘的最后一个页面对应于硬盘恢复点,由于脏页面落盘的速度比较慢,所以导致磁盘恢复点对应的重做日志和磁盘结束点对应的重做日志之间存在大量的重做日志,且这批重做日志对应的脏页面都没有落盘。在主机发生故障需要恢复的时候,由于大量的重做日志对应的脏页面没有落盘,因此该大量的重做日志均需要回放。容易知道,检查点的主要用处是通过不断的把脏页面刷到本地磁盘,推进磁盘恢复点,来达到在数据库崩溃且需要恢复时,减少需要恢复的重做日志量,降低RTO的目的。
请参见附图3,它示出了checkpoint机制在数据库发生故障需要修复时所起的作用。具体的,如图3所示,页面P1由V0版本依次被修改为V1、V2和V3版本,从V0修改到V1,对应生成的重做日志是log 1;从V1修改到V2,对应生成的重做日志是log 2;从V2修改到V3,对应生成的重做日志是log 3。根据WAL协议,在每一修改事务被提交(commit)时,主机只会将对应的重做日志刷到本地磁盘,而页面刷盘则是在后台完成的。假设P1的V0版本刷盘之后,节点发生故障需要恢复,如图3所示,那么磁盘恢复点对应的重做日志就是log 0,于是需要从log 0开始依次回放发生在log 0之后的全部重做日志,假 设需要依次回放的重做日志为log1、log2和log3,则在需要回放的全部重做日志回放完成之后,P1的版本将会被恢复到V3,也就是将页面恢复到故障之前的状态。
可选的,在本实施例中,主机210和备机230之间是一种无共享(shared nothing)架构。无共享架构是一种分布式计算架构,这种架构中的每一个节点(node)都是独立,也即每个节点都有自己的CPU/内存/硬盘等,不存在共享资源。
值得注意的是,在本实施例涉及的数据库系统中,能够实现该数据库系统快速恢复的关键装置是GBP节点220。该GBP节点可以是安装有能够实现全局页面缓存功能的应用程序的设备。为了表述方便,下文将“能够实现全局页面缓存功能的应用程序”简称为“目标应用”。在本实施例中,该目标应用可以被部署在除该主机210和该备机230之外的其他任一设备上,则该部署有目标应用的其他任一设备就是该GBP节点220。值得注意的是,在本实施例中,还需要根据部署有该目标应用的设备的位置,配置所述主机210将被修改页面写入何处,以及所述备机230将从何处获取页面。
在本实施例中,主机210和备机230建立关系后,都会分别根据各自的配置信息连接到GBP节点上。其中,主机210和GBP节点220之间通过第一数据传输协议连通。在主机210正常工作的过程中,备机230与主机210之间以及GBP节点220与主机210之间均需要保持心跳。在主机210发生故障(或崩溃)导致该数据库系统故障时,主机210和备机230之间将会进行故障切换,在该故障切换之后,备机230将被提升为新的主机,从而实现该数据库系统的故障修复。
下面将详细描述本实施例所述的第一种故障修复方法。请参见附图4,它示出了该第一种故障修复方法的流程示意图。具体的,该第一种故障修复方法包括下述步骤。
S101、在所述主机正常工作时,所述主机使用第一数据传输协议将多个页面发送给所述GBP节点。
在本实施例中,在所述主机正常工作期间,所述主机还将所有修改事务对应的重做日志发送给所述备机。对应的,所述备机通过回放这些重做日志得到对应的页面,并将这些页面成批地刷到所述备机的本地磁盘。
需要说明的是,所述重做日志也是成批地从所述主机传送到所述备机的,例如一批重做日志可以为8MB。
在本实施例所述的数据库系统包括多个备机时,通常所述主机需要将重做日志发送给大于N/2(向上取整)个备机,N为大于1的整数。
作为本实施例的一种具体实现方式,所述主机启动页面发送线程,且所述页面发送线程使用所述第一数据传输协议将位于发送队列内的多个页面,按照从头部到尾部的顺序,成批地发送给所述GBP节点。其中,所述发送队列位于所述主机内,且从所述发送队列的头部到尾部,位于所述发送队列的多个页面所对应的LSN是递增的。
进一步地,所述主机可以启动多个页面发送线程,且所述多个页面发送线程与所述主机包括多个发送队列是一对一的。
值得注意的是,在所述主机包括多个发送队列时,被修改页面将会被放入哪一个发送队列可以根据哈希算法确定。而被放入同一发送队列(例如发送队列Q)的多个页面,可以根据该多个页面被修改的先后顺序,将该多个页面放入发送队列Q中。具体的,在同一发送队列中,从头部到尾部,多个页面的LSN递增。或者说,在同一发送队列中,先被修改的页面位于后被修改的页面的前面。应当知道的是,该多个页面各自的LSN也是根据该 多个页面被修改的先后顺序确定的,其中,先被修改的页面的LSN小于后被修改的页面的LSN。
S102、所述GBP节点将所述多个页面写入到所述GBP节点的缓存队列。
其中,所述多个页面各自对应的LSN按照从所述缓存队列的头部到尾部的顺序递增。
其中,所述GBP节点的页面缓冲区中包括一个或多个缓存队列。每一缓存队列内均具有多个页面,并且位于同一缓存队列内的多个页面,按照写入该缓存队列的顺序(或者,按照从该缓存队列的头部到尾部的顺序),该多个页面各自包括的LSN越来越大。
可选的,所述GBP节点启动页面接收线程,所述页面接收线程成批地接收所述多个页面,并将所述多个页面写入所述GBP节点的缓存队列。
进一步地,所述GBP节点可以启动多个页面接收线程,且所述多个页面接收线程与所述GBP节点包括的多个缓存队列是一对一的。
再进一步地,所述主机启动的多个页面发送线程与所述GBP节点启动的多个页面接收线程是一对一的。在这种情况下,容易知道,位于所述主机上的发送队列和位于所述GBP节点上的缓存队列也是一对一的,位于每个发送队列内的页面,在被对应的页面发送线程发送以及被对应的页面接收线程接收后,将被存入对应的缓存队列中。如图5所示,主机200中具有发送队列1-3,还启动了页面发送线程1-3,其中,页面发送线程1用于发送位于发送队列1中的页面,页面发送线程2用于发送位于发送队列2中的页面,页面发送线程3用于发送位于发送队列3中的页面。进一步地,如图5所示,GBP节点300中启动了页面接收线程1-3,另外还具有缓存队列1-3,其中,页面接收线程1接收的页面将放入缓存队列1中,页面接收线程2接收的页面将放入缓存队列2中,页面接收线程3接收的页面将放入缓存队列3中。则在图5对应的实施例中,位于发送队列1内的页面,被页面发送线程1发送给页面接收线程1之后,将被放入缓存队列1中。位于发送队列2内的页面,被页面发送线程2发送给页面接收线程2之后,将被放入缓存队列2中。或位于发送队列3内的页面,被页面发送线程3发送给页面接收线程3之后,将被放入缓存队列3中。
在本实施例中,由于所述主机通过第一数据传输协议将被修改页面写入所述GBP节点的速率,要远大于,所述备机通过回放重做日志生成对应的被修改页面并将被修改页面刷入本地磁盘的速率。因此所述GBP节点内存储的被修改页面的数量要远大于所述备机的本地磁盘中被刷入的被修改页面,从而在所述主机发生故障且需要修复所述数据库系统时,对于第一部分页面,可以直接从所述GBP节点拉取到所述备机的页面缓冲区,而所述备机仅需要回放对应于第二部分页面的重做日志并得到该第二部分页面即可。因此,采用本实施例,可以提高该数据库系统的修复效率。
需要注意的是,为了表述的更加清楚简洁,在本申请中将位于包含所述磁盘恢复点的页面和包含所述磁盘结束点的页面之间的全部页面分为第一部分页面和第二部分页面。具体的,第一部分页面是指位于包含所述磁盘恢复点的页面和包含所述GBP恢复点的页面之间的全部页面,或者,是指位于所述磁盘恢复点对应的重做日志和所述GBP恢复点对应的重做日志之间的所有重做日志对应的被修改页面。第二部分页面是指位于包含所述GBP恢复点的页面和包含所述磁盘结束点的页面之间的全部页面,或者,是指位于所述GBP恢复点对应的重做日志和所述磁盘结束点对应的重做日志之间的所有重做日志对应的被修改页面。
其中,所述第一部分页面可以包括包含所述磁盘恢复点的页面,也可以不包括包含所 述磁盘恢复点的页面。所述第一部分页面可以包括包含所述GBP恢复点的页面,也可以不包括包含所述GBP恢复点的页面。
在所述第一部分页面包括包含所述GBP恢复点的页面时,所述第二部分页面可以不包括包含所述GBP恢复点的页面,自然,也可以包括包含所述GBP恢复点的页面。在所述第一部分页面不包括包含所述GBP恢复点的页面时,所述第二部分页面包括包含所述GBP恢复点的页面。应当知道的是,所述第二部分页面是包括包含所述磁盘结束点的页面的。
S103、在所述主机发生故障时,所述备机确定GBP起始点、GBP恢复点和GBP结束点。
其中,所述GBP起始点指示所述GBP节点上存储的所有页面中所包括的最小的LSN。所述GBP恢复点指示所述GBP节点最近一次接收到的一批页面中所包括的最小的LSN。所述GBP结束点指示所述GBP节点最近一次接收到的所述一批页面中所包括的最大的LSN。
可选的,所述GBP节点本身维护了所述GBP起始点、所述GBP恢复点和所述GBP结束点,且所述备机是从所述GBP节点中获取这三个点的。
具体的,所述GBP节点在接收到新的页面后,会更新所述GBP恢复点和所述GBP结束点。
作为本实施例的一种具体实现方式,所述GBP节点在接收到新页面,且所述新页面在所述GBP节点的页面缓冲区中不存在,则将所述新页面放入所述缓存队列的尾部。
作为本实施例的另一种具体实现方式,所述GBP节点在接收到新页面,但所述新页面在所述GBP节点的页面缓冲区已经存在,则所述GBP节点会根据接收到的所述新页面对已经存在的对应页面进行更新,并将更新后的所述新页面放在所述缓存队列的尾部,或者说,所述GBP节点删除已经存在的对应页面,并将所述新页面放在所述缓存队列的尾部。
需要解释的是,所谓“新页面”是指所述GBP节点当前接收到的页面。例如,所述GBP节点当前接收到的页面是页面M,则页面M就是“新页面”。相应的,如果GBP节点的页面缓冲区中在接收页面M之前不存在该页面M,则将该页面M放入其中一个缓存队列的尾部。反之,如果GBP节点的页面缓冲区中在接收页面M之前已经存在页面M(该页面M位于缓存队列R中),不过该已经存在的页面M包含的LSN为K,当前接收的页面M包含的LSN为T。其中,K和T均为大于或等于0的整数,且T大于K。则使用当前接收的页面M对已经存在的页面M进行更新,并将更新后的页面M放入该缓存队列R的尾部。或者,则丢弃到已经存在的页面M,并将当前接收的页面M放入该缓存队列R的尾部。
应当知道的是,在该GBP节点的页面缓冲区中在接收页面M之前不存在页面M时,该页面M将被放入哪一个缓存队列,可以通过哈希算法确定,也可以通过其他方法确定。
在本实施例中,判断(或确定)所述新页面在所述GBP节点的页面缓冲区中是否已经存在可以是由所述GBP节点本身执行的,也可以是由所述主机执行的。
值得注意的是,在本实施例中,在所述GBP节点接收到所述GBP节点的页面缓冲区中不存在的新页面,所述新页面在所述GBP节点的页面缓冲区中不存在,且所述GBP节点的页面缓冲区已满时,所述GBP节点将位于所述缓存队列头部的页面淘汰掉,并将所述GBP起始点更新为所述缓存队列的新的头部页面对应的LSN。例如,所述GBP节点当前接收的页面为页面Y,且该页面Y在所述GBP节点的页面缓冲区中不存在,则所述GBP节点会淘汰掉位于所述缓存队列头部的页面,并将所述页面Y放入所述缓存队列的尾部,同时所述GBP起始点将会被更新(或推进)为所述缓存队列的新的头部页面对应的LSN。
需要说明的是,在所述备机从所述GBP节点中获取所述GBP起始点、所述GBP恢复点 和所述GBP结束点时,所述备机获取的是最近更新后的所述GBP起始点、所述GBP恢复点和所述GBP结束点。
通常所述GBP节点是成批(bitch)接收来自所述主机的页面的,例如,一批页面可以最多包括100个页面,最少包括1个页面。例如,所述主机的后台线程是每5ms发送一批页面到所述GBP节点的,则如果所述主机内有M(M为大于100的整数)个页面待发送,则所述主机的后台线程会连续发送M/100(向上取整)次,如果所述主机内只有1个页面,则所述主机的后台线程就只发1个页面给所述GBP节点。
值得注意的是,所述GBP节点最近一次接收到的一批页面可以包括一个或多个页面。自然,在所述GBP节点最近一次接收到的一批页面包括多个页面时,该多个页面的数量不大于一次最多允许发送的页面的数量(例如100个页面)。
结合上述描述,容易知道,所述GBP节点在每接收到一批来自所述主机的页面后,是基于滑动窗口的缓存淘汰算法管理该批页面的,或者更准确地说,是基于滑动窗口的缓存淘汰算法管理位于所述GBP节点的缓存队列中的所有页面的。具体的,假设所述GBP节点的缓存队列是一个的窗口(如图6所示,它示出的是所述GBP节点内的一个缓存队列的示意图,例如为缓存队列1),则在所述GBP节点接收到新页面需要被写入到缓存队列1且缓存队列1没满时,需要维护的是缓存队列1的右边界;而在所述GBP节点接收到新页面需要被写入到缓存队列1且缓存队列1已满时,由于需要将位于缓存队列1头部的页面淘汰掉,因此需要维护的是缓存队列1的左边界。
请参见图7A至图7C,它们示出了所述GBP节点每接收一批页面后,所述GBP节点如何基于滑动窗口的缓存淘汰算法管理所述GBP节点的缓存队列中存入的页面,以及如何维护所述GBP起始点、所述GBP恢复点以及所述GBP。
值得注意的是,在图7A至图7C所示的实施例中,所述主机每批发送给所述GBP节点的页面最多不超多3个(也即batch=3)。
如图7A所示,发送队列按照从头到尾的顺序依次包括页面1(P1)、页面2(P2)、页面3(P3)、和页面4(P4)四个被修改页面,由于batch=3,因此本次(假设本次为第一次)发送队列中的P1、P2和P3将会从主机传送到GBP节点中,假设P1的LSN为1,P2的LSN为2,P3的LSN为3。
在第一次发送完之后,GBP起始点和GBP恢复点均为1,GBP结束点为3。
如图7B所示,发送队列按照从头到尾的顺序依次包括页面4(P4)、页面5(P5)、P2、P1和页面6(P6)五个被修改页面,由于batch=3,因此本次(假设本次为第二次)发送队列中的P4、P5和P2将会从主机传送到GBP节点中,假设P4的LSN为4,P5的LSN为5,P2的LSN为6。容易知道的是,在第一次发送时,P2的LSN为2,但是在第二次发送时,P2的LSN被刷新为6。之所以为出现这种情况,是因为P2又被修改了,因此导致P2对应的LSN变大了。
在第二次发送完之后,GBP起始点为1,GBP结束点为6,GBP恢复点为4.
如图7C所示,发送队列按照从头到尾的顺序依次包括页面1和页面6,由于batch=3,因此本次(假设本次为第三次)发送队列中的页面1和页面6将会从主机传送到GBP节点中,假设页面1的LSN为7,页面6的LSN为8。容易知道的是,在第一次发送时,页面1的LSN为1,但是在第三次发送时,页面1的LSN被刷新为7,之所以为出现这种情况,是因为页面1又被修改了,因此导致页面1对应的LSN变大了。
在第三次发送完之后,GBP起始点为3,GBP结束点为8,GBP恢复点为7。
S105、在磁盘恢复点大于或等于所述GBP起始点,且磁盘结束点大于或等于所述GBP结束点时,所述备机回放位于所述GBP恢复点所对应的重做日志和所述磁盘结束点所对应的重做日志之间的所有重做日志,以使所述备机切换为新的主机,进而实现所述数据库系统的故障修复。
其中,所述磁盘恢复点指示所述备机的磁盘中最近一批被写入的多个页面所包含的最小的LSN。所述磁盘结束点指示所述备机所接收的最后一条重做日志的LSN。
在所述磁盘恢复点大于或等于所述GBP起始点,且所述磁盘结束点大于或等于所述GBP结束点时,所述备机还启动后台线程,所述后台线程用于将所述GBP节点上存储的所有页面拉取到所述备机的页面缓冲区。随后,所述备机还将启动后台线程将这些页面从所述备机的页面缓冲区被刷到所述备机的本地磁盘。
值得注意的是,在备机提升为主机之前,并不需要等待位于该GBP节点上的所有页面均被拉取到所述备机的页面缓冲区,页面拉取可以在后台异步完成。
可选的,所述备机启动后台线程几乎与所述备机开始执行回放步骤(S105)同时进行。
另外,步骤S105中提到“新的主机”,这是为了与本实施例中的原主机相区分。应当知道,在执行完所述第一种故障修复方法之后,所述备机(或原备机)被提升(或被切换)成为所述新的主机。
可选的,所述后台线程用于通过第二数据传输协议将所述GBP上存储的所有页面拉取到所述备机的页面缓冲区。
值得注意的是,该第二数据传输协议可以是低时延且高吞吐的数据传输协议。可选的,
第一数据传输协议和第二数据传输协议可以是相同的协议。
可选的,该第二数据传输协议为RDMA协议。在这种情况下,该备机具有支持RDMA协议的万兆以太网卡或者无限带宽(infiniBand)网卡。
可选的,该第二数据传输协议还可以均为40G以太网(40GE)。
由上可知,该第一数据传输协议和该第二数据传输协议可以均为RDMA协议,或均为40GE。该第一数据传输协议和该第二数据传输协议也可以其中一个为RDMA协议,另一个为40GE,例如该第一数据传输协议为RDMA协议,该第二数据传输协议为40GE。
所述备机在将所述GBP节点上存储的所有页面通过第二数据传输协议拉取到所述备机的页面缓冲区后,还会将拉取到页面缓冲区的页面与所述备机自己维护的页面进行比较,丢弃到旧的页面并留下新的页面。如图8所示,如果从GBP节点上拉取的P1的版本是V3,备机自己维护的P1的版本是V2,则抛弃掉V2并保留V3。另外,如图8所示,如果从GBP节点上拉取的P2的版本是V0,备机自己维护的该P2的版本是V1,则抛弃掉V0并保留V1。应当知道的是,在所述备机执行回放步骤(S105)的过程中,通常从所述GBP节点拉取的页面的版本比所述备机自己维护的页面的版本新,而在所述备机执行完回放步骤(S105)之后,则所述备机自己维护的页面的版本要比从所述GBP节点拉取的页面的版本新,或者与从所述GBP节点拉取的页面的版本一样新。
其中,所述备机自己维护的页面的版本,可以是所述备机通过回放重做日志生成的,也可以从所述备机的本地磁盘上直接读取上来的。
应当知道的是,在所述主机发生故障之后,以及在所述备机执行所述回放步骤之前,所述备机还从本地获取所述磁盘恢复点和所述磁盘结束点。自然,获取所述磁盘恢复点和所述磁盘结束点的目的是为了判断步骤S105中限定的条件是否满足。
需要关注的是,在本实施例中,所述备机回放完位于所述GBP恢复点对应的重做日志和所述磁盘恢复点所对应的重做日志之间的所有重做日志之后,所述备机就可以被切换(提升)为所述新的主机,也即,本实施例所述的数据库系统的故障修复就完成了。因此,所述备机被切换(提升)为所述新的主机的效率仅与所述备机回放位于所述GBP恢复点对应的重做日志和所述磁盘恢复点所对应的重做日志之间的所有重做日志的速率有关,而与将所述GBP节点上存储的所有页面拉取到所述备机的页面缓冲区的速率无关。因此,将所述GBP节点上存储的所有页面拉取到所述备机的页面缓冲区可以在所述备机的后台异步完成。
值得注意的是,在本实施例中,所述备机仅回放了位于所述GBP恢复点所对应的重做日志和所述磁盘结束点所对应的重做日志之间的所有重做日志,而没有回放位于所述磁盘恢复点所对应的重做日志和所述GBP恢复点所对应的重做日志之间的所有重做日志(如图9所示)。或者说,在本实施例中,所述备机跳过位于所述磁盘恢复点所对应的重做日志和所述GBP恢复点所对应的重做日志之间的所有重做日志,仅回放位于所述GBP恢复点所对应的重做日志和所述磁盘结束点所对应的重做日志之间的所有重做日志。相对于需要回放位于所述磁盘恢复点所对应的重做日志和所述磁盘结束点所对应的重做日志之间的所有重做日志的技术方案来说,本实施例中,由于备机需要回放的重做日志数量比较少,所以采用本实施例,能够提高该备机被切换为新的主机的效率,也即能够提高该数据库系统的故障修复效率。
需要关注的是,在本实施例中,在所述主机发生故障后,所述备机不再继续回放剩余的没有回放的重做日志,而是确定所述GBP起始点、所述GBP恢复点、所述GBP结束点、所述磁盘恢复点以及所述磁盘结束点,然后比较所述磁盘恢复点与所述GBP起始点的大小,以及比较所述磁盘结束点和所述GBP结束点的大小,并在所述磁盘恢复点大于或等于所述GBP起始点,以及所述磁盘结束点大于或等于所述GBP结束点时,回放位于所述GBP恢复点对应的重做日志与所述磁盘结束点对应的重做日志之间的所有重做日志,以便实现故障转移,或者实现该数据库系统的故障恢复。简单来说,在本实施例中,在该主机发生故障后,对于剩余的没有回放完的所有重做日志,所述备机只回放很小的一部分。因此采用本实施例提供的技术方案能够提高该数据库系统恢复的效率。
可选的,在所述备机执行完回放步骤之后,或者在该故障修复方法执行完且所述备机被提升为新的主机之后,如果所述备机上的应用需要访问的页面还位于所述GBP节点的页面缓冲区中,则所述应用从所述GBP节点的页面缓冲区中读取所述需要访问的页面。
应当知道的是,在所述备机被切换成新的主机后,所述新的主机可以提供读写服务。另外,在所述备机被切换成新的主机后,如果还需要做回滚(undo),则所述新的主机会启动后台线程做回滚。由于回滚是在后台做的,因此不会阻塞所述新的主机的其他业务。本申请还提供了一种数据库系统。请参见附图2,它为一种数据库系统的架构图。该数据库系统可以用于执行前述第一种故障修复方法。由于该数据库系统在前述实施例中已经有了比较多的描述,因此本实施例仅就前述实施例没有提及的部分进行描述,至于前述实施例中已经描述的部分,由于本实施例以及其他关于数据库系统的实施例均可以直接参见前述实施例的相关描述,因此不再赘述。
容易知道,所述数据库系统包括主机210、备机230和GBP节点220,且所述主机210与所述GBP节点220之间通过第一数据传输协议通信连接。
在所述主机210正常工作期间,所述主机210用于使用所述第一数据传输协议将多个页面发送给所述GBP节点220。
所述GBP节点220用于将所述多个页面写入到所述GBP节点的缓存队列。需要注意的是,所述多个页面对应的LSN按照从所述缓存队列的头部到尾部的顺序递增。
在所述主机210发生故障时,所述备机230用于确定所述GBP起始点、所述GBP恢复点和所述GBP结束点。
关于所述GBP起始点、所述GBP恢复点和所述GBP结束点的定义,请参考前文的描述。
在所述磁盘恢复点大于或等于所述GBP起始点,且所述磁盘结束点大于或等于所述GBP结束点的情况下,所述备机230还用于回放位于所述GBP恢复点所对应的重做日志和所述磁盘结束点所对应的重做日志之间的所有重做日志。
关于所述磁盘恢复点和所述磁盘结束点的定义,也请参考前文的描述。
作为本申请的一个实施例,在所述数据库系统中,所述GBP节点220用于接收新的页面,并根据所述新的页面,更新所述GBP起始点、所述GBP恢复点和所述GBP结束点。
值得注意的是,在所述GBP节点220维护所述GBP起始点、所述GBP恢复点以及所述GBP结束点时,可选的,所述备机230还用于从所述GBP节点获取所述GBP起始点、所述GBP恢复点以及所述GBP结束点。
可选的,在所述GBP节点220接收到新页面,且所述新页面在所述GBP节点的页面缓冲区中不存在时,所述GBP节点220还用于将所述新页面放入所述缓存队列的尾部。
可选的,在所述GBP节点220接收到新页面,且所述新页面在所述GBP节点的页面缓冲区中已经存在时,所述GBP节点220还用于根据接收到的所述新页面对已经存在的对应页面进行更新,并将更新后的所述新页面放在所述缓存队列的尾部。
可选的,在所述GBP节点220接收到新页面,且所述新页面在所述GBP节点的页面缓冲区中已经存在时,所述GBP节点220还用于丢弃已经存在的且与所述新页面对应的页面,并将所述新页面放在所述缓存队列的尾部。
可选的,在所述GBP节点220接收到所述GBP节点的页面缓冲区中不存在的新页面,且所述GBP节点的页面缓冲区已满时,所述GBP节点220还用于将位于所述缓存队列头部的页面淘汰掉,并将所述GBP起始点更新为所述缓存队列的新的头部页面对应的LSN。自然,在位于所述缓存队列头部的页面淘汰掉之后,所述GBP节点220还用于将所述GBP节点的页面缓冲区中不存在的新页面放入所述缓存队列的尾部。
作为本申请的另一个实施例,在所述磁盘恢复点大于或等于所述GBP起始点,且所述磁盘结束点大于或等于所述GBP结束点时,所述备机230还用于启动后台线程,所述后台线程用于将所述GBP节点220上存储的所有页面拉取到所述备机的页面缓冲区。
可选的,所述后台线程用于通过第二数据传输协议将所述GBP节点220上存储的所有页面拉取到所述备机的页面缓冲区。
需要说明的是,所述备机230回放位于所述GBP恢复点所对应的重做日志和所述磁盘结束点所对应的重做日志之间的所有重做日志与所述备机230将所述GBP节点220上存储的所有页面拉取到所述备机的页面缓冲区可以异步完成。
值得关注的是,在所述主机210发生故障之后,以及在所述备机230回放所述重做日志之前,所述备机230还用于确定,或从本地获取,所述磁盘恢复点和所述磁盘结束点。
应当知道的是,在所述主机210正常工作期间,所述主机210还用于将重做日志发送 给所述备机230。对应地,所述备机230还用于回放所述重做日志得到对应的页面。
可选的,所述主机210用于启动页面发送线程,所述页面发送线程可以使用所述第一数据传输协议将位于发送队列的多个页面,按照从所述发送队列的头部到尾部的顺序,成批地发送给所述GBP节点220。其中,所述发送队列位于所述主机210内,且从所述发送队列的头部到尾部,所述多个页面各自包含的LSN是递增的。
进一步地,所述主机210还用于启动多个页面发送线程,且所述主机210可以包括多个发送队列,其中,所述多个页面发送线程与所述页面发送队列是一对一的。
可选的,所述GBP节点220用于启动页面接收线程,所述页面接收线程可以使用所述第一数据传输协议成批地接收所述多个页面,并将所述多个页面写入所述GBP节点的缓存队列。
进一步地,所述GBP节点220还用于启动多个页面接收线程,且所述GBP节点的页面缓冲区中具有多个缓存队列,其中,所述多个页面接收线程与所述多个缓存队列是一对一的。
另外,所述主机210启动的多个页面发送线程与所述GBP节点220启动的多个页面接收线程也可以是一对一的。应当知道的是,在这种情况下,所述多个发送队列与所述多个缓存队列也是一对一的,也即,每一发送队列内的多个页面可以被发送给对应的一个缓存队列内。
请参见附图10,它为本申请提供的第二种数据库系统的故障修复方法的流程图。需要说明的是,该第二种数据库系统的故障修复方法(简称为“第二种故障修复方法”)是从备机的角度描述的,而前述第一种故障修复方法(简称为“第一种故障修复方法”)是从系统的角度描述的,由于备机是系统的一部分,因此该第二种故障修复方法与第一种故障修复方法有很多相同的地方,基于此,在下述关于第二种故障修复方法的实施例中,仅就与第一种故障修复方法不同的部分进行描述,而与第一种故障修复方法相同的部分,参考前述相关实施例即可。
如图10所示,该第二种故障修复方法,包括下述步骤。
S201、在主机发送故障时,确定所述GBP起始点、所述GBP恢复点和所述GBP结束点。
需要说明的是,所述GBP节点上存储的所有页面均是在所述主机正常工作期间,由所述主机通过第一数据传输协议发送给所述GBP节点,且被所述GBP节点写入到所述GBP节点的缓存队列中的。所述多个页面对应的日志序列号LSN按照从所述缓存队列的头部到尾部的顺序递增。
S203、在所述磁盘恢复点大于或等于所述GBP起始点,且所述磁盘结束点大于或等于所述GBP结束点时,回放位于所述GBP恢复点所对应的重做日志和所述磁盘结束点所对应的重做日志之间的所有重做日志。
关于所述GBP起始点、所述GBP恢复点、所述GBP结束点、所述磁盘恢复点以及所述磁盘结束点的定义,均请参考前文的描述。
值得注意的是,位于所述磁盘恢复点所对应的重做日志以及所述GBP恢复点所对应的重做日志之间的所有重做日志没有被回放。
需要说明的是,在所述磁盘恢复点大于或等于所述GBP起始点,且所述磁盘结束点大于或等于所述GBP结束点时,本实施例提供的故障修复方法还包括:启动后台线程,所述后台线程用于将所述GBP节点上存储的所有页面拉取到页面缓冲区。
可选的,所述后台线程通过第二数据传输协议将所述GBP节点上存储的所有页面拉取到页面缓冲区。
应当知道的是,在所述主机发生故障之后以及在执行所述回放步骤之前,本实施例提供的故障修复方法还包括:获取所述磁盘恢复点和所述磁盘结束点。以及,在所述主机正常工作期间,接收所述主机发送的重做日志,回放所述重做日志得到对应的页面,并将得到的页面成批地刷到本地磁盘。
可选的,在执行完所述回放步骤之后,以及在需要被访问的页面还位于所述GBP节点的页面缓冲区时,本实施例提供的故障修复方法还包括:从所述GBP节点的页面缓冲区中读取所述需要被访问的页面。
如图11A和11B所示,它为本申请提供的第一种计算设备500的结构示意图。该计算设备500可以为前述第二种故障修复方法中提及的备机,且该计算设备500可以执行前述从备机角度描述的故障修复方法。该备机和前述第二种故障修复方法中提及的主机可以是两个独立的节点。
具体的,如图11A所示,该计算设备500至少包括确定单元510和回放单元530。其中,在主机发生故障时,确定单元510用于确定GBP起始点、GBP恢复点和GBP结束点。
关于所述GBP起始点、所述GBP恢复点和所述GBP结束点的定义,请参考前文的描述。
值得注意的是,所述GBP节点上存储的所有页面均是在所述主机正常工作期间,由所述主机通过第一数据传输协议发送给所述GBP节点,并由所述GBP节点写入到所述GBP节点的缓存队列的。所述多个页面对应的日志序列号LSN按照从所述缓存队列的头部到尾部的顺序递增。
在磁盘恢复点大于或等于所述GBP起始点,且磁盘结束点大于或等于所述GBP结束点时,回放单元530用于回放位于所述GBP恢复点所对应的重做日志和所述磁盘结束点所对应的重做日志之间的所有重做日志。关于所述磁盘恢复点和所述磁盘结束点的定义,也请参考前文的描述。
作为本申请的一个实施例,如图11B所示,该计算设备500还包括启动单元540。在所述磁盘恢复点大于或等于所述GBP起始点,且所述磁盘结束点大于或等于所述GBP结束点时,启动单元540用于启动后台线程,所述后台线程用于将所述GBP节点上存储的所有页面拉取到页面缓冲区。
应当知道的是,在所述主机发生故障之后以及在回放单元530执行回放步骤之前,确定单元510还用于获取所述磁盘恢复点和所述磁盘结束点。
值得注意的是,如图11B所示,该计算设备还包括接收单元520。在所述主机正常工作期间,接收单元520用于接收所述主机发送的重做日志。对应的,回放单元530用于回放所述重做日志得到对应的页面。
作为本申请的另一个实施例,如图11B所示,该计算设备还包括读取单元550。在执行完所述回放步骤之后,以及在需要被访问的页面还位于所述GBP节点的页面缓冲区时,读取单元550用于从所述GBP节点的页面缓冲区中读取所述需要被访问的页面。
如图12所示,它为本申请提供的第二种计算设备600的结构示意图。该计算设备600可以为前述第二种故障修复方法中提及的备机,且该计算设备600可以执行前述从备机角度描述的第二种故障修复方法。具体的,如图12所示,该计算设备600的硬件层610上运行有操作系统620,操作系统620上运行有应用程序630。硬件层610包含处理器611、 存储器612和输入/输出(I/O)接口等。存储器612中存储有可执行代码,该可执行代码在被处理器611执行时被配置为实现计算设备600的组件和功能。在本实施例中,存储器612用于存储磁盘恢复点和磁盘结束点。
具体的,在主机发生故障时,处理器611用于确定GBP起始点、GBP恢复点和GBP结束点。
值得注意的是,所述GBP节点上存储的所有页面均是在所述主机正常工作期间,由所述主机通过第一数据传输协议发送给所述GBP节点,并由所述GBP节点写入到所述GBP节点的缓存队列中。所述多个页面对应的日志序列号LSN按照从所述缓存队列的头部到尾部的顺序递增。
在所述磁盘恢复点大于或等于所述GBP起始点,且所述磁盘结束点大于或等于所述GBP结束点时,处理器611还用于回放位于所述GBP恢复点所对应的重做日志和所述磁盘结束点所对应的重做日志之间的所有重做日志。
需要强调的是,在本实施例中,位于所述磁盘恢复点所对应的重做日志以及所述GBP恢复点所对应的重做日志之间的所有重做日志没有被回放。
作为本申请的另一个实施例,在所述磁盘恢复点大于或等于所述GBP起始点,且所述磁盘结束点大于或等于所述GBP结束点时,处理器611还用于启动后台线程,所述后台线程用于将所述GBP节点上存储的所有页面拉取到页面缓冲区。
作为本申请的再一个实施例,在执行完所述回放步骤之后,以及在需要被访问的页面还位于所述GBP节点的页面缓冲区时,处理器611还用于从所述GBP节点的页面缓冲区中读取所述需要被访问的页面。
应当知道的是,在主机发生故障之后以及在所述处理器执行回放步骤之前,处理器611还用于从所述存储器中获取所述磁盘恢复点和所述磁盘结束点。
作为本申请的又一个实施例,在所述主机正常工作期间,I/O接口613用于接收所述主机发送的重做日志。对应的,处理器611用于回放所述重做日志得到对应的页面。
需要说明的是,本申请提供的第一种数据备份方法。该第一种数据备份方法至少包括:在将重做日志传送给备机期间,使用RDMA协议将页面发送给GBP节点,以便在发生故障时,利用所述GBP节点中的页面进行故障修复。
在本实施例中,在将重做日志传送给备机期间,还使用RDMA协议将被修改页面发给GBP节点,用于在所述GBP节点上做备份。由于使用RDMA协议可以使得大部分被发送给备机的重做日志对应的被修改页面被发送给了GBP节点,因此在本机发生故障时,所述备机尚未回放的剩余重做日志包括两部分,第一部分重做日志是指位于所述磁盘恢复点所对应的重做日志和所述GBP恢复点所对应的重做日志之间的全部重做日志,第二部分重做日志是指位于所述GBP恢复点所对应的重做日志和所述磁盘结束点所对应的重做日志之间的全部重做日志。所述备机只需要回放该第二部分重做日志以得到对应的页面就可以实现故障修复了,因为第一部分重做日志对应的页面可以直接从所述GBP节点上拉取。可知,采用本实施例提供的数据备份方法能够提高故障修复效率。
对应于上述第一种数据备份方法,本申请还提供了第三种计算设备700,该计算设备700可以执行上述第一种数据备份方法。如图13所示,该计算设备700的硬件层710上运行有操作系统720,操作系统720上运行有应用程序730。硬件层710包含处理器711、存 储器712、第一传输接口713和第二传输接口714等。存储器712中存储有可执行代码,该可执行代码在被处理器711执行时被配置为实现计算设备700的组件和功能。
在本实施例中,第一传输接口713用于将重做日志传送给备机。在第一传输接口713将所述重做日志传送给所述备机期间,第二传输接口714用于基于RDMA协议将页面发送给GBP节点,以便在发生故障时,利用所述GBP节点中的页面进行故障修复。
类似地,采用计算设备700的数据库系统在进行故障修复时,会具有比较高的故障修复效率。
本申请还提供了第三种数据库系统的故障修复方法(简称为“第三种故障修复方法”)。首先,该第三种故障修复方法可以应用于如图14所示的数据库系统的,该数据库系统包括主机800和GBP节点900。该第三种故障修复方法也是从系统的角度描述的,但是该第三种故障修复方法不同于前述第一种故障修复方法。它们的区别在于在第一种故障修复方法中,涉及到主机210、备机230和GBP节点220三方,在所述主机发送故障时,该备机通过回放日志被提升为新的主机。也即在第一种的故障修复方法中,所述主机发生故障后,所述备机会被提升为新的主机。但是在第三种数据库系统的故障修复方法中,仅涉及到主机800和GBP节点900两方,在主机800发生故障时,主机800将会通过回放重做日志被重新拉起。
值得注意的是,如果所述主机的软件故障了,则通常所述主机可以被重新拉起。如果所述主机的硬件故障了,则通常所述主机无法被拉起。因此,前述第一种故障修复方法既可以在所述主机的软件故障时被采用,也可以在所述主机的硬件故障时被采用。而该第三种故障修复方法,通常只可以在所述主机的软件故障时被采用。
需要说明的是,该第三种故障修复方法与前述第一种故障修复方法有很多相同的地方,因此下面在对该第三种故障修复方法进行描述时,仅就不同于前述第一种故障修复方法的部分进行描述,对应相同的部分,请直接参考前文的描述即可。
请参阅附图15,它是该第三种故障修复方法的流程图。需要关注的是,在该第三种故障修复方法中,S301、S303和S305的执行主体均所述主机800,S302的执行主体是GBP节点900。而在前述第一种故障修复方法,S101是主机210执行的,S102是GBP节点220执行的,S103和S105均是备机230执行的。容易看出,S301与S101几乎相同;S302与S102几乎相同;
S303与S103除了执行主体不同,其他的几乎相同;S305与S105也是除了执行主体不同,其他的几乎相同。
具体的,该第三种故障修复方法包括下述步骤。
S301、在正常工作时,使用第一数据传输协议将多个页面发送给GBP节点。
S302、所述GBP节点将所述多个页面写入所述GBP节点的缓存队列中。
值得注意的是,所述多个页面对应的LSN按照从所述缓存队列的头部到尾部的顺序递增。
S303、在发生故障时,确定GBP起始点、GBP恢复点和GBP结束点。
S305、在磁盘恢复点大于或等于所述GBP起始点,且磁盘结束点大于或等于所述GBP结束点时,回放位于所述GBP恢复点对应的重做日志和所述磁盘结束点对应的重做日志之间的所有重做日志,以使所述主机被重新拉起。
值得注意的是,在本实施例中,位于所述磁盘恢复点对应的重做日志和所述GBP恢复 点对应的重做日志之间的所有重做日志没有被回放。
作为本申请的另一个实施例,在所述磁盘恢复点大于或等于所述GBP起始点,且所述磁盘结束点大于或等于所述GBP结束点时,该第三种故障修复方法还包括:S306、启动后台线程,所述后台线程用于将位于所述GBP节点上的所有页面拉取到页面缓冲区。应当知道的是,被拉取到页面缓冲区的页面还会被刷到本地磁盘。
容易看出,在发生故障之后以及在执行回放之前,该第三种故障修复方法还包括:S304、获取所述磁盘恢复点和所述磁盘结束点。
作为本申请的再一个实施例,在所述主机执行完回放步骤之后,以及在需要被访问的页面还位于所述GBP节点时,该第三种故障修复方法还包括:S307、从所述GBP节点中读取所述需要被访问的页面。
本申请还提供了一种数据库系统,请参阅附图14,该数据库系统包括主机800和GBP节点900,且主机800和GBP节点900之间通过第一数据传输协议通信连接。该数据库系统可以用于执行前述第三种故障修复方法。
主机800用于通过第一数据传输协议向GBP节点900发送多个页面。
GBP节点900用于将所述多个页面写入所述GBP节点的缓存队列。
其中,所述多个页面各自包含的日志序列号LSN按照从所述缓存队列的头部到尾部的顺序递增
在所述主机发生故障时,主机800还用于确定GBP起始点、GBP恢复点和GBP结束点。在磁盘恢复点大于或等于所述GBP起始点,且磁盘结束点大于或等于所述GBP结束点时,主机800还用于回放位于所述GBP恢复点对应的重做日志和所述磁盘结束点对应的重做日志之间的所有重做日志。
关于所述GBP起始点、所述GBP恢复点、所述GBP结束点、磁盘恢复点和磁盘结束点的定义,请参考前文的描述。
应当知道的是,位于所述磁盘恢复点对应的重做日志和所述GBP恢复点对应的重做日志之间的所有重做日志没有被回放。
作为本申请的另一个实施例,在所述磁盘恢复点大于或等于所述GBP起始点,且所述磁盘结束点大于或等于所述GBP结束点时,主机800还用于启动后台线程,所述后台线程用于将位于所述GBP节点上的所有页面拉取到页面缓冲区。
可选的,所述后台线程通过所述第一数据传输协议将位于所述GBP节点上的所有页面拉取到页面缓冲区。
需要说明的是,在所述主机执行完回放步骤之后,以及在需要被访问的页面还位于所述GBP节点的页面缓冲区时,主机800还用于从所述GBP节点的页面缓冲区中读取所述需要被访问的页面。
如图16所示,本申请还提供了第四种数据库系统的故障修复方法。其中,该第四种故障修复方法的执行主体是图14中的主机800。其中,该故障修复方法包括如下步骤。
S311、在正常工作时,通过第一数据传输协议将多个页面发送给GBP节点。
S313、在发生故障时,确定GBP起始点、GBP恢复点和GBP结束点。
S315、在磁盘恢复点大于或等于所述GBP起始点,且磁盘结束点大于或等于所述GBP结束点时,回放位于所述GBP恢复点对应的重做日志和所述磁盘结束点对应的重做日志之间的所有重做日志。
值得注意的是,所述多个页面被写入所述GBP节点的缓存队列,所述多个页面对应的日志序列号LSN按照从所述缓存队列的头部到尾部的顺序递增。
需要解释的是,所述GBP起始点、所述GBP恢复点、所述GBP结束点、所述磁盘恢复点以及所述磁盘结束点的定义请参考前文的相关描述,此处不再赘述。
由于发明内容部分已经该第四种故障修复方法做了比较多的解释,因此此处不再赘述。值得注意的是,本申请中针对主机、备机和GBP节点组成的数据库系统执行的故障修复方法,做了最详细的说明,其他实施例由于与该前述实施例均有很强的关联关系,因此均可以参考该被做了最详细说明的实施例。为了不重复赘述相同的内容,对于后面的其他所有实施例,均描述的比较简单。但应当知道的是,对于每一描述简单的实施例,均可以通过参考发明内容部分以及前述该被做了最详细说明的实施例,进行理解。
本申请还提供了第四种计算设备1000,该第四种计算设备1000可以用于执行前述第四种故障修复方法,也即该第四种计算设备1000可以实现前述第四种故障修复方法中主机的功能。如图17所示,所述计算设备1000至少包括发送单元1010、确定单元1020和回放单元1030。
具体的,在正常工作时,所述发送单元1010用于使用第一数据传输协议将多个页面发送给GBP节点。其中,所述多个页面对应的日志序列号LSN按照从所述缓存队列的头部到尾部的顺序递增。
在发生故障时,所述确定单元1020用于确定GBP起始点、GBP恢复点和GBP结束点。
在磁盘恢复点大于或等于所述GBP起始点,且磁盘结束点大于或等于所述GBP结束点时,所述回放单元1030用于回放位于所述GBP恢复点对应的重做日志和所述磁盘结束点对应的重做日志之间的所有重做日志。
可选的,所述计算设备还包括启动单元1040。在所述磁盘恢复点大于或等于所述GBP起始点,且所述磁盘结束点大于或等于所述GBP结束点时,所述启动单元1040用于启动后台线程,其中,所述后台线程用于将位于所述GBP节点上的所有页面拉取到所述计算设备的页面缓冲区。
进一步地,所述计算设备还包括读取单元1050,在执行完回放步骤之后,以及在需要被访问的页面还位于所述GBP节点时,所述读取单元1050用于从所述GBP节点中读取所述需要被访问的页面。
本申请还提供了第五种计算设备2000,该第五种计算设备2000可以用于执行前述第三种故障修复方法。如图18所示,该计算设备2000的硬件层2010上运行有操作系统2020,操作系统2020上运行有应用程序2030。硬件层2010包含处理器2011、存储器2012和I/O接口2013等。存储器2012中存储有可执行代码,该可执行代码在被处理器2011执行时被配置为实现计算设备2000的组件和功能。
在本实施例中,所述存储器2012用于存储GBP起始点、GBP恢复点、GBP结束点、磁盘恢复点和磁盘结束点。
在正常工作时,所述处理器2011用于使用第一数据传输协议将多个页面发送给GBP节点。其中,所述多个页面对应的日志序列号LSN按照从所述缓存队列的头部到尾部的顺序递增。在发生故障时,所述处理器2011还用于确定所述GBP起始点、所述GBP恢复点和所述GBP结束点。
在所述磁盘恢复点大于或等于所述GBP起始点,且所述磁盘结束点大于或等于所述GBP 结束点时,所述处理器2011还用于回放位于所述GBP恢复点对应的重做日志和所述磁盘结束点对应的重做日志之间的所有重做日志。
需要说明的是,在所述磁盘恢复点大于或等于所述GBP起始点,且所述磁盘结束点大于或等于所述GBP结束点时,所述处理器2011还用于启动后台线程,所述后台线程用于将位于所述GBP节点上的所有页面拉取到页面缓冲区。
可选的,在执行完回放步骤之后(在备机被拉起之后),以及在需要被访问的页面还位于所述GBP节点中时,所述处理器2011还用于从所述GBP节点中读取所述需要被访问的页面。
本申请还提供第二种数据备份方法,相对于前述第一种数据备份方法,该第二种数据备份方法的执行主体是GBP节点。该GBP节点既可以是第一种故障修复方法中所述的GBP节点,也可以是第三种故障修复方法中所述的GBP节点。如图19所示,该第二种数据备份方法包括如下步骤。
S401、通过RDMA协议接收来自主机的多个页面。
S403、将所述多个页面写入缓存队列中。其中,所述多个页面各自包含的LSN按照从所述缓存队列的头部到尾部的顺序递增。
S405、根据所述多个页面各自的LSN,维护GBP起始点、GBP恢复点和GBP结束点,以便在主机发生故障时,根据所述GBP起始点、所述GBP恢复点和所述GBP结束点进行故障修复。
作为该第二种数据备份方法的一个实施例,在接收到内存中不存在的新页面时,S403具体包括:将所述新页面放入所述缓存队列的尾部。
关于“新页面”的解释可以参见前述第一种故障修复方法对应的实施例中关于“新页面”的解释,此处不再赘述。
作为另一个实施例,在接收到内存中不存在的新页面,且所述缓存队列已满的情况下,S403具体包括:将位于所述缓存队列头部的页面淘汰掉,将所述新页面存入所述缓存队列的尾部,并将所述GBP起始点更新为新的位于所述缓存队列头部的页面对应的LSN。
作为又一种实施例,在接收到内存中已经存在的新页面时,S403具体包括:使用所述新页面对已经存在的对应页面进行更新,并将更新后的页面放入所述缓存队列的尾部。
应当知道的是,每接收到一批页面,所述GBP恢复点和所述GBP结束点一定会被更新,而所述GBP起始点可能会被更新。由于所述GBP起始点、所述GBP恢复点和所述GBP结束点的定义前文已有描述,因此此处不再赘述,应当知道的是,只需要根据它们各自的定义以及接收到的页面,对它们进行更新即可。
由于在第三种故障修复方法对应的实施例中,所述GBP节点与所述主机的备机可以是重合的,也即所述备机既可以实现第一种故障修复方法中的备机的功能,又可以实现第一种故障修复方法中的GBP节点的功能,或者说,所述备机安装有能够实现全局页面缓存功能的应用程序。在所述GBP节点与所述备机重合的情况下,所述第二种数据备份方法还包括:接收多个重做日志,并通过回放所述多个重做日志,得到与所述多个重做日志中每一重做日志对应的页面。
本申请还提供了第六种计算设备3000,该第六种计算设备3000可以执行前述第二种数据备份方法,也即该第六种计算设备3000可以实现前述实施例中所述GBP节点的功能。
具体的,如图20所示,该第六种计算设备3000至少包括接收单元3010、写入单元 3020和维护单位3030。所述接收单元3010用于通过RDMA协议接收来自主机的多个页面。所述写入单元3020用于将所述多个页面写入缓队列中。值得注意的是,所述多个页面对应的日志序列号LSN按照从所述缓存队列的头部到尾部的顺序递增。所述维护单元3030用于根据所述多个页面中每个页面的LSN,维护GBP起始点、GBP恢复点和GBP结束点,以便在所述主机发生故障时,根据所述GBP起始点、所述GBP恢复点和所述GBP结束点进行故障修复。
应当知道的是,在接收到内存中不存在的新页面时,所述写入单元3020还用于将所述新页面放入所述缓存队列的尾部。
值得注意的是,在接收到内存中不存在的新页面,且所述缓存队列已满的情况下,所述写入单元3020还用于将位于所述缓存队列头部的页面淘汰掉,并将所述新页面存入所述缓存队列的尾部。相应的,所述维护单元3030还用于将所述GBP起始点更新为新的位于所述缓存队列头部的页面对应的LSN。
进一步地,在接收到内存中已经存在的新页面时,所述写入单元3020还用于使用所述新页面对已经存在的对应页面进行更新,并将更新后的页面放入所述缓存队列的尾部。
应当知道的是,每接收到一批页面,所述维护单元3030还用于根据接收到的页面更新所述GBP恢复点和所述GBP结束点。
在所述第六种计算设备既能够实现前述实施例中所述GBP节点的功能,又能实现所述备机的功能时,所述接收单元还用于接收多个重做日志,另外所述第六种计算设备还包括回放单元。所述回放单元用于回放所述多个重做日志,得到与所述多个重做日志中每一重做日志对应的页面。
本申请还提供了第七种计算设备4000,该第七种计算设备4000也可以执行前述第二种数据备份方法,换句话说,该第七种计算设备4000可以实现前述实施例中所述GBP节点的功能。具体的,如图21所示,该计算设备4000的硬件层4010上运行有操作系统4020,操作系统4020上运行有应用程序4030。硬件层4010包含处理器4011、存储器4012和I/O接口4013等。存储器4012中存储有可执行代码,该可执行代码在被处理器4011执行时被配置为实现计算设备4000的组件和功能。
在本实施例中,所述I/O接口4013用于通过RDMA协议接收来自主机的多个页面。所述处理器4012用于将所述多个页面依次写入缓队列中,并根据所述多个页面中每个页面包含的LSN,维护GBP起始点、GBP恢复点和GBP结束点。
值得注意的是,所述多个页面对应的LSN按照从所述缓存队列的头部到尾部的顺序递增。另外,维护所述GBP起始点、所述GBP恢复点和所述GBP结束点的目的是为了在所述主机发生故障时,可以根据所述GBP起始点、所述GBP恢复点和所述GBP结束点进行故障修复。
应当知道的是,在接收到内存中不存在的新页面时,所述处理器4012还用于将所述新页面放入所述缓存队列的尾部。
值得注意的是,在接收到内存中不存在的新页面,且所述缓存队列已满的情况下,所述处理器4012还用于将位于所述缓存队列头部的页面淘汰掉,将所述新页面存入所述缓存队列的尾部,以及将所述GBP起始点更新为新的位于所述缓存队列头部的页面对应的LSN。
进一步地,在接收到内存中已经存在的新页面时,所述处理器4012还用于使用所述新页面对已经存在的对应页面进行更新,并将更新后的页面放入所述缓存队列的尾部。
应当知道的是,每接收到一批页面,所述处理器4012还用于根据接收到的页面更新所述GBP恢复点和所述GBP结束点。
在所述第七种计算设备4000既能够实现前述实施例中所述GBP节点的功能,又能实现所述备机的功能时,所述处理器4012还用于接收多个重做日志,并回放所述多个重做日志,得到与所述多个重做日志中每一重做日志对应的页面。
需要说明的是,本申请涉及到多个保护主体,每个保护主体对应多个实施例,但是这些保护主体之间,以及这些实施例之间均是关联的。本申请中,在对包括主机、备机和GBP节点的数据库系统的故障修复方法进行描述之前,描述了很多通用的内容,这些内容,对后面所有相关的实施例均是使用。另外,本申请中,除了对包括主机、备机和GBP节点的数据库系统的故障修复方法描述的较为详细之外,对于其他实施例描述的均比较简单。应当知道的是,其他所有实施例均可以参见本申请中任一相关部分的内容进行理解,在本申请中,各个实施例之间可以相互参见。

Claims (35)

  1. 一种数据库系统的故障修复方法,其特征在于,包括:
    在主机正常工作时,所述主机使用第一数据传输协议将多个页面发送给全局页面缓冲池GBP节点,
    所述GBP节点将所述多个页面写入到所述GBP节点的缓存队列,且所述多个页面对应的日志序列号LSN按照从所述缓存队列的头部到尾部的顺序递增;
    在所述主机发生故障时,所述备机确定GBP起始点、GBP恢复点和GBP结束点,所述GBP起始点指示所述GBP节点上存储的所有页面中所包括的最小的LSN;所述GBP恢复点指示所述GBP节点最近一次接收到的一批页面中所包括的最小的LSN;所述GBP结束点指示所述GBP节点最近一次接收到的所述一批页面中所包括的最大的LSN;
    在磁盘恢复点大于或等于所述GBP起始点,且磁盘结束点大于或等于所述GBP结束点时,所述备机回放位于所述GBP恢复点所对应的重做日志和所述磁盘结束点所对应的重做日志之间的所有重做日志,所述磁盘恢复点指示所述备机的磁盘中最近一批被写入的多个页面所包含的最小的LSN,所述磁盘结束点指示所述备机所接收的最后一条重做日志的LSN。
  2. 根据权利要求1所述的故障修复方法,其特征在于,位于所述磁盘恢复点所对应的重做日志和所述GBP恢复点所对应的重做日志之间的所有重做日志没有被回放。
  3. 根据权利要求1或2所述的故障修复方法,其特征在于,所述GBP节点维护了所述GBP恢复点和所述GBP结束点,则在所述GBP节点在将所述多个页面写入到所述GBP节点的缓存队列之后,所述方法还包括:
    所述GBP节点根据所述多个页面,更新所述GBP恢复点和所述GBP结束点;
    所述备机确定GBP恢复点和GBP结束点,包括:
    所述备机从所述GBP节点中获取更新后的所述GBP恢复点和所述GBP结束点。
  4. 根据权利要求1或2所述的故障修复方法,其特征在于,所述GBP节点维护所述GBP起始点,
    在所述GBP节点接收到所述GBP节点的页面缓冲区中不存在的新页面且所述GBP节点的页面缓冲区已满时,所述方法还包括:
    所述GBP节点将位于所述缓存队列头部的页面淘汰掉,并将所述GBP起始点更新为所述缓存队列的新的头部页面对应的LSN;
    所述备机确定GBP起始点,包括:所述备机从所述GBP节点中获取更新后的所述GBP起始点。
  5. 根据权利要求1或2所述的故障修复方法,其特征在于,
    在所述GBP节点接收到所述GBP节点的页面缓冲区中不存在的新页面时,
    所述GBP节点将所述多个页面写入到所述GBP节点的缓存队列,包括:
    所述GBP节点将所述新页面放入所述缓存队列的尾部;
    在所述GBP节点接收到所述GBP节点的页面缓冲区中已经存在的新页面时,
    所述GBP节点将所述多个页面写入到所述GBP节点的缓存队列,包括:
    所述GBP节点根据接收到的所述新页面对已经存在的对应页面进行更新,并将更新后的所述新页面放在所述缓存队列的尾部。
  6. 根据权利要求1至5任一项所述的故障修复方法,其特征在于,还包括:在所述磁盘恢复点大于或等于所述GBP起始点,且所述磁盘结束点大于或等于所述GBP结束点时,所 述备机启动后台线程,所述后台线程用于将所述GBP节点上存储的所有页面拉取到所述备机的页面缓冲区。
  7. 根据权利要求6所述的故障修复方法,其特征在于,所述后台线程用于通过第二数据传输协议将所述GBP节点上存储的所有页面拉取到所述备机的页面缓冲区。
  8. 根据权利要求1至7任一项所述的故障修复方法,其特征在于,在所述备机执行完回放步骤之后,以及在所述备机上的应用需要访问的页面还位于所述GBP节点的页面缓冲区时,则所述应用从所述GBP节点的页面缓冲区中读取所述需要访问的页面。
  9. 一种数据库系统,其特征在于,包括主机、备机和全局页面缓冲池GBP节点;
    所述主机用于通过第一数据传输协议将多个页面发送给所述GBP节点;
    所述GBP节点用于将所述多个页面写入到所述GBP节点的缓存队列,且所述多个页面对应的日志序列号LSN按照从所述缓存队列的头部到尾部的顺序递增;
    在所述主机发生故障时,所述备机用于确定GBP起始点、GBP恢复点和GBP结束点,所述GBP起始点指示所述GBP节点上存储的所有页面中所包括的最小的日志序列号LSN,所述GBP恢复点指示所述GBP节点最近一次接收到的一批页面中所包括的最小的LSN,所述GBP结束点指示所述GBP节点最近一次接收到的所述一批页面中所包括的最大的LSN;
    在磁盘恢复点大于或等于所述GBP起始点,且磁盘结束点大于或等于所述GBP结束点的情况下,所述备机还用于回放位于所述GBP恢复点所对应的重做日志和所述磁盘结束点所对应的重做日志之间的所有重做日志,其中,所述磁盘恢复点指示所述备机的磁盘中最近一批被写入的多个页面所包含的最小的LSN,所处磁盘结束点指示所述备机所接收的最后一条重做日志的LSN。
  10. 根据权利要求9所述的系统,其特征在于,位于所述磁盘恢复点所对应的重做日志和所述GBP恢复点所对应的重做日志之间的所有重做日志没有被回放。
  11. 根据权利要求9或10所述的系统,其特征在于,在将所述多个页面写入到所述GBP节点的缓存队列之后,所述GBP节点还用于根据所述多个页面,更新所述GBP恢复点和所述GBP结束点;
    相应的,所述备机还用于从所述GBP节点中获取更新后的所述GBP恢复点和所述GBP结束点。
  12. 根据权利要求9或10所述的系统,其特征在于,在所述GBP节点接收到所述GBP节点的页面缓冲区中不存在的新页面且所述GBP节点的页面缓冲区已满时,所述GBP节点还用于将位于所述缓存队列头部的页面淘汰掉,并将所述GBP起始点更新为所述缓存队列的新的头部页面对应的LSN;
    相应的,所述备机还用于从所述GBP节点中获取更新后的所述GBP起始点。
  13. 根据权利要求9或10所述的系统,其特征在于,
    在所述GBP节点接收到所述GBP节点的页面缓冲区中不存在的新页面时,所述GBP节点还用于将所述新页面放入所述缓存队列的尾部;或,
    在所述GBP节点接收到所述GBP节点的页面缓冲区中已经存在的新页面时,所述GBP节点还用于根据接收到的所述新页面对已经存在的对应页面进行更新,并将更新后的所述新页面放在所述缓存队列的尾部。
  14. 根据权利要求9至13任一项所述的系统,其特征在于,在所述磁盘恢复点大于或等于所述GBP起始点,且所述磁盘结束点大于或等于所述GBP结束点时,所述备机还用于启 动后台线程,所述后台线程用于将所述GBP节点上存储的所有页面拉取到的所述备机的页面缓冲区。
  15. 根据权利要求14所述的系统,其特征在于,所述后台线程用于通过第二数据传输协议将所述GBP节点上存储的所有页面拉取到的所述备机的页面缓冲区。
  16. 一种数据库系统的故障修复方法,其特征在于,包括:
    在主机发生故障时,确定全局页面缓冲池GBP起始点、GBP恢复点和GBP结束点,所述GBP起始点指示GBP节点上存储的所有页面中所包括的最小的日志序列号LSN,所述GBP恢复点指示所述GBP节点最近一次接收到的一批页面中所包括的最小的LSN,所述GBP结束点指示所述GBP节点最近一次接收到的所述一批页面中所包括的最大的LSN,其中,所述GBP节点上存储的所有页面均是所述主机在正常工作期间,通过第一数据传输协议发送给所述GBP节点,并由所述GBP节点写入到所述GBP节点的缓存队列的,所述多个页面对应的LSN按照从所述缓存队列的头部到尾部的顺序递增;
    在磁盘恢复点大于或等于所述GBP起始点,且磁盘结束点大于或等于所述GBP结束点时,回放位于所述GBP恢复点所对应的重做日志和所述磁盘结束点所对应的重做日志之间的所有重做日志;所述磁盘恢复点指示所述备机的磁盘中最近一批被写入的多个页面所包含的最小的LSN,所述磁盘结束点指示所述备机所接收的最后一条重做日志的LSN。
  17. 根据权利要求16所述的故障修复方法,其特征在于,位于所述磁盘恢复点所对应的重做日志以及所述GBP恢复点所对应的重做日志之间的所有重做日志没有被回放。
  18. 根据权利要求16或17所述的故障修复方法,其特征在于,在所述磁盘恢复点大于或等于所述GBP起始点,且所述磁盘结束点大于或等于所述GBP结束点时,所述方法还包括:启动后台线程,所述后台线程用于将所述GBP节点上存储的所有页面拉取到页面缓冲区。
  19. 根据权利要求18所述的故障修复方法,其特征在于,所述后台线程用于通过第二数据传输协议将所述GBP节点上存储的所有页面拉取到页面缓冲区。
  20. 根据权利要求16至19任一项所述的故障修复方法,其特征在于,在执行完所述回放步骤之后,以及在需要被访问的页面还位于所述GBP节点的页面缓冲区时,所述方法还包括:从所述GBP节点的页面缓冲区中读取所述需要被访问的页面。
  21. 一种计算设备,其特征在于,包括确定单元和回放单元,
    在主机发生故障时,所述确定单元用于确定全局页面缓冲池GBP起始点、GBP恢复点和GBP结束点,所述GBP起始点指示GBP节点上存储的所有页面中所包括的最小的日志序列号LSN,所述GBP恢复点指示所述GBP节点最近一次接收到的一批页面中所包括的最小的LSN,所述GBP结束点指示所述GBP节点最近一次接收到的所述一批页面中所包括的最大的LSN,其中,所述GBP节点上存储的所有页面均是所述主机在正常工作期间,通过第一数据传输协议发送给所述GBP节点,并由所述GBP节点写入到所述GBP节点的缓存队列的,所述多个页面对应的LSN按照从所述缓存队列的头部到尾部的顺序递增;
    在磁盘恢复点大于或等于所述GBP起始点,且磁盘结束点大于或等于所述GBP结束点时,所述回放单元用于回放位于所述GBP恢复点所对应的重做日志和所述磁盘结束点所对应的重做日志之间的所有重做日志;所述磁盘恢复点指示所述备机的磁盘中最近一批被写入的多个页面所包含的最小的LSN,所述磁盘结束点指示所述备机所接收的最后一条重做日志的LSN。
  22. 根据权利要求21所述的设备,其特征在于,位于所述磁盘恢复点所对应的重做日志以及所述GBP恢复点所对应的重做日志之间的所有重做日志没有被回放。
  23. 根据权利要求21或22所述的设备,其特征在于,还包括启动单元,在所述磁盘恢复点大于或等于所述GBP起始点,且所述磁盘结束点大于或等于所述GBP结束点时,所述启动单元用于启动后台线程,所述后台线程用于将所述GBP节点上存储的所有页面拉取到页面缓冲区。
  24. 根据权利要求23所述的设备,其特征在于,所述后台线程用于通过第二数据传输协议将所述GBP节点上存储的所有页面拉取到页面缓冲区。
  25. 根据权利要求21至24任一项所述的设备,其特征在于,还包括读取单元,在所述回放单元回放完重做日志之后,以及在需要被访问的页面还位于所述GBP节点的页面缓冲区时,所述读取单元用于从所述GBP节点的页面缓冲区中读取所述需要被访问的页面。
  26. 一种数据库系统的故障修复方法,其特征在于,包括:
    在主机正常工作时,所述主机使用第一数据传输协议将多个页面发送给全局页面缓冲池GBP节点;
    所述GBP节点将所述多个页面写入所述GBP节点的缓存队列中,所述多个页面对应的日志序列号LSN按照从所述缓存队列的头部到尾部的顺序递增;
    在所述主机发生故障时,所述主机确定GBP起始点、GBP恢复点和GBP结束点,所述GBP起始点指示所述GBP节点上存储的所有页面中所包括的最小的LSN,所述GBP恢复点指示所述GBP节点最近一次接收的一批页面中所包括的最小的LSN,所述GBP结束点指示所述GBP节点最近一次接收到的所述一批页面中所包括的最大的LSN;
    在磁盘恢复点大于或等于所述GBP起始点,且磁盘结束点大于或等于所述GBP结束点时,所述主机回放位于所述GBP恢复点对应的重做日志和所述磁盘结束点对应的重做日志之间的所有重做日志,其中,所述磁盘恢复点指示本地磁盘最近一批被写入的多个页面中所包括的最小的LSN,所述磁盘结束点指示所接收到的最后一条重做日志的LSN。
  27. 根据权利要求26所述的故障修复方法,其特征在于,位于所述磁盘恢复点对应的重做日志和所述GBP恢复点对应的重做日志之间的所有重做日志没有被回放。
  28. 根据权利要求26或27所述的故障修复方法,其特征在于,在所述磁盘恢复点大于或等于所述GBP起始点,且所述磁盘结束点大于或等于所述GBP结束点时,所述方法还包括:所述主机启动后台线程,所述后台线程用于将位于所述GBP节点上的所有页面拉取到页面缓冲区。
  29. 根据权利要求26至28任一项所述的故障修复方法,其特征在于,在所述主机执行完回放步骤之后,以及在需要被访问的页面还位于所述GBP节点的页面缓冲区时,所述方法还包括:
    所述主机从所述GBP节点的页面缓冲区中读取所述需要被访问的页面。
  30. 一种数据库系统,其特征在于,包括主机和全局资源缓冲池GBP节点,
    所述主机用于通过第一数据传输协议向所述GBP节点发送多个页面;
    所述GBP节点用于将所述多个页面写入所述GBP节点的缓存队列,所述多个页面各自包含的日志序列号LSN按照从所述缓存队列的头部到尾部的顺序递增;
    在所述主机发生故障时,所述主机还用于确定GBP起始点、GBP恢复点和GBP结束点,所述GBP起始点指示所述GBP节点上存储的所有页面中所包括的最小的LSN,所述GBP恢 复点指示所述GBP节点最近一次接收的一批页面中所包括的最小的LSN,所述GBP结束点指示所述GBP节点最近一次接收到的所述一批页面中所包括的最大的LSN;
    在磁盘恢复点大于或等于所述GBP起始点,且磁盘结束点大于或等于所述GBP结束点时,所述主机还用于回放位于所述GBP恢复点对应的重做日志和所述磁盘结束点对应的重做日志之间的所有重做日志,其中,所述磁盘恢复点指示本地磁盘最近一批被写入的多个页面中所包括的最小的LSN,所述磁盘结束点指示所接收到的最后一条重做日志的LSN。
  31. 根据权利要求30所述的数据库系统,其特征在于,位于所述磁盘恢复点对应的重做日志和所述GBP恢复点对应的重做日志之间的所有重做日志没有被回放。
  32. 根据权利要求30或31所述的数据库系统,其特征在于,在所述磁盘恢复点大于或等于所述GBP起始点,且所述磁盘结束点大于或等于所述GBP结束点时,所述主机还用于启动后台线程,所述后台线程用于将位于所述GBP节点上的所有页面拉取到页面缓冲区。
  33. 根据权利要求30至32任一项所述的数据库系统,其特征在于,在所述主机执行完回放步骤之后,以及在需要被访问的页面还位于所述GBP节点的页面缓冲区时,所述主机还用于从所述GBP节点的页面缓冲区中读取所述需要被访问的页面。
  34. 一种数据库系统的故障修复方法,其特征在于,包括:
    在正常工作时,通过第一数据传输协议将多个页面发送给全局页面缓冲池GBP节点,所述多个页面被写入所述GBP节点的缓存队列,所述多个页面对应的日志序列号LSN按照从所述缓存队列的头部到尾部的顺序递增;
    在发生故障时,确定GBP起始点、GBP恢复点和GBP结束点,所述GBP起始点指示所述GBP节点上存储的所有页面中所包括的最小的LSN,所述GBP恢复点指示所述GBP节点最近一次接收的一批页面中所包括的最小的LSN,所述GBP结束点指示所述GBP节点最近一次接收到的所述一批页面中所包括的最大的LSN;
    在磁盘恢复点大于或等于所述GBP起始点,且磁盘结束点大于或等于所述GBP结束点时,回放位于所述GBP恢复点对应的重做日志和所述磁盘结束点对应的重做日志之间的所有重做日志,其中,所述磁盘恢复点指示本地磁盘最近一批被写入的多个页面中所包括的最小的LSN,所述磁盘结束点指示所接收到的最后一条重做日志的LSN。
  35. 一种计算设备,其特征在于,包括:
    传输单元,用于通过第一数据传输协议将多个页面发送给全局页面缓冲池GBP节点,所述多个页面被写入所述GBP节点的缓存队列,所述多个页面对应的日志序列号LSN按照从所述缓存队列的头部到尾部的顺序递增;
    在发送故障时,确定单元,用于确定GBP起始点、GBP恢复点和GBP结束点,所述GBP起始点指示所述GBP节点上存储的所有页面中所包括的最小的LSN,所述GBP恢复点指示所述GBP节点最近一次接收的一批页面中所包括的最小的LSN,所述GBP结束点指示所述GBP节点最近一次接收到的所述一批页面中所包括的最大的LSN;
    在磁盘恢复点大于或等于所述GBP起始点,且磁盘结束点大于或等于所述GBP结束点时,回放单元,用于回放位于所述GBP恢复点对应的重做日志和所述磁盘结束点对应的重做日志之间的所有重做日志,其中,所述磁盘恢复点指示本地磁盘最近一批被写入的多个页面中所包括的最小的LSN,所述磁盘结束点指示所接收到的最后一条重做日志的LSN。
PCT/CN2020/089909 2019-05-13 2020-05-13 数据库系统的故障修复方法、数据库系统和计算设备 WO2020228712A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CA3137745A CA3137745C (en) 2019-05-13 2020-05-13 Fault repair method for database system, database system, and computing device
EP20806356.0A EP3961400B1 (en) 2019-05-13 2020-05-13 Method for repairing database system failures, database system and computing device
US17/525,415 US11829260B2 (en) 2019-05-13 2021-11-12 Fault repair method for database system, database system, and computing device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910395371.7 2019-05-13
CN201910395371.7A CN111930558B (zh) 2019-05-13 2019-05-13 数据库系统的故障修复方法、数据库系统和计算设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/525,415 Continuation US11829260B2 (en) 2019-05-13 2021-11-12 Fault repair method for database system, database system, and computing device

Publications (1)

Publication Number Publication Date
WO2020228712A1 true WO2020228712A1 (zh) 2020-11-19

Family

ID=73282691

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/089909 WO2020228712A1 (zh) 2019-05-13 2020-05-13 数据库系统的故障修复方法、数据库系统和计算设备

Country Status (5)

Country Link
US (1) US11829260B2 (zh)
EP (1) EP3961400B1 (zh)
CN (1) CN111930558B (zh)
CA (1) CA3137745C (zh)
WO (1) WO2020228712A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7363413B2 (ja) * 2019-11-27 2023-10-18 富士通株式会社 情報処理装置、情報処理システム及びプログラム
CN112395141B (zh) * 2020-11-25 2023-07-21 上海达梦数据库有限公司 一种数据页管理方法、装置、电子设备及存储介质
US20220261356A1 (en) * 2021-02-16 2022-08-18 Nyriad, Inc. Cache operation for a persistent storage device
CN113254277B (zh) * 2021-06-15 2021-11-02 云宏信息科技股份有限公司 存储集群osd故障修复方法、存储介质、监视器及存储集群

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150254264A1 (en) * 2013-12-30 2015-09-10 Huawei Technologies Co., Ltd. Method for Recording Transaction Log, and Database Engine
CN106815275A (zh) * 2015-12-02 2017-06-09 阿里巴巴集团控股有限公司 一种通过备用数据库实现主备数据库同步的方法与设备
US20170300391A1 (en) * 2016-04-14 2017-10-19 Sap Se Scalable Log Partitioning System
CN108874588A (zh) * 2018-06-08 2018-11-23 郑州云海信息技术有限公司 一种数据库实例恢复方法和装置

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5333303A (en) * 1991-03-28 1994-07-26 International Business Machines Corporation Method for providing data availability in a transaction-oriented system during restart after a failure
US5951695A (en) * 1997-07-25 1999-09-14 Hewlett-Packard Company Fast database failover
US7277905B2 (en) 2004-03-31 2007-10-02 Microsoft Corporation System and method for a consistency check of a database backup
US7831772B2 (en) * 2006-12-12 2010-11-09 Sybase, Inc. System and methodology providing multiple heterogeneous buffer caches
US9135123B1 (en) * 2011-12-28 2015-09-15 Emc Corporation Managing global data caches for file system
CN103530253B (zh) 2013-09-30 2016-08-17 华为技术有限公司 集群多全局缓冲池系统、中心节点、计算节点及管理方法
US9767178B2 (en) * 2013-10-30 2017-09-19 Oracle International Corporation Multi-instance redo apply
CN104216806B (zh) * 2014-07-24 2016-04-06 上海英方软件股份有限公司 一种文件系统序列化操作日志的捕获与传输方法及其装置
CN104573112B (zh) * 2015-01-30 2018-03-02 华为技术有限公司 Oltp集群数据库中页面查询方法及数据处理节点
US9772911B2 (en) * 2015-03-27 2017-09-26 International Business Machines Corporation Pooling work across multiple transactions for reducing contention in operational analytics systems
CN108363641B (zh) * 2017-01-26 2022-01-14 华为技术有限公司 一种主备机数据传递方法、控制节点以及数据库系统
US10769034B2 (en) * 2017-03-07 2020-09-08 Sap Se Caching DML statement context during asynchronous database system replication

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150254264A1 (en) * 2013-12-30 2015-09-10 Huawei Technologies Co., Ltd. Method for Recording Transaction Log, and Database Engine
CN106815275A (zh) * 2015-12-02 2017-06-09 阿里巴巴集团控股有限公司 一种通过备用数据库实现主备数据库同步的方法与设备
US20170300391A1 (en) * 2016-04-14 2017-10-19 Sap Se Scalable Log Partitioning System
CN108874588A (zh) * 2018-06-08 2018-11-23 郑州云海信息技术有限公司 一种数据库实例恢复方法和装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3961400A4 *

Also Published As

Publication number Publication date
CN111930558A (zh) 2020-11-13
EP3961400B1 (en) 2023-07-05
CA3137745C (en) 2024-04-02
US20220066886A1 (en) 2022-03-03
CN111930558B (zh) 2023-03-03
US11829260B2 (en) 2023-11-28
EP3961400A1 (en) 2022-03-02
EP3961400A4 (en) 2022-07-06
CA3137745A1 (en) 2020-11-19

Similar Documents

Publication Publication Date Title
WO2020228712A1 (zh) 数据库系统的故障修复方法、数据库系统和计算设备
US8667330B1 (en) Information lifecycle management assisted synchronous replication
US9753663B1 (en) Triangular asynchronous replication
US7908448B1 (en) Maintaining data consistency in mirrored cluster storage systems with write-back cache
US7779295B1 (en) Method and apparatus for creating and using persistent images of distributed shared memory segments and in-memory checkpoints
US7885923B1 (en) On demand consistency checkpoints for temporal volumes within consistency interval marker based replication
JP4581500B2 (ja) ディザスタリカバリシステム、プログラム及びデータベースのリカバリ方法
JP6026538B2 (ja) 検証されたデータセットの不揮発性媒体ジャーナリング
US7644300B1 (en) Fast resynchronization of data from a remote copy
US20120017040A1 (en) Maintaining Data Consistency in Mirrored Cluster Storage Systems Using Bitmap Write-Intent Logging
JPH07239799A (ja) 遠隔データ・シャドーイングを提供する方法および遠隔データ二重化システム
US20050149666A1 (en) Virtual ordered writes
US20080178050A1 (en) Data backup system and method for synchronizing a replication of permanent data and temporary data in the event of an operational error
KR102016095B1 (ko) 트랜잭셔널 미들웨어 머신 환경에서 트랜잭션 레코드들을 유지하기 위한 시스템 및 방법
JP2006023889A (ja) リモートコピーシステム及び記憶装置システム
WO2007126862A1 (en) Switchover/failover for triangular asynchronous replication
CN112256485B (zh) 数据备份方法、装置、介质和计算设备
US20150074219A1 (en) High availability networking using transactional memory
US7228396B2 (en) Switching between virtual ordered writes mode and synchronous or semi-synchronous RDF transfer mode
WO2019109256A1 (zh) 一种日志管理方法、服务器和数据库系统
WO2022033269A1 (zh) 数据处理的方法、设备及系统
JP2003223350A (ja) データベースシステム
US8942073B1 (en) Maintaining tape emulation consistency
CN113326251A (zh) 数据管理方法、系统、设备和存储介质
JPH10326220A (ja) ファイルシステムおよびファイル管理方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20806356

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 3137745

Country of ref document: CA

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2020806356

Country of ref document: EP

Effective date: 20211126