CN110807064A

CN110807064A - Data recovery device in RAC distributed database cluster system

Info

Publication number: CN110807064A
Application number: CN201911032746.XA
Authority: CN
Inventors: 梁继良; 孙家彦; 张震阳; 赵宗鹏; 赵健; 曹宝峰; 张争; 陈坤坤
Original assignee: BEIJING UXSINO SOFTWARE Co Ltd
Current assignee: BEIJING UXSINO SOFTWARE Co Ltd
Priority date: 2019-10-28
Filing date: 2019-10-28
Publication date: 2020-02-18
Anticipated expiration: 2039-10-28
Also published as: CN110807064B

Abstract

The embodiment of the invention provides a data recovery device in an RAC distributed database cluster system, which is applied to each database node in the RAC distributed database cluster system and comprises the following steps: the control sub-service is used for acquiring a fault processing request of the RAC distributed database cluster system, controlling the log scanning sub-service and recovering the execution of the execution sub-service; the log scanning sub-service is used for combining log records of a plurality of database nodes according to a log scanning command sent by the control sub-service, determining data items and recovery sequences to be recovered and generating recovery logs; and the recovery execution sub-service is used for executing data recovery operation on the data items needing to be recovered according to the recovery log based on the recovery execution command sent by the control sub-service, and writing the contents of all the data items which are completely recovered into the disk. Data recovery in the RAC distributed database cluster system can be achieved, and high availability of the RAC distributed database cluster system is improved.

Description

Data recovery device in RAC distributed database cluster system

Technical Field

The invention relates to the technical field of computers, in particular to a data recovery device in an RAC distributed database cluster system.

Background

High Availability (High Availability) is one of the factors that must be considered in the architecture design of the RAC (real application clusters) distributed database cluster system, and generally means that the time during which the system cannot provide services is reduced by design. If a system is able to provide service without interruption, the availability of that system is said to be 100%. That if 1 time unit occurs to fail to provide service every 100 time units the system is running, then the availability of that system is 99%.

In order to realize high availability of the RAC distributed database cluster system, the core criterion of the architectural design is to provide redundancy. The system needs to operate continuously for 24 hours all day long, a corresponding redundancy mechanism is needed to prevent a certain machine from being inaccessible when the machine is down, and the redundancy can realize high availability of service by deploying at least two servers to form a cluster.

The database system redundancy mechanism used in a common scene has modes of 'one master multiple backup', 'two places and three centers', and the like, provides multiple redundancy backup modes for a database system, and provides an effective and high-availability scheme aiming at service availability requirements, but the prior art does not provide a data recovery scheme capable of effectively improving the high availability of an RAC distributed database cluster system.

Disclosure of Invention

Aiming at the problems in the prior art, the embodiment of the invention provides a data recovery device in an RAC distributed database cluster system.

The embodiment of the invention provides a data recovery device in an RAC distributed database cluster system, which is applied to each database node in the RAC distributed database cluster system and comprises the following steps: the control sub-service, the log scanning sub-service and the recovery execution sub-service;

the control sub-service is used for acquiring a fault processing request of the RAC distributed database cluster system and sending a log scanning command to the log scanning sub-service;

the log scanning sub-service is used for combining log records of a plurality of database nodes in the RAC distributed database cluster system to determine the data items to be recovered and the recovery sequence of the data items to be recovered to generate recovery logs if a log scanning command sent by the control sub-service is received;

the control sub-service is further configured to send a recovery execution command to the recovery execution sub-service if it is detected that the log scanning sub-service has generated a recovery log;

and the recovery execution sub-service is configured to, if a recovery execution command sent by the control sub-service is received, perform a data recovery operation on the data items needing to be recovered in the recovery log according to the recovery sequence of the data items needing to be recovered in the recovery log, and write the contents of all the data items that have completed recovery into a disk.

Optionally, the control sub-service is also used for

Keeping communication with the recovery execution sub-service, and recording the content of the data item which is completely recovered by the data recovery operation executed by the recovery execution sub-service; if the data recovery operation of the data item needing to be recovered is detected to be completed by the recovery execution sub-service, the content of the data item which is completed by the data recovery operation executed by the recovery execution sub-service and is recovered is sent to the RAC distributed database cluster system, so that the RAC distributed database cluster system converts the data item which is completed by the recovery in the state of global data resource management, from the unavailable state to the available state.

Optionally, the resume execution sub-service is specifically for

If a recovery execution command sent by the control sub-service is received, where the recovery execution command carries a failed database node of the RAC distributed database cluster system, starting a database connection with the failed database node and a recovery transaction, and performing a data recovery operation on data items to be recovered in the recovery log according to a recovery sequence of the data items to be recovered in the recovery log in the recovery transaction.

Optionally, the log scanning sub-service is specifically for

If a log scanning command sent by the control sub-service is received, the log scanning command carries a fault database node of the RAC distributed database cluster system, and for the log record of the fault database node, the log record is scanned and filtered in combination with the log records of other health database nodes except the fault database node in the RAC distributed database cluster system, the recovery sequence of the data items needing to be recovered and the data items needing to be recovered is determined, and a recovery log is generated.

Optionally, the log scanning sub-service includes:

the scanning unit is used for scanning the log record of the fault database node if a log scanning command sent by the control sub-service is received, wherein the log scanning command carries the fault database node of the RAC distributed database cluster system, and taking all data items which are modified and are not written into a disk in the log record of the fault database node as data items needing to be recovered; for each data item needing to be recovered in the log record of the fault database node, judging whether the data item modification of the redo record of the data item needing to be recovered currently comprises a cross-database node, if so, storing the redo record of the data item needing to be recovered currently into a cross-node log record list, and if not, storing the redo record of the data item needing to be recovered currently into a single-node log record list; after scanning all data items in the log record of the fault database node, acquiring the data item modification of the redo record of each data item in the cross-node log record list, scanning the log record of the cross-database node, and storing the redo record of the data item needing to be recovered in the scanned log record of the cross-database node into the cross-node log record list;

the sorting unit is used for combining and sorting the redo records of all the data items needing to be recovered in the single-node log record list and the redo records of all the data items needing to be recovered in the cross-node log record list, and determining the recovery sequence of the data items needing to be recovered;

and the generating unit is used for generating a recovery log according to the data items needing to be recovered and the recovery sequence of the data items needing to be recovered.

Optionally, the log records are pre-write WAL logs, and each pre-write WAL log includes: the system comprises a transaction ID number and a WAL log serial number, wherein the transaction ID number describes the sequence of starting database transactions, and the WAL log serial number describes the sequence of ending database transactions;

accordingly, the sorting unit is particularly adapted for

And combining the redo records of all data items needing to be recovered in the single-node log record list and the redo records of all data items needing to be recovered in the cross-node log record list for sorting according to the size of the WAL log serial number, and determining the recovery sequence of the data items needing to be recovered.

Optionally, the control sub-service is also used for

And if the recovery execution sub-service fails to execute the data recovery operation on the data item needing to be recovered, re-sending a log scanning command to the log scanning sub-service to control the execution of the log scanning sub-service, further triggering the recovery execution sub-service to execute a new round of data recovery operation, and writing the contents of all the data items which are completely recovered into a disk.

Optionally, the control sub-service is in a master-slave mode, and if a database node where the control sub-service serving as the master control service is located is down or abnormal, one control sub-service is selected from the control sub-services serving as the backup service of all database nodes in the RAC distributed database cluster system that are not down or abnormal, and is used as a new master control service.

The data recovery device in the RAC distributed database cluster system according to the embodiment of the present invention obtains a fault handling request of the RAC distributed database cluster system through the control sub-service, and controls the execution of the log scanning sub-service and the recovery execution sub-service, wherein the log scanning sub-service combines log records of a plurality of database nodes according to a log scanning command sent by the control sub-service, determines a data item and a recovery sequence to be recovered, generates a recovery log, the recovery execution sub-service performs a data recovery operation on the data item to be recovered according to the recovery log, and writes the content of all the data items that have been recovered into a disk, thereby being capable of realizing data recovery in the RAC distributed database cluster system and effectively improving high availability of the RAC distributed database cluster system.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

Fig. 1 is a schematic structural diagram of a data recovery apparatus in an RAC distributed database cluster system according to an embodiment of the present invention;

fig. 2 is a schematic flowchart illustrating a specific process of the resume execution sub-service execution data resume operation in fig. 1.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 shows a schematic structural diagram of a data recovery device in an RAC distributed database cluster system according to an embodiment of the present invention, where the data recovery device in the RAC distributed database cluster system according to the embodiment is applied to each database node in the RAC distributed database cluster system, as shown in fig. 1, the database RAC cluster system according to the embodiment includes: a control sub-service 1, a log scanning sub-service 2, and a resume execution sub-service 3;

the control sub-service 1 is used for acquiring a fault processing request of the RAC distributed database cluster system and sending a log scanning command to the log scanning sub-service 2;

the log scanning sub-service 2 is configured to, if a log scanning command sent by the control sub-service 1 is received, combine log records of a plurality of database nodes in the RAC distributed database cluster system, determine a data item to be restored and a restoration sequence of the data item to be restored, and generate a restoration log;

the control sub-service 1 is further configured to send a recovery execution command to the recovery execution sub-service 3 if it is detected that the log scanning sub-service 2 has generated a recovery log;

the recovery execution sub-service 3 is configured to, if a recovery execution command sent by the control sub-service 1 is received, perform a data recovery operation on the data items that need to be recovered in the recovery log according to the recovery sequence of the data items that need to be recovered in the recovery log, and write the contents of all the data items that have completed recovery into a disk.

In a specific application, the control sub-service in this embodiment is in a master-standby mode, and if a database node where the control sub-service serving as the master control service is located is down or abnormal, one control sub-service is selected as a new master control service from the control sub-services serving as the backup services of all database nodes in the RAC distributed database cluster system that are not down or abnormal.

It can be understood that the fault processing request of the RAC distributed database cluster system may carry a fault database node of the RAC distributed database cluster system, and the control sub-service establishes a communication and coordination mechanism with the RAC distributed database cluster system when an abnormality is detected, and controls the execution of the log scanning sub-service and the recovery execution sub-service, and the log scanning sub-service and the recovery execution sub-service receive a command of the control sub-service, execute a specific operation flow, and implement data recovery.

In a specific application, the resume execution sub-service 3 may be specifically used for

It is understood that, in this embodiment, the recovery execution sub-service 3 can perform data recovery in the form of database transaction after receiving the recovery log of the data to be recovered of the log scanning sub-service 2.

It will be appreciated that in the event of a downtime/failure of a database node in a RAC distributed database cluster system, incomplete transactions belonging to a terminated state need not be recovered. Under the condition that a database node in the RAC distributed database cluster system is down/failed, data which are modified but not written into a disk may be located in a cache of the RAC distributed database cluster system, all the modified data have pre-written WAL (write Ahead Log) logs corresponding to modification contents and are written into the disk, and data recovery can be performed according to the WAL logs.

In a specific application, the log scanning sub-service 2 may be specifically used for

It can be understood that in the RAC distributed database cluster system, each database node records its own log, and is located under different directories of the shared storage. And under the condition that a machine is down/failed, the log scanning sub-service 2 can acquire log records of all database nodes, scan, sort and filter the log records.

It can be understood that, in the process of executing the data recovery operation by the recovery execution sub-service 3 according to the embodiment, the normal operation of other health database nodes is not affected. In the RAC distributed database cluster system, the recovery execution sub-service 3 of any healthy node has the ability to execute data recovery based on the processing result of the log scanning sub-service 2. The control sub-service 1 may select a recovery execution sub-service 3 of a healthy node to execute a data recovery process through a specific mechanism.

It is understood that, in the recovery execution sub-service 3 of the embodiment, in the database recovery transaction, the data recovery operation is sequentially executed for each data item to be recovered one by one. After all the data recovery items are executed, the data item contents recovered in the memory are written into the disk, and the data recovery is completed, which can refer to fig. 2.

It can be understood that, in the case of a machine downtime/failure, the data recovery apparatus in the RAC distributed database cluster system scans log records of multiple database nodes and then reasonably sorts the log records, determines the data content and sequence to be recovered, and performs recovery, so that multiple nodes in the system can be allowed to back up each other, the flow of failure and recovery does not affect the operation of normal services, and the high availability of the system can be effectively improved.

Further, the log scanning sub-service 2 may include:

a scanning unit, configured to scan log records of a failed database node (that is, a first scanning round) if a log scanning command sent by the control sub-service is received, where the log scanning command carries the failed database node of the RAC distributed database cluster system, and take all data items that have been modified and are not written to a disk in the log records of the failed database node as data items to be recovered; for each data item needing to be recovered in the log record of the fault database node, judging whether the data item modification of the redo record of the data item needing to be recovered currently comprises a cross-database node, if so, storing the redo record of the data item needing to be recovered currently into a cross-node log record list, and if not, storing the redo record of the data item needing to be recovered currently into a single-node log record list; after scanning all data items in the log record of the fault database node is finished, acquiring the log record of the cross-database node, which is included in the modification of the redo record of each data item in the cross-node log record list, scanning (namely, the second scanning), and storing the redo record of the data item needing to be recovered in the scanned log record of the cross-database node into the cross-node log record list;

It is understood that during the scanning of the log records of the failed database node, all data items that have completed modification and are written to disk need not be recovered; all incomplete modified data items do not need to perform recovery; all data items that have completed modification and have not been written to disk need to be restored.

In a specific application, the log records may be pre-write WAL logs, each pre-write WAL log including: a transaction ID (identification) number and a WAL log sequence number, wherein the transaction ID number describes the sequence of the beginning of the database transaction, and the WAL log sequence number describes the sequence of the ending of the database transaction;

accordingly, the sorting unit can be used for

Specifically, the sorting unit may combine, from small to large, the redo records of all the data items to be restored in the single-node log record list and the redo records of all the data items to be restored in the cross-node log record list for sorting according to the size of the WAL log sequence number, and determine the restoration order of the data items to be restored. That is, the recovery of data items with smaller WAL log sequence numbers may be performed first, followed by the recovery of data items with larger WAL log sequence numbers. If there are multiple failed nodes to be recovered, then a unique data item recovery order may be determined in accordance with the globally and uniformly managed WAL log sequence number.

It can be understood that, for each data item, the data recovery flow needs both the substrate data and the modified WAL log, and the purpose of recovering data can be achieved by the modified content described by the redo WAL log on the basis of the substrate data. The present embodiment employs a file in the storage device as substrate data in the recovery data flow.

For example, in a RAC distributed database cluster system, one data item L is allowed to be first modified and committed by database node a and, in the event that the modified data item is not written to disk, passed over the network to another database node B, where it continues to be modified by node B to perform database transactions. Such data transfer may involve two database nodes or may involve multiple database nodes. Taking two database nodes a and B as an example, the modified redo records of the database node a and the database node B for the data item L need to be acquired simultaneously.

It will be appreciated that the recovery of a data item may involve logging of a single database node (corresponding to the single-node logging list described above) or may involve logging of multiple database nodes (corresponding to the cross-node logging list described above). For the case involving multiple database nodes, in the second scan pass, all recovery log records for data items that need to be recovered need to be scanned across nodes.

In the recovery process for the data item L, on the basis of the data item L in the disk storage, it is necessary to first perform WAL log description on the data item L on the redo database node a, and then perform WAL log description on the data item L on the redo database node B, so as to completely recover the data item L. It is therefore necessary to determine that the data item L needs to be recovered, then determine that the order in which L needs to be recovered is from the WAL log information of database node a to the WAL log information of database node B, and then scan the WAL log of database node a and the WAL log of database node B for changes (redo records) related to the L data item.

In order to reduce the depth of log scanning (scanning data volume), increase the speed of log scanning, and simplify the algorithm complexity of log combination, in this embodiment, when one data item L is modified and submitted by the database node a, and is transferred to another database node B through the network without writing the modified data item into the disk, and is continuously modified by the database node B to execute the database transaction, the following processing is required: making a description record in the WAL log of the database node a in the following sense: data item L is passed to database node B and data item L is modified but not written to disk. Meanwhile, in a message packet for transmitting the data item L to the database node B, the data item is marked as a data item which is modified but not written into a disk; when the database node B receives the description of "data item that has been modified but not written to disk", it needs to record in the WAL log of the database node B: a data item L sent by the database node a that completes modification but is not written to disk is received.

It can be understood that the database transaction number and the WAL log sequence number are added to the WAL log of the database node a and the WAL log of the database node B by a certain principle, so that the complete flow and content of the data block L that needs to execute the redo can be quickly determined.

Further, the control sub-service 1 may also be used for

Keeping communication with the resume execution sub-service 3, and recording the content of the data item whose data resume operation performed by the resume execution sub-service 3 has completed resuming; if the recovery execution sub-service 3 is detected to complete the data recovery operation on the data item needing to be recovered, the content of the data item which is completely recovered by the data recovery operation executed by the recovery execution sub-service 3 is sent to the RAC distributed database cluster system, so that the RAC distributed database cluster system converts the data item which is completely recovered into an available state in a global data resource management state and an unavailable state.

It is understood that, in the RAC distributed database cluster system, if a database node fails, the data item currently occupied by the failed node in global resource management is in an "occupied" or "unavailable" state. Any request that requires access to data to be recovered is in a "blocked" state, prohibiting access to such data in any way in the event that the data item to be recovered does not complete recovery. The control method is internally responsible and executed by the RAC distributed database cluster system. And after the data recovery of the fault node is completed, the control sub-service 1 releases the data items occupied by the fault node to the global resource management of the RAC distributed database cluster system. So far, the whole RAC distributed database cluster system is recovered to be normal, and all the processes which are blocked due to the node failure can be continuously executed.

In this embodiment, the control sub-service may receive a fault processing request sent by an Uxdb SRAC, coordinate the execution flows of the log scanning sub-service and the recovery execution sub-service, and sequentially direct the log scanning sub-service and the recovery execution sub-service to execute a specific operation; after the data recovery is completed, the control sub-service can communicate with the RAC distributed database cluster system, report the completed data recovery content, and guide the RAC distributed database cluster system to perform necessary cleaning on the management information of the unavailable data resources caused by the node failure under the failed node, so that the data can be accessed again.

Further, the control sub-service 1 may also be used for

It can be understood that a transaction is used as a basic unit of business logic of a database system, and operations executed in one transaction are all executed or all fail and roll back. The database recovery transaction may end when it has completed execution of all recovery transactions. When a node performing recovery is in the process of executing, if a recovery node fails due to human or unknown factors and recovery transactions cannot be completed completely, the recovery execution sub-service 3 fails to perform data recovery operations on the data items to be recovered, and the control sub-service 1 in this embodiment can detect such a failure and initiate a new failure recovery process.

The data recovery device in the RAC distributed database cluster system according to this embodiment is substantially an application of a highly available model in the RAC distributed database cluster system, and can allow complete data recovery to be performed on line according to a multi-node WAL log in the case of a downtime of a partial database node in the RAC distributed database cluster system, without interfering with an operating state of an existing database node that does not have a fault in the RAC distributed database cluster system; the whole data recovery process can be finished within 1 second or a plurality of seconds, and the internal fault of the RAC distributed database cluster system is transparent and insensitive to the outside.

The data recovery device in the RAC distributed database cluster system provided by the embodiment of the invention can quickly recover the content of the data item of the RAC distributed database cluster system; the data recovery of the database can be carried out under the condition that the normal operation of the RAC distributed database cluster system is not influenced; the data recovery has no perception to the outside (users), the data to be recovered belongs to a blocked state before the recovery is completed, and belongs to an available state after the recovery process is completed; the database nodes in the Uxdb SRAC system can be in a state of mutual backup, if a part of nodes are down/failed, the recovery process can be executed on the rest nodes, machines and resources do not need to be additionally configured as backup machines, and the cost of software and hardware equipment is saved; the high availability of the RAC distributed database cluster system can be effectively improved.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A data recovery device in RAC distributed database cluster system is applied to each database node in RAC distributed database cluster system, and is characterized by comprising: the control sub-service, the log scanning sub-service and the recovery execution sub-service;

2. The apparatus for data recovery in a RAC distributed database cluster system according to claim 1, wherein the control sub-service is further configured to

3. A data recovery arrangement in a RAC distributed database cluster system according to claim 1, wherein the recovery performs sub-services, in particular for

4. A data recovery arrangement in a RAC distributed database cluster system according to claim 1, wherein the log scanning sub-service is specifically adapted to

5. The apparatus for data recovery in a RAC distributed database cluster system according to claim 4, wherein the log scanning sub-service comprises:

6. The apparatus for data recovery in a RAC distributed database cluster system of claim 1, wherein the log records are pre-write WAL logs, each pre-write WAL log comprising: the system comprises a transaction ID number and a WAL log serial number, wherein the transaction ID number describes the sequence of starting database transactions, and the WAL log serial number describes the sequence of ending database transactions;

accordingly, the sorting unit is particularly adapted for

7. The apparatus for data recovery in a RAC distributed database cluster system according to claim 1, wherein the control sub-service is further configured to

8. The apparatus for data recovery in a RAC distributed database cluster system according to claim 1, wherein the control sub-service is in a master-standby form, and if a database node where the control sub-service as the master control service is located is down or abnormal, one control sub-service is selected as a new master control service from among the control sub-services as the backup services of all database nodes in the RAC distributed database cluster system that are not down or abnormal.