CN115934417A

CN115934417A - Data backup method, system and equipment

Info

Publication number: CN115934417A
Application number: CN202211490140.2A
Authority: CN
Inventors: 陈琪; 姜广耀
Original assignee: XFusion Digital Technologies Co Ltd
Current assignee: XFusion Digital Technologies Co Ltd
Priority date: 2022-11-25
Filing date: 2022-11-25
Publication date: 2023-04-07
Also published as: WO2024109253A1

Abstract

The application discloses a data backup method, a data backup system and a device, which relate to the field of data backup of a first node database and a second node database, wherein the data backup system comprises a first node, a second node and a shared memory device; the method comprises the following steps: the first node operates the first data based on the first operation instruction to obtain an execution result; the first node records by redoing the log: a processing procedure that the first node obtains an execution result based on the first operation instruction; the first node sends the redo log to the second node; and the second node acquires an execution result from the shared memory device based on the log. By the method, the second node can quickly realize data synchronization with the first node, and the service interruption time is shortened when the first node fails.

Description

Data backup method, system and equipment

Technical Field

The application relates to the technical field of computers, in particular to the field of data backup and disaster recovery of databases.

Background

With the progress and development of society, data information becomes more and more important, and especially how to ensure the accuracy and timeliness of data transmission and ensure that data is not lost or tampered in the process of data transmission between two computing devices becomes a popular research for database software, related computing equipment and storage equipment at present. Today, traditional data information has become data assets for entities and is becoming increasingly important. Reliability and availability have become core complaints with respect to database software carrying data assets.

Disclosure of Invention

Embodiments of the present application provide a method and a system for data backup, which can effectively reduce service interruption time between a master node and a backup node, and improve service reliability.

In order to realize the purpose, the following technical scheme is provided:

in a first aspect, an embodiment of the present application provides a method for data backup, where the method is applied to a data backup system, and the data backup system includes: the system comprises a first node, a second node and a shared memory device;

the first node operates the first data based on the first operation instruction to obtain an execution result, and sends the execution result to the shared memory device; the first node records by redoing the log: and the first node obtains the execution result based on the first operation instruction, sends the redo log to the second node, and obtains the execution result from the shared memory device based on the redo log.

According to the embodiment of the application, the first node stores the execution result in the shared memory device, the redo log records the process of obtaining the execution result based on the first operation instruction, the redo log is sent to the second node, and in the process of synchronizing and updating data with the first node, the execution result can be directly obtained from the shared memory device according to the redo log, so that the time for the second node to further analyze the log and recompile the operation instruction to regenerate the execution result is reduced, the service interruption time is shortened, the service continuity is ensured, and the service reliability is improved.

In some embodiments, the redo log has an identifier corresponding to the execution result, and the second node obtains the execution result from the shared memory device based on the redo log, which specifically includes: and the second node acquires the execution result from the shared memory device based on the identification of the redo log.

In the above embodiment, the redo log has the identifier corresponding to the execution result, so that the second node may directly query the execution result corresponding to the identifier in the shared memory device based on the identifier of the redo log, and directly read the execution result from the shared memory device, thereby improving the efficiency of obtaining the execution result by the second node, and further reducing the service interruption time.

In some embodiments, the first data is a plurality of data, the first operation instruction is a plurality of data, and the first node performs an operation on the first data based on the first operation instruction to obtain an execution result, including: the first node sequentially executes each of the plurality of operation instructions on each of the first data to sequentially obtain a plurality of execution results.

In the above embodiment, the first node may sequentially perform operations on the plurality of data in the first data based on the first operation instruction, where the operations may include, but are not limited to, modifying, deleting, adding other data to the first data, and the like, so as to sequentially obtain a plurality of execution results.

In some embodiments, the redo log has a plurality of logs, wherein each log has an identifier corresponding to the execution result one to one; the second node obtains the execution result from the shared memory device based on the redo log, and specifically includes: and the second node sequentially acquires a plurality of execution results from the shared memory device based on the identification of each log in the redo log.

In the above embodiment, when the master node executes the operation instructions in sequence to obtain multiple execution results, each time an operation instruction is executed to obtain one execution result, a redo log record is passed, where the redo log has an identifier corresponding to the execution result, so that when the master node executes the operation instructions in sequence to obtain multiple execution results, the redo log includes multiple logs, each of which has an identifier corresponding to one of the execution results, and thus, when the second node obtains the execution results from the shared memory device based on the redo log, the second node can obtain multiple execution results in sequence according to the identifier of each log in the redo log, and ensure that each execution result can be read completely and in a certain sequence, thereby ensuring the integrity and accuracy of the obtained execution results.

In some embodiments, the second node has the first data stored therein; before the first node sends the redo log to the second node, the first node also sends a first operation instruction to the shared memory device; the second node obtains the execution result from the shared memory device based on the log, and specifically includes: the second node queries an execution result from the shared memory device based on the redo log, and under the condition that the execution result cannot be queried by the second node, the second node acquires a first operation instruction from the shared memory device based on the redo log and operates the first data based on the first operation instruction to obtain the execution result.

In the above embodiment, because data in the shared memory device may be covered by new data or because the shared memory device is prone to lose data, the execution result may be lost, and in such a case, if the second node cannot query the execution result in the shared memory device, the second node may further obtain the first operation instruction from the shared memory device based on the redo log, and operate the first data in the second node based on the first operation instruction, so as to obtain the execution result; in this way, even if the execution result is lost, the second node can still acquire the first data from the shared memory device and process the first data to obtain the execution result, and the first operation instruction that was used once is used to process the first data by using the first operation instruction, so as to obtain the execution result. In this way, the second node also reduces the time for analyzing the redo log to obtain the first operation instruction, and also shortens the service interruption time.

In some embodiments, the redo log has an identification corresponding to the first operational instruction; the second node acquires a first operation instruction from the shared memory device based on the redo log, and operates the first data based on the first operation instruction to obtain the execution result, including: and the second node acquires the first operation instruction from the shared memory device based on the identification of the redo log, and operates the first data based on the first operation instruction to obtain an execution result.

In the above embodiment, the redo log has the identifier corresponding to the first operation instruction, and the second node may directly obtain the first operation instruction from the shared memory device through the identifier and operate the first data based on the first operation instruction, so as to obtain the execution result, thereby improving the efficiency of generating the execution result by the second node.

In some embodiments, the first data is a plurality of data, the first operation instruction is a plurality of data, and the first node performs an operation on the first data based on the first operation instruction to obtain an execution result, including: and the first node sequentially executes each of the operation instructions on each of the first data to sequentially obtain a plurality of execution results.

In the above embodiment, the first node may sequentially perform operations on multiple pieces of data in the first data based on the first operation instruction, where the operations may include, but are not limited to, modifying, deleting, adding other data to the first data, and so on, so as to sequentially obtain multiple execution results.

In some embodiments, the redo log has a plurality of logs, wherein each log has an identifier corresponding to each operation instruction one to one; the second node obtains a first operation instruction from the shared memory device based on the redo log, and operates the first data based on the first operation instruction to obtain the execution result, which specifically includes: and the second node sequentially acquires the operation instructions corresponding to the identification of each log from the shared memory device based on the identification of each log in the redo log, and sequentially executes each operation instruction in the plurality of operation instructions for each data in the first data according to the identification sequence, thereby sequentially acquiring a plurality of execution results.

In the implementation manner, when the master node executes the operation instructions in sequence to obtain a plurality of execution results, each time an operation instruction is executed to obtain an execution result, the master node records through a redo log, and the redo log has an identifier corresponding to the first operation instruction, so that when the master node executes the operation instructions in sequence to obtain the execution results, the redo log includes a plurality of logs, each log has an identifier corresponding to the first operation instruction one by one, so that when the second node obtains the execution results from the shared memory device based on the redo log, the second node can obtain a plurality of first operation instructions in sequence according to the identifier of each log in the redo log, and execute each of the first operation instructions in sequence on each piece of first data to obtain a plurality of execution results, thereby ensuring that each execution result can be executed completely and in a certain sequence, and ensuring the integrity and accuracy of the obtained execution results.

In some embodiments: after the first node sends the redo log to the second node, the method further comprises: and the second node writes the redo log to obtain a second log file, and sends a first message to the first node, wherein the first message is used for indicating that the second node finishes the disk writing processing of the redo log.

In the foregoing embodiment, after acquiring the redo log, the second node needs to write the redo log to obtain a second log file, so as to ensure that the redo log is permanently stored, and send a first message to the first node to notify the first node that the second node has completed the disk writing process of the local redo log.

In some embodiments, the method further comprises: and the first node receives the first message, writes the redo log to a disk to obtain a first log file, acquires an execution result from the shared memory based on the first log file, and writes the execution result to the disk to obtain a first data file.

In the above embodiment, after receiving the first message sent by the second node, the first node writes the redo log to disk to obtain a first log file, so as to ensure that the redo log is permanently saved, and obtains the execution result from the shared memory based on the first log file, where the method of obtaining the execution result is the same as the method of obtaining the execution result from the shared memory device by the second node, and writes the execution result to disk to obtain a first data file, so that the execution result is permanently saved.

In some embodiments, after the second node obtains the execution result from the shared memory device based on the redo log, the method further includes: and the second node writes the execution result to obtain a second data file.

In the foregoing embodiment, after obtaining the execution result, the second node performs local disk writing processing on the execution result to obtain a second data file, and it is ensured that the execution result is permanently saved.

In some embodiments, the first node is connected to the shared memory device via a CXL protocol, and the second node is connected to the shared memory device via a CXL protocol.

In the above embodiment, the shared memory device is connected to the first node and the second node through the CXL protocol, which can further increase the speed of reading or writing data from or to the shared memory device by the first node and the second node, and further improve the processing efficiency.

In some embodiments, the shared memory device is any one of a persistent memory PMEM, a dynamic random access memory DRAM, a static random access memory SRAM, and a cache memory.

In the above implementation manner, the shared memory device is set as a persistent memory PMEM, a dynamic random access memory DRAM, a static random access memory SRAM, or a cache memory, and particularly, the persistent memory PMEM can further improve the reliability of data storage and the IO read-write rate.

In some embodiments, the first node is a primary database node and the second node is a secondary database node. In the embodiment, the first node and the second node are respectively used for the main database node and the standby database node, so that when the main database node fails, the system can quickly switch the service to the standby database node, and the service interruption time is shortened.

In a second aspect, an embodiment of the present application further provides another data backup method, where the method is applied to a data backup system, and the data backup system includes: the system comprises a first node, a second node and a shared memory device; a controller is disposed in the shared memory device,

the first node operates the first data based on the first operation instruction to obtain an execution result, and the first node redos log records: a processing procedure that the first node obtains an execution result based on the first operation instruction;

the first node sends an execution result and the redo log to the shared memory device;

and the controller of the shared memory device sends the redo log to the second node, and the second node acquires an execution result from the shared memory device based on the redo log.

According to the embodiment of the application, the controller is arranged in the shared memory device, and the redo log is stored in the shared memory device by the first node, so that the shared memory device can actively send the redo log to the standby node, and the standby node can obtain the execution result from the shared memory device based on the redo log. By the method, even if the master node and the standby node asynchronously transmit the redo log, when the master node fails, the standby node can still obtain the redo log from the shared memory device, so that an execution result is obtained, and the service data is ensured not to be lost.

In a third aspect, embodiments of the present application provide: a data backup system, the data backup system comprising: the system comprises a first node, a second node and a shared memory device;

the first node is configured to: operating the first data based on the first operation instruction to obtain an execution result, and sending the execution result to the shared memory device;

the first node is further configured to: by redoing the log record: the first node obtains a processing process of the execution result based on the first operation instruction, and sends the redo log to the second node;

the second node is configured to: and acquiring an execution result from the shared memory device based on the redo log.

In some embodiments, the redo log also has an identification corresponding to the execution results; the second node is further configured to: and acquiring an execution result from the shared memory device based on the identification of the redo log.

In some embodiments, the first data is a plurality of data, and the first operation instruction is a plurality of data; the first node is further configured to: and sequentially executing each of the plurality of operation instructions on each of the first data to obtain a plurality of execution results.

In some embodiments, the redo log has a plurality of logs, wherein each log has an identification corresponding to the execution result one to one; the second node is further configured to: and acquiring an execution result from the shared memory device in sequence based on the identification of each log in the redo log.

In some embodiments, the second node has the first data stored therein, and the first node is further configured to: sending the first operation instruction to the shared memory device; the second node is further configured to: and inquiring an execution result from the shared memory device based on the redo log, acquiring a first operation instruction from the shared memory device based on the redo log under the condition that the execution result cannot be inquired, and operating the first data based on the first operation instruction to obtain the execution result.

In some embodiments, the redo log has an identification corresponding to the first operational instruction; the second node is further configured to: and acquiring a first operation instruction from the shared memory device based on the identification of the redo log, and operating the first data based on the first operation instruction to obtain an execution result.

In some embodiments, the first data is a plurality of data, and the first operation instruction is a plurality of data; the first node is further configured to: and sequentially executing each of the first operation instructions on each of the first data according to the sequence of the identification to sequentially obtain a plurality of execution results.

In some embodiments, the redo log has multiple logs, where each log has an identifier corresponding to each operation instruction one-to-one; the second node is further configured to: based on the identification of each log in the redo log, obtaining the operation instructions corresponding to the identification of each log one by one from the shared memory device, and sequentially executing each of the operation instructions on each data in the first data to obtain a plurality of execution results in sequence.

The data backup system provided by the embodiment of the application comprises the first node, the second node and the shared memory device in the data backup method, so that all the beneficial effects are achieved, and details are not repeated herein.

In a fourth aspect, an embodiment of the present application provides a master database node, where the master database node includes: the first control module and the first log management module; the main database node is connected with the shared memory device; the first control module is used for: operating the first data based on the first operation instruction to obtain an execution result, and sending the execution result to the shared memory device; the first log management module is used for: logging by redo: and the first control module obtains a processing process of an execution result based on the first operation instruction and sends the redo log to the database node.

In some embodiments, the first data is a plurality of data, and the first operation instruction is a plurality of data; the first control module is further configured to: and sequentially executing each of the first operation instructions on each of the first data to sequentially obtain a plurality of execution results.

In some embodiments, the first control module is further configured to: and sending the first operation instruction to the shared memory device.

The primary database node provided in the embodiment of the present application is the first node in the data backup method, and therefore has all the above beneficial effects, and details are not repeated here.

In a fifth aspect, an embodiment of the present application provides a backup database node, where the backup database node includes: the second control module and the second log management module; the backup database node is connected with the shared memory device; the second log management module is used for: receiving a redo log sent by a master database node; the redo log records a processing process that the master database node operates the first data based on the first operation instruction to obtain an execution result; the second control module is used for: and acquiring an execution result from the shared memory device based on the redo log.

In some embodiments, the redo log also has an identification corresponding to the execution results; the second control module is further configured to: and acquiring the execution result from the shared memory device based on the identification of the redo log.

In some embodiments, the database node has first data stored therein; the second control module is further configured to: inquiring an execution result from the shared memory device based on the redo log, acquiring a first operation instruction from the shared memory device based on the redo log under the condition that the execution result is not inquired, and operating first data based on the first operation instruction to obtain an execution result; wherein the first operation instruction is sent from the master database node to the shared memory device.

In some embodiments, the redo log has an identification corresponding to the first operational instruction; the second control module is further configured to: and acquiring the first operation instruction from the shared memory device based on the identifier of the redo log, and operating the first data based on the first operation instruction to obtain the result.

The backup database node provided in the embodiment of the present application is the second node in the data backup method, and therefore has all the above beneficial effects, and is not described herein again.

Drawings

Fig. 1 is a schematic diagram of a data backup system in the related art;

FIG. 2 is a hardware architecture diagram of a data backup system according to an embodiment of the present application;

fig. 3 is a flowchart of a data backup method provided in an embodiment of the present application;

FIG. 4 is a flowchart of another data backup method provided by an embodiment of the present application;

FIG. 5 is a software architecture diagram of a data backup system according to an embodiment of the present application;

FIG. 6 is a hardware architecture diagram of a host node according to an embodiment of the present disclosure;

fig. 7 is a hardware architecture diagram of a standby node according to an embodiment of the present application.

Detailed Description

The term "and/or" herein is an association relationship describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. The symbol "/" herein denotes a relationship in which the associated object is or, for example, a/B denotes a or B.

The terms "first" and "second," and the like, in the description and in the claims herein are used for distinguishing between different objects and not for describing a particular order of the objects. For example, the first response message and the second response message, etc. are for distinguishing different response messages, not for describing a specific order of the response messages.

In the embodiments of the present application, words such as "exemplary" or "for example" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "such as" is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present concepts related in a concrete fashion.

In the description of the embodiments of the present application, unless otherwise specified, "a plurality" means two or more, for example, a plurality of processing units means two or more processing units or the like; plural elements means two or more elements, and the like.

In order to facilitate understanding of the technical aspects of the present application, technical terms referred to herein are explained below.

A database: a database is an ordered collection of structured information or data, typically stored in electronic form in a computer system. Typically controlled by a database management system (DBMS). In reality, the data, DBMS and associated applications are collectively referred to as a database system, often simply a database. The database can be regarded as an electronic file cabinet, and the user can add, update, delete and acquire the data in the file.

Redoing the log: redo logs (Redo logs) are the key components of databases for downtime recovery and maintaining consistency of multi-copy data. The Redo log is a physical log which records all the modification history of the database to the data, and particularly records the result after one-time write operation. The most recent version of the data can be restored by playing back the Redo log item by item starting from a certain persistent data version.

SQL language: SQL (Structured Query Language) is a specific purpose programming Language used to manage a relational database management system (RDBMS), or to perform stream processing in relational stream data management (RDSMS). SQL is based on relational algebra and tuple relational calculus, including a data definition language and a data manipulation language. SQL ranges include data insertion, query, update, and deletion, database schema creation and modification, and data access control.

Memory: a Memory (Memory) is an important part of a computer, and is also called an internal Memory and a main Memory, and is used for temporarily storing operation data in a CPU and data exchanged with an external Memory such as a hard disk. The computer is a bridge for communicating an external memory with a CPU, all programs in the computer are operated in the internal memory, and the level of the overall performance of the computer is influenced by the strength of the internal memory. As long as the computer starts to run, the operating system transfers the data to be operated to the CPU from the memory for operation, and when the operation is finished, the CPU transmits the result.

CXL: CXL (computer Express Link) is a new standard for open interconnect, CPU-oriented and accelerator-oriented intensive workloads that all require efficient and stable memory access between a host and a device.

PMEM: persistent Memory PMEM (PMEM), also called Non-volatile Memory (NVM), refers to a type of storage hardware that supports byte-addressing (NVM), can be directly operated by CPU instructions, and does not lose data after power is off. PMEM provides delay times on the order of submicrons. From the aspects of cost, performance and capacity, the PMEM is a layer of storage between a direct memory bus random access memory DRAM (Di rect rambus RAM, DRRAM) and a solid State disk SSD (So id State Di sk or So id State drive, SSD), and the SSD can be completely replaced in a database to improve the IO performance, so that the performance of the database is improved.

TCP/IP transport protocol: TCP/ip (transmission control protocol/internet protocol), also called network communication protocol, refers to a protocol cluster capable of implementing information transmission among a plurality of different networks. It is the most basic communication protocol in the use of networks, and the TCP/ip transport protocol specifies the standards and methods for communicating among various parts of the internet.

At present, reliability and availability of nodes bearing data assets are crucial, and the reliability refers to that data serving as key assets of an enterprise cannot be lost and cannot be maliciously tampered by a third party. Availability means that the nodes and the data assets carried by the nodes can guarantee continuous operation of the service to the maximum extent, which is also the first problem to be considered in the construction of the enterprise IT framework by all IT personnel. To ensure service continuity, this is generally achieved by providing a High reliability (HA) capability of the cluster.

In the related art, as shown in fig. 1 below, fig. 1 shows a schematic diagram of communication among a master node 111, a standby node 112 and a client 101, in a normal case, since the master node receives a data request of the client 101 and executes corresponding operations, and operations of performing data processing involve frequent operations, such as data operation, copy, memory access, and the like, the master node 111 only undertakes the above operations, that is, when the master node 111 operates normally, data requests sent by the client 101 are all received and responded by the master node 111, and processing of local service data of the master node 111 is completed and stored in a local disk, so that the data can be permanently stored; at this time, the standby node 112 does not receive the data request of the client. When the host fails, if the failure cannot be recovered in a short time, in order to not affect the operation of the service, the cluster software needs to automatically switch the service to the standby node 112 to execute the relevant service, thereby ensuring the continuous operation of the service, that is, the standby node 112 assumes the function of the master node 111 at this time. In order to enable the standby node 112 to assume the functions of the master node 111, the premise is that the same data as the master node 111 is stored in the standby node 112 at any time, so that when the master node 111 performs data processing according to a data request of the client 101 in a normal operation state of the master node 111, the original data in the master node 111 generates corresponding changed data, and the master node 111 needs to copy the changed data to the standby node 112, thereby ensuring that the stored data in the standby node 112 and the data in the master node 111 have consistency. It should be noted that the dotted line in fig. 1 indicates that the standby node 112 is switched to operate when the primary node 111 fails.

However, since the new data generated by the master node 111 is generally huge and complex when processing data, and is limited by the transmission network delay and bandwidth of the communication between the master node 111 and the slave node 112, the new data generated by the master node 111 is generally not directly transmitted to the slave node 112.

Next, a mechanism of redo log (redo log) is introduced, and the redo log is divided into:

and 1, a redo log buffer (redo log buffer), wherein the redo log buffer is stored in a memory and is volatile.

And 2, storing the redo log file (redo log fi le) in a disk, wherein the redo log file can be stored persistently. The redo log may be generated by MTR (My traceroute) and placed in a page in memory, called b-lock.

Because the disk speed is too slow, when the redo logs are written, the redo logs cannot be directly written to the disk, so that the node can apply for a piece of continuous memory space called a redo log cache to an Operating System (OS) when the node is started, each time one redo log is generated, the redo log is temporarily stored in one position of the memory, and after the MTR is finished, a group of redo logs generated in the process are all assigned to the redo log cache. The process that the node writes the redo log into the redo log cache is sequential, the node writes to the previous b-lock page first, and writes to the next b-lock page after the b-lock page is fully written. Logs in the redo log cache are refreshed into the files of ib _ logfi le0 and ib _ logfi le1 in the MySQL database directory by default.

The red log file on the disk appears in the form of a log file group named in the form of ib _ logfi le [ number ]. When writing the redo log into the log file group, the write is started from ib _ logfi le0, and if the write is full, the write is started from the next file. If the last file is full, go back to ib _ logfi le0 and continue writing.

In the related technology, when the master node generates a data request based on a client side to generate data change, the master node records an operation process of the master node on the data in the master node based on the data request through a redo log and sends the redo log to the standby node, and the standby node performs data reconstruction locally by replaying the redo log when receiving the redo log, so that the consistency with the data of the master node is realized. In general, during synchronous transmission, a main node needs to wait until a backup node writes a redo log into a local disk of the backup node, and then writes the redo log into the local disk after receiving a redo log disk-dropping confirmation message of the backup node, and then writes generated new data into the local disk of the main node according to the redo log after disk-dropping, so as to ensure that the data is permanently stored; during asynchronous transmission, the master node writes the redo log into the local disk without waiting for the completion of the log-off confirmation message of the standby node, and then writes the generated new data into the local disk of the master node according to the redo log after the log-off, thereby ensuring permanent storage of the data. In the above way, the following disadvantages may exist:

for the synchronous transmission of the redo logs, the redo logs of the disk drop in the standby node also need to be replayed to realize the data reconstruction of the standby node, if the replay speed of the logs of the standby node is not equal to the speed of the logs transmitted from the main node to the standby node, the data processing transaction of the standby node lags behind the main node, if the main node fails, the service switching between the main node and the standby node can be carried out only by ensuring the complete consistency of the data of the standby node and the main node, therefore, the service switching between the main node and the standby node can be realized only by waiting until the data reconstruction of the standby node is completed, and the service interruption time RTO is overlong, the more the redo logs of the standby node lagging behind the main node, and the longer the service interruption time required for switching to the standby node is prolonged when the main node fails. Therefore, the waiting time of the client is too long or the service interruption time (RTO) is difficult to control, which causes redundant waste of hardware resources and affects the service progress.

For the asynchronous transmission of the redo log, the redo log quantity of the standby node is behind that of the main node, so that if the main node fails, the redo log of the main node is not completely transmitted to the standby node, and the loss of the service data is caused, that is, the loss RPO of the service data is not 0.

In order to solve the above problem, an embodiment of the present application provides a data backup method, which is applied to a data backup system, as shown in fig. 2, the data backup system in this embodiment includes a master node 111, a backup node 112, and a shared memory device 113, where the shared memory device 113 is in communication connection with the master node 111 and the backup node 112. The shared memory device 113 may be any one of a persistent memory PMEM, a Dynamic random access memory DRAM (Dynamic RAM), a static random access memory SRAM (static RAM), and a cache memory (cache).

In one embodiment, the shared memory device 113 is connected to the master node 111 and the standby node 112 by CXL (computer Express Link).

When the master node 111 normally operates, the client 101 may perform data interaction with the master node 111, and when the master node 111 fails, the system may automatically switch to the standby node 112, and the standby node undertakes the transaction of the master node, thereby ensuring the continuity of the service. Where the client 101 may be any suitable carrier of software, hardware, firmware, or combination thereof, including but not limited to: electronic devices, hosts, servers, computing devices, etc., the client 101 may also be a single or a combination of multiple computing devices, etc. The master node and the slave node may be one or more computing devices, servers, or a combination thereof, wherein a server may be: a file Server (file Server), a domain control Server (domain Server), a database Server (database Server), a mail Server (mail Server), a Web Server (Web Server), a multimedia Server (multimedia Server), a communication Server (communication Server), a terminal Server (terminal Server), an infrastructure Server (infrastructure Server), a virtualization Server (virtualization Server), etc. The server may be a tower server, a rack server, a blade server, a high-density server, a rack server, a high-performance computing (HPC) server, etc., and may be, but is not limited to, an X86 architecture, a Reduced Instruction Set Computer (RISC) architecture, an advanced reduced instruction set machine (ARM) architecture, etc.

In the embodiment of the present application, the master node and the standby node may be located at the same physical address, or may not be located at the same physical address. And is not particularly limited herein.

How the data backup system implements data backup between the primary node 111 and the standby node 112 will be described in detail below with reference to fig. 3, where a manner of synchronously transmitting redo logs is taken as an example, and fig. 3 is a schematic flow chart of data backup between the primary node 111 and the standby node 112.

When the data backup system is set in the connection manner shown in fig. 2, if the master node and the backup node perform data backup, the master node needs to modify the cache address of the data from the original memory address of the master node to the address of the shared memory device, i.e., the shared memory device has a function of replacing part of the memory in the original master node. At this time, the main node has read-write permission for the shared memory device, and the standby node only has read permission for the shared memory device.

It should be noted that, in the above embodiment, the shared memory device is an external memory located outside the main node and the standby node, the main node and the standby node are also provided with their own original memories, and the shared memory device is different from a disk and other permanent storage, which enables the main node and the standby node to access their internal data together, and the IO speed of the shared memory is faster than that of the disk, that is, the read-write speed of the main node to the shared memory device is faster than that of a hard disk; the data stored in the shared memory device in this embodiment only relates to the data updated by the master node, that is, the updated data that the master node needs to synchronize with the standby node is stored in the shared memory device, and the memory of the master node itself may cache other data (not related to the updated data) in the master node.

S101, a main node receives a first operation request of a client;

when the master node works normally, a data operation request from the client may be received, where the data operation request includes, but is not limited to, an operation of modifying data in a database in the master node, adding a certain data, deleting a certain data, or changing a logical relationship between data in the database, modifying a storage location of a certain data in the database, and the like, that is, the first operation request may include a plurality of operation instructions.

In one implementation, the first operation request includes an instruction of an SQL statement.

S102, the main node responds to the first operation request, and operates first data based on a first operation instruction to obtain an execution result, wherein the execution result comprises second data, and the first operation instruction is an analysis result of the main node analyzing the first operation request;

the main node submits a transaction, the transaction is used for instructing the main node to respond to the first operation request, operate the first data and obtain an execution result, and the transaction is also used for instructing: the main node records the processing process of the main node on the first data based on the first operation instruction through a group of logs; in one implementation, the log may be a redo log, and the method in this application will be described below by taking the redo log as an example.

If the first operation request comprises a plurality of (N) operation instructions, the main node submits N transactions in sequence within time t1-tn, each transaction is used for indicating the main node to respond to the first operation request, sequential operation is carried out on first data based on each operation instruction in the first operation request to obtain an execution result, at the moment, the main node records through a redo log when executing one operation instruction, and when all the instructions are executed, a group of redo logs are generated by the main node and comprise a plurality of logs, wherein each log is provided with a log entry, and records specific instructions in the first operation request executed by the main node on each operation object in the first data and the specific execution time. It can be understood that the log entry of each log is unique and is generated according to the recording sequence of the logs. This sequence can be understood as: a plurality of redo logs in a group of redo logs can be regarded as redo log queues, and the redo log queues are sorted by the time stamp of each redo log; each transaction also has a unique transaction id, which can be understood as the unique identification information of the transaction, indicating that the transaction is finished after the master node generates an execution result of a first data, and each transaction has a transaction commit time.

After receiving a first operation request of a client, a master node responds to the data request, and firstly, the master node analyzes a first operation request instruction through an SQL analyzer to obtain an analysis result, namely the first operation instruction, wherein the language can be identified and operated by a computer. And performing corresponding operation on first data stored in the local database to obtain an execution result, where the first data may be cache data stored in the shared memory device or data stored in a disk of the primary node. It should be noted that, here, the shared memory device and the memory in the master node have the same performance, that is, the master node has a faster speed when reading and writing data to the shared memory device, but the data stored in the shared memory device is relatively easy to lose. Due to the performance of the shared memory device, when the first data is data stored in the disk of the host node, the host node needs to read the first data from the first disk, write the first data into the shared memory device, and perform related processing on the first data; specifically, the master node reads first data stored in the shared memory device into its own processor CPU, and invokes the first operation instruction to operate on the first data, thereby obtaining an execution result. When the master node responds to the first operation request to generate an execution result of the first data, the master node may record, through the redo log, a processing process of the master node on the first data based on the first operation request, that is, record, through the redo log, a specific instruction in the first operation request that the master node executes on each operation object in the first data, and a specific execution time for the master node to execute each instruction, which need to be described. The redo log does not include the specific instruction in the first operation request.

In one embodiment, the redo log is generated by the master node through MTR (My traceroute).

In one embodiment, the first data includes a plurality of data, the first operation request includes a plurality of instructions, and a final execution result corresponds to one instruction, the first data stored in the master node includes N data01-data0N, when the first operation request received by the master node is: deleting the Data0N in the first Data, and modifying the Data01-Data0N-1 into the Data11-Data1N-1, wherein each instruction is executed completely, a final execution result can be obtained, namely the final execution result corresponds to one instruction, and the main node can execute the instructions sequentially based on the first operation request. It is to be understood that, here, the first operation request includes N instructions, instruction 1 is to delete the Data0N in the first Data, instruction 2-instruction N is to sequentially modify the Data01-Data0N-1 into the Data11-Data1N-1, after the master node sequentially executes the instructions 1 to N, execution results 1 to N can be obtained, and the execution results 1 to N form new second Data, and it can be understood that the second Data does not comprise Data0N any more, but only comprises Data11-Data1N-1.

In one implementation, the first data may be a data table in the primary node database, and the second data may also be a data table in the primary node database.

In one implementation, the plurality of data in the first data are data having interdependencies, and then the plurality of transactions also have certain dependencies therebetween.

When the main node receives the first operation request, the main node firstly analyzes the first operation request to obtain an analysis result, wherein the analysis result is a first operation instruction, for example, when an instruction included in the first operation request of the client received by the main node is an SQL statement, the main node analyzes the instruction through an SQL analyzer, and the SQL analyzer analyzes the SQL statement through lexical analysis, syntactic analysis and semantic analysis to obtain the first operation instruction SQL1-SQLN that the main node can recognize and efficiently execute, that is, the first operation instruction SQL1-SQLN is an analysis result of the first operation request.

After the master node executes each instruction, the master node makes a corresponding record, so as to generate a redo log, and after the master node executes the N operation instructions, the record results of a group of redo logs generated are shown in the following table 1: the commands recorded in table 1 are commands obtained by parsing the commands received by the host computer by the SQL parser. Each of the redo logs in the group of redo logs in table 1 has a log entry, and each log entry records an operation time and an operation instruction for each operation object in the first data correspondingly. It can be understood that each log entry is a unique identifier of each operation performed on the operation object by the master node, each log is sequentially generated according to an execution sequence of the master node executing the operation instruction, a group of redo logs in table 1 includes a log queue formed by N logs corresponding to the log entries 001 to 00N, and the operation time recorded in each log is a timestamp of the redo log, and it can be understood that the operation times are sequenced from the oldest operation time to the newest operation time, that is, the sizes of the log entries 001 to 00N represent the freshness of the operation times. Each log entry 001-00N of the redo log record in Table 1 corresponds to an operation instruction SQL1-SQLN, and the analysis shows that the operation instructions SQL1-SQLN correspond to N execution results 1-N, respectively, wherein the execution result 1 is to delete Data0N in the first Data, the execution result 2 is to modify Data01 into Data11, the execution result 3 is to modify Data02 into Data12.

It should be noted that, at this time, after the master node executes the operation instructions SQL1-SQLN on the data in the first data and obtains the execution results 1-N, the master node also assigns identifiers 001-00N to each execution result, where the identifiers 001-00N correspond to the log entries (identifiers) 001-00N of the redo log one-to-one, so as to facilitate subsequent reading of the execution results. It can be understood that, at this time, the operation object, the operation time, the operation instruction, and the final execution result corresponding to each log share and have a unique identifier, that is, the log entries of the redo log, that is, the several parameters may be associated with one another through the log entries of the redo log.

In an implementation manner, the operation instruction SQL1-N for redoing the log record includes an execution instruction 1-N for generating an execution result Data11-Data1N-1 from the first Data01-Data0N, and a query instruction 1-N for the master node to execute the result Data11-Data1N-1 from the shared memory device, where the execution instruction 1-N corresponds to the query instruction 1-N and the execution result 1-N one to one. At this time, the query instructions 1-N have identifiers corresponding to the execution results 1-N one to one, for example, the query instructions 1-N include identifiers "a1-an", and the execution results 1-N also have identifiers "a1-an".

TABLE 1

S1021, the main node records the execution result of the operation through the redo log;

in one implementation, the redo log also records the operation time, operation instruction, and execution result for each operand in the first data, as shown in table 2.

TABLE 2

In one embodiment, the first data includes only one data, the first operation request includes a plurality of instructions that are sequentially associated with each other, and the operation of the first data by the master node based on the first operation request includes a plurality of instructions that are sequentially associated with each other. In this case, it can be understood that the first operation requests a relatively complex operation on the first data, and one instruction cannot complete, so that the operation needs to be split into a plurality of sub-operations, and therefore, the operation needs to be executed sequentially by being split into a plurality of instructions. Taking 2 instructions as an example, if the first Data only includes one Data1, the first operation request includes 3 instructions, the instruction 1 is to perform operation 1 on the Data1 to obtain Data2, and the instruction 2 is to perform operation 2 on the Data2 to obtain Data3. At this time, 2 instructions are required to be executed completely, and a final execution result can be obtained, that is, a final execution result corresponds to 2 instructions. It can be understood that the Data corresponding to the second Data execution result 2 at this time is Data3.

Similarly, after the master node executes each instruction, the master node makes a corresponding record to generate a redo log, and when the master node executes the 2 operation instructions, the record results of a group of redo logs generated are shown in the following table 3:

TABLE 3

The redo log in table 3 has log entries, and each log entry records an operation time and an operation instruction for each operation object in the first data.

In one implementation, the redo log also records the operation time, operation instruction, and execution result for each operand in the first data, as shown in table 4.

TABLE 4

The redo log in table 4 has log entries, and each log entry records the operation time and the execution result of each operation object in the first data. At this time, the execution result 2 in table 3 is the final execution result, and the corresponding Data3 is the second Data.

Similarly, at this time, after the main node executes the operation instructions SQL1-SQL2 on the data in the first data to obtain the execution results 1-2, the main node assigns identifiers 001-002 to each execution result, where the identifiers 001-002 correspond to the log entries 001-002 of the redo log one by one, so as to facilitate subsequent reading of the execution result 2.

In one implementation, the first data includes a plurality of data, and the operation on each of the first data by the master node based on the first operation request includes a plurality of instructions that are sequentially associated with each other. In this case, it can also be understood that the first operation requests a relatively complex operation to be performed on each of the first data, and one instruction cannot be completed, so that each operation needs to be split into a plurality of sub-operations, that is, each operation needs to be completed by a plurality of instructions. Taking 3 pieces of Data as an example, the first Data comprises 2 sequentially correlated instructions for each operation of the first node on the first Data based on a first operation request, if the first Data comprises three Data1, data2 and Data3, the first operation request comprises 6 instructions, the instruction 1 is to perform operation 1 on the Data1 to obtain Data4, and the instruction 2 is to perform operation 2 on the Data4 to obtain Data5; the instruction 3 is to perform operation 3 on the Data2 to obtain Data6, and the instruction 4 is to perform operation 4 on the Data6 to obtain Data7; instruction 5 is to perform operation 5 on Data3 to obtain Data8, and instruction 6 is to perform operation 6 on Data8 to obtain Data9, and it can be understood that, in this case, the second Data is Data corresponding to execution results 2,4, and 6: data5, data7, data9.

Similarly, after the master node executes each instruction, the master node performs corresponding recording, so as to generate a redo log, and after the master node executes the above 6 operation instructions, the recording results of a group of redo logs generated are shown in table 5 below:

TABLE 5

In one implementation, the redo log also records the operation time, the operation instruction, and the execution result for each operation object in the first data, as shown in table 6.

TABLE 6

The redo log in table 6 has log entries, and each log entry correspondingly records the operation time and the execution result of each operation object in the first data or the operation object generated in the middle when the master node executes each instruction. At this time, the Data5, data7, and Data9 corresponding to the execution results 2,4,6 in table 6 are the second Data.

Similarly, at this time, after the main node executes the operation instructions SQL1-SQL6 on the data in the first data to obtain the execution results 1-6, the main node also assigns identifiers 001-006 to each execution result, and the identifiers 001-006 are in one-to-one correspondence with the log entries 001-006 of the redo log, so as to facilitate subsequent reading of the execution results 2,4,6.

S103, the main node sends (or stores) the execution result and a first operation instruction (namely, an analysis result of a first operation request, namely SQL 1-N) to a shared memory device, wherein the execution result comprises the second data;

when the master node process and the backup node process are started, the master node and the backup node can be connected to the shared memory device, the master node has read-write permission to the shared memory device, when the master node receives the first operation request from the client, a processor in the master node executing the first operation request can directly access the shared memory device, so that an execution result (including second data) generated when the master node executes the first operation request is sent (or stored) into a memory buffer area of the shared memory device, and meanwhile, the master node also sends (or stores) a first operation instruction into the memory buffer area of the shared memory device. It can be understood that the second data is composed of a final execution result generated by the master node sequentially executing each operation instruction of the first request on the first data, and the final execution result is also an execution result 1-N corresponding to each log entry 001-00N in the redo log in table 2 or an execution result 3, an execution result 2,4, and 6 in tables 4 and 6. And the analysis result of the first operation request is the operation instructions 1-N in table 1 and table 2, and specifically may be SQL statements 1-N.

Optionally, the master node further stores the redo log in the shared memory device. Because the local disk speed of the master node is too low, the master node may first temporarily store the redo log in the shared memory device, and it can be understood that at this time, a group of redo logs stored in the shared memory device is a redo log buffer (redo log buffer), and the execution result stored in the shared memory device is a data buffer of the changed new data.

And S104, the main node submits a transaction, and the transaction is used for indicating the main node to send the redo log to the standby node.

In one implementation mode, the main node submits N transactions in sequence within time t1-tn, wherein the N transactions are used for indicating the main node to send the contents recorded in log entries 001-00N in the redo log to the standby node in sequence; the primary node 111 may also send a set of redo logs to the standby node 112 when a transaction is committed, for example, send a corresponding set of redo logs in the log entries 001-00N in table 1.

S105, the standby node acquires an execution result from the shared memory device based on the redo log;

specifically, in a first case, with reference to tables 1 to 2, when the first data is multiple, the first operation request includes multiple instructions, and the first operation instruction corresponds to one instruction for each of the multiple first data, which can be understood, and an execution result corresponding to each instruction is a final execution result, in this case, the standby node sequentially queries and reads an execution result corresponding to each log from the shared memory device through an SQL statement corresponding to each log entry based on an entry of each log in a group of redo logs, where the execution result is the second data. After the backup node acquires the redo log, because the operation record of the master node for each operation object in the first data is recorded in the redo log record sequence, the backup node may query, according to the redo log in table 1, the operation instruction 1-N (i.e., SQL statement cache) corresponding to each log entry of the redo log from the shared memory device, query, according to the operation instruction 1-N, the execution result 1-N corresponding to the operation instruction 1-N from the shared memory device, and sequentially read the execution result 1-N from the shared memory device according to the order of the log entries 001-00N. It should be noted that, at this time, because the SQL statement corresponding to each log entry has the identifier corresponding to the execution result, for example, the identifiers of SQL1 and execution result 1 are both log entries 001, the SQL statement corresponding to each log entry may sequentially query from the shared memory device, and read the corresponding execution result, where the execution result is the second data.

It should be noted that, in an implementation manner, if multiple pieces of data in the first data are data having a mutual dependency relationship, multiple transactions also have a certain dependency relationship, and at this time, the execution result also needs to be sequentially read from the shared memory device in strict order of the time stamp of each log corresponding to the log entries 001 to 00N.

In an implementation manner, the standby node may query, according to the redo log in table 1, an operation instruction 1-N corresponding to each log entry of the redo log from the shared memory device, and because the operation instruction 1-N further includes a query instruction 1-N corresponding to the operation instruction, the standby node may query, according to each log entry, a query instruction corresponding to one of the log entries at this time, and sequentially query, through the query instruction 1-N, an execution result 1-N corresponding to the query instruction 1-N.

Specifically, each query instruction 1-N includes an identifier "a1-an" corresponding to the execution result, so that the standby node can sequentially query and read the execution results 1-N from the shared memory device when executing the query instruction 1-N.

In an implementation manner, because the master node assigns, when generating the execution result, an identifier corresponding to each redo log to each execution result, the standby node may further directly and sequentially read the execution result corresponding to each entry based on an entry (i.e., an identifier) of each log in a group of redo logs, where the execution result is the second data.

In case two, by combining the above tables 3 to 4, when the first data is one, and the first operation request includes a plurality of sequentially correlated instructions, the first operation instruction corresponds to the plurality of sequentially correlated instructions for the first data, it can be understood that, at this time, the execution result corresponding to the last instruction in the plurality of instructions is the final execution result, and then, in this case, the standby node queries and reads the execution result corresponding to the last log entry by redoing the last SQL statement of the log based on the log entry in the redo log, where the execution result is the second data.

In a third case, by combining the above tables 5 to 6, when the first data is multiple, the first instruction corresponds to multiple sequentially correlated instructions for each of the first data, it can be understood that, at this time, an execution result corresponding to a last instruction of the multiple instructions corresponding to each of the first data is a final execution result, and in this case, the standby node sequentially queries, based on log entries in a group of redo logs, through a last log entry of each of the first data, and reads an execution result corresponding to a last log entry of each of the first data from the shared memory device, where the execution result is the second data.

For example, in the implementation manners recorded in tables 1 to 2, the final execution result is the execution results 1 to N, and at this time, the standby node reads the execution results 1 to N sequentially from the shared memory device, that is, the second data is composed of the execution results 1 to N; in the implementation manners recorded in tables 3 to 4, the final execution result is the execution result 2, and at this time, the standby node only needs to read the execution result 2 corresponding to the last SQL2 statement, that is, the second data consists of the execution result 2; in the implementation manners recorded in tables 5 to 6, the final execution results are the execution results 2,4, and 6, and therefore, the standby node needs to sequentially read the execution results 2,4, and 6 corresponding to SQL2, SQL4, and SQL6, that is, the second data is composed of the execution result 2, the execution result 4, and the execution result 6.

It is understood that the execution result is an execution result of the operation on the first data by the master node based on the first operation request.

In an implementation manner, if the redo log further records an execution result of the master node on each operation object in the first data, after the redo log is acquired, the backup node may query, according to the redo log in table 2, the execution result 1-N corresponding to each log entry of the redo log from the shared memory device, and sequentially read a final execution result from the shared memory device, which may be understood that the final execution result constitutes new second data.

In one implementation, since the data stored in the shared memory device is all cache data and is limited by the capacity of the shared memory device, that is, after the data in the shared memory device is fully written, the previously stored data is refreshed by the data newly written by the master node, so that the execution result stored in the shared memory device may be lost, and based on this, after the backup node receives the redo log, the second data may be generated in the following manner: the standby node inquires the execution result from the shared memory device based on the redo log; if the backup node cannot inquire the execution result from the shared memory device, the backup node sequentially reads the operation instructions 1-N corresponding to each log entry of the redo log from the shared memory device through the redo log, and since the data in the backup node is always synchronous with the data in the master node, the backup node also stores the first data, at this time, the backup node can sequentially execute the operation instructions 1-N sequentially read from the shared memory device on the first data, and write the final execution result into the corresponding storage position in the local disk according to the log entry sequence, thereby generating the execution result. The above process may be understood as a replay process of the redo log, that is, according to an instruction recorded in the redo log and executed in the master node, the execution result is restored by replaying item by item according to the redo log.

In an implementation manner, the standby node may also directly read the operation instructions 1 to N corresponding to each log entry of the redo log from the shared memory device through the redo log without querying the execution result from the shared memory device, and sequentially execute the operation instructions on the first data, thereby obtaining the execution result.

In an implementation manner, if the standby node cannot obtain the execution result from the shared memory device, nor the operation instruction 1-N, the standby node may further analyze the redo log by using a log analyzer based on the redo log received from the master node, so as to obtain an analysis result of the operation instruction SQL1-N recorded in each log entry in table 1, that is, the analysis result is a specific SQL1-N statement instruction, that is, an instruction used when the first data is processed in the master node before, and sequentially operate the first data stored in the standby node according to the SQL1-N statement instruction and the record content of the redo log, so as to obtain an execution result.

In an implementation manner, the system may also enable the standby node to have a read-write permission for the shared memory device, and since the first data is also stored in the shared memory device at this time, the standby node may sequentially execute the first data in the shared memory device according to the operation instructions 1 to N sequentially read from the shared memory device, and sequentially write the final execution result into the corresponding storage location in the local disk according to the log entry sequence 001 to 00N, thereby generating the execution result.

S106, the standby node writes the execution result into a local second disk to obtain a second data file;

after the standby node generates the execution result, in order to prevent the execution result (including the second data) from being lost, the standby node writes the execution result into a local second disk to obtain a second data file, so that the execution result is permanently stored. It can be understood that the second data file is stored in the database of the local disk of the standby node at this time.

Specifically, after the standby node sequentially reads the final execution results, since the execution results may have a write error in writing to the local disk, in order to find out which specific execution result has the write error in time when the write error occurs, it is necessary to sequentially write the final execution results to corresponding storage locations in the local disk strictly according to the sequence of the log entries in the redo log, so as to implement generation of the second data on the standby node.

In an implementation manner, a plurality of data in the first data are data having a mutual dependency relationship, and then a plurality of transactions also have a certain dependency relationship, at this time, it is also necessary to sequentially write the execution result into the local disk of the standby node in the order of the time stamps of each log corresponding to the log entries 001-00N.

For example, in the implementation manners recorded in tables 1 to 2, since the final execution result is the execution result 1 to N, at this time, the standby node writes the execution result 1 to N into the local second disk according to the log entry sequence in the redo log;

in the implementation manners recorded in tables 3 to 4, the final execution result is execution result 3, and at this time, the standby node needs to write the execution result 3 corresponding to the last log entry into the local second disk;

in the implementation manners recorded in tables 5 to 6, the final execution results are the execution result 2, the execution result 4, and the execution result 6, and at this time, the standby node needs to write the execution result 2, the execution result 4, and the execution result 6 in the local second disk in sequence according to the log entry order.

By the method, the standby node can directly acquire the operation instruction (namely SQL statement cache) from the shared memory device based on the redo log, and process the first data in the standby node based on the operation instruction so as to acquire the second data, so that the problem that the redo log file needs to be re-analyzed in the standby node to acquire corresponding analysis data (corresponding operation instruction) in the prior art can be reduced, and the analysis time of the log is reduced.

In one implementation, as shown in fig. 4 below, fig. 4 is a schematic diagram of another flow chart of data backup between the primary node 111 and the backup node 112.

After the step S104, steps S1041 to S1051 are further included:

s1041, the backup node performs local disk writing processing on the redo log to obtain a second log file;

s1042, the standby node sends a first message to the main node, wherein the first message is used for indicating the main node to write the redo log;

when the master node submits the transaction, the backup node can receive the redo log from the master node, and since data stored in the shared memory is easy to lose, when the backup node acquires the redo log, the redo log is subjected to local disk writing, that is, the redo log is written into a local disk of the backup node, so that a second log file is obtained. And then, the standby node sends a first message to the master node, wherein the first message is used for indicating the master node to write the redo log.

It should be noted that, in the embodiment of the present application, when the master node works normally, the master node usually undertakes a corresponding service initiated by the client, and when the master node transmits the redo log to the standby node synchronously, the master node performs the disk-dropping process of the redo log after receiving a disk-dropping confirmation message of the standby node for the redo log. That is, the primary node transmits the first message to the standby node so that the primary node confirms that the redo log is completed and synchronously transmitted.

And S1043, the master node receives the first message, and performs local disk writing processing on the redo log to obtain a first log file.

The purpose of the local disk writing processing of the redo log by the master node is consistent with that of the standby node, which is not described herein again.

S1044, the main node reads the execution result from the shared memory device according to the first log file;

the manner of reading the execution result from the shared memory device by the master node according to the first log file is the same as that of the standby node, and details are not repeated here.

And S1045, writing the execution result into a local first disk to obtain a first data file.

S1051, the standby node obtains the execution result from the shared memory device based on the second log file.

After the master node executes the step S1042, the first log file generated based on the redo log has been subjected to disk writing processing, that is, the first log file will be permanently stored by the master node, at this time, the master node will sequentially read the execution results 1-N from the shared memory device according to the log entry of the first log, that is, read the second data, and sequentially write the execution results 1-N into corresponding storage locations in the master node disk according to the log entry of the redo log, thereby implementing writing of the newly generated second data into the local first disk of the master node, and obtaining the first data file, so that the second data is permanently stored.

It should be noted that, when the master node receives a data request from the client, and when the master node executes data processing according to the data request from the client, the shared memory device may implement operations such as fast reading and writing, so the data processing is performed in the shared memory device, and the data cache generated by the data processing is also stored in the shared memory device, that is, new data generated by the master node during the data processing is temporarily stored in the shared memory device, and since the new data cached in the shared memory device is easily lost, the master node writes the new data into a disk after performing the data processing operation, so that the new data is permanently stored. It can be appreciated that the first data file is now stored in the database of the primary node's local disk.

Through the steps, when the backup node backs up the second data newly generated in the main node, the backup node can directly acquire the execution result of the redo log from the shared memory device based on the redo log, so that the time for re-analyzing the redo log in the backup node and re-compiling the SQL statement to generate the second data according to the analysis result in the prior art is reduced, and therefore, when the main node fails, the problem that the service interruption time is too long due to the fact that the replay speed of the backup node log lags behind the transmission speed of the main node log and the backup node log in the prior art can be solved, and the efficiency for synchronously updating the new data of the main node by the backup node is improved.

After step S106 is executed, the data in the standby node is consistent with the data in the main node, at this time, if the main node fails, the authorization software or the HA software switches the service to the standby node to execute, at this time, the standby node bears the service of the main node, and the standby node HAs a read-write permission to the shared memory device at this time.

An embodiment of the present application further provides a data backup system, where the data backup system includes the following components shown in fig. 2: the system comprises a main node 111, a standby node 112 and a shared memory device 113, wherein the shared memory device 113 is in communication connection with the main node 111 and the standby node 112.

The main node 111 is configured to receive a first operation request from a client, generate second data from first data based on the first operation request, and analyze an operation of the first data in response to the first operation request to obtain an analysis result; wherein the operation of the first data request on the first data has an execution result and a parsing result;

recording the operation of the main node on the first data based on the first operation request through the redo log; and storing the analysis result and the execution result corresponding to the first operation request in the shared memory device 113, and writing the second data into the local first disk.

In one implementation, the master node is further configured to: and recording an execution result corresponding to the first operation request through the redo log.

In one implementation, the master node is further configured to store the redo log in a shared memory device.

When the master node 111 works normally, a data request operation from a client may be received, where the data request operation includes, but is not limited to, a modification of service data in the master node, an addition of certain data, a deletion of certain data or a change in a logical relationship between service data, a modification of a storage location of certain data in the service data, and the like.

After receiving the first operation request from the client 101, the host node 111 performs a correlation operation on the locally stored first data in response to the data request to obtain second data, where the first data may be cache data stored in the shared memory device or data stored in a disk of the host node. It should be noted that, here, the shared memory device 113 has the same performance as the memory of the master node 111, that is, the master node 111 has a higher speed when reading and writing data, but the data stored therein is easily lost. Due to the above performance of the shared memory device 113, when the first data is data stored in the disk of the primary node, the primary node 111 needs to read the first data from the disk, write the first data into the shared memory device, and perform related processing on the first data. When the master node 111 responds to the first operation request to process the first data and obtain an execution result, the master node 111 may record a processing procedure of the master node on the first data based on the first operation request through a redo log. That is, the master node is configured to perform the method in steps S101-S102, S1021.

The primary node 111 may also be configured to transmit the redo log to the standby node 112 when its transaction commits; and the master node is configured to perform local disk writing processing on the redo log when receiving the first message of the standby node 112, so as to obtain a first log file. Namely, the master node is configured to execute the methods in S1042 and S1043 in the above steps.

In one implementation mode, the master node submits N transactions in sequence within time t1-tn, where the N transactions are used to instruct the master node 111 to send the redo log to the standby node 112, that is, to send the content recorded in log entries 001-00N in the redo log in sequence; the primary node 111 may also send a set of redo logs to the standby node 112 when a transaction is committed, for example, send a corresponding set of redo logs in the log entries 001-00N in table 1.

The standby node 112 is configured to: and receiving the redo log sent by the master node 111.

When the transaction of the master node 111 is submitted, the standby node 112 may receive the redo log from the master node 111, and perform disk writing processing on the redo log to obtain a second log file.

The standby node 112 is further configured to send a first message to the master node 111, where the first message is used to instruct the master node 111 to perform disk writing processing on the redo log.

The standby node 112 is further configured to: the execution result and/or the first operation instruction (i.e., the parsing result) corresponding to the first operation request are sequentially read from the shared memory device 113 based on the redo log. The execution result includes the second data.

The standby node 112 is further configured to write the execution result to a local second disk. That is, the standby node 112 is configured to execute the methods in steps S104, S0141, S1042, S105, and S106.

As shown in fig. 5, fig. 5 shows another schematic diagram of the data backup system, wherein the primary node 111 includes a first session management module 1114, a first control module 1115, a first log management module 1116 and a primary DB (DataBase, DB) 1117, and the standby node 112 includes a second session management module 1124, a second control module 1125, a second log management module 1126 and a standby DB1127.

The first session management module 1114 is configured to receive a data request from the client 101, such as the first operation request in the method S101.

The first control module 1115 is configured to process the first data according to the data operation request of the client 101 to obtain an execution result, store an analysis result (a first operation instruction) and the execution result corresponding to the first operation request in the shared memory device 113, and write the execution result in the local disk, for example, execute the methods in steps S102 and S1045.

The first log management module 1116 is configured to perform an operation on the first data based on the first operation instruction by the redo log record master node to obtain a processing procedure of an execution result, as in the method in the above step S102; the first log management module 1116 is further configured to send the redo log to the standby node 112, communicate with the second log management module 1126 in the standby node 112, such as receiving the first message in the standby node 112, and write the redo log to the local disk to obtain the first log file, as in the foregoing steps S1042 and S1043.

The first log management module 1116 is further configured to record an execution result corresponding to the first operation instruction through a redo log, as in the method in step S1021.

The master DB1117 stores first data, a first data file, and a first log file.

The second session management module 1124, the second log management module 1126, and the standby DB1127 in the standby node 112 are similar to the functions in the main node, and are not described herein again. The second log management module is configured to receive the redo log sent by the master node 111, as in the method in step S104, and write the redo log into the local disk of the standby node 112 to obtain a second log file, as in the method in step S1041, and the second control module 1125 is configured to read an execution result from the shared memory device according to the redo log, and write the execution result into the local disk, as in steps 105 to 106, or in step S1051.

It should be noted that the first session management module 1114, the first control module 1115 and the first log management module 1116 may be implemented by corresponding program codes running in a CPU processor in the master node, and similarly, the second session management module 1124, the second control module 1125 and the second log management module 1126 may be implemented by corresponding program codes running in a CPU processor in the slave node.

In one embodiment, the shared Memory device 113 is an external Memory device, and may be any one of a persistent Memory PMEM (PMEM), a Dynamic random access Memory DRAM (Dynamic RAM), a static random access Memory SRAM (static RAM), and a cache Memory (cache), for example.

In one implementation, the shared memory device may also be a memory device in any one of another computing apparatus, a computing device, and a server, in addition to the master node and the standby node; the shared memory device may further have a controller that may actively send the redo log to the standby node 112, and in this case, the standby node 112 may obtain the execution result from the shared memory device based on the redo log, so that even in the case of asynchronous transmission of the redo log, the standby node may still obtain the complete redo log recorded in the main node from the shared memory device, and obtain the execution result from the shared memory device according to the redo log, where a specific method is the same as the method for the standby node to obtain the execution result from the shared memory device in the foregoing method S105, and it is no longer described here any more, and it is ensured that data is not lost.

In one embodiment, the shared memory device 113 is connected to the primary node 111 and the standby node 112 by CXL (computer Express Link).

In an implementation manner, data communication is performed between the master node 111 and the standby node 112 through a TCP/ip transmission protocol (TCP/ip), that is, the master node 111 sends the redo log to the standby node 112 by using a TCP/ip protocol.

Fig. 6 shows a schematic diagram of the internal hardware composition of the master node 111, where the master node 111 includes: processor 1111, memory 1112, interface 1113, wherein processor 1111, memory 1112, and interface 1113 may be connected by bus 1114 or otherwise. In this embodiment, processor 1111 is a compute core and a control core of master node 1111. For example, the processor 1111 may implement the steps in the above method steps S102, S1021, S103, S104, S1043, S1044, S1045. The memory 1112 may store the first data and the second data in the above steps. The interface 1113 is configured to receive and transmit data, for example, the interface 1113 may receive the first operation request from the client in the step S101 and send the redo log to the standby node 112 through the interface 1113.

Fig. 7 is a schematic diagram illustrating an internal hardware composition of the standby node 112, where the standby node 112 includes: a processor 1121, a memory 1122, and an interface 1123, wherein the processor 1121, memory 1122, and interface 1123 may be connected via a bus 1124 or otherwise. In this embodiment of the present invention, the processor 1121 is a computing core and a control core of the main node 1121. For example, the processor 1121 may implement the steps in the above-described method steps S105, S106, S1041, S1042, S106. The memory 1122 may store the first data and the second data in the above steps. The interface 1113 is used for transceiving data, for example, the interface 1123 may receive a data request from a client and receive a redo log sent by the master node 111 when the master node 111 fails and switches to the standby node 112.

The memories in the master node 111 and the slave node 112 may be volatile or nonvolatile memories, or may include both volatile and nonvolatile memories. The non-volatile memory may be a read-on-ly memory (ROM), a programmable read-only memory (PROM), an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEPROM), or a flash memory. Volatile memory can be Random Access Memory (RAM), which acts as external cache memory. By way of example but not limitation, many forms of RAM are available, such as static random access memory (stat IC RAM, SRAM), dynamic Random Access Memory (DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate synchronous dynamic random access memory (DDR SDRAM), enhanced SDRAM (ESDRAM), SLDRAM, and DPRAM (synchronous I-RAM, SLDRAM). The memory may also be a Solid State Disk (SSD), a mechanical Hard disk (Hard Di sk Dr, HDD), or a solid State hybrid Hard disk (SSHD).

Buses

1114 and 1124 in master node 111 and standby node 112 may include buses of one or more communications protocols. Also,

buses

1114 and 1124 may include hardware, software, or both to couple the components of master node 111 and standby node 112 to each other. By way of example, and not limitation,

buses

1114 and 1124 may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), a Hyper Transport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an infiniband interconnect, a Low Pin Count (LPC) bus, a memory bus, a Micro Channel Architecture (MCA) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI E) bus, a Serial Advanced Technology Attachment (SATA) bus, a video electronics standards association local (VLB) bus, or other suitable bus, or a combination of two or more of these.

Interfaces 1113, 1123 in primary node 111 and standby node 112 may be 3G communication interfaces, long Term Evolution (LTE) (4G) communication interfaces, 5G communication interfaces, WLAN communication interfaces, WAN communication interfaces, and the like. Not limited to wireless communication interfaces, the master node 111 and the slave node 112 may be configured with wired communication interfaces to support wired communication.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application. In addition, in some possible implementation manners, each step in the above embodiments may be selectively executed according to an actual situation, may be partially executed, or may be completely executed, and is not limited herein. The master node and the standby node can be respectively interpreted as a first node and a second node or a master database node, a standby database node or a first database node and a second database node.

It is understood that the method steps in the embodiments of the present application may be implemented by hardware, or by software instructions executed by a processor. The software instructions may comprise corresponding software modules that may be stored in Random Access Memory (RAM), flash memory, read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), registers, a hard disk, a removable hard disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an ASIC.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on or transmitted over a computer-readable storage medium. The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a Solid State Disk (SSD)), etc.

It is to be understood that the various numerical references referred to in the embodiments of the present application are merely for convenience of description and distinction and are not intended to limit the scope of the embodiments of the present application.

Claims

1. A method of data backup, the method being applied to a data backup system, the data backup system comprising: the system comprises a first node, a second node and a shared memory device;

the first node operates first data based on a first operation instruction to obtain an execution result, and sends the execution result to the shared memory device;

the first node records by redoing the log: the first node obtains the processing procedure of the execution result based on the first operation instruction;

the first node sends the redo log to the second node;

and the second node acquires the execution result from the shared memory device based on the redo log.

2. The method of claim 1, wherein: the redo log has an identifier corresponding to the execution result, and the second node obtains the execution result from the shared memory device based on the redo log, which specifically includes:

and the second node acquires the execution result from the shared memory device based on the identification of the redo log.

3. The method of claim 2, wherein: the first node operates the first data based on the first operation instruction to obtain an execution result, and the method includes:

and the first node sequentially executes each of the first operation instructions on each of the first data to obtain a plurality of execution results.

4. The method of claim 3, wherein: the redo log is provided with a plurality of logs, wherein each log is provided with an identifier corresponding to the execution result one to one;

the second node obtains the execution result from the shared memory device based on the redo log, and specifically includes:

and the second node sequentially acquires the multiple execution results from the shared memory device based on the identification of each log in the redo log.

5. The method of claim 1, wherein: the second node stores the first data; before the redo log is sent to the second node by the first node, the first node also sends the first operation instruction to the shared memory device; the second node obtains the execution result from the shared memory device based on the redo log, and specifically includes:

the second node queries the execution result from the shared memory device based on the redo log,

and under the condition that the second node cannot inquire the execution result, the second node acquires the first operation instruction from the shared memory device based on the redo log, and operates the first data based on the first operation instruction to obtain the execution result.

6. The method of claim 5, wherein: the redo log has an identifier corresponding to the first operation instruction; the second node acquires the first operation instruction from the shared memory device based on the redo log, and operates the first data based on the first operation instruction to obtain the execution result, including:

and the second node acquires the first operation instruction from the shared memory device based on the identifier of the redo log, and operates the first data based on the first operation instruction to obtain the execution result.

7. The method according to any one of claims 5-6, wherein: the first node operates the first data based on the first operation instruction to obtain an execution result, and the method includes:

8. The method of claim 7, wherein: the redo log is provided with a plurality of logs, wherein each log is provided with an identifier corresponding to each operation instruction one to one;

the second node obtains the first operation instruction from the shared memory device based on the redo log, and operates the first data based on the first operation instruction to obtain the execution result, and the method specifically includes:

and the second node sequentially acquires the operation instructions corresponding to the identification of each log in the redo log from the shared memory device one by one based on the identification of each log in the redo log, and sequentially executes each of the operation instructions for each data in the first data according to the sequence of the identification to sequentially obtain a plurality of execution results.

9. A data backup system, characterized by: the data backup system includes: the system comprises a first node, a second node and a shared memory device;

the second node is configured to: and acquiring the execution result from the shared memory device based on the redo log.

10. The system of claim 9, wherein: the redo log also has an identifier corresponding to the execution result;

the second node is further configured to: and acquiring the execution result from the shared memory device based on the identifier of the redo log.

11. The system of claim 10, wherein: the first data is multiple, and the first operation instruction is multiple;

the first node is further configured to: and sequentially executing each of the first operation instructions on each of the first data to obtain a plurality of execution results in sequence.

12. The system of claim 11, wherein: the redo log is provided with a plurality of logs, wherein each log is provided with an identifier corresponding to the execution result one to one;

the second node is further configured to: and sequentially acquiring the execution result from the shared memory device based on the identification of each log in the redo log.

13. The system of claim 9, wherein: the second node stores the first data therein, and the first node is further configured to: sending the first operation instruction to the shared memory device;

the second node is further configured to: and querying the execution result from the shared memory device based on the redo log, acquiring the first operation instruction from the shared memory device based on the redo log under the condition that the execution result is not queried, and operating the first data based on the first operation instruction to obtain the execution result.

14. The system of claim 9, wherein: the redo log has an identifier corresponding to the first operation instruction;

the second node is further configured to: and acquiring the first operation instruction from the shared memory device based on the identifier of the redo log, and operating the first data based on the first operation instruction to obtain the execution result.

15. The system according to any one of claims 13-14, wherein: the first data is multiple, and the first operation instruction is multiple;

16. The system of claim 15, wherein: the redo log is provided with a plurality of logs, wherein each log is provided with an identifier corresponding to each operation instruction one to one;

the second node is further configured to: based on the identification of each log in the redo log, sequentially acquiring operation instructions corresponding to the identification of each log one by one from a shared memory device, and sequentially executing each of the operation instructions for each data in the first data according to the sequence of the identification to sequentially obtain a plurality of execution results.

17. A master database node, characterized by: the master database node includes: the master database node is connected with the shared memory device;

the first control module is configured to: operating the first data based on the first operation instruction to obtain an execution result, and sending the execution result to the shared memory device;

the first log management module is configured to: by redoing the log record: and the first control module obtains the processing process of the execution result based on the first operation instruction and sends the redo log to a database node.

18. The node of claim 17, wherein: the first data are multiple, and the first operation instruction is multiple;

the first control module is further configured to: and sequentially executing each of the first operation instructions on each of the first data to obtain a plurality of execution results in sequence.

19. The node according to any of claims 17-18, characterized by: the first control module is further configured to: and sending the first operation instruction to the shared memory device.

20. A database backup node, comprising: the database node comprises: the backup database node is connected with the shared memory device;

the second log management module is configured to: receiving a redo log sent by a master database node; the redo log records a processing process that the master database node operates the first data based on the first operation instruction to obtain an execution result;

the second control module is configured to: and acquiring the execution result from the shared memory device based on the redo log.

21. The node of claim 20, wherein: the redo log also has an identifier corresponding to the execution result;

the second control module is further configured to: and acquiring the execution result from the shared memory device based on the identifier of the redo log.

22. The node of claim 20, wherein: the first data are stored in the database nodes;

the second control module is further configured to: querying the execution result from the shared memory device based on the redo log, acquiring the first operation instruction from the shared memory device based on the redo log under the condition that the execution result is not queried, and operating the first data based on the first operation instruction to obtain the execution result; wherein the first operation instruction is sent from the master database node to the shared memory device.

23. The node of claim 22, wherein: the redo log has an identifier corresponding to the first operation instruction;

the second control module is further configured to: and acquiring the first operation instruction from the shared memory device based on the identifier of the redo log, and operating the first data based on the first operation instruction to obtain the execution result.