WO2019020081A1

WO2019020081A1 - Distributed system and fault recovery method and apparatus thereof, product, and storage medium

Info

Publication number: WO2019020081A1
Application number: PCT/CN2018/097262
Authority: WO
Inventors: 褚建辉; 卢申朋; 刘东辉; 王新栋
Original assignee: 广东神马搜索科技有限公司
Priority date: 2017-07-28
Filing date: 2018-07-26
Publication date: 2019-01-31
Also published as: CN107357688B; CN107357688A

Abstract

The present invention discloses a distributed system and a fault recovery method and an apparatus thereof, a product, and a storage medium. The method comprises: a slave node and/or a master node obtains and stores a metadata mirror having a record of scheduling information and a system status of the master node at a certain time; the master node obtains and stores a redo log having a record of all operations of the master node after the certain time; and when a fault occurs in the master node, the master node invokes the metadata mirror and its corresponding redo log to execute fault recovery. When a fault occurs, the method enables a master node to quickly return to a pre-failure state by means of a previously recorded metadata mirror and redo log.

Description

Distributed system and its fault recovery method, device, product and storage medium

Technical field

The present invention relates to the field of distributed technologies, and in particular, to a distributed system and a method, apparatus, and storage medium for the same.

Background technique

A distributed system is a combination of multiple machines that are organically combined to perform a task, such as computing tasks and storage tasks. It is a software system built on top of the network. The existing distributed systems are mostly master-slave architectures. FIG. 1 is a schematic diagram showing the structure of a distributed system employing a master-slave architecture. As shown in FIG. 1, the distributed system of the master-slave architecture is mostly composed of a master node and a plurality of slave nodes. As the central scheduling node of the distributed system, the master node usually has functions such as metadata storage and query, cluster node state management, decision making and task delivery. The metadata managed by the master node is the more important data in the system. The loss of data on the node has a greater impact on the system.

Therefore, a failover mechanism is needed, so that when the primary node encounters an unknown error and crashes, the primary node can be restored to the state before the error occurred, and the loss of the primary node data is avoided.

Summary of the invention

The invention provides a distributed system and a fault recovery method, device, product and storage medium thereof, which acquires metadata mirroring of a master node at one or more moments, and records the operation of the master node in a redo log, When the primary node fails, the primary node can be quickly restored to the pre-failure state based on the previously recorded metadata mirroring and redo logs.

According to a first aspect of the present invention, there is provided a distributed system comprising a master node for scheduling tasks and managing system states and a plurality of slave nodes for running scheduled tasks, wherein one or more slaves The node and/or the master node acquires and saves a metadata image recorded with scheduling information and system status at a certain moment on the master node; the master node acquires and saves a redo log recording all operations of the master node after the moment; and the master node The metadata mirror and its corresponding redo log are called for failure recovery when the fault is recovered.

Thus, according to the previously recorded metadata mirroring and redo logs, the primary node can be quickly restored to the state before the failure, and the recovery efficiency can be improved compared with the manner of recording only the log files.

Preferably, one or more slave nodes and/or master nodes perform metadata mirroring acquisition and save operations triggered by the master node and/or external commands. Therefore, different trigger modes can be set according to the characteristics of the distributed system to trigger the acquisition and save operation of the metadata mirror.

Preferably, the master node responds to the slave node's request after each operation is recorded in the redo log and stored. This ensures that the redo log can fully record every operation of the primary node.

Preferably, the one or more slave nodes and/or the master node continuously acquire and save the metadata mirror of the master node at a plurality of different moments, and the master node continuously acquires and saves the redo logs respectively corresponding to the plurality of different moments. The master node can call the latest metadata mirror and its corresponding redo log for fault recovery when the fault is recovered, and call the metadata mirror and its corresponding when the latest metadata mirror and/or its corresponding redo log are unavailable. The redo logs are available for recovery at the most recent time. Thus, by storing a plurality of memory images at different times and corresponding redo logs, the fault tolerance rate at the time of failure recovery can be improved.

Preferably, one or more slave nodes and/or master nodes directly acquire and save the memory state of the master node at a certain moment as a metadata mirror. Metadata mirroring can be stored in groups of tasks. Thereby, the corresponding metadata mirror can be efficiently organized according to the grouping at the subsequent recovery.

According to a second aspect of the present invention, there is also provided a fault recovery apparatus for a distributed system, the distributed system including a master node for scheduling tasks and managing system states, and a plurality of slave nodes for running tasks, the device The method is used for recovering the fault when the primary node fails, and includes: a mirroring acquiring unit, configured to acquire and save a metadata mirror that records scheduling information and system status at a certain moment on the primary node; and the redo log obtaining unit uses Obtaining and saving a redo log of all operations of the primary node after the record is recorded; and a fault recovery unit for invoking the metadata mirror and its corresponding redo log for failure recovery when the fault is recovered.

Preferably, the image acquisition unit performs the acquisition and save operation of the metadata mirroring under the trigger of the master node, the device, and/or the external command.

Preferably, the master node responds to the request of the slave node after each operation thereof is recorded in the redo log by the redo log obtaining unit and stored.

Preferably, the image obtaining unit continuously acquires and saves the metadata mirror of the master node at a plurality of different times, and the redo log obtaining unit continuously acquires and saves the redo logs respectively corresponding to the plurality of different moments.

Preferably, the fault recovery unit calls the latest metadata mirror and its corresponding redo log for fault recovery when the fault is recovered.

Preferably, the fault recovery unit calls the data of the latest time available for the metadata mirror and its corresponding redo log to perform fault recovery when the latest metadata mirror and/or its corresponding redo log is unavailable.

Preferably, the image acquisition unit directly acquires and saves the memory state of the master node at a certain moment as a metadata image.

Preferably, the image acquisition unit stores the metadata image according to the task group.

According to a third aspect of the present invention, there is also provided a method for recovering a fault of a distributed system, the distributed system comprising a master node for scheduling tasks and managing system states, and a plurality of slave nodes for running tasks, The method is configured to perform fault recovery when the primary node fails. The method includes: acquiring and saving a metadata image of the scheduling information and the system state recorded at a certain moment; acquiring and saving the weight of all the scheduling operations after the recording has a time Do the log; and call the metadata mirror and its corresponding redo log for failback when the failure recovers.

Preferably, the metadata mirroring of the master node at a plurality of different moments is continuously acquired and saved, and the redo logs respectively corresponding to the plurality of different moments are continuously acquired and saved.

Preferably, invoking the metadata mirror and its corresponding redo log for fault recovery during fault recovery may include: calling the latest metadata mirror and its corresponding redo log for fault recovery during fault recovery; and in the latest element When data mirroring and/or its corresponding redo log is unavailable, the data of the latest time that the metadata mirror and its corresponding redo log are available are called for failure recovery.

Preferably, the memory state of the master node at a certain moment can be directly obtained and saved as a metadata mirror.

Preferably, the obtaining and saving operation of the metadata mirroring is performed under the trigger of the master node and/or an external command.

Preferably, the master node responds to the request of the slave node after each operation thereof is recorded in the redo log and stored.

Preferably, the metadata image is stored in accordance with a task grouping.

According to a fourth aspect of the present invention, a computer program product, comprising: a memory; a processor; and a computer program; wherein the computer program is stored in the memory and configured to be processed by the The method of the third aspect of the invention and any of its preferred aspects is performed.

A fifth aspect of the invention provides a computer readable storage medium comprising: a program, when executed on a computer, causing a computer to perform the method of the third aspect of the invention and any of the preferred aspects thereof.

The distributed system, the fault recovery method, the device, the product and the storage medium of the present invention obtain the metadata mirroring of the master node at one or more moments, and record the subsequent operations of the master node in the redo log, so that When the primary node fails, the primary node can be quickly restored to the pre-fault state based on the previously recorded metadata mirroring and redo logs.

DRAWINGS

The above and other objects, features, and advantages of the present invention will become more apparent from the aspects of the embodiments of the invention. The same parts.

FIG. 1 is a schematic diagram showing the architecture of a distributed system of a master-slave architecture.

FIG. 2 is a schematic flow chart showing a fault recovery method according to an embodiment of the present invention.

FIG. 3 is a diagram showing the continuous storage of a plurality of metadata mirrors and redo logs.

4 is a schematic block diagram showing the structure of a failure recovery device according to an embodiment of the present invention;

FIG. 5 is a structural diagram of a computer program product according to an exemplary embodiment of the present invention.

Detailed ways

Preferred embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While the preferred embodiment of the present invention has been shown in the drawings, it is understood that Rather, these embodiments are provided so that this disclosure will be thorough and complete.

For the distributed system of the master-slave architecture shown in Figure 1, since the master node stores the data necessary for the normal operation and scheduling of the system, such as system state data and current scheduling data, the loss of its data has a great impact on the system. Therefore, a failure recovery mechanism is needed so that when the primary node encounters an unknown error, the primary node can be restored to a stable and reliable state. For this, log files for all operations of the primary node can be recorded, and the log files can be stored persistently on the disk. Then, once the primary node fails, even if all the data in the memory of the primary node is lost, when the next startup, by replaying the recorded log files, the primary node can still be restored to the state before the failure.

The operation flow of the master node in the scheme is as follows: before the master node performs the operation, the operation may be recorded in the log file, and the operation may be performed after the recording succeeds, that is, the data in the memory may be updated based on the operation; The recovery process is as follows: the log file is read, and the data in the memory is sequentially modified based on the operation of the master node recorded in the log file. This method of recovering log files only by recording write operations is simple, but the recovery process takes a very long time.

Therefore, after in-depth research, the inventor found that in the process of recording the log file of the operation of the master node, the image file of the memory data of the master node at a certain moment can be interspersed, and the image file can represent that the master node is corresponding. The current state data at the moment, so that when the master node fails, the latest image file and the operation recorded in the log file after the time corresponding to the called image file can be called, and the master node can be implemented according to the called data. Recovery can significantly reduce the time required for recovery compared to just logging log files.

Based on the above concept, the present invention proposes a failure recovery scheme for a primary node in a distributed system, and the failure recovery scheme of the present invention can be implemented by the distributed system shown in FIG. 1. As shown in FIG. 1, the distributed system of the present invention may include a master node for scheduling tasks and managing system states and a plurality of slave nodes for running scheduled tasks. Both the master node and the slave node can be deployed in the server, and the master node can be deployed in a separate server different from the slave node, or can be deployed in the same server as one of the slave nodes. As a preferred embodiment, different nodes can be deployed in different servers. The distributed system shown in FIG. 1 is composed of a master node and a plurality of slave nodes. It should be understood that the distributed system of the present invention may further include a plurality of master nodes, and may also include other nodes than the master node and the slave node. Devices such as backup master nodes, failover databases, and more.

The specific process of implementing the fault recovery scheme of the distributed system of the present invention is described in detail below. FIG. 2 is a schematic flow chart showing a fault recovery method according to an embodiment of the present invention. The method shown in FIG. 2 can be implemented by the distributed system shown in FIG. 1, and in particular, can be implemented by a master node in a distributed system.

Referring to FIG. 2, in step S210, the metadata image of the scheduling information and the system state recorded at a certain moment on the master node is acquired and saved.

For a distributed system with a master-slave architecture, after the master node crashes, the entire distributed system is unavailable, so considering the importance of the master node, the master node usually does not directly run specific tasks, but is only responsible for maintaining distributed The operation of the system and the scheduling of tasks are assigned, and specific tasks can be performed by the slave nodes. That is to say, the primary node is mainly responsible for parsing the task request, allocating resources, and locating the target data or nodes according to the metadata, and the specific task is performed by the slave node specified by the master node. Wherein, the metadata is data for describing data, and the metadata in the present invention refers specifically to data that the primary node is responsible for saving and managing. Since the master node is used to schedule tasks and manage system state, the metadata may refer to data that records scheduling information and system status at a certain moment on the master node. For example, for a Hadoop distributed system, the metadata may be system related description data, system state data, current task scheduling and status data, etc., and for example, for a distributed storage system, the metadata may be a state describing user data. Data for information such as storage location.

The obtained metadata mirror of the master node at a certain time may be a mapping of the memory state of the master node at that moment, so that the memory state of the master node at a certain moment can be directly obtained and saved as a metadata mirror. In a specific implementation, the metadata mirror of the master node at a certain moment can be obtained by means of Snapshot or dump (backup file system).

The operation of obtaining the metadata image may be performed by the master node, by one or more slave nodes, or by a backup master node in the distributed system. The obtained metadata image can be stored persistently on a local disk or a distributed file system, for example, can be stored persistently in the failover database.

As an optional embodiment of the present invention, the master node may perform scheduling according to the packet concurrently when scheduling the task, and the obtained metadata mirror may be a metadata mirror under multiple groups, and therefore, the acquired metadata The mirroring can be stored according to the task group, and the metadata mirrors belonging to the same task group are stored in the same directory, so that the corresponding metadata mirror can be efficiently organized according to the grouping in subsequent recovery.

In step S220, the redo log in which all operations of the master node after the time is recorded may be acquired and saved by the master node. The operations described herein may refer to operations performed by the primary node on metadata or operations performed by the primary node on its in-memory data.

For each operation performed by the primary node, it can be recorded in the redo log. The operation information of the master node can be sequentially recorded in the redo log. For each operation that the primary node will perform, the operation can be performed by the primary node after the operation is recorded in the redo log and persisted. In this way, when the primary node fails during the execution of the operation, the operation can be resumed according to the data recorded in the redo log. Otherwise, if the re-recording is performed for an operation first, and the operation is interrupted during the execution of the operation or before the operation is recorded or saved, the operation cannot be resumed and can only be repeated.

For example, when the slave node requests a task from the master node (such as a computing task or a storage task), the master node may first record the operation of delivering the target data to the slave node in the redo log, and successfully record and persist the save. After that, the master node sends the target data to the slave node in response to the request of the slave node. In other words, the request for the slave node can be responded to the slave node's request after the master node's operation for the request is recorded in the redo log and stored (persistent storage).

In step S230, the metadata mirror and its corresponding redo log are called for failure recovery when the fault is recovered.

As mentioned above, metadata mirroring can be seen as a mapping of the memory state of the master node at a certain time, while redo logs record all operations of the master node. Therefore, when the primary node fails, the operation of the primary node may occur according to the metadata mirror acquired before the failure occurs and the operation of the primary node during the period before the failure of the primary node after the time corresponding to the metadata mirror recorded in the redo log. Fault recovery, restore the primary node to the state before the failure occurred. To redo log records in the file system, for example, you can recover as follows: After the primary node restarts, first traverse the metadata mirror directory in the file system, find the most recent metadata mirror, load it into memory, and then start. The redo log after loading the latest metadata image and start replay, so after the loading is complete, the entire recovery process is complete.

As an optional embodiment of the present invention, when saving the metadata mirror of the primary node, a plurality of metadata mirrors corresponding to different time instants may be saved. In the process of recording the redo log, the acquisition operation of the metadata mirror may be performed periodically or in response to satisfying the predetermined trigger condition. The above trigger condition may be, for example, a certain parameter satisfies a predetermined value, reaches a predetermined interval, or directly responds to an external trigger command. For example, the acquisition operation of the metadata mirror may be performed once every predetermined number of operations are recorded in the redo log, or the acquisition operation of the metadata mirror may be performed once every predetermined time.

Further, when the operation of the master node is recorded in the redo log, the redo logs respectively corresponding to a plurality of different time instants (ie, multiple metadata mirrors) may be continuously acquired. FIG. 3 is a schematic diagram showing the principle of continuously saving a plurality of metadata mirror files and their corresponding redo logs.

Referring to FIG. 3, the metadata mirror 1 of the master node at time t1 can be obtained first, and the operation of the master node between t1 and t2 can be recorded and stored in the redo log 1, and the metadata mirror of the master node can be acquired again at time t2. The operation of the master node between t2-t3 can be recorded and stored in the redo log 2, and so on, and the metadata mirrors respectively corresponding to the times t1, t2, and t3, and the metadata corresponding to the different moments respectively can be obtained. Mirrored redo logs.

Therefore, assuming that the master node crashes at time t4, the master node can first call the latest metadata mirror (ie metadata image at time t3) and its corresponding redo log (the weight within t3-t4 segment) during fault recovery. Do log) for failure recovery. If the latest metadata mirroring and redo logs are not available, you can further call the new metadata mirror (that is, the metadata mirror at time t2) and the redo log (that is, the redo log in the t2-t3 segment). Recovery, and so on, can be pushed back until the available data files are available. Thus, by storing a plurality of memory images at different times and corresponding redo logs, the fault tolerance rate at the time of failure recovery can be improved.

In other words, the solution of the present application can trigger the acquisition and storage of the metadata image (for example, save the state at time t3) under certain conditions or commands, and then start continuous recording of the redo log (ie, record t3). After all the operations). After the failure occurs at time t4, all the operations after t3 can be played back by restoring the state at time t3 so that the master node quickly returns to the state at time t4.

When acquiring the metadata mirror of the master node at a certain moment, for example, as shown in FIG. 3, when the metadata mirror 1 is acquired at time t1, the service of the master node is not stopped, and it takes a certain time to obtain the metadata mirror 1 Therefore, the metadata image 1 acquired at time t1 may contain some operations in redo log 1 after time t1. Therefore, when the master node fails at time t2, the metadata image 1 at time t1 and the corresponding redo are used. When log 1 is restored, it is likely that the state of the last restored primary node is inconsistent with the state before the recovery.

Therefore, as an optional embodiment of the present invention, in the process of acquiring the metadata mirror at a certain moment, the time of the operation recorded in the redo log at this time can be recorded in real time, and the metadata mirroring at a certain moment is obtained. After the completion, the corresponding operation can be removed from the redo log to avoid the phenomenon that the acquired metadata mirror includes some operations recorded in the redo log, so that the metadata mirror can be corresponding to the redo log at the time. Strictly contrasted.

The failure recovery method of the present invention has been described in detail so far with reference to Figs. 2-3. In addition, the fault recovery scheme of the present invention can also be implemented by a fault recovery device. 4 is a block diagram showing the structure of a fault recovery apparatus according to an embodiment of the present invention. The functional modules of the fault recovery device 400 may be implemented by hardware, software, or a combination of hardware and software that implements the principles of the present invention. Those skilled in the art will appreciate that the functional blocks depicted in FIG. 4 can be combined or divided into sub-modules to implement the principles of the above described invention. Accordingly, the description herein may support any possible combination, or division, or further limitation of the functional modules described herein.

The fault recovery apparatus 400 shown in FIG. 4 can be used to implement the fault recovery method shown in FIG. 2, and only the functional modules that the fault recovery apparatus 400 can have and the operations that can be performed by the functional modules are briefly described. For details, please refer to the description above in conjunction with FIG. 2, and details are not described herein again. It should be noted that the fault recovery apparatus 400 may be the primary node itself or a backup primary node.

As shown in FIG. 4, the fault recovery apparatus of the present invention may include a mirror acquisition unit 410, a redo log acquisition unit 420, and a failure recovery unit 430. The image obtaining unit 410 can acquire and save the metadata image of the scheduling information and the system state recorded at a certain moment on the master node, and the redo log obtaining unit 420 can acquire and save the redo log of all the operations of the master node after the recording time. The fault recovery unit 430 can invoke the metadata mirror and its corresponding redo log for fault recovery when the fault is recovered.

Preferably, the image acquisition unit 410 can perform the acquisition and save operation of the metadata mirror under the trigger of the master node, the device, and/or the external command. The image obtaining unit 410 can directly acquire and save the memory state of the master node at a certain moment as a metadata mirror. Further, the image obtaining unit 410 may store the metadata image according to the task group.

Preferably, the master node responds to the new request of the slave node after each operation thereof is recorded in the redo log and stored in the redo log and stored.

Preferably, the image obtaining unit 410 continuously acquires and saves the metadata mirror of the master node at a plurality of different times, and the redo log obtaining unit 420 continuously acquires and saves the redo logs respectively corresponding to the plurality of different moments. At this time, the fault recovery unit 430 calls the latest metadata mirror and its corresponding redo log for failure recovery when the fault is recovered, and the fault recovery unit 430, when the latest metadata mirror and/or its corresponding redo log is unavailable, The data of the latest time that the metadata mirror and its corresponding redo log are available can be called for failure recovery.

The distributed system and its failure recovery method, apparatus, product and storage medium according to the present invention have been described in detail above with reference to the accompanying drawings.

Furthermore, the method according to the invention may also be embodied as a computer program or computer program product comprising computer program code instructions for performing the various steps defined above in the above method of the invention.

Alternatively, the invention may be embodied as a computer program product comprising: a memory; a processor; and a computer program; wherein the computer program is stored in the memory and configured to perform the invention by the processor The above method.

FIG. 5 is a structural diagram of an apparatus for displaying a power amount according to an exemplary embodiment of the present invention.

As shown in FIG. 5, the embodiment provides a computer program product, including: at least one processor 51 and a memory 52. In FIG. 5, a processor 51 is taken as an example. The processor 51 and the memory 52 are connected by a bus 50. 52 stores instructions executable by at least one processor 51, the instructions being executed by at least one processor 51 to cause at least one processor 51 to perform the above described method of the present invention.

The related descriptions can be understood by referring to the related descriptions and effects corresponding to the steps in FIG. 2, and no further description is made here.

Alternatively, the present invention may be embodied as a non-transitory machine readable storage medium (or computer readable storage medium, or machine readable storage medium) having stored thereon executable code (or computer program, or computer instruction code) When the executable code (or computer program, or computer instruction code) is executed by a processor of an electronic device (or computing device, server, etc.), causing the processor to perform various steps of the above method in accordance with the present invention .

The various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both.

The flowchart and block diagrams in the Figures illustrate the architecture, functionality and operation of possible implementations of systems and methods in accordance with various embodiments of the present invention. In this regard, each block of the flowchart or block diagram can represent a module, a program segment, or a portion of code that includes one or more of the Executable instructions. It should also be noted that in some alternative implementations, the functions noted in the blocks may also occur in a different order than those illustrated in the drawings. For example, two consecutive blocks may be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowcharts, and combinations of blocks in the block diagrams and/or flowcharts, can be implemented in a dedicated hardware-based system that performs the specified function or operation. Or it can be implemented by a combination of dedicated hardware and computer instructions.

The embodiments of the present invention have been described above, and the foregoing description is illustrative, not limiting, and not limited to the disclosed embodiments. Numerous modifications and changes will be apparent to those skilled in the art without departing from the scope of the invention. The choice of terms used herein is intended to best explain the principles, practical applications, or improvements of the techniques in the various embodiments of the embodiments, or to enable those of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

A distributed system, comprising: a master node for scheduling tasks and managing system states; and a plurality of slave nodes for running scheduled tasks, wherein

One or more of the slave nodes and/or the master node acquires and saves a metadata image recorded with scheduling information and system status at a certain moment on the master node;

The master node acquires and saves a redo log recording all operations of the primary node after the moment;

The primary node invokes the metadata mirror and its corresponding redo log for failure recovery when the fault is recovered.
The distributed system of claim 1 wherein said one or more said slave nodes and/or said master node perform said metadata mirroring under the trigger of said master node and/or an external command Get and save operations.
The distributed system of claim 1 wherein said master node responds to said slave node request after each operation thereof is recorded in said redo log and stored.
The distributed system of claim 1 wherein one or more of said slave nodes and/or said master node continuously acquires and maintains metadata mirroring of said master node at a plurality of different times, and

The master node continuously acquires and saves redo logs respectively corresponding to the plurality of different moments.
The distributed system according to claim 4, wherein the master node invokes the latest metadata mirror and its corresponding redo log for failure recovery when the fault is recovered.
The distributed system of claim 4, wherein the calling master data mirror and its corresponding redo log are available when the latest metadata mirror and/or its corresponding redo log is unavailable. The data of the most recent moment is fault recovery.
The distributed system according to claim 1, wherein one or more of said slave nodes and/or said master node directly acquires and saves a memory state of said master node at a certain time as said metadata Mirroring.
The distributed system of claim 1 wherein said metadata mirroring is stored in accordance with a task grouping.
A fault recovery apparatus for a distributed system, characterized in that the distributed system includes a master node for scheduling tasks and managing system states, and a plurality of slave nodes for running tasks, the device being used in the master Failover when a node fails, and includes:

a mirroring obtaining unit, configured to acquire and save a metadata image recorded with scheduling information and a system state at a certain moment on the primary node;

a redo log obtaining unit, configured to acquire and save a redo log that records all operations of the primary node after the moment; and

The fault recovery unit is configured to invoke the metadata mirror and its corresponding redo log for fault recovery during fault recovery.
The apparatus according to claim 9, wherein said image acquisition unit performs said metadata mirror acquisition and save operation under the trigger of said master node, said device, and/or an external command.
The apparatus of claim 9, wherein the master node responds to the request of the slave node after each operation thereof is recorded in the redo log and stored in the redo log.
The apparatus according to claim 9, wherein said image acquisition unit continuously acquires and saves a metadata image of said master node at a plurality of different times, and

The redo log obtaining unit continuously acquires and saves redo logs respectively corresponding to the plurality of different moments.
The apparatus according to claim 12, wherein said failure recovery unit calls the latest metadata mirror and its corresponding redo log for failure recovery upon failure recovery.
The apparatus according to claim 12, wherein said failure recovery unit invokes a metadata mirror and its corresponding redo log when the latest metadata image and/or its corresponding redo log is unavailable. The data at the most recent time is fault recovery.
The apparatus according to claim 9, wherein the image acquisition unit directly acquires and saves a memory state of the master node at a certain moment as the metadata image.
The apparatus according to claim 9, wherein said image acquisition unit stores said metadata image in accordance with a task group.
A fault recovery method for a distributed system, characterized in that the distributed system includes a master node for scheduling tasks and managing system states, and a plurality of slave nodes for running tasks, the method being used in the master To recover from a failure when a node fails, the method includes:

Obtaining and saving a metadata image recording schedule information and system status at a certain moment;

Acquiring and saving a redo log of all scheduled operations after the moment; and

The metadata mirror and its corresponding redo log are called for failure recovery during failure recovery.
The method of claim 17 wherein:

Continuously acquire and save metadata images of the primary node at multiple different times, and

The redo logs respectively corresponding to the plurality of different moments are continuously acquired and saved.
The method of claim 18, wherein invoking the metadata mirror and its corresponding redo log for failure recovery upon failure recovery comprises:

Recalling the latest metadata mirror and its corresponding redo log for failure recovery when the fault is recovered;

When the latest metadata mirror and/or its corresponding redo log is unavailable, the data of the latest time that the metadata mirror and its corresponding redo log are available are called for failure recovery.
The method according to claim 17, wherein the memory state of the master node at a certain moment is directly acquired and saved as the metadata mirror.
The method of claim 17, wherein the obtaining and saving operations of the metadata mirroring are performed under the trigger of the master node and/or an external command.
The method of claim 17 wherein said master node responds to said slave node request after each operation thereof is recorded in said redo log and stored.
The method of claim 17 wherein said metadata mirroring is stored in accordance with a task grouping.
A computer program product, comprising:

Memory; processor; and computer program;

Wherein the computer program is stored in the memory and configured to perform the method of any one of claims 17 to 23 by the processor.
A computer readable storage medium, comprising: a program, when executed on a computer, causing a computer to perform the method of any one of claims 17 to 23.