WO2019020081A1 - Distributed system and fault recovery method and apparatus thereof, product, and storage medium - Google Patents

Distributed system and fault recovery method and apparatus thereof, product, and storage medium Download PDF

Info

Publication number
WO2019020081A1
WO2019020081A1 PCT/CN2018/097262 CN2018097262W WO2019020081A1 WO 2019020081 A1 WO2019020081 A1 WO 2019020081A1 CN 2018097262 W CN2018097262 W CN 2018097262W WO 2019020081 A1 WO2019020081 A1 WO 2019020081A1
Authority
WO
WIPO (PCT)
Prior art keywords
master node
metadata
redo log
node
distributed system
Prior art date
Application number
PCT/CN2018/097262
Other languages
French (fr)
Chinese (zh)
Inventor
褚建辉
卢申朋
刘东辉
王新栋
Original Assignee
广东神马搜索科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广东神马搜索科技有限公司 filed Critical 广东神马搜索科技有限公司
Publication of WO2019020081A1 publication Critical patent/WO2019020081A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1464Management of the backup or restore process for networked environments

Definitions

  • the present invention relates to the field of distributed technologies, and in particular, to a distributed system and a method, apparatus, and storage medium for the same.
  • FIG. 1 is a schematic diagram showing the structure of a distributed system employing a master-slave architecture.
  • the distributed system of the master-slave architecture is mostly composed of a master node and a plurality of slave nodes.
  • the master node usually has functions such as metadata storage and query, cluster node state management, decision making and task delivery.
  • the metadata managed by the master node is the more important data in the system. The loss of data on the node has a greater impact on the system.
  • the invention provides a distributed system and a fault recovery method, device, product and storage medium thereof, which acquires metadata mirroring of a master node at one or more moments, and records the operation of the master node in a redo log, When the primary node fails, the primary node can be quickly restored to the pre-failure state based on the previously recorded metadata mirroring and redo logs.
  • a distributed system comprising a master node for scheduling tasks and managing system states and a plurality of slave nodes for running scheduled tasks, wherein one or more slaves
  • the node and/or the master node acquires and saves a metadata image recorded with scheduling information and system status at a certain moment on the master node; the master node acquires and saves a redo log recording all operations of the master node after the moment; and the master node
  • the metadata mirror and its corresponding redo log are called for failure recovery when the fault is recovered.
  • the primary node can be quickly restored to the state before the failure, and the recovery efficiency can be improved compared with the manner of recording only the log files.
  • one or more slave nodes and/or master nodes perform metadata mirroring acquisition and save operations triggered by the master node and/or external commands. Therefore, different trigger modes can be set according to the characteristics of the distributed system to trigger the acquisition and save operation of the metadata mirror.
  • the master node responds to the slave node's request after each operation is recorded in the redo log and stored. This ensures that the redo log can fully record every operation of the primary node.
  • the one or more slave nodes and/or the master node continuously acquire and save the metadata mirror of the master node at a plurality of different moments, and the master node continuously acquires and saves the redo logs respectively corresponding to the plurality of different moments.
  • the master node can call the latest metadata mirror and its corresponding redo log for fault recovery when the fault is recovered, and call the metadata mirror and its corresponding when the latest metadata mirror and/or its corresponding redo log are unavailable.
  • the redo logs are available for recovery at the most recent time.
  • the fault tolerance rate at the time of failure recovery can be improved.
  • one or more slave nodes and/or master nodes directly acquire and save the memory state of the master node at a certain moment as a metadata mirror.
  • Metadata mirroring can be stored in groups of tasks. Thereby, the corresponding metadata mirror can be efficiently organized according to the grouping at the subsequent recovery.
  • a fault recovery apparatus for a distributed system, the distributed system including a master node for scheduling tasks and managing system states, and a plurality of slave nodes for running tasks, the device
  • the method is used for recovering the fault when the primary node fails, and includes: a mirroring acquiring unit, configured to acquire and save a metadata mirror that records scheduling information and system status at a certain moment on the primary node; and the redo log obtaining unit uses Obtaining and saving a redo log of all operations of the primary node after the record is recorded; and a fault recovery unit for invoking the metadata mirror and its corresponding redo log for failure recovery when the fault is recovered.
  • the image acquisition unit performs the acquisition and save operation of the metadata mirroring under the trigger of the master node, the device, and/or the external command.
  • the master node responds to the request of the slave node after each operation thereof is recorded in the redo log by the redo log obtaining unit and stored.
  • the image obtaining unit continuously acquires and saves the metadata mirror of the master node at a plurality of different times
  • the redo log obtaining unit continuously acquires and saves the redo logs respectively corresponding to the plurality of different moments.
  • the fault recovery unit calls the latest metadata mirror and its corresponding redo log for fault recovery when the fault is recovered.
  • the fault recovery unit calls the data of the latest time available for the metadata mirror and its corresponding redo log to perform fault recovery when the latest metadata mirror and/or its corresponding redo log is unavailable.
  • the image acquisition unit directly acquires and saves the memory state of the master node at a certain moment as a metadata image.
  • the image acquisition unit stores the metadata image according to the task group.
  • a method for recovering a fault of a distributed system comprising a master node for scheduling tasks and managing system states, and a plurality of slave nodes for running tasks
  • the method is configured to perform fault recovery when the primary node fails.
  • the method includes: acquiring and saving a metadata image of the scheduling information and the system state recorded at a certain moment; acquiring and saving the weight of all the scheduling operations after the recording has a time Do the log; and call the metadata mirror and its corresponding redo log for failback when the failure recovers.
  • the metadata mirroring of the master node at a plurality of different moments is continuously acquired and saved, and the redo logs respectively corresponding to the plurality of different moments are continuously acquired and saved.
  • invoking the metadata mirror and its corresponding redo log for fault recovery during fault recovery may include: calling the latest metadata mirror and its corresponding redo log for fault recovery during fault recovery; and in the latest element When data mirroring and/or its corresponding redo log is unavailable, the data of the latest time that the metadata mirror and its corresponding redo log are available are called for failure recovery.
  • the memory state of the master node at a certain moment can be directly obtained and saved as a metadata mirror.
  • the obtaining and saving operation of the metadata mirroring is performed under the trigger of the master node and/or an external command.
  • the master node responds to the request of the slave node after each operation thereof is recorded in the redo log and stored.
  • the metadata image is stored in accordance with a task grouping.
  • a computer program product comprising: a memory; a processor; and a computer program; wherein the computer program is stored in the memory and configured to be processed by the The method of the third aspect of the invention and any of its preferred aspects is performed.
  • a fifth aspect of the invention provides a computer readable storage medium comprising: a program, when executed on a computer, causing a computer to perform the method of the third aspect of the invention and any of the preferred aspects thereof.
  • the distributed system, the fault recovery method, the device, the product and the storage medium of the present invention obtain the metadata mirroring of the master node at one or more moments, and record the subsequent operations of the master node in the redo log, so that When the primary node fails, the primary node can be quickly restored to the pre-fault state based on the previously recorded metadata mirroring and redo logs.
  • FIG. 1 is a schematic diagram showing the architecture of a distributed system of a master-slave architecture.
  • FIG. 2 is a schematic flow chart showing a fault recovery method according to an embodiment of the present invention.
  • FIG. 3 is a diagram showing the continuous storage of a plurality of metadata mirrors and redo logs.
  • FIG. 4 is a schematic block diagram showing the structure of a failure recovery device according to an embodiment of the present invention.
  • FIG. 5 is a structural diagram of a computer program product according to an exemplary embodiment of the present invention.
  • the operation flow of the master node in the scheme is as follows: before the master node performs the operation, the operation may be recorded in the log file, and the operation may be performed after the recording succeeds, that is, the data in the memory may be updated based on the operation;
  • the recovery process is as follows: the log file is read, and the data in the memory is sequentially modified based on the operation of the master node recorded in the log file. This method of recovering log files only by recording write operations is simple, but the recovery process takes a very long time.
  • the inventor found that in the process of recording the log file of the operation of the master node, the image file of the memory data of the master node at a certain moment can be interspersed, and the image file can represent that the master node is corresponding.
  • the current state data at the moment so that when the master node fails, the latest image file and the operation recorded in the log file after the time corresponding to the called image file can be called, and the master node can be implemented according to the called data.
  • Recovery can significantly reduce the time required for recovery compared to just logging log files.
  • the present invention proposes a failure recovery scheme for a primary node in a distributed system, and the failure recovery scheme of the present invention can be implemented by the distributed system shown in FIG. 1.
  • the distributed system of the present invention may include a master node for scheduling tasks and managing system states and a plurality of slave nodes for running scheduled tasks. Both the master node and the slave node can be deployed in the server, and the master node can be deployed in a separate server different from the slave node, or can be deployed in the same server as one of the slave nodes. As a preferred embodiment, different nodes can be deployed in different servers.
  • the distributed system shown in FIG. 1 is composed of a master node and a plurality of slave nodes. It should be understood that the distributed system of the present invention may further include a plurality of master nodes, and may also include other nodes than the master node and the slave node. Devices such as backup master nodes, failover databases, and more.
  • FIG. 2 is a schematic flow chart showing a fault recovery method according to an embodiment of the present invention.
  • the method shown in FIG. 2 can be implemented by the distributed system shown in FIG. 1, and in particular, can be implemented by a master node in a distributed system.
  • step S210 the metadata image of the scheduling information and the system state recorded at a certain moment on the master node is acquired and saved.
  • the master node For a distributed system with a master-slave architecture, after the master node crashes, the entire distributed system is unavailable, so considering the importance of the master node, the master node usually does not directly run specific tasks, but is only responsible for maintaining distributed The operation of the system and the scheduling of tasks are assigned, and specific tasks can be performed by the slave nodes. That is to say, the primary node is mainly responsible for parsing the task request, allocating resources, and locating the target data or nodes according to the metadata, and the specific task is performed by the slave node specified by the master node.
  • the metadata is data for describing data
  • the metadata in the present invention refers specifically to data that the primary node is responsible for saving and managing.
  • the metadata may refer to data that records scheduling information and system status at a certain moment on the master node.
  • the metadata may be system related description data, system state data, current task scheduling and status data, etc.
  • the metadata may be a state describing user data. Data for information such as storage location.
  • the obtained metadata mirror of the master node at a certain time may be a mapping of the memory state of the master node at that moment, so that the memory state of the master node at a certain moment can be directly obtained and saved as a metadata mirror.
  • the metadata mirror of the master node at a certain moment can be obtained by means of Snapshot or dump (backup file system).
  • the operation of obtaining the metadata image may be performed by the master node, by one or more slave nodes, or by a backup master node in the distributed system.
  • the obtained metadata image can be stored persistently on a local disk or a distributed file system, for example, can be stored persistently in the failover database.
  • the master node may perform scheduling according to the packet concurrently when scheduling the task, and the obtained metadata mirror may be a metadata mirror under multiple groups, and therefore, the acquired metadata
  • the mirroring can be stored according to the task group, and the metadata mirrors belonging to the same task group are stored in the same directory, so that the corresponding metadata mirror can be efficiently organized according to the grouping in subsequent recovery.
  • step S220 the redo log in which all operations of the master node after the time is recorded may be acquired and saved by the master node.
  • the operations described herein may refer to operations performed by the primary node on metadata or operations performed by the primary node on its in-memory data.
  • the primary node For each operation performed by the primary node, it can be recorded in the redo log.
  • the operation information of the master node can be sequentially recorded in the redo log.
  • the operation For each operation that the primary node will perform, the operation can be performed by the primary node after the operation is recorded in the redo log and persisted. In this way, when the primary node fails during the execution of the operation, the operation can be resumed according to the data recorded in the redo log. Otherwise, if the re-recording is performed for an operation first, and the operation is interrupted during the execution of the operation or before the operation is recorded or saved, the operation cannot be resumed and can only be repeated.
  • the master node may first record the operation of delivering the target data to the slave node in the redo log, and successfully record and persist the save. After that, the master node sends the target data to the slave node in response to the request of the slave node.
  • the request for the slave node can be responded to the slave node's request after the master node's operation for the request is recorded in the redo log and stored (persistent storage).
  • step S230 the metadata mirror and its corresponding redo log are called for failure recovery when the fault is recovered.
  • metadata mirroring can be seen as a mapping of the memory state of the master node at a certain time, while redo logs record all operations of the master node. Therefore, when the primary node fails, the operation of the primary node may occur according to the metadata mirror acquired before the failure occurs and the operation of the primary node during the period before the failure of the primary node after the time corresponding to the metadata mirror recorded in the redo log. Fault recovery, restore the primary node to the state before the failure occurred.
  • redo log records in the file system for example, you can recover as follows: After the primary node restarts, first traverse the metadata mirror directory in the file system, find the most recent metadata mirror, load it into memory, and then start. The redo log after loading the latest metadata image and start replay, so after the loading is complete, the entire recovery process is complete.
  • a plurality of metadata mirrors corresponding to different time instants may be saved.
  • the acquisition operation of the metadata mirror may be performed periodically or in response to satisfying the predetermined trigger condition.
  • the above trigger condition may be, for example, a certain parameter satisfies a predetermined value, reaches a predetermined interval, or directly responds to an external trigger command.
  • the acquisition operation of the metadata mirror may be performed once every predetermined number of operations are recorded in the redo log, or the acquisition operation of the metadata mirror may be performed once every predetermined time.
  • FIG. 3 is a schematic diagram showing the principle of continuously saving a plurality of metadata mirror files and their corresponding redo logs.
  • the metadata mirror 1 of the master node at time t1 can be obtained first, and the operation of the master node between t1 and t2 can be recorded and stored in the redo log 1, and the metadata mirror of the master node can be acquired again at time t2.
  • the operation of the master node between t2-t3 can be recorded and stored in the redo log 2, and so on, and the metadata mirrors respectively corresponding to the times t1, t2, and t3, and the metadata corresponding to the different moments respectively can be obtained.
  • the master node can first call the latest metadata mirror (ie metadata image at time t3) and its corresponding redo log (the weight within t3-t4 segment) during fault recovery. Do log) for failure recovery. If the latest metadata mirroring and redo logs are not available, you can further call the new metadata mirror (that is, the metadata mirror at time t2) and the redo log (that is, the redo log in the t2-t3 segment). Recovery, and so on, can be pushed back until the available data files are available.
  • the fault tolerance rate at the time of failure recovery can be improved.
  • the solution of the present application can trigger the acquisition and storage of the metadata image (for example, save the state at time t3) under certain conditions or commands, and then start continuous recording of the redo log (ie, record t3). After all the operations). After the failure occurs at time t4, all the operations after t3 can be played back by restoring the state at time t3 so that the master node quickly returns to the state at time t4.
  • the metadata image 1 acquired at time t1 may contain some operations in redo log 1 after time t1. Therefore, when the master node fails at time t2, the metadata image 1 at time t1 and the corresponding redo are used. When log 1 is restored, it is likely that the state of the last restored primary node is inconsistent with the state before the recovery.
  • the time of the operation recorded in the redo log at this time can be recorded in real time, and the metadata mirroring at a certain moment is obtained.
  • the corresponding operation can be removed from the redo log to avoid the phenomenon that the acquired metadata mirror includes some operations recorded in the redo log, so that the metadata mirror can be corresponding to the redo log at the time. Strictly contrasted.
  • FIG. 4 is a block diagram showing the structure of a fault recovery apparatus according to an embodiment of the present invention.
  • the functional modules of the fault recovery device 400 may be implemented by hardware, software, or a combination of hardware and software that implements the principles of the present invention.
  • the functional blocks depicted in FIG. 4 can be combined or divided into sub-modules to implement the principles of the above described invention. Accordingly, the description herein may support any possible combination, or division, or further limitation of the functional modules described herein.
  • the fault recovery apparatus 400 shown in FIG. 4 can be used to implement the fault recovery method shown in FIG. 2, and only the functional modules that the fault recovery apparatus 400 can have and the operations that can be performed by the functional modules are briefly described. For details, please refer to the description above in conjunction with FIG. 2, and details are not described herein again. It should be noted that the fault recovery apparatus 400 may be the primary node itself or a backup primary node.
  • the fault recovery apparatus of the present invention may include a mirror acquisition unit 410, a redo log acquisition unit 420, and a failure recovery unit 430.
  • the image obtaining unit 410 can acquire and save the metadata image of the scheduling information and the system state recorded at a certain moment on the master node
  • the redo log obtaining unit 420 can acquire and save the redo log of all the operations of the master node after the recording time.
  • the fault recovery unit 430 can invoke the metadata mirror and its corresponding redo log for fault recovery when the fault is recovered.
  • the image acquisition unit 410 can perform the acquisition and save operation of the metadata mirror under the trigger of the master node, the device, and/or the external command.
  • the image obtaining unit 410 can directly acquire and save the memory state of the master node at a certain moment as a metadata mirror. Further, the image obtaining unit 410 may store the metadata image according to the task group.
  • the master node responds to the new request of the slave node after each operation thereof is recorded in the redo log and stored in the redo log and stored.
  • the image obtaining unit 410 continuously acquires and saves the metadata mirror of the master node at a plurality of different times
  • the redo log obtaining unit 420 continuously acquires and saves the redo logs respectively corresponding to the plurality of different moments.
  • the fault recovery unit 430 calls the latest metadata mirror and its corresponding redo log for failure recovery when the fault is recovered, and the fault recovery unit 430, when the latest metadata mirror and/or its corresponding redo log is unavailable, The data of the latest time that the metadata mirror and its corresponding redo log are available can be called for failure recovery.
  • the method according to the invention may also be embodied as a computer program or computer program product comprising computer program code instructions for performing the various steps defined above in the above method of the invention.
  • the invention may be embodied as a computer program product comprising: a memory; a processor; and a computer program; wherein the computer program is stored in the memory and configured to perform the invention by the processor The above method.
  • FIG. 5 is a structural diagram of an apparatus for displaying a power amount according to an exemplary embodiment of the present invention.
  • the embodiment provides a computer program product, including: at least one processor 51 and a memory 52.
  • a processor 51 is taken as an example.
  • the processor 51 and the memory 52 are connected by a bus 50.
  • 52 stores instructions executable by at least one processor 51, the instructions being executed by at least one processor 51 to cause at least one processor 51 to perform the above described method of the present invention.
  • the present invention may be embodied as a non-transitory machine readable storage medium (or computer readable storage medium, or machine readable storage medium) having stored thereon executable code (or computer program, or computer instruction code)
  • executable code or computer program, or computer instruction code
  • a processor of an electronic device or computing device, server, etc.
  • each block of the flowchart or block diagram can represent a module, a program segment, or a portion of code that includes one or more of the Executable instructions.
  • the functions noted in the blocks may also occur in a different order than those illustrated in the drawings. For example, two consecutive blocks may be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowcharts, and combinations of blocks in the block diagrams and/or flowcharts can be implemented in a dedicated hardware-based system that performs the specified function or operation. Or it can be implemented by a combination of dedicated hardware and computer instructions.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Retry When Errors Occur (AREA)

Abstract

The present invention discloses a distributed system and a fault recovery method and an apparatus thereof, a product, and a storage medium. The method comprises: a slave node and/or a master node obtains and stores a metadata mirror having a record of scheduling information and a system status of the master node at a certain time; the master node obtains and stores a redo log having a record of all operations of the master node after the certain time; and when a fault occurs in the master node, the master node invokes the metadata mirror and its corresponding redo log to execute fault recovery. When a fault occurs, the method enables a master node to quickly return to a pre-failure state by means of a previously recorded metadata mirror and redo log.

Description

分布式系统及其故障恢复方法、装置、产品和存储介质Distributed system and its fault recovery method, device, product and storage medium 技术领域Technical field
本发明涉及分布式技术领域,特别是涉及一种分布式系统及其故障恢复方法、装置、产品和存储介质。The present invention relates to the field of distributed technologies, and in particular, to a distributed system and a method, apparatus, and storage medium for the same.
背景技术Background technique
分布式系统是把多台机器有机的组合、连接起来,让其协同完成一项任务,例如计算任务、存储任务。其是建立在网络之上的软件系统。现有的分布式系统大多是主从架构,图1是示出了采用主从架构的分布式系统的结构示意图。如图1所示,主从架构的分布式系统大多由主节点(master)和多个从属节点(slave)构成。主节点作为分布式系统的中心调度节点,通常兼具元数据存储与查询、集群节点状态管理、决策制定与任务下发等功能,由于主节点管理的元数据是系统中较为重要的数据,主节点上的数据的丢失对系统的影响较大。A distributed system is a combination of multiple machines that are organically combined to perform a task, such as computing tasks and storage tasks. It is a software system built on top of the network. The existing distributed systems are mostly master-slave architectures. FIG. 1 is a schematic diagram showing the structure of a distributed system employing a master-slave architecture. As shown in FIG. 1, the distributed system of the master-slave architecture is mostly composed of a master node and a plurality of slave nodes. As the central scheduling node of the distributed system, the master node usually has functions such as metadata storage and query, cluster node state management, decision making and task delivery. The metadata managed by the master node is the more important data in the system. The loss of data on the node has a greater impact on the system.
因此,需要一种故障切换(failover)机制,使得当主节点遇到未知错误发生崩溃时,能够将主节点恢复到错误发生之前的状态,避免主节点数据的丢失。Therefore, a failover mechanism is needed, so that when the primary node encounters an unknown error and crashes, the primary node can be restored to the state before the error occurred, and the loss of the primary node data is avoided.
发明内容Summary of the invention
本发明提供了一种分布式系统及其故障恢复方法、装置、产品和存储介质,通过获取主节点在一个或多个时刻下的元数据镜像,并在重做日志中记录主节点的操作,使得在主节点发生故障时,可以根据之前记录的元数据镜像和重做日志将主节点快速恢复到故障前的状态。The invention provides a distributed system and a fault recovery method, device, product and storage medium thereof, which acquires metadata mirroring of a master node at one or more moments, and records the operation of the master node in a redo log, When the primary node fails, the primary node can be quickly restored to the pre-failure state based on the previously recorded metadata mirroring and redo logs.
根据本发明的第一个方面,提供了一种分布式系统,包括用于调度任务并管理系统状态的主节点和用于运行被调度的任务的多个从属节点,其中,一个或多个从属节点和/或主节点获取并保存记录有主节点上某一时刻的调度信息和系统状态的元数据镜像;主节点获取并保存记录有该时刻之后主节点所有操作的重做日志;以及主节点在故障恢复时调用元数据镜像及其对应 的重做日志进行故障恢复。According to a first aspect of the present invention, there is provided a distributed system comprising a master node for scheduling tasks and managing system states and a plurality of slave nodes for running scheduled tasks, wherein one or more slaves The node and/or the master node acquires and saves a metadata image recorded with scheduling information and system status at a certain moment on the master node; the master node acquires and saves a redo log recording all operations of the master node after the moment; and the master node The metadata mirror and its corresponding redo log are called for failure recovery when the fault is recovered.
由此,根据之前记录的元数据镜像和重做日志可以快速将主节点恢复到故障前的状态,与仅通过记录日志文件的方式相比可以提高恢复效率。Thus, according to the previously recorded metadata mirroring and redo logs, the primary node can be quickly restored to the state before the failure, and the recovery efficiency can be improved compared with the manner of recording only the log files.
优选地,一个或多个从属节点和/或主节点在主节点和/或外部命令的触发下进行元数据镜像的获取和保存操作。由此,可以根据分布式系统的特性,设置不同的触发方式来触发元数据镜像的获取和保存操作。Preferably, one or more slave nodes and/or master nodes perform metadata mirroring acquisition and save operations triggered by the master node and/or external commands. Therefore, different trigger modes can be set according to the characteristics of the distributed system to trigger the acquisition and save operation of the metadata mirror.
优选地,主节点在其每一次操作被记录在重做日志内并被存储之后才响应从属节点的请求。由此确保重做日志能够完整记录主节点的每一次操作。Preferably, the master node responds to the slave node's request after each operation is recorded in the redo log and stored. This ensures that the redo log can fully record every operation of the primary node.
优选地,一个或多个从属节点和/或主节点持续获取并保存主节点在多个不同时刻的元数据镜像,并且主节点持续获取并保存分别对应于多个不同时刻的重做日志。主节点可以在故障恢复时调用最新的元数据镜像及其对应的重做日志进行故障恢复,而当最新的元数据镜像和/或其对应的重做日志不可用时,调用元数据镜像及其对应的重做日志都可用的最近时刻的数据进行故障恢复。由此,通过保存多份不同时刻的内存镜像和对应的重做日志,可以提高故障恢复时的容错率。Preferably, the one or more slave nodes and/or the master node continuously acquire and save the metadata mirror of the master node at a plurality of different moments, and the master node continuously acquires and saves the redo logs respectively corresponding to the plurality of different moments. The master node can call the latest metadata mirror and its corresponding redo log for fault recovery when the fault is recovered, and call the metadata mirror and its corresponding when the latest metadata mirror and/or its corresponding redo log are unavailable. The redo logs are available for recovery at the most recent time. Thus, by storing a plurality of memory images at different times and corresponding redo logs, the fault tolerance rate at the time of failure recovery can be improved.
优选地,一个或多个从属节点和/或主节点直接获取并保存主节点在某一时刻的内存状态作为元数据镜像。元数据镜像可以是按照任务分组进行存储的。由此,在后续恢复时可以根据分组高效地组织对应的元数据镜像。Preferably, one or more slave nodes and/or master nodes directly acquire and save the memory state of the master node at a certain moment as a metadata mirror. Metadata mirroring can be stored in groups of tasks. Thereby, the corresponding metadata mirror can be efficiently organized according to the grouping at the subsequent recovery.
根据本发明的第二个方面,还提供了一种分布式系统的故障恢复装置,分布式系统包括用于调度任务并管理系统状态的主节点和用于运行任务的多个从属节点,该装置用于在主节点发生故障时进行故障恢复,并且包括:镜像获取单元,用于获取并保存记录有主节点上某一时刻的调度信息和系统状态的元数据镜像;重做日志获取单元,用于获取并保存记录有时刻之后主节点所有操作的重做日志;以及故障恢复单元,用于在故障恢复时调用元数据镜像及其对应的重做日志进行故障恢复。According to a second aspect of the present invention, there is also provided a fault recovery apparatus for a distributed system, the distributed system including a master node for scheduling tasks and managing system states, and a plurality of slave nodes for running tasks, the device The method is used for recovering the fault when the primary node fails, and includes: a mirroring acquiring unit, configured to acquire and save a metadata mirror that records scheduling information and system status at a certain moment on the primary node; and the redo log obtaining unit uses Obtaining and saving a redo log of all operations of the primary node after the record is recorded; and a fault recovery unit for invoking the metadata mirror and its corresponding redo log for failure recovery when the fault is recovered.
优选地,镜像获取单元在主节点、装置和/或外部命令的触发下进行元数据镜像的获取和保存操作。Preferably, the image acquisition unit performs the acquisition and save operation of the metadata mirroring under the trigger of the master node, the device, and/or the external command.
优选地,主节点在其每一次操作被重做日志获取单元记录在重做日志内并存储之后才响应从属节点的请求。Preferably, the master node responds to the request of the slave node after each operation thereof is recorded in the redo log by the redo log obtaining unit and stored.
优选地,镜像获取单元持续获取并保存主节点在多个不同时刻的元数据 镜像,并且重做日志获取单元持续获取并保存分别对应于多个不同时刻的重做日志。Preferably, the image obtaining unit continuously acquires and saves the metadata mirror of the master node at a plurality of different times, and the redo log obtaining unit continuously acquires and saves the redo logs respectively corresponding to the plurality of different moments.
优选地,故障恢复单元在故障恢复时调用最新的元数据镜像及其对应的重做日志进行故障恢复。Preferably, the fault recovery unit calls the latest metadata mirror and its corresponding redo log for fault recovery when the fault is recovered.
优选地,故障恢复单元在最新的元数据镜像和/或其对应的重做日志不可用时,调用元数据镜像及其对应的重做日志都可用的最近时刻的数据进行故障恢复。Preferably, the fault recovery unit calls the data of the latest time available for the metadata mirror and its corresponding redo log to perform fault recovery when the latest metadata mirror and/or its corresponding redo log is unavailable.
优选地,镜像获取单元直接获取并保存主节点在某一时刻的内存状态作为元数据镜像。Preferably, the image acquisition unit directly acquires and saves the memory state of the master node at a certain moment as a metadata image.
优选地,镜像获取单元按照任务分组对元数据镜像进行存储。Preferably, the image acquisition unit stores the metadata image according to the task group.
根据本发明的第三个个方面,还提供了一种分布式系统的故障恢复方法,分布式系统包括用于调度任务并管理系统状态的主节点和用于运行任务的多个从属节点,该方法用于在所述主节点发生故障时进行故障恢复,该方法包括:获取并保存记录有某一时刻的调度信息和系统状态的元数据镜像;获取并保存记录有时刻之后所有调度操作的重做日志;以及在故障恢复时调用元数据镜像及其对应的重做日志进行故障恢复。According to a third aspect of the present invention, there is also provided a method for recovering a fault of a distributed system, the distributed system comprising a master node for scheduling tasks and managing system states, and a plurality of slave nodes for running tasks, The method is configured to perform fault recovery when the primary node fails. The method includes: acquiring and saving a metadata image of the scheduling information and the system state recorded at a certain moment; acquiring and saving the weight of all the scheduling operations after the recording has a time Do the log; and call the metadata mirror and its corresponding redo log for failback when the failure recovers.
优选地,持续获取并保存所述主节点在多个不同时刻的元数据镜像,并且持续获取并保存分别对应于所述多个不同时刻的重做日志。Preferably, the metadata mirroring of the master node at a plurality of different moments is continuously acquired and saved, and the redo logs respectively corresponding to the plurality of different moments are continuously acquired and saved.
优选地,在故障恢复时调用元数据镜像及其对应的重做日志进行故障恢复可以包括:在故障恢复时调用最新的元数据镜像及其对应的重做日志进行故障恢复;以及在最新的元数据镜像和/或其对应的重做日志不可用时,调用元数据镜像及其对应的重做日志都可用的最近时刻的数据进行故障恢复。Preferably, invoking the metadata mirror and its corresponding redo log for fault recovery during fault recovery may include: calling the latest metadata mirror and its corresponding redo log for fault recovery during fault recovery; and in the latest element When data mirroring and/or its corresponding redo log is unavailable, the data of the latest time that the metadata mirror and its corresponding redo log are available are called for failure recovery.
优选地,可以直接获取并保存主节点在某一时刻的内存状态作为元数据镜像。Preferably, the memory state of the master node at a certain moment can be directly obtained and saved as a metadata mirror.
优选地,在所述主节点、和/或外部命令的触发下进行所述元数据镜像的获取和保存操作。Preferably, the obtaining and saving operation of the metadata mirroring is performed under the trigger of the master node and/or an external command.
优选地,所述主节点在其每一次操作被记录在所述重做日志内并存储之后才响应所述从属节点的请求。Preferably, the master node responds to the request of the slave node after each operation thereof is recorded in the redo log and stored.
优选地,所述元数据镜像是按照任务分组进行存储的。Preferably, the metadata image is stored in accordance with a task grouping.
根据本发明的第四个方面,还提供了一种计算机程序产品,包括:存储 器;处理器;以及计算机程序;其中,所述计算机程序存储在所述存储器中,并被配置为由所述处理器执行本发明第三方面及其任一优选地方案所述的方法。According to a fourth aspect of the present invention, a computer program product, comprising: a memory; a processor; and a computer program; wherein the computer program is stored in the memory and configured to be processed by the The method of the third aspect of the invention and any of its preferred aspects is performed.
本发明的第五个方面提供一种计算机可读存储介质,包括:程序,当其在计算机上运行时,使得计算机执行本发明第三方面及其任一优选地方案所述的方法。A fifth aspect of the invention provides a computer readable storage medium comprising: a program, when executed on a computer, causing a computer to perform the method of the third aspect of the invention and any of the preferred aspects thereof.
本发明的分布式系统及其故障恢复方法、装置、产品和存储介质,通过获取主节点在一个或多个时刻下的元数据镜像,并在重做日志中记录主节点的后续操作,使得在主节点发生故障时,可以根据之前记录的元数据镜像和重做日志将主节点快速恢复到故障前的状态。The distributed system, the fault recovery method, the device, the product and the storage medium of the present invention obtain the metadata mirroring of the master node at one or more moments, and record the subsequent operations of the master node in the redo log, so that When the primary node fails, the primary node can be quickly restored to the pre-fault state based on the previously recorded metadata mirroring and redo logs.
附图说明DRAWINGS
通过结合附图对本公开示例性实施方式进行更详细的描述,本公开的上述以及其它目的、特征和优势将变得更加明显,其中,在本公开示例性实施方式中,相同的参考标号通常代表相同部件。The above and other objects, features, and advantages of the present invention will become more apparent from the aspects of the embodiments of the invention. The same parts.
图1是示出了主从架构的分布式系统的架构示意图。FIG. 1 is a schematic diagram showing the architecture of a distributed system of a master-slave architecture.
图2是示出了根据本发明一实施例的故障恢复方法的示意性流程图。FIG. 2 is a schematic flow chart showing a fault recovery method according to an embodiment of the present invention.
图3是示出了连续保存多个元数据镜像以及重做日志的示意图。FIG. 3 is a diagram showing the continuous storage of a plurality of metadata mirrors and redo logs.
图4是示出了根据本发明一实施例的故障恢复装置的结构的示意性方框图;4 is a schematic block diagram showing the structure of a failure recovery device according to an embodiment of the present invention;
图5为本发明一示例性实施例示出的计算机程序产品的结构图。FIG. 5 is a structural diagram of a computer program product according to an exemplary embodiment of the present invention.
具体实施方式Detailed ways
下面将参照附图更详细地描述本公开的优选实施方式。虽然附图中显示了本公开的优选实施方式,然而应该理解,可以以各种形式实现本公开而不应被这里阐述的实施方式所限制。相反,提供这些实施方式是为了使本公开更加透彻和完整,并且能够将本公开的范围完整地传达给本领域的技术人员。Preferred embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While the preferred embodiment of the present invention has been shown in the drawings, it is understood that Rather, these embodiments are provided so that this disclosure will be thorough and complete.
对于图1所示的主从架构的分布式系统,由于主节点存储了系统正常运行和调度所必须的数据,例如系统状态数据和当前调度数据,因此其数 据的丢失对系统的影响极大。因此,需要一种故障恢复机制,使得当主节点遇到未知错误时,可以将主节点恢复到一个稳定可靠的状态。针对于此,可以记录主节点所有操作的日志文件,日志文件可以是持久化地存储在磁盘上。那么一旦主节点发生故障,即使丢失主节点的内存中所有数据,当下一次启动时,通过复现(replay)已经记录的日志文件,依然可以使主节点恢复到故障之前的状态。For the distributed system of the master-slave architecture shown in Figure 1, since the master node stores the data necessary for the normal operation and scheduling of the system, such as system state data and current scheduling data, the loss of its data has a great impact on the system. Therefore, a failure recovery mechanism is needed so that when the primary node encounters an unknown error, the primary node can be restored to a stable and reliable state. For this, log files for all operations of the primary node can be recorded, and the log files can be stored persistently on the disk. Then, once the primary node fails, even if all the data in the memory of the primary node is lost, when the next startup, by replaying the recorded log files, the primary node can still be restored to the state before the failure.
该方案下主节点的操作流程如下:主节点每次执行操作前,可以将该操作记录到日志文件中,记录成功后再执行该操作,即可以基于该操作更新内存中的数据;发生故障时的恢复流程如下:读取日志文件,基于日志文件中记录的主节点的操作依次修改内存中的数据。这种仅通过记录写操作的日志文件的恢复方式实现简单,但其恢复流程所需时间极长。The operation flow of the master node in the scheme is as follows: before the master node performs the operation, the operation may be recorded in the log file, and the operation may be performed after the recording succeeds, that is, the data in the memory may be updated based on the operation; The recovery process is as follows: the log file is read, and the data in the memory is sequentially modified based on the operation of the master node recorded in the log file. This method of recovering log files only by recording write operations is simple, but the recovery process takes a very long time.
为此,发明人在深入研究后发现,在记录主节点的操作的日志文件的过程中,可以穿插地获取主节点在某一时刻下的内存数据的镜像文件,镜像文件可以表征主节点在对应时刻下的当前状态数据,这样在主节点发生故障时,可以调用最近的镜像文件以及日志文件中在所调用的镜像文件所对应的时刻之后记录的操作,根据调用的数据就可以实现主节点的恢复,与仅通过记录日志文件的方式相比可以大幅缩短恢复所需时间。Therefore, after in-depth research, the inventor found that in the process of recording the log file of the operation of the master node, the image file of the memory data of the master node at a certain moment can be interspersed, and the image file can represent that the master node is corresponding. The current state data at the moment, so that when the master node fails, the latest image file and the operation recorded in the log file after the time corresponding to the called image file can be called, and the master node can be implemented according to the called data. Recovery can significantly reduce the time required for recovery compared to just logging log files.
基于上述构思,本发明提出了一种针对分布式系统中的主节点的故障恢复方案,本发明的故障恢复方案可以由图1所示的分布式系统实现。如图1所示,本发明的分布式系统可以包括用于调度任务并管理系统状态的主节点和用于运行被调度的任务的多个从属节点。主节点和从属节点均可以部署在服务器中,并且主节点可以部署在不同于从属节点的一个独立的服务器中,也可以和其中一个从属节点部署在同一个服务器中。作为优选实施例,不同的节点可以部署在不同的服务器中。图1示出的分布式系统由一个主节点和多个从属节点构成,应该知道,本发明的分布式系统还可以包括多个主节点,并且还可以包括除了主节点、从属节点之外的其它装置,例如备份主节点、故障恢复数据库等等。Based on the above concept, the present invention proposes a failure recovery scheme for a primary node in a distributed system, and the failure recovery scheme of the present invention can be implemented by the distributed system shown in FIG. 1. As shown in FIG. 1, the distributed system of the present invention may include a master node for scheduling tasks and managing system states and a plurality of slave nodes for running scheduled tasks. Both the master node and the slave node can be deployed in the server, and the master node can be deployed in a separate server different from the slave node, or can be deployed in the same server as one of the slave nodes. As a preferred embodiment, different nodes can be deployed in different servers. The distributed system shown in FIG. 1 is composed of a master node and a plurality of slave nodes. It should be understood that the distributed system of the present invention may further include a plurality of master nodes, and may also include other nodes than the master node and the slave node. Devices such as backup master nodes, failover databases, and more.
下面就本发明的分布式系统实现故障恢复方案的具体流程进行详细说明。图2是示出了根据本发明一实施例的故障恢复方法的示意性流程图。其中,图2所示的方法可以由图1所示的分布式系统实现,具体地,可以 由分布式系统中的主节点实现。The specific process of implementing the fault recovery scheme of the distributed system of the present invention is described in detail below. FIG. 2 is a schematic flow chart showing a fault recovery method according to an embodiment of the present invention. The method shown in FIG. 2 can be implemented by the distributed system shown in FIG. 1, and in particular, can be implemented by a master node in a distributed system.
参见图2,在步骤S210,获取并保存记录有主节点上某一时刻的调度信息和系统状态的元数据镜像。Referring to FIG. 2, in step S210, the metadata image of the scheduling information and the system state recorded at a certain moment on the master node is acquired and saved.
对于主从架构的分布式系统来说,主节点崩溃之后,会导致整个分布式系统不可用,因此考虑到主节点的重要性,主节点通常不直接运行具体任务,而是仅负责维持分布式系统的运行以及任务的调度分配,具体任务可由从属节点执行。也就是说,主节点主要负责解析任务请求,分配资源,根据元数据定位目标数据或节点,具体任务由主节点指定的从属节点执行。其中,元数据是用于描述数据的数据,本发明中的元数据特指主节点负责保存和管理的数据。由于主节点用于调度任务并管理系统状态,因此,元数据可以是指记录主节点上某一时刻的调度信息和系统状态的数据。例如对于Hadoop分布式系统而言,元数据可以是系统相关描述数据、系统状态数据、当前任务调度和状态数据等等,再例如对于分布式存储系统而言,元数据可以是描述用户数据的状态信息(如存储位置)的数据。For a distributed system with a master-slave architecture, after the master node crashes, the entire distributed system is unavailable, so considering the importance of the master node, the master node usually does not directly run specific tasks, but is only responsible for maintaining distributed The operation of the system and the scheduling of tasks are assigned, and specific tasks can be performed by the slave nodes. That is to say, the primary node is mainly responsible for parsing the task request, allocating resources, and locating the target data or nodes according to the metadata, and the specific task is performed by the slave node specified by the master node. Wherein, the metadata is data for describing data, and the metadata in the present invention refers specifically to data that the primary node is responsible for saving and managing. Since the master node is used to schedule tasks and manage system state, the metadata may refer to data that records scheduling information and system status at a certain moment on the master node. For example, for a Hadoop distributed system, the metadata may be system related description data, system state data, current task scheduling and status data, etc., and for example, for a distributed storage system, the metadata may be a state describing user data. Data for information such as storage location.
获取到的主节点在某一时刻的元数据镜像可以是主节点在该时刻的内存状态的一个映射,因此可以直接获取并保存主节点在某一时刻的内存状态作为元数据镜像。具体实现上,可以通过Snapshot(磁盘快照)、dump(备份文件系统)等方式获取主节点在某一时刻的元数据镜像。The obtained metadata mirror of the master node at a certain time may be a mapping of the memory state of the master node at that moment, so that the memory state of the master node at a certain moment can be directly obtained and saved as a metadata mirror. In a specific implementation, the metadata mirror of the master node at a certain moment can be obtained by means of Snapshot or dump (backup file system).
获取元数据镜像的操作可以由主节点执行,也可以由一个或多个从属节点执行,还可以由分布式系统中的备份主节点执行。所获取的元数据镜像可以持久化地存储在本地磁盘或分布式文件系统中,例如可以持久化地存储在故障恢复数据库中。The operation of obtaining the metadata image may be performed by the master node, by one or more slave nodes, or by a backup master node in the distributed system. The obtained metadata image can be stored persistently on a local disk or a distributed file system, for example, can be stored persistently in the failover database.
作为本发明的一个可选实施例,主节点在调度任务时可以按照分组并发进行调度,此时所获取的元数据镜像可以是多个分组下的元数据镜像,因此,对于所获取的元数据镜像可以按照任务分组进行存储,将属于同一任务分组的元数据镜像存储在同一目录下,由此在后续恢复时可以根据分组高效地组织对应的元数据镜像。As an optional embodiment of the present invention, the master node may perform scheduling according to the packet concurrently when scheduling the task, and the obtained metadata mirror may be a metadata mirror under multiple groups, and therefore, the acquired metadata The mirroring can be stored according to the task group, and the metadata mirrors belonging to the same task group are stored in the same directory, so that the corresponding metadata mirror can be efficiently organized according to the grouping in subsequent recovery.
在步骤S220,可以由主节点获取并保存记录有所述时刻之后主节点所有操作的重做日志。此处述及的操作可以是指主节点对元数据执行的操作,或者是主节点对其内存数据执行的操作。In step S220, the redo log in which all operations of the master node after the time is recorded may be acquired and saved by the master node. The operations described herein may refer to operations performed by the primary node on metadata or operations performed by the primary node on its in-memory data.
对于主节点执行的每个操作,可以将其记录在重做日志(redo log)中。重做日志中可以顺序地记录有主节点的操作信息。对于主节点将要执行的每个操作,可以在该操作记录在重做日志中并持久化保存后,才由主节点执行该操作。如此使得在该操作执行过程中主节点出错时,可以根据重做日志中记录的数据恢复该操作。否则如果对于某一操作先执行再记录,在该操作执行过程中或者该操作记录、保存前出错时,则无法恢复这一操作,只能重新来过。For each operation performed by the primary node, it can be recorded in the redo log. The operation information of the master node can be sequentially recorded in the redo log. For each operation that the primary node will perform, the operation can be performed by the primary node after the operation is recorded in the redo log and persisted. In this way, when the primary node fails during the execution of the operation, the operation can be resumed according to the data recorded in the redo log. Otherwise, if the re-recording is performed for an operation first, and the operation is interrupted during the execution of the operation or before the operation is recorded or saved, the operation cannot be resumed and can only be repeated.
例如,在从属节点向主节点请求任务时(如计算任务、存储任务),主节点可以首先将向从属节点下发目标数据的这一操作记录在重做日志中,在记录并持久化保存成功后,主节点才响应于从属节点的请求,将目标数据发送给从属节点。换句话说,对于从属节点的请求,可以在主节点针对该请求的操作记录在重做日志内并被存储(持久化存储)之后,才响应从属节点的请求。For example, when the slave node requests a task from the master node (such as a computing task or a storage task), the master node may first record the operation of delivering the target data to the slave node in the redo log, and successfully record and persist the save. After that, the master node sends the target data to the slave node in response to the request of the slave node. In other words, the request for the slave node can be responded to the slave node's request after the master node's operation for the request is recorded in the redo log and stored (persistent storage).
在步骤S230,在故障恢复时调用元数据镜像及其对应的重做日志进行故障恢复。In step S230, the metadata mirror and its corresponding redo log are called for failure recovery when the fault is recovered.
如上文所述,元数据镜像可以视为主节点在某一时刻的内存状态的映射,而重做日志记录着主节点的所有操作。因此,在主节点出现故障时,可以根据故障发生前所获取的元数据镜像以及重做日志中记录的在元数据镜像对应的时刻之后主节点故障发生之前这段时间内主节点的操作,进行故障恢复,将主节点恢复到故障发生前的状态。以重做日志记录在文件系统为例,可以按照如下方式恢复:主节点重新启动后,首先遍历文件系统中的元数据镜像目录,找到最近的一次元数据镜像,将其加载到内存,然后开始加载最新元数据镜像之后的重做日志,并开始重放(replay),如此在加载完成之后,整个恢复过程就完成了。As mentioned above, metadata mirroring can be seen as a mapping of the memory state of the master node at a certain time, while redo logs record all operations of the master node. Therefore, when the primary node fails, the operation of the primary node may occur according to the metadata mirror acquired before the failure occurs and the operation of the primary node during the period before the failure of the primary node after the time corresponding to the metadata mirror recorded in the redo log. Fault recovery, restore the primary node to the state before the failure occurred. To redo log records in the file system, for example, you can recover as follows: After the primary node restarts, first traverse the metadata mirror directory in the file system, find the most recent metadata mirror, load it into memory, and then start. The redo log after loading the latest metadata image and start replay, so after the loading is complete, the entire recovery process is complete.
作为本发明的一个可选实施例,在保存主节点的元数据镜像时,可以保存多个对应于不同时刻的元数据镜像。在记录重做日志的过程中,可以周期性地或响应于满足预定的触发条件,执行一次元数据镜像的获取操作。上述触发条件可以是例如某个参数满足预定值,到达预定间隔,或是直接响应于外部的触发命令。例如,可以是在重做日志中每记录预定数量个操作,就执行一次元数据镜像的获取操作,也可以是每隔预定时间执行 一次元数据镜像的获取操作等等。As an optional embodiment of the present invention, when saving the metadata mirror of the primary node, a plurality of metadata mirrors corresponding to different time instants may be saved. In the process of recording the redo log, the acquisition operation of the metadata mirror may be performed periodically or in response to satisfying the predetermined trigger condition. The above trigger condition may be, for example, a certain parameter satisfies a predetermined value, reaches a predetermined interval, or directly responds to an external trigger command. For example, the acquisition operation of the metadata mirror may be performed once every predetermined number of operations are recorded in the redo log, or the acquisition operation of the metadata mirror may be performed once every predetermined time.
进一步地,在将主节点的操作记录在重做日志中时,可以持续获取分别对应于多个不同时刻(即多个元数据镜像)的重做日志。图3是示出了持续保存多个元数据镜像文件及其对应的重做日志的原理示意图。Further, when the operation of the master node is recorded in the redo log, the redo logs respectively corresponding to a plurality of different time instants (ie, multiple metadata mirrors) may be continuously acquired. FIG. 3 is a schematic diagram showing the principle of continuously saving a plurality of metadata mirror files and their corresponding redo logs.
参见图3,首先可以获取t1时刻主节点的元数据镜像1,主节点在t1-t2之间的操作可以记录保存在重做日志1中,在t2时刻可以再次获取主节点的元数据镜像2,主节点在t2-t3之间的操作可以记录保存在重做日志2中,以此类推,可以得到分别对应于t1、t2、t3时刻的元数据镜像,以及分别对应于不同时刻的元数据镜像的重做日志。Referring to FIG. 3, the metadata mirror 1 of the master node at time t1 can be obtained first, and the operation of the master node between t1 and t2 can be recorded and stored in the redo log 1, and the metadata mirror of the master node can be acquired again at time t2. The operation of the master node between t2-t3 can be recorded and stored in the redo log 2, and so on, and the metadata mirrors respectively corresponding to the times t1, t2, and t3, and the metadata corresponding to the different moments respectively can be obtained. Mirrored redo logs.
由此,假设主节点在t4时刻发生崩溃,在故障恢复时主节点可以首先调用最新的元数据镜像(即t3时刻的元数据镜像)及其对应的重做日志(t3-t4段内的重做日志)进行故障恢复。假如最新的元数据镜像和重做日志不可用,则可以进一步调用次新的元数据镜(即t2时刻的元数据镜像)和重做日志(即t2-t3段内的重做日志)进行故障恢复,以此类推,可以通过不断回推直到获取可用的数据文件。由此,通过保存多份不同时刻的内存镜像和对应的重做日志,可以提高故障恢复时的容错率。Therefore, assuming that the master node crashes at time t4, the master node can first call the latest metadata mirror (ie metadata image at time t3) and its corresponding redo log (the weight within t3-t4 segment) during fault recovery. Do log) for failure recovery. If the latest metadata mirroring and redo logs are not available, you can further call the new metadata mirror (that is, the metadata mirror at time t2) and the redo log (that is, the redo log in the t2-t3 segment). Recovery, and so on, can be pushed back until the available data files are available. Thus, by storing a plurality of memory images at different times and corresponding redo logs, the fault tolerance rate at the time of failure recovery can be improved.
换句话说,本申请的方案能够以一定的条件或是命令触发对元数据镜像的获取和存储(例如,保存t3时刻的状态),随即便启动对重做日志的持续记录(即,记录t3之后的所有操作)。在t4时刻发生故障之后,可以通过恢复t3时刻的状态再回放t3之后的所有操作使得主节点快速回到t4时刻的状态。In other words, the solution of the present application can trigger the acquisition and storage of the metadata image (for example, save the state at time t3) under certain conditions or commands, and then start continuous recording of the redo log (ie, record t3). After all the operations). After the failure occurs at time t4, all the operations after t3 can be played back by restoring the state at time t3 so that the master node quickly returns to the state at time t4.
在获取主节点在某一时刻的元数据镜像时,例如如图3所示,在t1时刻获取元数据镜像1时,往往不会停止主节点的服务,而获取元数据镜像1需要一定的时间,因此t1时刻所获取的元数据镜像1很可能包含了t1时刻之后重做日志1中的一些操作,因此在t2时刻主节点发生故障时,使用t1时刻的元数据镜像1以及对应的重做日志1进行恢复时,很可能最后恢复的主节点的状态与恢复前的状态不一致。When acquiring the metadata mirror of the master node at a certain moment, for example, as shown in FIG. 3, when the metadata mirror 1 is acquired at time t1, the service of the master node is not stopped, and it takes a certain time to obtain the metadata mirror 1 Therefore, the metadata image 1 acquired at time t1 may contain some operations in redo log 1 after time t1. Therefore, when the master node fails at time t2, the metadata image 1 at time t1 and the corresponding redo are used. When log 1 is restored, it is likely that the state of the last restored primary node is inconsistent with the state before the recovery.
因此,作为本发明的一个可选实施例,在获取某一时刻的元数据镜像的过程中,可以实时地记录此时重做日志中记录的操作的时间,在某一时刻的元数据镜像获取完毕后,可以从重做日志中去除相应的操作,以避免 获取的元数据镜像包括此后重做日志中所记录的某些操作的现象,从而使得元数据镜像能够和其对应的重做日志在时间上严格对照。Therefore, as an optional embodiment of the present invention, in the process of acquiring the metadata mirror at a certain moment, the time of the operation recorded in the redo log at this time can be recorded in real time, and the metadata mirroring at a certain moment is obtained. After the completion, the corresponding operation can be removed from the redo log to avoid the phenomenon that the acquired metadata mirror includes some operations recorded in the redo log, so that the metadata mirror can be corresponding to the redo log at the time. Strictly contrasted.
至此已经结合图2-3详细描述本发明的故障恢复方法。另外,本发明的故障恢复方案还可以由一种故障恢复装置实现。图4示出了根据本发明一个实施例的故障恢复装置的结构框图。其中,故障恢复装置400的功能模块可以由实现本发明原理的硬件、软件或硬件和软件的结合来实现。本领域技术人员可以理解的是,图4所描述的功能模块可以组合起来或者划分成子模块,从而实现上述发明的原理。因此,本文的描述可以支持对本文描述的功能模块的任何可能的组合、或者划分、或者更进一步的限定。The failure recovery method of the present invention has been described in detail so far with reference to Figs. 2-3. In addition, the fault recovery scheme of the present invention can also be implemented by a fault recovery device. 4 is a block diagram showing the structure of a fault recovery apparatus according to an embodiment of the present invention. The functional modules of the fault recovery device 400 may be implemented by hardware, software, or a combination of hardware and software that implements the principles of the present invention. Those skilled in the art will appreciate that the functional blocks depicted in FIG. 4 can be combined or divided into sub-modules to implement the principles of the above described invention. Accordingly, the description herein may support any possible combination, or division, or further limitation of the functional modules described herein.
图4所示的故障恢复装置400可以用来实现图2所示的故障恢复方法,下面仅就故障恢复装置400可以具有的功能模块以及各功能模块可以执行的操作做简要说明,对于其中涉及的细节部分可以参见上文结合图2的描述,这里不再赘述。需要说明的是,故障恢复装置400可以是主节点本身,也可以是备份主节点。The fault recovery apparatus 400 shown in FIG. 4 can be used to implement the fault recovery method shown in FIG. 2, and only the functional modules that the fault recovery apparatus 400 can have and the operations that can be performed by the functional modules are briefly described. For details, please refer to the description above in conjunction with FIG. 2, and details are not described herein again. It should be noted that the fault recovery apparatus 400 may be the primary node itself or a backup primary node.
如图4所示,本发明的故障恢复装置可以包括镜像获取单元410、重做日志获取单元420以及故障恢复单元430。镜像获取单元410可以获取并保存记录有主节点上某一时刻的调度信息和系统状态的元数据镜像,重做日志获取单元420可以获取并保存记录有时刻之后主节点所有操作的重做日志,故障恢复单元430可以在故障恢复时调用元数据镜像及其对应的重做日志进行故障恢复。As shown in FIG. 4, the fault recovery apparatus of the present invention may include a mirror acquisition unit 410, a redo log acquisition unit 420, and a failure recovery unit 430. The image obtaining unit 410 can acquire and save the metadata image of the scheduling information and the system state recorded at a certain moment on the master node, and the redo log obtaining unit 420 can acquire and save the redo log of all the operations of the master node after the recording time. The fault recovery unit 430 can invoke the metadata mirror and its corresponding redo log for fault recovery when the fault is recovered.
优选地,镜像获取单元410可以在主节点、装置和/或外部命令的触发下进行元数据镜像的获取和保存操作。镜像获取单元410可以直接获取并保存主节点在某一时刻的内存状态作为元数据镜像。进一步地,镜像获取单元410可以按照任务分组对元数据镜像进行存储。Preferably, the image acquisition unit 410 can perform the acquisition and save operation of the metadata mirror under the trigger of the master node, the device, and/or the external command. The image obtaining unit 410 can directly acquire and save the memory state of the master node at a certain moment as a metadata mirror. Further, the image obtaining unit 410 may store the metadata image according to the task group.
优选地,主节点在其每一次操作被重做日志获取单元420记录在重做日志内并存储之后才响应从属节点的新请求。Preferably, the master node responds to the new request of the slave node after each operation thereof is recorded in the redo log and stored in the redo log and stored.
优选地,镜像获取单元410持续获取并保存主节点在多个不同时刻的元数据镜像,并且重做日志获取单元420持续获取并保存分别对应于多个不同时刻的重做日志。此时,故障恢复单元430在故障恢复时调用最新的元数据镜像及其对应的重做日志进行故障恢复,故障恢复单元430在最新 的元数据镜像和/或其对应的重做日志不可用时,可以调用元数据镜像及其对应的重做日志都可用的最近时刻的数据进行故障恢复。Preferably, the image obtaining unit 410 continuously acquires and saves the metadata mirror of the master node at a plurality of different times, and the redo log obtaining unit 420 continuously acquires and saves the redo logs respectively corresponding to the plurality of different moments. At this time, the fault recovery unit 430 calls the latest metadata mirror and its corresponding redo log for failure recovery when the fault is recovered, and the fault recovery unit 430, when the latest metadata mirror and/or its corresponding redo log is unavailable, The data of the latest time that the metadata mirror and its corresponding redo log are available can be called for failure recovery.
上文中已经参考附图详细描述了根据本发明的分布式系统及其故障恢复方法、装置、产品和存储介质。The distributed system and its failure recovery method, apparatus, product and storage medium according to the present invention have been described in detail above with reference to the accompanying drawings.
此外,根据本发明的方法还可以实现为一种计算机程序或计算机程序产品,该计算机程序或计算机程序产品包括用于执行本发明的上述方法中限定的上述各步骤的计算机程序代码指令。Furthermore, the method according to the invention may also be embodied as a computer program or computer program product comprising computer program code instructions for performing the various steps defined above in the above method of the invention.
或者,本发明还可以实施为一种计算机程序产品,包括:存储器;处理器;以及计算机程序;其中,所述计算机程序存储在所述存储器中,并被配置为由所述处理器执行本发明的上述方法。Alternatively, the invention may be embodied as a computer program product comprising: a memory; a processor; and a computer program; wherein the computer program is stored in the memory and configured to perform the invention by the processor The above method.
图5为本发明一示例性实施例示出的电量提醒的设备的结构图。FIG. 5 is a structural diagram of an apparatus for displaying a power amount according to an exemplary embodiment of the present invention.
如图5所示,本实施例提供一种计算机程序产品,包括:至少一个处理器51和存储器52,图5中以一个处理器51为例,处理器51和存储器52通过总线50连接,存储器52存储有可被至少一个处理器51执行的指令,指令被至少一个处理器51执行,以使至少一个处理器51执行本发明的上述方法。As shown in FIG. 5, the embodiment provides a computer program product, including: at least one processor 51 and a memory 52. In FIG. 5, a processor 51 is taken as an example. The processor 51 and the memory 52 are connected by a bus 50. 52 stores instructions executable by at least one processor 51, the instructions being executed by at least one processor 51 to cause at least one processor 51 to perform the above described method of the present invention.
相关说明可以对应参见图2的步骤所对应的相关描述和效果进行理解,此处不做过多赘述。The related descriptions can be understood by referring to the related descriptions and effects corresponding to the steps in FIG. 2, and no further description is made here.
或者,本发明还可以实施为一种非暂时性机器可读存储介质(或计算机可读存储介质、或机器可读存储介质),其上存储有可执行代码(或计算机程序、或计算机指令代码),当所述可执行代码(或计算机程序、或计算机指令代码)被电子设备(或计算设备、服务器等)的处理器执行时,使所述处理器执行根据本发明的上述方法的各个步骤。Alternatively, the present invention may be embodied as a non-transitory machine readable storage medium (or computer readable storage medium, or machine readable storage medium) having stored thereon executable code (or computer program, or computer instruction code) When the executable code (or computer program, or computer instruction code) is executed by a processor of an electronic device (or computing device, server, etc.), causing the processor to perform various steps of the above method in accordance with the present invention .
本领域技术人员还将明白的是,结合这里的公开所描述的各种示例性逻辑块、模块、电路和算法步骤可以被实现为电子硬件、计算机软件或两者的组合。The various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both.
附图中的流程图和框图显示了根据本发明的多个实施例的系统和方法的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或代码的一部分,所述模块、程序段或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也 应当注意,在有些作为替换的实现中,方框中所标记的功能也可以以不同于附图中所标记的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality and operation of possible implementations of systems and methods in accordance with various embodiments of the present invention. In this regard, each block of the flowchart or block diagram can represent a module, a program segment, or a portion of code that includes one or more of the Executable instructions. It should also be noted that in some alternative implementations, the functions noted in the blocks may also occur in a different order than those illustrated in the drawings. For example, two consecutive blocks may be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowcharts, and combinations of blocks in the block diagrams and/or flowcharts, can be implemented in a dedicated hardware-based system that performs the specified function or operation. Or it can be implemented by a combination of dedicated hardware and computer instructions.
以上已经描述了本发明的各实施例,上述说明是示例性的,并非穷尽性的,并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实施例的原理、实际应用或对市场中的技术的改进,或者使本技术领域的其它普通技术人员能理解本文披露的各实施例。The embodiments of the present invention have been described above, and the foregoing description is illustrative, not limiting, and not limited to the disclosed embodiments. Numerous modifications and changes will be apparent to those skilled in the art without departing from the scope of the invention. The choice of terms used herein is intended to best explain the principles, practical applications, or improvements of the techniques in the various embodiments of the embodiments, or to enable those of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (25)

  1. 一种分布式系统,其特征在于,包括用于调度任务并管理系统状态的主节点和用于运行被调度的任务的多个从属节点,其中,A distributed system, comprising: a master node for scheduling tasks and managing system states; and a plurality of slave nodes for running scheduled tasks, wherein
    一个或多个所述从属节点和/或所述主节点获取并保存记录有所述主节点上某一时刻的调度信息和系统状态的元数据镜像;One or more of the slave nodes and/or the master node acquires and saves a metadata image recorded with scheduling information and system status at a certain moment on the master node;
    所述主节点获取并保存记录有所述时刻之后所述主节点所有操作的重做日志;以及The master node acquires and saves a redo log recording all operations of the primary node after the moment;
    所述主节点在故障恢复时调用所述元数据镜像及其对应的重做日志进行故障恢复。The primary node invokes the metadata mirror and its corresponding redo log for failure recovery when the fault is recovered.
  2. 如权利要求1所述的分布式系统,其特征在于,一个或多个所述从属节点和/或所述主节点在所述主节点和/或外部命令的触发下进行所述元数据镜像的获取和保存操作。The distributed system of claim 1 wherein said one or more said slave nodes and/or said master node perform said metadata mirroring under the trigger of said master node and/or an external command Get and save operations.
  3. 如权利要求1所述的分布式系统,其特征在于,所述主节点在其每一次操作被记录在所述重做日志内并被存储之后才响应所述从属节点的请求。The distributed system of claim 1 wherein said master node responds to said slave node request after each operation thereof is recorded in said redo log and stored.
  4. 如权利要求1所述的分布式系统,其特征在于,一个或多个所述从属节点和/或所述主节点持续获取并保存所述主节点在多个不同时刻的元数据镜像,并且The distributed system of claim 1 wherein one or more of said slave nodes and/or said master node continuously acquires and maintains metadata mirroring of said master node at a plurality of different times, and
    所述主节点持续获取并保存分别对应于所述多个不同时刻的重做日志。The master node continuously acquires and saves redo logs respectively corresponding to the plurality of different moments.
  5. 如权利要求4所述的分布式系统,其特征在于,所述主节点在故障恢复时调用最新的所述元数据镜像及其对应的重做日志进行故障恢复。The distributed system according to claim 4, wherein the master node invokes the latest metadata mirror and its corresponding redo log for failure recovery when the fault is recovered.
  6. 如权利要求4所述的分布式系统,其特征在于,所述主节点在最新的元数据镜像和/或其对应的重做日志不可用时,调用元数据镜像及其对应的重做日志都可用的最近时刻的数据进行故障恢复。The distributed system of claim 4, wherein the calling master data mirror and its corresponding redo log are available when the latest metadata mirror and/or its corresponding redo log is unavailable. The data of the most recent moment is fault recovery.
  7. 如权利要求1所述的分布式系统,其特征在于,一个或多个所述从属节点和/或所述主节点直接获取并保存所述主节点在某一时刻的内存状态作为所述元数据镜像。The distributed system according to claim 1, wherein one or more of said slave nodes and/or said master node directly acquires and saves a memory state of said master node at a certain time as said metadata Mirroring.
  8. 如权利要求1所述的分布式系统,其特征在于,所述元数据镜像是按照任务分组进行存储的。The distributed system of claim 1 wherein said metadata mirroring is stored in accordance with a task grouping.
  9. 一种分布式系统的故障恢复装置,其特征在于,所述分布式系统包括用于调度任务并管理系统状态的主节点和用于运行任务的多个从属节点,该装置用于在所述主节点发生故障时进行故障恢复,并且包括:A fault recovery apparatus for a distributed system, characterized in that the distributed system includes a master node for scheduling tasks and managing system states, and a plurality of slave nodes for running tasks, the device being used in the master Failover when a node fails, and includes:
    镜像获取单元,用于获取并保存记录有所述主节点上某一时刻的调度信息和系统状态的元数据镜像;a mirroring obtaining unit, configured to acquire and save a metadata image recorded with scheduling information and a system state at a certain moment on the primary node;
    重做日志获取单元,用于获取并保存记录有所述时刻之后所述主节点所有操作的重做日志;以及a redo log obtaining unit, configured to acquire and save a redo log that records all operations of the primary node after the moment; and
    故障恢复单元,用于在故障恢复时调用所述元数据镜像及其对应的重做日志进行故障恢复。The fault recovery unit is configured to invoke the metadata mirror and its corresponding redo log for fault recovery during fault recovery.
  10. 如权利要求9所述的装置,其特征在于,所述镜像获取单元在所述主节点、所述装置和/或外部命令的触发下进行所述元数据镜像的获取和保存操作。The apparatus according to claim 9, wherein said image acquisition unit performs said metadata mirror acquisition and save operation under the trigger of said master node, said device, and/or an external command.
  11. 如权利要求9所述的装置,其特征在于,所述主节点在其每一次操作被所述重做日志获取单元记录在所述重做日志内并存储之后才响应所述从属节点的请求。The apparatus of claim 9, wherein the master node responds to the request of the slave node after each operation thereof is recorded in the redo log and stored in the redo log.
  12. 如权利要求9所述的装置,其特征在于,所述镜像获取单元持续获取并保存所述主节点在多个不同时刻的元数据镜像,并且The apparatus according to claim 9, wherein said image acquisition unit continuously acquires and saves a metadata image of said master node at a plurality of different times, and
    所述重做日志获取单元持续获取并保存分别对应于所述多个不同时刻的重做日志。The redo log obtaining unit continuously acquires and saves redo logs respectively corresponding to the plurality of different moments.
  13. 如权利要求12所述的装置,其特征在于,所述故障恢复单元在故障恢复时调用最新的所述元数据镜像及其对应的重做日志进行故障恢复。The apparatus according to claim 12, wherein said failure recovery unit calls the latest metadata mirror and its corresponding redo log for failure recovery upon failure recovery.
  14. 如权利要求12所述的装置,其特征在于,所述故障恢复单元在最新的元数据镜像和/或其对应的重做日志不可用时,调用元数据镜像及其对应的重做日志都可用的最近时刻的数据进行故障恢复。The apparatus according to claim 12, wherein said failure recovery unit invokes a metadata mirror and its corresponding redo log when the latest metadata image and/or its corresponding redo log is unavailable. The data at the most recent time is fault recovery.
  15. 如权利要求9所述的装置,其特征在于,所述镜像获取单元直接获取并保存所述主节点在某一时刻的内存状态作为所述元数据镜像。The apparatus according to claim 9, wherein the image acquisition unit directly acquires and saves a memory state of the master node at a certain moment as the metadata image.
  16. 如权利要求9所述的装置,其特征在于,所述镜像获取单元按照任务分组对所述元数据镜像进行存储。The apparatus according to claim 9, wherein said image acquisition unit stores said metadata image in accordance with a task group.
  17. 一种分布式系统的故障恢复方法,其特征在于,所述分布式系统 包括用于调度任务并管理系统状态的主节点和用于运行任务的多个从属节点,该方法用于在所述主节点发生故障时进行故障恢复,该方法包括:A fault recovery method for a distributed system, characterized in that the distributed system includes a master node for scheduling tasks and managing system states, and a plurality of slave nodes for running tasks, the method being used in the master To recover from a failure when a node fails, the method includes:
    获取并保存记录有某一时刻的调度信息和系统状态的元数据镜像;Obtaining and saving a metadata image recording schedule information and system status at a certain moment;
    获取并保存记录有所述时刻之后所有调度操作的重做日志;以及Acquiring and saving a redo log of all scheduled operations after the moment; and
    在故障恢复时调用所述元数据镜像及其对应的重做日志进行故障恢复。The metadata mirror and its corresponding redo log are called for failure recovery during failure recovery.
  18. 如权利要求17所述的方法,其特征在于,The method of claim 17 wherein:
    持续获取并保存主节点在多个不同时刻的元数据镜像,并且Continuously acquire and save metadata images of the primary node at multiple different times, and
    持续获取并保存分别对应于所述多个不同时刻的重做日志。The redo logs respectively corresponding to the plurality of different moments are continuously acquired and saved.
  19. 如权利要求18所述的方法,其特征在于,在故障恢复时调用所述元数据镜像及其对应的重做日志进行故障恢复包括:The method of claim 18, wherein invoking the metadata mirror and its corresponding redo log for failure recovery upon failure recovery comprises:
    在故障恢复时调用最新的所述元数据镜像及其对应的重做日志进行故障恢复;以及Recalling the latest metadata mirror and its corresponding redo log for failure recovery when the fault is recovered;
    在最新的元数据镜像和/或其对应的重做日志不可用时,调用元数据镜像及其对应的重做日志都可用的最近时刻的数据进行故障恢复。When the latest metadata mirror and/or its corresponding redo log is unavailable, the data of the latest time that the metadata mirror and its corresponding redo log are available are called for failure recovery.
  20. 如权利要求17所述的方法,其特征在于,直接获取并保存主节点在某一时刻的内存状态作为所述元数据镜像。The method according to claim 17, wherein the memory state of the master node at a certain moment is directly acquired and saved as the metadata mirror.
  21. 如权利要求17所述的方法,其特征在于,在所述主节点、和/或外部命令的触发下进行所述元数据镜像的获取和保存操作。The method of claim 17, wherein the obtaining and saving operations of the metadata mirroring are performed under the trigger of the master node and/or an external command.
  22. 如权利要求17所述的方法,其特征在于,所述主节点在其每一次操作被记录在所述重做日志内并存储之后才响应所述从属节点的请求。The method of claim 17 wherein said master node responds to said slave node request after each operation thereof is recorded in said redo log and stored.
  23. 如权利要求17所述的方法,其特征在于,所述元数据镜像是按照任务分组进行存储的。The method of claim 17 wherein said metadata mirroring is stored in accordance with a task grouping.
  24. 一种计算机程序产品,其特征在于,包括:A computer program product, comprising:
    存储器;处理器;以及计算机程序;Memory; processor; and computer program;
    其中,所述计算机程序存储在所述存储器中,并被配置为由所述处理器执行如权利要求17至23中任一项所述的方法。Wherein the computer program is stored in the memory and configured to perform the method of any one of claims 17 to 23 by the processor.
  25. 一种计算机可读存储介质,其特征在于,包括:程序,当其在计算机上运行时,使得计算机执行权利要求17至23中任一项所述的方法。A computer readable storage medium, comprising: a program, when executed on a computer, causing a computer to perform the method of any one of claims 17 to 23.
PCT/CN2018/097262 2017-07-28 2018-07-26 Distributed system and fault recovery method and apparatus thereof, product, and storage medium WO2019020081A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710630823.6 2017-07-28
CN201710630823.6A CN107357688B (en) 2017-07-28 2017-07-28 Distributed system and fault recovery method and device thereof

Publications (1)

Publication Number Publication Date
WO2019020081A1 true WO2019020081A1 (en) 2019-01-31

Family

ID=60285161

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/097262 WO2019020081A1 (en) 2017-07-28 2018-07-26 Distributed system and fault recovery method and apparatus thereof, product, and storage medium

Country Status (2)

Country Link
CN (1) CN107357688B (en)
WO (1) WO2019020081A1 (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107357688B (en) * 2017-07-28 2020-06-12 广东神马搜索科技有限公司 Distributed system and fault recovery method and device thereof
CN108390771B (en) * 2018-01-25 2021-04-16 中国银联股份有限公司 Network topology reconstruction method and device
CN108427728A (en) * 2018-02-13 2018-08-21 百度在线网络技术(北京)有限公司 Management method, equipment and the computer-readable medium of metadata
CN109189480B (en) * 2018-07-02 2021-11-09 新华三技术有限公司成都分公司 File system starting method and device
CN109144792A (en) * 2018-10-08 2019-01-04 郑州云海信息技术有限公司 Data reconstruction method, device and system and computer readable storage medium
CN109656911B (en) * 2018-12-11 2023-08-01 江苏瑞中数据股份有限公司 Distributed parallel processing database system and data processing method thereof
CN111104226B (en) * 2019-12-25 2024-01-26 东北大学 Intelligent management system and method for multi-tenant service resources
CN112379977A (en) * 2020-07-10 2021-02-19 中国航空工业集团公司西安飞行自动控制研究所 Task-level fault processing method based on time triggering
CN111880969B (en) * 2020-07-30 2024-06-04 上海达梦数据库有限公司 Storage node recovery method, device, equipment and storage medium
CN115563028B (en) * 2022-12-06 2023-03-14 苏州浪潮智能科技有限公司 Data caching method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103294701A (en) * 2012-02-24 2013-09-11 联想(北京)有限公司 Distributed file system and data processing method
CN104216802A (en) * 2014-09-25 2014-12-17 北京金山安全软件有限公司 Memory database recovery method and device
US9053123B2 (en) * 2010-09-02 2015-06-09 Microsoft Technology Licensing, Llc Mirroring file data
CN107357688A (en) * 2017-07-28 2017-11-17 广东神马搜索科技有限公司 Distributed system and its fault recovery method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9053123B2 (en) * 2010-09-02 2015-06-09 Microsoft Technology Licensing, Llc Mirroring file data
CN103294701A (en) * 2012-02-24 2013-09-11 联想(北京)有限公司 Distributed file system and data processing method
CN104216802A (en) * 2014-09-25 2014-12-17 北京金山安全软件有限公司 Memory database recovery method and device
CN107357688A (en) * 2017-07-28 2017-11-17 广东神马搜索科技有限公司 Distributed system and its fault recovery method and device

Also Published As

Publication number Publication date
CN107357688B (en) 2020-06-12
CN107357688A (en) 2017-11-17

Similar Documents

Publication Publication Date Title
WO2019020081A1 (en) Distributed system and fault recovery method and apparatus thereof, product, and storage medium
US20220188003A1 (en) Distributed Storage Method and Device
CN105389230B (en) A kind of continuous data protection system and method for combination snapping technique
WO2019154394A1 (en) Distributed database cluster system, data synchronization method and storage medium
US10817478B2 (en) System and method for supporting persistent store versioning and integrity in a distributed data grid
WO2017177941A1 (en) Active/standby database switching method and apparatus
JP2021002369A (en) Index update pipeline
US9058371B2 (en) Distributed database log recovery
US9652520B2 (en) System and method for supporting parallel asynchronous synchronization between clusters in a distributed data grid
US8949190B2 (en) Point-in-time database recovery using log holes
CN101539873B (en) Data recovery method, data node and distributed file system
WO2017128764A1 (en) Cache cluster-based caching method and system
US10831741B2 (en) Log-shipping data replication with early log record fetching
WO2018098972A1 (en) Log recovery method, storage device and storage node
JP2016524750A5 (en)
CN102158540A (en) System and method for realizing distributed database
WO2021226905A1 (en) Data storage method and system, and storage medium
US9830228B1 (en) Intelligent backup model for snapshots
WO2015184925A1 (en) Data processing method for distributed file system and distributed file system
US20130219224A1 (en) Job continuation management apparatus, job continuation management method and job continuation management program
US11500812B2 (en) Intermediate file processing method, client, server, and system
US11042454B1 (en) Restoration of a data source
CN109726211B (en) Distributed time sequence database
JP5154843B2 (en) Cluster system, computer, and failure recovery method
CN113946471A (en) Distributed file-level backup method and system based on object storage

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18837616

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18837616

Country of ref document: EP

Kind code of ref document: A1