CN107357688A - Distributed system and its fault recovery method and device - Google Patents

Distributed system and its fault recovery method and device Download PDF

Info

Publication number
CN107357688A
CN107357688A CN201710630823.6A CN201710630823A CN107357688A CN 107357688 A CN107357688 A CN 107357688A CN 201710630823 A CN201710630823 A CN 201710630823A CN 107357688 A CN107357688 A CN 107357688A
Authority
CN
China
Prior art keywords
host node
mirror image
fault recovery
metadata mirror
log
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710630823.6A
Other languages
Chinese (zh)
Other versions
CN107357688B (en
Inventor
褚建辉
卢申朋
刘东辉
王新栋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Original Assignee
Guangdong Shenma Search Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Shenma Search Technology Co Ltd filed Critical Guangdong Shenma Search Technology Co Ltd
Priority to CN201710630823.6A priority Critical patent/CN107357688B/en
Publication of CN107357688A publication Critical patent/CN107357688A/en
Priority to PCT/CN2018/097262 priority patent/WO2019020081A1/en
Application granted granted Critical
Publication of CN107357688B publication Critical patent/CN107357688B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1464Management of the backup or restore process for networked environments

Abstract

The invention discloses a kind of distributed system and its fault recovery method and device.Wherein, slave node and/or host node obtain and keeping records has the metadata mirror image of the schedule information at a certain moment and system mode on host node;Host node obtains and keeping records has redoing log for all operations of host node after the moment;And host node calls metadata mirror image in fault recovery and its corresponding redoes log carry out fault recovery.Thus, when host node breaks down, it is possible to according to the metadata mirror image recorded before and redo log the state before host node quickly to be returned to failure.

Description

Distributed system and its fault recovery method and device
Technical field
The present invention relates to distributed technical field, more particularly to a kind of distributed system and its fault recovery method and dress Put.
Background technology
Distributed system is that more machines are organically combined, connected, and allows its collaboration to accomplish a task, such as count Calculation task, store tasks.It is built upon the software systems on network.Existing distributed system is client/server mostly, Fig. 1 is the structural representation for showing the distributed system using client/server.As shown in figure 1, the distributed system of client/server System is made up of host node (master) and multiple slave nodes (slave) mostly.Host node is adjusted as the center of distributed system Node is spent, generally has the functions such as metadata storage and inquiry, clustered node condition managing, decision-making and mission dispatching concurrently, by In the metadata of host node management be data more important in system, the influence of the loss of the data on host node to system compared with Greatly.
Therefore, it is necessary to a kind of failover (failover) mechanism so that collapsed when host node runs into unknown error When, the state that can return to host node before mistake occurs avoids the loss of host node data.
The content of the invention
The invention provides a kind of fault recovery scheme for the host node being directed in distributed system, by obtaining host node The metadata mirror image inscribed in one or more, and in redoing log record host node operation so that host node send out During raw failure, according to the metadata mirror image recorded before and the shape before host node to be quickly recovered to failure can be redo log State.
According to an aspect of the invention, there is provided a kind of distributed system, including for scheduler task and management system The host node of state and multiple slave nodes for running scheduled task, wherein, one or more slave nodes and/or Host node obtains and keeping records has the metadata mirror image of the schedule information at a certain moment and system mode on host node;Host node Obtain and keeping records has redoing log for all operations of host node after the moment;And host node calls in fault recovery Metadata mirror image and its corresponding redo log carry out fault recovery.
Thus, according to the metadata mirror image that records before and before redoing log and host node quickly being returned into failure State, recovery efficiency can be improved compared with only by way of log file.
Preferably, one or more slave nodes and/or host node are carried out under the triggering of host node and/or external command The acquisition and preservation operation of metadata mirror image.Thus, it is possible to according to the characteristic of distributed system, different triggering modes is set Trigger the acquisition of metadata mirror image and preserve operation.
Preferably, host node its operate each time be recorded in redo log in and by storage after just respond subordinate section The request of point.Therefore ensure that the operation each time for redoing log and being capable of complete documentation host node.
Preferably, one or more slave nodes and/or host node persistently obtain and preserve host node when multiple different The metadata mirror image at quarter, and host node is persistently obtained and preserved and corresponds respectively to multiple redoing log at different moments.Main section Point can be called in fault recovery newest metadata mirror image and its it is corresponding redo log carry out fault recovery, and when newest Metadata mirror image and/or its corresponding to when redoing log unavailable, call metadata mirror image and its corresponding redo log all The data at available moment recently carry out fault recovery.Thus, by preserving more parts of memory mirrors at different moments and corresponding Redo log, serious forgiveness during fault recovery can be improved.
Preferably, one or more slave nodes and/or host node directly obtain and preserve host node at a time Internal storage state is as metadata mirror image.Metadata mirror image can be stored according to grouped task.Thus, subsequently recovering When can according to packet efficiently tissue corresponding to metadata mirror image.
According to another aspect of the present invention, a kind of local fault recovery device of distributed system, distribution system are additionally provided System includes the host node for being used for scheduler task and management system state and multiple slave nodes for operation task, and the device is used In carrying out fault recovery when host node breaks down, and including:Mirror image acquiring unit, there is master for obtaining simultaneously keeping records The metadata mirror image of the schedule information at a certain moment and system mode on node;Acquiring unit is redo log, for obtaining and protecting Depositing record has redoing log for all operations of host node after the moment;And fault recovery unit, for being adjusted in fault recovery Fault recovery is carried out with metadata mirror image and its corresponding redo log.
Preferably, mirror image acquiring unit carries out metadata mirror image under the triggering of host node, device and/or external command Obtain and preserve operation.
Preferably, host node its operate each time redo log acquiring unit be recorded in redo log in and store it The request of slave node is just responded afterwards.
Preferably, mirror image acquiring unit persistently obtains and preserves host node in multiple metadata mirror images at different moments, and And redo log acquiring unit and persistently obtain and preserve and correspond respectively to multiple redoing log at different moments.
Preferably, fault recovery unit calls newest metadata mirror image in fault recovery and its corresponding redo log Carry out fault recovery.
Preferably, fault recovery unit is adjusted when redoing log unavailable corresponding to newest metadata mirror image and/or its Fault recovery is carried out with metadata mirror image and its corresponding data for redoing log all available moment recently.
Preferably, mirror image acquiring unit directly obtains and preserves the internal storage state of host node at a time as metadata Mirror image.
Preferably, mirror image acquiring unit stores according to grouped task to metadata mirror image.
According to a further aspect of the invention, a kind of fault recovery method of distributed system, distribution system are additionally provided System includes multiple slave nodes for operation task, and this method includes:The scheduling that obtaining simultaneously keeping records has a certain moment is believed The metadata mirror image of breath and system mode;Obtain and keeping records has redoing log for all scheduling operations after the moment;And Metadata mirror image is called in fault recovery and its corresponding redoes log carry out fault recovery.
Preferably, persistently obtain and preserve the host node in multiple metadata mirror images at different moments, and persistently obtain Take and preserve and correspond respectively to the multiple redoing log at different moments.
Preferably, in fault recovery call metadata mirror image and its it is corresponding redo log carry out fault recovery can wrap Include:Newest metadata mirror image is called in fault recovery and its corresponding redoes log carry out fault recovery;And newest Metadata mirror image and/or its corresponding to when redoing log unavailable, call metadata mirror image and its corresponding redo log all The data at available moment recently carry out fault recovery.
Preferably, it can directly obtain and preserve the internal storage state of host node at a time as metadata mirror image.
The distributed system and its fault recovery method and device of the present invention, by obtaining host node in one or more The metadata mirror image inscribed, and in redoing log record host node subsequent operation so that, can when host node breaks down With the metadata mirror image of record before and redo log the state before host node to be quickly recovered to failure.
Brief description of the drawings
Disclosure illustrative embodiments are described in more detail in conjunction with the accompanying drawings, the disclosure above-mentioned and its Its purpose, feature and advantage will be apparent, wherein, in disclosure illustrative embodiments, identical reference number Typically represent same parts.
Fig. 1 is the configuration diagram for the distributed system for showing client/server.
Fig. 2 is the indicative flowchart for showing fault recovery method according to an embodiment of the invention.
Fig. 3 is to show the schematic diagram for continuously preserving multiple metadata mirror images and redoing log.
Fig. 4 is the schematic block diagram for the structure for showing local fault recovery device according to an embodiment of the invention.
Embodiment
The preferred embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although the disclosure is shown in accompanying drawing Preferred embodiment, however, it is to be appreciated that may be realized in various forms the disclosure without the embodiment party that should be illustrated here Formula is limited.On the contrary, these embodiments are provided so that the disclosure is more thorough and complete, and can be by the disclosure Scope is intactly communicated to those skilled in the art.
For the distributed system of the client/server shown in Fig. 1, because host node stores system normal operation and scheduling Necessary data, such as system state data and current scheduling data, therefore influence of the loss of its data to system is very big. Therefore, it is necessary to a kind of recovery mechanism so that when host node runs into unknown error, host node can be returned to one surely Fixed reliable state.It this is directed to, can record the journal file of all operations of host node, journal file can be persistence It is stored on disk.Once so host node breaks down, even if all data in the internal memory of loss host node, are opened when next time When dynamic, the journal file recorded by reappearing (replay), the state that host node can still returned to before failure.
The operating process of host node is as follows under the program:Before host node performs operation every time, the operation note can be arrived In journal file, the operation is performed after recording successfully again, you can to update the data in internal memory based on the operation;When breaking down Recovery flow it is as follows:Journal file is read, the operation based on the host node recorded in journal file is changed in internal memory successively Data.It is this only to be realized simply by the reset mode for the journal file for recording write operation, but it recovers pole the time required to flow It is long.
, can be with during the journal file of the operation of record host node therefore, inventor has found after extensive studies it The image file of internal storage data under obtaining host node interspersedly at a time, image file can characterize host node corresponding When the current status data inscribed, so when host node breaks down, nearest image file and daily record text can be called In part at the time of corresponding to the image file called after the operation that records, main section is realized according to the data can of calling The recovery of point, the time required to can significantly shortening recovery compared with only by way of log file.
Based on above-mentioned design, the present invention proposes a kind of fault recovery scheme for the host node being directed in distributed system, The present invention fault recovery scheme can be as shown in Figure 1 distributed system realize.As shown in figure 1, the distributed system of the present invention System can include the host node for being used for scheduler task and management system state and multiple subordinates for running scheduled task Node.Host node and slave node can be disposed in the server, and host node can be deployed in different from slave node An independent server in, can also be deployed in one of slave node in same server.As preferred reality Example is applied, different nodes can be deployed in different servers.Distributed system shown in Fig. 1 is by a host node and multiple Slave node is formed, it should be appreciated that distributed system of the invention can also include multiple host nodes, and can also include removing Other devices outside host node, slave node, such as backup host node, disaster recovery data storehouse etc..
Just the distributed system of the present invention realizes that the idiographic flow of fault recovery scheme is described in detail below.Fig. 2 is Show the indicative flowchart of fault recovery method according to an embodiment of the invention.Wherein, the method shown in Fig. 2 can be by Distributed system shown in Fig. 1 is realized, specifically, can be realized by the host node in distributed system.
Referring to Fig. 2, in step S210, obtain and keeping records has the schedule information at a certain moment and system shape on host node The metadata mirror image of state.
For the distributed system of client/server, after host node collapse, it can cause whole distributed system can not With, therefore do not run specific tasks directly generally in view of the importance of host node, host node, but be merely responsible for remaining distributed The operation of system and the dispatching distribution of task, specific tasks can be performed by slave node.That is, host node is mainly responsible for Task requests are parsed, distribute resource, target data or node, the subordinate that specific tasks are specified by host node are positioned according to metadata Node performs.Wherein, metadata is the data for describing data, the present invention in metadata refer in particular to host node be responsible for preserve and The data of management.Because host node is used for scheduler task and management system state, therefore, metadata can refer to record host node The schedule information at upper a certain moment and the data of system mode.Such as Hadoop distributed systems, metadata can be with It is system associated description data, system state data, current task scheduling and status data etc., then is for example deposited for distribution For storage system, metadata can be the data for the status information (such as storage location) for describing user data.
The metadata mirror image of the host node got at a time can be internal storage state of the host node at the moment One mapping, therefore can directly obtain and preserve the internal storage state of host node at a time as metadata mirror image.Specifically In realization, host node can be obtained by modes such as Snapshot (disk snapshot), dump (backup file system) in certain a period of time The metadata mirror image at quarter.
Obtaining the operation of metadata mirror image can be performed by host node, can also be performed by one or more slave nodes, It can also be performed by the backup host node in distributed system.Acquired metadata mirror image can with persistence be stored in local In disk or distributed file system, such as can with persistence it be stored in disaster recovery data storehouse.
As the alternative embodiment of the present invention, host node can concurrently be adjusted in scheduler task according to packet Degree, now acquired metadata mirror image can be the metadata mirror image under multiple packets, therefore, for acquired metadata Mirror image can be stored according to grouped task, will belong to the metadata mirrored storage of same grouped task under same catalogue, Thus metadata mirror image corresponding to can efficiently being organized according to packet in follow-up recover.
In step S220, simultaneously keeping records can be obtained by host node the weight of all operations of host node after the moment Do daily record.Operation described herein can refer to the operation that host node performs to metadata, or host node to its internal memory number According to the operation of execution.
The each operation performed for host node, can be recorded in and redo log in (redo log).Redo log In can sequentially record the operation information of host node.The each operation that will be performed for host node, can be in the operation It is recorded in redoing log and after persistence preservation, this is just performed by host node and operated.So that in the operation implementation procedure , can be according to redoing log the data recovery of the middle record operation during error of middle host node.Else if for a certain operation first Execution re-records, and when being malfunctioned in the operation implementation procedure or before the operation note, preservation, then can not recover this operation, It can only redo.
For example, when slave node is to host node request task (such as calculating task, store tasks), host node can be first This operation note of target data will be issued to slave node in redoing log, after record and persistence preserve successfully, Target data just in response to the request of slave node, is sent to slave node by host node.In other words, for slave node Request, operation note in redo log and after store (persistent storage) that can be in host node for the request, Respond the request of slave node.
In step S230, metadata mirror image is called in fault recovery and its corresponding redoes log carry out fault recovery.
As described above, metadata mirror image can be considered as the mapping of the internal storage state of host node at a time, and reform Log recording all operations of host node.Therefore, when host node breaks down, acquired in before being occurred according to failure Metadata mirror image and redo log middle record at the time of metadata mirror image corresponds to after host node failure occur before this The operation of host node in the section time, carries out fault recovery, and host node is returned into the state before failure occurs.To redo log note Record can recover as follows exemplified by file system:After host node restarting, the member first in traversal file system Data image catalogue, recent metadata mirror image is found, is loaded into internal memory, then start to load newest metadata mirror Redoing log as after, and start to reset (replay), so after loading is completed, whole recovery process just completes.
As the alternative embodiment of the present invention, when preserving the metadata mirror image of host node, it is multiple right to preserve The metadata mirror images of Ying Yu at different moments.During record is redo log, it can be periodically or in response to meet to make a reservation for Trigger condition, perform the acquisition operation of metadata mirror image.Above-mentioned trigger condition can be that for example some parameter meets in advance Definite value, predetermined space is reached, or correspond directly to the trigger command of outside.For example, it may be often recorded in redoing log Predetermined quantity operates, and the acquisition for being carried out a metadata mirror image operates or performed at predetermined time intervals a dimension Acquisition operation of data image etc..
Further, the operation note of host node when in redoing log, can be continued acquisition correspond respectively to it is more Individual (i.e. multiple metadata mirror images) at different moments is redo log.Fig. 3 is to show the lasting multiple metadata image files of preservation And its corresponding principle schematic redo log.
Referring to Fig. 3, the metadata mirror image 1 of t1 moment host nodes, operation of the host node between t1-t2 can be obtained first It can record in being stored in and redoing log 1, the metadata mirror image 2 of host node can be obtained again at the t2 moment, host node is in t2- Operation between t3, which can record, to be stored in and redoes log 2, by that analogy, can obtain corresponding respectively to t1, t2, t3 moment Metadata mirror image, and correspond respectively to redoing log for metadata mirror image at different moments.
This reason it is assumed that host node collapses at the t4 moment, in fault recovery, host node can call newest first Metadata mirror image (i.e. the metadata mirror image at t3 moment) and its it is corresponding redo log (redoing log in t3-t4 sections) carry out therefore Barrier recovers.If newest metadata mirror image and redoing log unavailable, then time new metadata mirror can be further called (i.e. The metadata mirror image at t2 moment) and (i.e. redoing log in t2-t3 sections) progress fault recovery is redo log, by that analogy, can With by constantly pushing back until obtaining available data file.Thus, by preserving more parts of memory mirrors at different moments and right That answers redoes log, and can improve serious forgiveness during fault recovery.
In other words, the scheme of the application can with the acquisition of certain condition or command triggers to metadata mirror image and Storage (for example, preserving the state at t3 moment), even if with lasting record (that is, the institute after record t3 started to redoing log There is operation).After the t4 moment breaks down, can play back all operations after t3 by recovering the state at t3 moment again makes Obtain the state that host node is quickly returning to the t4 moment.
When obtaining the metadata mirror image of host node at a time, such as shown in figure 3, obtain metadata at the t1 moment During mirror image 1, tend not to stop the service of host node, and obtaining metadata mirror image 1 needs the regular hour, therefore t1 moment institutes The metadata mirror image 1 of acquisition has been likely to contain the certain operations redo log after the t1 moment in 1, therefore in t2 moment main section Point is when breaking down, using the t1 moment metadata mirror image 1 and it is corresponding redo log 1 and recovered when, it is likely that it is last State before the state of the host node of recovery and recovery is inconsistent.
Therefore, an alternative embodiment as the present invention, can during the metadata mirror image at a certain moment is obtained To record the time for the operation for now redoing log middle record in real time, after metadata mirror image acquisition at a time, Can from redoing log the corresponding operation of middle removal, include hereafter redoing log with the metadata mirror image for avoiding obtaining recorded in Some operations phenomenon so that metadata mirror image can with its corresponding to redo log strict control in time.
So far the fault recovery method of the present invention is described in detail in combined Fig. 2-3.In addition, the fault recovery side of the present invention Case can also be realized by a kind of local fault recovery device.Fig. 4 shows local fault recovery device according to an embodiment of the invention Structured flowchart.Wherein, the functional module of local fault recovery device 400 can be by realizing the hardware, software or hardware of the principle of the invention Combination with software is realized.It will be appreciated by persons skilled in the art that the functional module described by Fig. 4 can combine Or submodule is divided into, so as to realize the principle of foregoing invention.Therefore, description herein can be supported to work(described herein Any possible combination or division of energy module or further restriction.
Local fault recovery device 400 shown in Fig. 4 can be used for realizing the fault recovery method shown in Fig. 2, below only just event The operation that the functional module and each functional module that barrier recovery device 400 can have can perform is described briefly, for it In the detail section that is related to may refer to description above in association with Fig. 2, repeat no more here.It should be noted that fault recovery Device 400 can be host node in itself or backup host node.
As shown in figure 4, the local fault recovery device of the present invention can include mirror image acquiring unit 410, redo log acquisition list Member 420 and fault recovery unit 430.Mirror image acquiring unit 410 can obtain and keeping records has a certain moment on host node Schedule information and system mode metadata mirror image, redo log that acquiring unit 420 can obtain and keeping records has the moment All operations of host node afterwards are redo log, fault recovery unit 430 can be called in fault recovery metadata mirror image and Carry out fault recovery is redo log corresponding to it.
Preferably, mirror image acquiring unit 410 can carry out first number under the triggering of host node, device and/or external command According to the acquisition and preservation operation of mirror image.Mirror image acquiring unit 410 can directly obtain and preserve host node at a time interior State is deposited as metadata mirror image.Further, mirror image acquiring unit 410 can be carried out according to grouped task to metadata mirror image Storage.
Preferably, host node its operate each time redo log acquiring unit 420 be recorded in redo log in and deposit The new request of slave node is just responded after storage.
Preferably, mirror image acquiring unit 410 persistently obtains and preserves host node in multiple metadata mirror images at different moments, And redo log acquiring unit 420 and persistently obtain and preserve and correspond respectively to multiple redoing log at different moments.Now, therefore Barrier recovery unit 430 called in fault recovery newest metadata mirror image and its it is corresponding redo log carry out fault recovery, Fault recovery unit 430 can call metadata when redoing log unavailable corresponding to newest metadata mirror image and/or its Mirror image and its corresponding data for redoing log all available moment recently carry out fault recovery.
Above by reference to accompanying drawing be described in detail according to the present invention distributed system and its fault recovery method and Device.
In addition, the method according to the invention is also implemented as a kind of computer program or computer program product, the meter The calculating of the above steps limited in the above method that calculation machine program or computer program product include being used to perform the present invention Machine code instructions.
Or the present invention can also be embodied as a kind of (or the computer-readable storage of non-transitory machinable medium Medium or machinable medium), executable code (or computer program or computer instruction code) is stored thereon with, When the executable code (or computer program or computer instruction code) is by electronic equipment (or computing device, server Deng) computing device when, make the computing device according to the present invention the above method each step.
Those skilled in the art will also understand is that, the various illustrative logical blocks with reference to described by disclosure herein, mould Block, circuit and algorithm steps may be implemented as the combination of electronic hardware, computer software or both.
Flow chart and block diagram in accompanying drawing show that the possibility of the system and method for multiple embodiments according to the present invention is real Existing architectural framework, function and operation.At this point, each square frame in flow chart or block diagram can represent module, a journey A part for sequence section or code, a part for the module, program segment or code is comprising one or more defined for realizing The executable instruction of logic function.It should also be noted that at some as in the realization replaced, the function of being marked in square frame also may be used With with different from the order marked in accompanying drawing generation.For example, two continuous square frames can essentially perform substantially in parallel, They can also be performed in the opposite order sometimes, and this is depending on involved function.It is also noted that block diagram and/or stream The combination of each square frame and block diagram in journey figure and/or the square frame in flow chart, function or operation as defined in performing can be used Special hardware based system realize, or can be realized with the combination of specialized hardware and computer instruction.
It is described above various embodiments of the present invention, described above is exemplary, and non-exclusive, and It is not limited to disclosed each embodiment.In the case of without departing from the scope and spirit of illustrated each embodiment, for this skill Many modifications and changes will be apparent from for the those of ordinary skill in art field.The selection of term used herein, purport The principle of each embodiment, practical application or improvement to the technology in market are best being explained, or is making the art Other those of ordinary skill are understood that each embodiment disclosed herein.

Claims (20)

1. a kind of distributed system, including host node for scheduler task and management system state and scheduled for running Multiple slave nodes of task, wherein,
One or more slave nodes and/or the host node obtain and keeping records has a certain moment on the host node Schedule information and system mode metadata mirror image;
The host node obtains and keeping records has redoing log for all operations of host node after the moment;And
The host node calls the metadata mirror image in fault recovery and its corresponding redoes log carry out fault recovery.
2. distributed system as claimed in claim 1, wherein, one or more slave nodes and/or the host node The acquisition of the metadata mirror image is carried out under the triggering of the host node and/or external command and preserves operation.
3. distributed system as claimed in claim 1, wherein, the host node its operate each time be recorded in it is described heavy Do in daily record and the request of the slave node is just responded after storage.
4. distributed system as claimed in claim 1, wherein, one or more slave nodes and/or the host node Persistently obtain and preserve the host node in multiple metadata mirror images at different moments, and
The host node, which is persistently obtained and preserved, corresponds respectively to the multiple redoing log at different moments.
5. distributed system as claimed in claim 4, wherein, the host node calls the newest member in fault recovery Data image and its corresponding redo log carry out fault recovery.
6. distributed system as claimed in claim 4, wherein, the host node is in newest metadata mirror image and/or its is right Answer when redoing log unavailable, call metadata mirror image and its corresponding data for redoing log all available moment recently to enter Row fault recovery.
7. distributed system as claimed in claim 1, wherein, one or more slave nodes and/or the host node Directly obtain and preserve the internal storage state of the host node at a time as the metadata mirror image.
8. distributed system as claimed in claim 1, wherein, the metadata mirror image is stored according to grouped task 's.
9. a kind of local fault recovery device of distributed system, the distributed system includes being used for scheduler task and management system shape The host node of state and multiple slave nodes for operation task, the device are used to carry out event when the host node breaks down Barrier recovers, and including:
Mirror image acquiring unit, there are the schedule information and system mode at a certain moment on the host node for obtaining simultaneously keeping records Metadata mirror image;
Acquiring unit is redo log, for obtaining and keeping records has reforming for all operations of host node after the moment Daily record;And
Fault recovery unit, for calling the metadata mirror image in fault recovery and its corresponding redoing log carry out failure Recover.
10. device as claimed in claim 9, wherein, the mirror image acquiring unit the host node, described device and/or The acquisition of the metadata mirror image is carried out under the triggering of external command and preserves operation.
11. device as claimed in claim 9, wherein, the host node operates and redoes log acquisition by described each time at it Unit record just responds the request of the slave node in described redo log and after storing.
12. device as claimed in claim 9, wherein, the mirror image acquiring unit, which persistently obtains and preserves the host node, to exist Multiple metadata mirror images at different moments, and
It is described redo log acquiring unit and persistently obtain and preserve correspond respectively to the multiple redoing log at different moments.
13. device as claimed in claim 12, wherein, the fault recovery unit called in fault recovery it is newest described in Metadata mirror image and its corresponding redo log carry out fault recovery.
14. device as claimed in claim 12, wherein, the fault recovery unit newest metadata mirror image and/or its It is corresponding when redoing log unavailable, call metadata mirror image and its corresponding data for redoing log all available moment recently Carry out fault recovery.
15. device as claimed in claim 9, wherein, the mirror image acquiring unit, which directly obtains and preserves the host node, to exist The internal storage state at a certain moment is as the metadata mirror image.
16. device as claimed in claim 9, wherein, the mirror image acquiring unit is according to grouped task to the metadata mirror As being stored.
17. a kind of fault recovery method of distributed system, the distributed system includes multiple subordinates for operation task Node, this method include:
Obtain and keeping records has the schedule information at a certain moment and the metadata mirror image of system mode;
Obtain and keeping records has redoing log for all scheduling operations after the moment;And
The metadata mirror image is called in fault recovery and its corresponding redoes log carry out fault recovery.
18. method as claimed in claim 17, wherein,
Persistently obtain and preserve host node in multiple metadata mirror images at different moments, and
Persistently obtain and preserve and correspond respectively to the multiple redoing log at different moments.
19. method as claimed in claim 18, wherein, the metadata mirror image and its corresponding weight are called in fault recovery Doing daily record progress fault recovery includes:
The newest metadata mirror image is called in fault recovery and its corresponding redoes log carry out fault recovery;And
When redoing log unavailable corresponding to newest metadata mirror image and/or its, metadata mirror image and its corresponding is called The data for redoing log all available moment recently carry out fault recovery.
20. method as claimed in claim 17, wherein, directly obtain and preserve the internal storage state of host node at a time and make For the metadata mirror image.
CN201710630823.6A 2017-07-28 2017-07-28 Distributed system and fault recovery method and device thereof Active CN107357688B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201710630823.6A CN107357688B (en) 2017-07-28 2017-07-28 Distributed system and fault recovery method and device thereof
PCT/CN2018/097262 WO2019020081A1 (en) 2017-07-28 2018-07-26 Distributed system and fault recovery method and apparatus thereof, product, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710630823.6A CN107357688B (en) 2017-07-28 2017-07-28 Distributed system and fault recovery method and device thereof

Publications (2)

Publication Number Publication Date
CN107357688A true CN107357688A (en) 2017-11-17
CN107357688B CN107357688B (en) 2020-06-12

Family

ID=60285161

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710630823.6A Active CN107357688B (en) 2017-07-28 2017-07-28 Distributed system and fault recovery method and device thereof

Country Status (2)

Country Link
CN (1) CN107357688B (en)
WO (1) WO2019020081A1 (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108390771A (en) * 2018-01-25 2018-08-10 中国银联股份有限公司 A kind of network topology method for reconstructing and device
CN108427728A (en) * 2018-02-13 2018-08-21 百度在线网络技术(北京)有限公司 Management method, equipment and the computer-readable medium of metadata
CN109144792A (en) * 2018-10-08 2019-01-04 郑州云海信息技术有限公司 Data reconstruction method, device and system and computer readable storage medium
CN109189480A (en) * 2018-07-02 2019-01-11 新华三技术有限公司成都分公司 File system starts method and device
WO2019020081A1 (en) * 2017-07-28 2019-01-31 广东神马搜索科技有限公司 Distributed system and fault recovery method and apparatus thereof, product, and storage medium
CN109656911A (en) * 2018-12-11 2019-04-19 江苏瑞中数据股份有限公司 Distributed variable-frequencypump Database Systems and its data processing method
CN111104226A (en) * 2019-12-25 2020-05-05 东北大学 Intelligent management system and method for multi-tenant service resources
CN111880969A (en) * 2020-07-30 2020-11-03 上海达梦数据库有限公司 Storage node recovery method, device, equipment and storage medium
CN112379977A (en) * 2020-07-10 2021-02-19 中国航空工业集团公司西安飞行自动控制研究所 Task-level fault processing method based on time triggering

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8401998B2 (en) * 2010-09-02 2013-03-19 Microsoft Corporation Mirroring file data
CN103294701A (en) * 2012-02-24 2013-09-11 联想(北京)有限公司 Distributed file system and data processing method
CN104216802A (en) * 2014-09-25 2014-12-17 北京金山安全软件有限公司 Memory database recovery method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107357688B (en) * 2017-07-28 2020-06-12 广东神马搜索科技有限公司 Distributed system and fault recovery method and device thereof

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8401998B2 (en) * 2010-09-02 2013-03-19 Microsoft Corporation Mirroring file data
CN103294701A (en) * 2012-02-24 2013-09-11 联想(北京)有限公司 Distributed file system and data processing method
CN104216802A (en) * 2014-09-25 2014-12-17 北京金山安全软件有限公司 Memory database recovery method and device

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019020081A1 (en) * 2017-07-28 2019-01-31 广东神马搜索科技有限公司 Distributed system and fault recovery method and apparatus thereof, product, and storage medium
CN108390771A (en) * 2018-01-25 2018-08-10 中国银联股份有限公司 A kind of network topology method for reconstructing and device
CN108390771B (en) * 2018-01-25 2021-04-16 中国银联股份有限公司 Network topology reconstruction method and device
CN108427728A (en) * 2018-02-13 2018-08-21 百度在线网络技术(北京)有限公司 Management method, equipment and the computer-readable medium of metadata
CN109189480A (en) * 2018-07-02 2019-01-11 新华三技术有限公司成都分公司 File system starts method and device
CN109189480B (en) * 2018-07-02 2021-11-09 新华三技术有限公司成都分公司 File system starting method and device
CN109144792A (en) * 2018-10-08 2019-01-04 郑州云海信息技术有限公司 Data reconstruction method, device and system and computer readable storage medium
CN109656911A (en) * 2018-12-11 2019-04-19 江苏瑞中数据股份有限公司 Distributed variable-frequencypump Database Systems and its data processing method
CN111104226A (en) * 2019-12-25 2020-05-05 东北大学 Intelligent management system and method for multi-tenant service resources
CN111104226B (en) * 2019-12-25 2024-01-26 东北大学 Intelligent management system and method for multi-tenant service resources
CN112379977A (en) * 2020-07-10 2021-02-19 中国航空工业集团公司西安飞行自动控制研究所 Task-level fault processing method based on time triggering
CN111880969A (en) * 2020-07-30 2020-11-03 上海达梦数据库有限公司 Storage node recovery method, device, equipment and storage medium

Also Published As

Publication number Publication date
WO2019020081A1 (en) 2019-01-31
CN107357688B (en) 2020-06-12

Similar Documents

Publication Publication Date Title
CN107357688A (en) Distributed system and its fault recovery method and device
WO2019154394A1 (en) Distributed database cluster system, data synchronization method and storage medium
CN109814998A (en) A kind of method and device of multi-process task schedule
US9984140B1 (en) Lease based leader election system
CN106843750B (en) Distributed storage system
US9251233B2 (en) Merging an out of synchronization indicator and a change recording indicator in response to a failure in consistency group formation
CN107426265A (en) The synchronous method and apparatus of data consistency
CN110807064B (en) Data recovery device in RAC distributed database cluster system
CN110377395A (en) A kind of Pod moving method in Kubernetes cluster
CN102158540A (en) System and method for realizing distributed database
CN103207867A (en) Method for processing data blocks, method for initiating recovery operation and nodes
CN104035836A (en) Automatic disaster tolerance recovery method and system in cluster retrieval platform
CN107329859B (en) Data protection method and storage device
US9348841B2 (en) Transaction processing method and system
KR20170042298A (en) Dynamic load-based merging
WO2014080492A1 (en) Computer system, cluster management method, and management computer
CN107451172A (en) Method of data synchronization and equipment for edition management system
CN104793981B (en) A kind of online snapshot management method and device of cluster virtual machine
CN108762982B (en) A kind of database restoring method, apparatus and system
CN110597655A (en) Fast predictive restoration method for coupling migration and erasure code-based reconstruction and implementation
CN109842500B (en) Scheduling method and system, working node and monitoring node
CN109361777A (en) Synchronous method, synchronization system and the relevant apparatus of distributed type assemblies node state
CN110121694B (en) Log management method, server and database system
JP7215971B2 (en) METHOD AND APPARATUS FOR PROCESSING DATA LOCATION IN STORAGE DEVICE, COMPUTER DEVICE AND COMPUTER-READABLE STORAGE MEDIUM
US11533391B2 (en) State replication, allocation and failover in stream processing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20200811

Address after: 310052 room 508, floor 5, building 4, No. 699, Wangshang Road, Changhe street, Binjiang District, Hangzhou City, Zhejiang Province

Patentee after: Alibaba (China) Co.,Ltd.

Address before: 510627 Guangdong city of Guangzhou province Whampoa Tianhe District Road No. 163 Xiping Yun Lu Yun Ping square B radio tower 13 layer self unit 01

Patentee before: Guangdong Shenma Search Technology Co.,Ltd.