CN107357688A - Distributed system and its fault recovery method and device - Google Patents
Distributed system and its fault recovery method and device Download PDFInfo
- Publication number
- CN107357688A CN107357688A CN201710630823.6A CN201710630823A CN107357688A CN 107357688 A CN107357688 A CN 107357688A CN 201710630823 A CN201710630823 A CN 201710630823A CN 107357688 A CN107357688 A CN 107357688A
- Authority
- CN
- China
- Prior art keywords
- host node
- mirror image
- fault recovery
- metadata mirror
- log
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1458—Management of the backup or restore process
- G06F11/1464—Management of the backup or restore process for networked environments
Abstract
The invention discloses a kind of distributed system and its fault recovery method and device.Wherein, slave node and/or host node obtain and keeping records has the metadata mirror image of the schedule information at a certain moment and system mode on host node;Host node obtains and keeping records has redoing log for all operations of host node after the moment;And host node calls metadata mirror image in fault recovery and its corresponding redoes log carry out fault recovery.Thus, when host node breaks down, it is possible to according to the metadata mirror image recorded before and redo log the state before host node quickly to be returned to failure.
Description
Technical field
The present invention relates to distributed technical field, more particularly to a kind of distributed system and its fault recovery method and dress
Put.
Background technology
Distributed system is that more machines are organically combined, connected, and allows its collaboration to accomplish a task, such as count
Calculation task, store tasks.It is built upon the software systems on network.Existing distributed system is client/server mostly,
Fig. 1 is the structural representation for showing the distributed system using client/server.As shown in figure 1, the distributed system of client/server
System is made up of host node (master) and multiple slave nodes (slave) mostly.Host node is adjusted as the center of distributed system
Node is spent, generally has the functions such as metadata storage and inquiry, clustered node condition managing, decision-making and mission dispatching concurrently, by
In the metadata of host node management be data more important in system, the influence of the loss of the data on host node to system compared with
Greatly.
Therefore, it is necessary to a kind of failover (failover) mechanism so that collapsed when host node runs into unknown error
When, the state that can return to host node before mistake occurs avoids the loss of host node data.
The content of the invention
The invention provides a kind of fault recovery scheme for the host node being directed in distributed system, by obtaining host node
The metadata mirror image inscribed in one or more, and in redoing log record host node operation so that host node send out
During raw failure, according to the metadata mirror image recorded before and the shape before host node to be quickly recovered to failure can be redo log
State.
According to an aspect of the invention, there is provided a kind of distributed system, including for scheduler task and management system
The host node of state and multiple slave nodes for running scheduled task, wherein, one or more slave nodes and/or
Host node obtains and keeping records has the metadata mirror image of the schedule information at a certain moment and system mode on host node;Host node
Obtain and keeping records has redoing log for all operations of host node after the moment;And host node calls in fault recovery
Metadata mirror image and its corresponding redo log carry out fault recovery.
Thus, according to the metadata mirror image that records before and before redoing log and host node quickly being returned into failure
State, recovery efficiency can be improved compared with only by way of log file.
Preferably, one or more slave nodes and/or host node are carried out under the triggering of host node and/or external command
The acquisition and preservation operation of metadata mirror image.Thus, it is possible to according to the characteristic of distributed system, different triggering modes is set
Trigger the acquisition of metadata mirror image and preserve operation.
Preferably, host node its operate each time be recorded in redo log in and by storage after just respond subordinate section
The request of point.Therefore ensure that the operation each time for redoing log and being capable of complete documentation host node.
Preferably, one or more slave nodes and/or host node persistently obtain and preserve host node when multiple different
The metadata mirror image at quarter, and host node is persistently obtained and preserved and corresponds respectively to multiple redoing log at different moments.Main section
Point can be called in fault recovery newest metadata mirror image and its it is corresponding redo log carry out fault recovery, and when newest
Metadata mirror image and/or its corresponding to when redoing log unavailable, call metadata mirror image and its corresponding redo log all
The data at available moment recently carry out fault recovery.Thus, by preserving more parts of memory mirrors at different moments and corresponding
Redo log, serious forgiveness during fault recovery can be improved.
Preferably, one or more slave nodes and/or host node directly obtain and preserve host node at a time
Internal storage state is as metadata mirror image.Metadata mirror image can be stored according to grouped task.Thus, subsequently recovering
When can according to packet efficiently tissue corresponding to metadata mirror image.
According to another aspect of the present invention, a kind of local fault recovery device of distributed system, distribution system are additionally provided
System includes the host node for being used for scheduler task and management system state and multiple slave nodes for operation task, and the device is used
In carrying out fault recovery when host node breaks down, and including:Mirror image acquiring unit, there is master for obtaining simultaneously keeping records
The metadata mirror image of the schedule information at a certain moment and system mode on node;Acquiring unit is redo log, for obtaining and protecting
Depositing record has redoing log for all operations of host node after the moment;And fault recovery unit, for being adjusted in fault recovery
Fault recovery is carried out with metadata mirror image and its corresponding redo log.
Preferably, mirror image acquiring unit carries out metadata mirror image under the triggering of host node, device and/or external command
Obtain and preserve operation.
Preferably, host node its operate each time redo log acquiring unit be recorded in redo log in and store it
The request of slave node is just responded afterwards.
Preferably, mirror image acquiring unit persistently obtains and preserves host node in multiple metadata mirror images at different moments, and
And redo log acquiring unit and persistently obtain and preserve and correspond respectively to multiple redoing log at different moments.
Preferably, fault recovery unit calls newest metadata mirror image in fault recovery and its corresponding redo log
Carry out fault recovery.
Preferably, fault recovery unit is adjusted when redoing log unavailable corresponding to newest metadata mirror image and/or its
Fault recovery is carried out with metadata mirror image and its corresponding data for redoing log all available moment recently.
Preferably, mirror image acquiring unit directly obtains and preserves the internal storage state of host node at a time as metadata
Mirror image.
Preferably, mirror image acquiring unit stores according to grouped task to metadata mirror image.
According to a further aspect of the invention, a kind of fault recovery method of distributed system, distribution system are additionally provided
System includes multiple slave nodes for operation task, and this method includes:The scheduling that obtaining simultaneously keeping records has a certain moment is believed
The metadata mirror image of breath and system mode;Obtain and keeping records has redoing log for all scheduling operations after the moment;And
Metadata mirror image is called in fault recovery and its corresponding redoes log carry out fault recovery.
Preferably, persistently obtain and preserve the host node in multiple metadata mirror images at different moments, and persistently obtain
Take and preserve and correspond respectively to the multiple redoing log at different moments.
Preferably, in fault recovery call metadata mirror image and its it is corresponding redo log carry out fault recovery can wrap
Include:Newest metadata mirror image is called in fault recovery and its corresponding redoes log carry out fault recovery;And newest
Metadata mirror image and/or its corresponding to when redoing log unavailable, call metadata mirror image and its corresponding redo log all
The data at available moment recently carry out fault recovery.
Preferably, it can directly obtain and preserve the internal storage state of host node at a time as metadata mirror image.
The distributed system and its fault recovery method and device of the present invention, by obtaining host node in one or more
The metadata mirror image inscribed, and in redoing log record host node subsequent operation so that, can when host node breaks down
With the metadata mirror image of record before and redo log the state before host node to be quickly recovered to failure.
Brief description of the drawings
Disclosure illustrative embodiments are described in more detail in conjunction with the accompanying drawings, the disclosure above-mentioned and its
Its purpose, feature and advantage will be apparent, wherein, in disclosure illustrative embodiments, identical reference number
Typically represent same parts.
Fig. 1 is the configuration diagram for the distributed system for showing client/server.
Fig. 2 is the indicative flowchart for showing fault recovery method according to an embodiment of the invention.
Fig. 3 is to show the schematic diagram for continuously preserving multiple metadata mirror images and redoing log.
Fig. 4 is the schematic block diagram for the structure for showing local fault recovery device according to an embodiment of the invention.
Embodiment
The preferred embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although the disclosure is shown in accompanying drawing
Preferred embodiment, however, it is to be appreciated that may be realized in various forms the disclosure without the embodiment party that should be illustrated here
Formula is limited.On the contrary, these embodiments are provided so that the disclosure is more thorough and complete, and can be by the disclosure
Scope is intactly communicated to those skilled in the art.
For the distributed system of the client/server shown in Fig. 1, because host node stores system normal operation and scheduling
Necessary data, such as system state data and current scheduling data, therefore influence of the loss of its data to system is very big.
Therefore, it is necessary to a kind of recovery mechanism so that when host node runs into unknown error, host node can be returned to one surely
Fixed reliable state.It this is directed to, can record the journal file of all operations of host node, journal file can be persistence
It is stored on disk.Once so host node breaks down, even if all data in the internal memory of loss host node, are opened when next time
When dynamic, the journal file recorded by reappearing (replay), the state that host node can still returned to before failure.
The operating process of host node is as follows under the program:Before host node performs operation every time, the operation note can be arrived
In journal file, the operation is performed after recording successfully again, you can to update the data in internal memory based on the operation;When breaking down
Recovery flow it is as follows:Journal file is read, the operation based on the host node recorded in journal file is changed in internal memory successively
Data.It is this only to be realized simply by the reset mode for the journal file for recording write operation, but it recovers pole the time required to flow
It is long.
, can be with during the journal file of the operation of record host node therefore, inventor has found after extensive studies it
The image file of internal storage data under obtaining host node interspersedly at a time, image file can characterize host node corresponding
When the current status data inscribed, so when host node breaks down, nearest image file and daily record text can be called
In part at the time of corresponding to the image file called after the operation that records, main section is realized according to the data can of calling
The recovery of point, the time required to can significantly shortening recovery compared with only by way of log file.
Based on above-mentioned design, the present invention proposes a kind of fault recovery scheme for the host node being directed in distributed system,
The present invention fault recovery scheme can be as shown in Figure 1 distributed system realize.As shown in figure 1, the distributed system of the present invention
System can include the host node for being used for scheduler task and management system state and multiple subordinates for running scheduled task
Node.Host node and slave node can be disposed in the server, and host node can be deployed in different from slave node
An independent server in, can also be deployed in one of slave node in same server.As preferred reality
Example is applied, different nodes can be deployed in different servers.Distributed system shown in Fig. 1 is by a host node and multiple
Slave node is formed, it should be appreciated that distributed system of the invention can also include multiple host nodes, and can also include removing
Other devices outside host node, slave node, such as backup host node, disaster recovery data storehouse etc..
Just the distributed system of the present invention realizes that the idiographic flow of fault recovery scheme is described in detail below.Fig. 2 is
Show the indicative flowchart of fault recovery method according to an embodiment of the invention.Wherein, the method shown in Fig. 2 can be by
Distributed system shown in Fig. 1 is realized, specifically, can be realized by the host node in distributed system.
Referring to Fig. 2, in step S210, obtain and keeping records has the schedule information at a certain moment and system shape on host node
The metadata mirror image of state.
For the distributed system of client/server, after host node collapse, it can cause whole distributed system can not
With, therefore do not run specific tasks directly generally in view of the importance of host node, host node, but be merely responsible for remaining distributed
The operation of system and the dispatching distribution of task, specific tasks can be performed by slave node.That is, host node is mainly responsible for
Task requests are parsed, distribute resource, target data or node, the subordinate that specific tasks are specified by host node are positioned according to metadata
Node performs.Wherein, metadata is the data for describing data, the present invention in metadata refer in particular to host node be responsible for preserve and
The data of management.Because host node is used for scheduler task and management system state, therefore, metadata can refer to record host node
The schedule information at upper a certain moment and the data of system mode.Such as Hadoop distributed systems, metadata can be with
It is system associated description data, system state data, current task scheduling and status data etc., then is for example deposited for distribution
For storage system, metadata can be the data for the status information (such as storage location) for describing user data.
The metadata mirror image of the host node got at a time can be internal storage state of the host node at the moment
One mapping, therefore can directly obtain and preserve the internal storage state of host node at a time as metadata mirror image.Specifically
In realization, host node can be obtained by modes such as Snapshot (disk snapshot), dump (backup file system) in certain a period of time
The metadata mirror image at quarter.
Obtaining the operation of metadata mirror image can be performed by host node, can also be performed by one or more slave nodes,
It can also be performed by the backup host node in distributed system.Acquired metadata mirror image can with persistence be stored in local
In disk or distributed file system, such as can with persistence it be stored in disaster recovery data storehouse.
As the alternative embodiment of the present invention, host node can concurrently be adjusted in scheduler task according to packet
Degree, now acquired metadata mirror image can be the metadata mirror image under multiple packets, therefore, for acquired metadata
Mirror image can be stored according to grouped task, will belong to the metadata mirrored storage of same grouped task under same catalogue,
Thus metadata mirror image corresponding to can efficiently being organized according to packet in follow-up recover.
In step S220, simultaneously keeping records can be obtained by host node the weight of all operations of host node after the moment
Do daily record.Operation described herein can refer to the operation that host node performs to metadata, or host node to its internal memory number
According to the operation of execution.
The each operation performed for host node, can be recorded in and redo log in (redo log).Redo log
In can sequentially record the operation information of host node.The each operation that will be performed for host node, can be in the operation
It is recorded in redoing log and after persistence preservation, this is just performed by host node and operated.So that in the operation implementation procedure
, can be according to redoing log the data recovery of the middle record operation during error of middle host node.Else if for a certain operation first
Execution re-records, and when being malfunctioned in the operation implementation procedure or before the operation note, preservation, then can not recover this operation,
It can only redo.
For example, when slave node is to host node request task (such as calculating task, store tasks), host node can be first
This operation note of target data will be issued to slave node in redoing log, after record and persistence preserve successfully,
Target data just in response to the request of slave node, is sent to slave node by host node.In other words, for slave node
Request, operation note in redo log and after store (persistent storage) that can be in host node for the request,
Respond the request of slave node.
In step S230, metadata mirror image is called in fault recovery and its corresponding redoes log carry out fault recovery.
As described above, metadata mirror image can be considered as the mapping of the internal storage state of host node at a time, and reform
Log recording all operations of host node.Therefore, when host node breaks down, acquired in before being occurred according to failure
Metadata mirror image and redo log middle record at the time of metadata mirror image corresponds to after host node failure occur before this
The operation of host node in the section time, carries out fault recovery, and host node is returned into the state before failure occurs.To redo log note
Record can recover as follows exemplified by file system:After host node restarting, the member first in traversal file system
Data image catalogue, recent metadata mirror image is found, is loaded into internal memory, then start to load newest metadata mirror
Redoing log as after, and start to reset (replay), so after loading is completed, whole recovery process just completes.
As the alternative embodiment of the present invention, when preserving the metadata mirror image of host node, it is multiple right to preserve
The metadata mirror images of Ying Yu at different moments.During record is redo log, it can be periodically or in response to meet to make a reservation for
Trigger condition, perform the acquisition operation of metadata mirror image.Above-mentioned trigger condition can be that for example some parameter meets in advance
Definite value, predetermined space is reached, or correspond directly to the trigger command of outside.For example, it may be often recorded in redoing log
Predetermined quantity operates, and the acquisition for being carried out a metadata mirror image operates or performed at predetermined time intervals a dimension
Acquisition operation of data image etc..
Further, the operation note of host node when in redoing log, can be continued acquisition correspond respectively to it is more
Individual (i.e. multiple metadata mirror images) at different moments is redo log.Fig. 3 is to show the lasting multiple metadata image files of preservation
And its corresponding principle schematic redo log.
Referring to Fig. 3, the metadata mirror image 1 of t1 moment host nodes, operation of the host node between t1-t2 can be obtained first
It can record in being stored in and redoing log 1, the metadata mirror image 2 of host node can be obtained again at the t2 moment, host node is in t2-
Operation between t3, which can record, to be stored in and redoes log 2, by that analogy, can obtain corresponding respectively to t1, t2, t3 moment
Metadata mirror image, and correspond respectively to redoing log for metadata mirror image at different moments.
This reason it is assumed that host node collapses at the t4 moment, in fault recovery, host node can call newest first
Metadata mirror image (i.e. the metadata mirror image at t3 moment) and its it is corresponding redo log (redoing log in t3-t4 sections) carry out therefore
Barrier recovers.If newest metadata mirror image and redoing log unavailable, then time new metadata mirror can be further called (i.e.
The metadata mirror image at t2 moment) and (i.e. redoing log in t2-t3 sections) progress fault recovery is redo log, by that analogy, can
With by constantly pushing back until obtaining available data file.Thus, by preserving more parts of memory mirrors at different moments and right
That answers redoes log, and can improve serious forgiveness during fault recovery.
In other words, the scheme of the application can with the acquisition of certain condition or command triggers to metadata mirror image and
Storage (for example, preserving the state at t3 moment), even if with lasting record (that is, the institute after record t3 started to redoing log
There is operation).After the t4 moment breaks down, can play back all operations after t3 by recovering the state at t3 moment again makes
Obtain the state that host node is quickly returning to the t4 moment.
When obtaining the metadata mirror image of host node at a time, such as shown in figure 3, obtain metadata at the t1 moment
During mirror image 1, tend not to stop the service of host node, and obtaining metadata mirror image 1 needs the regular hour, therefore t1 moment institutes
The metadata mirror image 1 of acquisition has been likely to contain the certain operations redo log after the t1 moment in 1, therefore in t2 moment main section
Point is when breaking down, using the t1 moment metadata mirror image 1 and it is corresponding redo log 1 and recovered when, it is likely that it is last
State before the state of the host node of recovery and recovery is inconsistent.
Therefore, an alternative embodiment as the present invention, can during the metadata mirror image at a certain moment is obtained
To record the time for the operation for now redoing log middle record in real time, after metadata mirror image acquisition at a time,
Can from redoing log the corresponding operation of middle removal, include hereafter redoing log with the metadata mirror image for avoiding obtaining recorded in
Some operations phenomenon so that metadata mirror image can with its corresponding to redo log strict control in time.
So far the fault recovery method of the present invention is described in detail in combined Fig. 2-3.In addition, the fault recovery side of the present invention
Case can also be realized by a kind of local fault recovery device.Fig. 4 shows local fault recovery device according to an embodiment of the invention
Structured flowchart.Wherein, the functional module of local fault recovery device 400 can be by realizing the hardware, software or hardware of the principle of the invention
Combination with software is realized.It will be appreciated by persons skilled in the art that the functional module described by Fig. 4 can combine
Or submodule is divided into, so as to realize the principle of foregoing invention.Therefore, description herein can be supported to work(described herein
Any possible combination or division of energy module or further restriction.
Local fault recovery device 400 shown in Fig. 4 can be used for realizing the fault recovery method shown in Fig. 2, below only just event
The operation that the functional module and each functional module that barrier recovery device 400 can have can perform is described briefly, for it
In the detail section that is related to may refer to description above in association with Fig. 2, repeat no more here.It should be noted that fault recovery
Device 400 can be host node in itself or backup host node.
As shown in figure 4, the local fault recovery device of the present invention can include mirror image acquiring unit 410, redo log acquisition list
Member 420 and fault recovery unit 430.Mirror image acquiring unit 410 can obtain and keeping records has a certain moment on host node
Schedule information and system mode metadata mirror image, redo log that acquiring unit 420 can obtain and keeping records has the moment
All operations of host node afterwards are redo log, fault recovery unit 430 can be called in fault recovery metadata mirror image and
Carry out fault recovery is redo log corresponding to it.
Preferably, mirror image acquiring unit 410 can carry out first number under the triggering of host node, device and/or external command
According to the acquisition and preservation operation of mirror image.Mirror image acquiring unit 410 can directly obtain and preserve host node at a time interior
State is deposited as metadata mirror image.Further, mirror image acquiring unit 410 can be carried out according to grouped task to metadata mirror image
Storage.
Preferably, host node its operate each time redo log acquiring unit 420 be recorded in redo log in and deposit
The new request of slave node is just responded after storage.
Preferably, mirror image acquiring unit 410 persistently obtains and preserves host node in multiple metadata mirror images at different moments,
And redo log acquiring unit 420 and persistently obtain and preserve and correspond respectively to multiple redoing log at different moments.Now, therefore
Barrier recovery unit 430 called in fault recovery newest metadata mirror image and its it is corresponding redo log carry out fault recovery,
Fault recovery unit 430 can call metadata when redoing log unavailable corresponding to newest metadata mirror image and/or its
Mirror image and its corresponding data for redoing log all available moment recently carry out fault recovery.
Above by reference to accompanying drawing be described in detail according to the present invention distributed system and its fault recovery method and
Device.
In addition, the method according to the invention is also implemented as a kind of computer program or computer program product, the meter
The calculating of the above steps limited in the above method that calculation machine program or computer program product include being used to perform the present invention
Machine code instructions.
Or the present invention can also be embodied as a kind of (or the computer-readable storage of non-transitory machinable medium
Medium or machinable medium), executable code (or computer program or computer instruction code) is stored thereon with,
When the executable code (or computer program or computer instruction code) is by electronic equipment (or computing device, server
Deng) computing device when, make the computing device according to the present invention the above method each step.
Those skilled in the art will also understand is that, the various illustrative logical blocks with reference to described by disclosure herein, mould
Block, circuit and algorithm steps may be implemented as the combination of electronic hardware, computer software or both.
Flow chart and block diagram in accompanying drawing show that the possibility of the system and method for multiple embodiments according to the present invention is real
Existing architectural framework, function and operation.At this point, each square frame in flow chart or block diagram can represent module, a journey
A part for sequence section or code, a part for the module, program segment or code is comprising one or more defined for realizing
The executable instruction of logic function.It should also be noted that at some as in the realization replaced, the function of being marked in square frame also may be used
With with different from the order marked in accompanying drawing generation.For example, two continuous square frames can essentially perform substantially in parallel,
They can also be performed in the opposite order sometimes, and this is depending on involved function.It is also noted that block diagram and/or stream
The combination of each square frame and block diagram in journey figure and/or the square frame in flow chart, function or operation as defined in performing can be used
Special hardware based system realize, or can be realized with the combination of specialized hardware and computer instruction.
It is described above various embodiments of the present invention, described above is exemplary, and non-exclusive, and
It is not limited to disclosed each embodiment.In the case of without departing from the scope and spirit of illustrated each embodiment, for this skill
Many modifications and changes will be apparent from for the those of ordinary skill in art field.The selection of term used herein, purport
The principle of each embodiment, practical application or improvement to the technology in market are best being explained, or is making the art
Other those of ordinary skill are understood that each embodiment disclosed herein.
Claims (20)
1. a kind of distributed system, including host node for scheduler task and management system state and scheduled for running
Multiple slave nodes of task, wherein,
One or more slave nodes and/or the host node obtain and keeping records has a certain moment on the host node
Schedule information and system mode metadata mirror image;
The host node obtains and keeping records has redoing log for all operations of host node after the moment;And
The host node calls the metadata mirror image in fault recovery and its corresponding redoes log carry out fault recovery.
2. distributed system as claimed in claim 1, wherein, one or more slave nodes and/or the host node
The acquisition of the metadata mirror image is carried out under the triggering of the host node and/or external command and preserves operation.
3. distributed system as claimed in claim 1, wherein, the host node its operate each time be recorded in it is described heavy
Do in daily record and the request of the slave node is just responded after storage.
4. distributed system as claimed in claim 1, wherein, one or more slave nodes and/or the host node
Persistently obtain and preserve the host node in multiple metadata mirror images at different moments, and
The host node, which is persistently obtained and preserved, corresponds respectively to the multiple redoing log at different moments.
5. distributed system as claimed in claim 4, wherein, the host node calls the newest member in fault recovery
Data image and its corresponding redo log carry out fault recovery.
6. distributed system as claimed in claim 4, wherein, the host node is in newest metadata mirror image and/or its is right
Answer when redoing log unavailable, call metadata mirror image and its corresponding data for redoing log all available moment recently to enter
Row fault recovery.
7. distributed system as claimed in claim 1, wherein, one or more slave nodes and/or the host node
Directly obtain and preserve the internal storage state of the host node at a time as the metadata mirror image.
8. distributed system as claimed in claim 1, wherein, the metadata mirror image is stored according to grouped task
's.
9. a kind of local fault recovery device of distributed system, the distributed system includes being used for scheduler task and management system shape
The host node of state and multiple slave nodes for operation task, the device are used to carry out event when the host node breaks down
Barrier recovers, and including:
Mirror image acquiring unit, there are the schedule information and system mode at a certain moment on the host node for obtaining simultaneously keeping records
Metadata mirror image;
Acquiring unit is redo log, for obtaining and keeping records has reforming for all operations of host node after the moment
Daily record;And
Fault recovery unit, for calling the metadata mirror image in fault recovery and its corresponding redoing log carry out failure
Recover.
10. device as claimed in claim 9, wherein, the mirror image acquiring unit the host node, described device and/or
The acquisition of the metadata mirror image is carried out under the triggering of external command and preserves operation.
11. device as claimed in claim 9, wherein, the host node operates and redoes log acquisition by described each time at it
Unit record just responds the request of the slave node in described redo log and after storing.
12. device as claimed in claim 9, wherein, the mirror image acquiring unit, which persistently obtains and preserves the host node, to exist
Multiple metadata mirror images at different moments, and
It is described redo log acquiring unit and persistently obtain and preserve correspond respectively to the multiple redoing log at different moments.
13. device as claimed in claim 12, wherein, the fault recovery unit called in fault recovery it is newest described in
Metadata mirror image and its corresponding redo log carry out fault recovery.
14. device as claimed in claim 12, wherein, the fault recovery unit newest metadata mirror image and/or its
It is corresponding when redoing log unavailable, call metadata mirror image and its corresponding data for redoing log all available moment recently
Carry out fault recovery.
15. device as claimed in claim 9, wherein, the mirror image acquiring unit, which directly obtains and preserves the host node, to exist
The internal storage state at a certain moment is as the metadata mirror image.
16. device as claimed in claim 9, wherein, the mirror image acquiring unit is according to grouped task to the metadata mirror
As being stored.
17. a kind of fault recovery method of distributed system, the distributed system includes multiple subordinates for operation task
Node, this method include:
Obtain and keeping records has the schedule information at a certain moment and the metadata mirror image of system mode;
Obtain and keeping records has redoing log for all scheduling operations after the moment;And
The metadata mirror image is called in fault recovery and its corresponding redoes log carry out fault recovery.
18. method as claimed in claim 17, wherein,
Persistently obtain and preserve host node in multiple metadata mirror images at different moments, and
Persistently obtain and preserve and correspond respectively to the multiple redoing log at different moments.
19. method as claimed in claim 18, wherein, the metadata mirror image and its corresponding weight are called in fault recovery
Doing daily record progress fault recovery includes:
The newest metadata mirror image is called in fault recovery and its corresponding redoes log carry out fault recovery;And
When redoing log unavailable corresponding to newest metadata mirror image and/or its, metadata mirror image and its corresponding is called
The data for redoing log all available moment recently carry out fault recovery.
20. method as claimed in claim 17, wherein, directly obtain and preserve the internal storage state of host node at a time and make
For the metadata mirror image.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710630823.6A CN107357688B (en) | 2017-07-28 | 2017-07-28 | Distributed system and fault recovery method and device thereof |
PCT/CN2018/097262 WO2019020081A1 (en) | 2017-07-28 | 2018-07-26 | Distributed system and fault recovery method and apparatus thereof, product, and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710630823.6A CN107357688B (en) | 2017-07-28 | 2017-07-28 | Distributed system and fault recovery method and device thereof |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107357688A true CN107357688A (en) | 2017-11-17 |
CN107357688B CN107357688B (en) | 2020-06-12 |
Family
ID=60285161
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710630823.6A Active CN107357688B (en) | 2017-07-28 | 2017-07-28 | Distributed system and fault recovery method and device thereof |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN107357688B (en) |
WO (1) | WO2019020081A1 (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108390771A (en) * | 2018-01-25 | 2018-08-10 | 中国银联股份有限公司 | A kind of network topology method for reconstructing and device |
CN108427728A (en) * | 2018-02-13 | 2018-08-21 | 百度在线网络技术(北京)有限公司 | Management method, equipment and the computer-readable medium of metadata |
CN109144792A (en) * | 2018-10-08 | 2019-01-04 | 郑州云海信息技术有限公司 | Data reconstruction method, device and system and computer readable storage medium |
CN109189480A (en) * | 2018-07-02 | 2019-01-11 | 新华三技术有限公司成都分公司 | File system starts method and device |
WO2019020081A1 (en) * | 2017-07-28 | 2019-01-31 | 广东神马搜索科技有限公司 | Distributed system and fault recovery method and apparatus thereof, product, and storage medium |
CN109656911A (en) * | 2018-12-11 | 2019-04-19 | 江苏瑞中数据股份有限公司 | Distributed variable-frequencypump Database Systems and its data processing method |
CN111104226A (en) * | 2019-12-25 | 2020-05-05 | 东北大学 | Intelligent management system and method for multi-tenant service resources |
CN111880969A (en) * | 2020-07-30 | 2020-11-03 | 上海达梦数据库有限公司 | Storage node recovery method, device, equipment and storage medium |
CN112379977A (en) * | 2020-07-10 | 2021-02-19 | 中国航空工业集团公司西安飞行自动控制研究所 | Task-level fault processing method based on time triggering |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8401998B2 (en) * | 2010-09-02 | 2013-03-19 | Microsoft Corporation | Mirroring file data |
CN103294701A (en) * | 2012-02-24 | 2013-09-11 | 联想(北京)有限公司 | Distributed file system and data processing method |
CN104216802A (en) * | 2014-09-25 | 2014-12-17 | 北京金山安全软件有限公司 | Memory database recovery method and device |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107357688B (en) * | 2017-07-28 | 2020-06-12 | 广东神马搜索科技有限公司 | Distributed system and fault recovery method and device thereof |
-
2017
- 2017-07-28 CN CN201710630823.6A patent/CN107357688B/en active Active
-
2018
- 2018-07-26 WO PCT/CN2018/097262 patent/WO2019020081A1/en active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8401998B2 (en) * | 2010-09-02 | 2013-03-19 | Microsoft Corporation | Mirroring file data |
CN103294701A (en) * | 2012-02-24 | 2013-09-11 | 联想(北京)有限公司 | Distributed file system and data processing method |
CN104216802A (en) * | 2014-09-25 | 2014-12-17 | 北京金山安全软件有限公司 | Memory database recovery method and device |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019020081A1 (en) * | 2017-07-28 | 2019-01-31 | 广东神马搜索科技有限公司 | Distributed system and fault recovery method and apparatus thereof, product, and storage medium |
CN108390771A (en) * | 2018-01-25 | 2018-08-10 | 中国银联股份有限公司 | A kind of network topology method for reconstructing and device |
CN108390771B (en) * | 2018-01-25 | 2021-04-16 | 中国银联股份有限公司 | Network topology reconstruction method and device |
CN108427728A (en) * | 2018-02-13 | 2018-08-21 | 百度在线网络技术(北京)有限公司 | Management method, equipment and the computer-readable medium of metadata |
CN109189480A (en) * | 2018-07-02 | 2019-01-11 | 新华三技术有限公司成都分公司 | File system starts method and device |
CN109189480B (en) * | 2018-07-02 | 2021-11-09 | 新华三技术有限公司成都分公司 | File system starting method and device |
CN109144792A (en) * | 2018-10-08 | 2019-01-04 | 郑州云海信息技术有限公司 | Data reconstruction method, device and system and computer readable storage medium |
CN109656911A (en) * | 2018-12-11 | 2019-04-19 | 江苏瑞中数据股份有限公司 | Distributed variable-frequencypump Database Systems and its data processing method |
CN111104226A (en) * | 2019-12-25 | 2020-05-05 | 东北大学 | Intelligent management system and method for multi-tenant service resources |
CN111104226B (en) * | 2019-12-25 | 2024-01-26 | 东北大学 | Intelligent management system and method for multi-tenant service resources |
CN112379977A (en) * | 2020-07-10 | 2021-02-19 | 中国航空工业集团公司西安飞行自动控制研究所 | Task-level fault processing method based on time triggering |
CN111880969A (en) * | 2020-07-30 | 2020-11-03 | 上海达梦数据库有限公司 | Storage node recovery method, device, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
WO2019020081A1 (en) | 2019-01-31 |
CN107357688B (en) | 2020-06-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107357688A (en) | Distributed system and its fault recovery method and device | |
WO2019154394A1 (en) | Distributed database cluster system, data synchronization method and storage medium | |
CN109814998A (en) | A kind of method and device of multi-process task schedule | |
US9984140B1 (en) | Lease based leader election system | |
CN106843750B (en) | Distributed storage system | |
US9251233B2 (en) | Merging an out of synchronization indicator and a change recording indicator in response to a failure in consistency group formation | |
CN107426265A (en) | The synchronous method and apparatus of data consistency | |
CN110807064B (en) | Data recovery device in RAC distributed database cluster system | |
CN110377395A (en) | A kind of Pod moving method in Kubernetes cluster | |
CN102158540A (en) | System and method for realizing distributed database | |
CN103207867A (en) | Method for processing data blocks, method for initiating recovery operation and nodes | |
CN104035836A (en) | Automatic disaster tolerance recovery method and system in cluster retrieval platform | |
CN107329859B (en) | Data protection method and storage device | |
US9348841B2 (en) | Transaction processing method and system | |
KR20170042298A (en) | Dynamic load-based merging | |
WO2014080492A1 (en) | Computer system, cluster management method, and management computer | |
CN107451172A (en) | Method of data synchronization and equipment for edition management system | |
CN104793981B (en) | A kind of online snapshot management method and device of cluster virtual machine | |
CN108762982B (en) | A kind of database restoring method, apparatus and system | |
CN110597655A (en) | Fast predictive restoration method for coupling migration and erasure code-based reconstruction and implementation | |
CN109842500B (en) | Scheduling method and system, working node and monitoring node | |
CN109361777A (en) | Synchronous method, synchronization system and the relevant apparatus of distributed type assemblies node state | |
CN110121694B (en) | Log management method, server and database system | |
JP7215971B2 (en) | METHOD AND APPARATUS FOR PROCESSING DATA LOCATION IN STORAGE DEVICE, COMPUTER DEVICE AND COMPUTER-READABLE STORAGE MEDIUM | |
US11533391B2 (en) | State replication, allocation and failover in stream processing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20200811 Address after: 310052 room 508, floor 5, building 4, No. 699, Wangshang Road, Changhe street, Binjiang District, Hangzhou City, Zhejiang Province Patentee after: Alibaba (China) Co.,Ltd. Address before: 510627 Guangdong city of Guangzhou province Whampoa Tianhe District Road No. 163 Xiping Yun Lu Yun Ping square B radio tower 13 layer self unit 01 Patentee before: Guangdong Shenma Search Technology Co.,Ltd. |