CN102841840B - The message logging restoration methods that Effect-based operation reorders and message number is checked - Google Patents

The message logging restoration methods that Effect-based operation reorders and message number is checked Download PDF

Info

Publication number
CN102841840B
CN102841840B CN201210239710.0A CN201210239710A CN102841840B CN 102841840 B CN102841840 B CN 102841840B CN 201210239710 A CN201210239710 A CN 201210239710A CN 102841840 B CN102841840 B CN 102841840B
Authority
CN
China
Prior art keywords
message
messages
lsn
recovery
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201210239710.0A
Other languages
Chinese (zh)
Other versions
CN102841840A (en
Inventor
高胜法
蔡静
冯振
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Original Assignee
Shandong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University filed Critical Shandong University
Priority to CN201210239710.0A priority Critical patent/CN102841840B/en
Publication of CN102841840A publication Critical patent/CN102841840A/en
Application granted granted Critical
Publication of CN102841840B publication Critical patent/CN102841840B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a kind of Effect-based operation to reorder and message number inspection message logging restoration methods.The present invention adopts message rearrangement sequence method, transmission process send message time with improve this message of logical timer indirect labelling reception order and this order is kept at transmission process this locality store in.When message sink process failures, first obtain from transmission process under recovering process controls and preserved and be not saved to the message of journal file and the logical timer of message, then according to the logical timer of message, the message not being saved to journal file is resequenced.Finally the message after sequence is resend to crashed process, crashed process is receipt message, processing messages again, thus realizes the recurrence of message.Thus the recovery algorithms both improved when the runnability of system process when non-fault in turn simplify process failures.

Description

Message log recovery method based on message reordering and message number inspection
Technical Field
The invention relates to a distributed system, in particular to a message log recovery method based on message reordering and message number inspection.
Background
Message log based recovery protocols rely on a piecewise deterministic assumption (PWD). According to this assumption, the execution of a process is divided into several state intervals (state intervals), each state interval starting with the execution of an indeterminate event as the beginning of this interval, followed by the execution of several determinate events. According to the PWD assumption, the message receive event is an uncertainty event, while the message send event and the internal events of the process are deterministic events; therefore, a state interval is often started with the execution of a message receive event followed by the execution of several process internal events and message send events.
If a process' state interval depends on an indeterminate event (e.g., a message receipt event) and the event cannot be regenerated in the recovery process, then the process is called an orphan process. Under the assumption of segment certainty (PWD), if process p receives message miPost-send message mjTo process q, process q receives mjThe latter state interval depends on the process p receiving miThe latter state interval. If process p is sending message mjLater not transmitting the received message miThe necessary information is saved to a log file, and then p fails, miIs not recoverable in the recovery process; m dependent on p processiThe process q receiving the event becomes an orphan process. All log recovery protocols require that the global state of the system cannot contain any orphan processes when the system rollback (rollback) recovers.
Traditional pessimistic and optimistic message log failover protocols have to balance between two conflicting objectives: or the process saves the necessary information to the log file and the communication between the processes is asynchronously carried out to improve the performance of the process without failure, in which case there may be an orphan process; or save all necessary information when a process is running without a fault to quickly recover a faulty process when a fault occurs, in which case there are no orphan processes.
The necessary information of a message can be represented as a quadruple < m.source, m.ssn, m.dest, m.rsn >, wherein m.source represents the sending process id of the message m, m.ssn represents the sending order of the message m, m.dest represents the receiving process id of the message m, and m.rsn represents the receiving order of the message m.
Currently, two main message logging protocols exist in distributed systems, namely optimistic message logging protocol and pessimistic message logging protocol.
Optimistic message logs, processes holding information necessary for messages to log files and communications between processes are asynchronous, and orphan processes may exist in the system. In such protocolsAny process p saves the received message miNecessary information of (A), (B)<m.source,m.ssn,m.dest,m.rsn>) Allowing message m to be sent until log filejTo other processes q. When process p fails, because of the received message miMay not be saved to the log file and p may have sent message mjTo q, so process q may become an orphan process. Under an optimistic log protocol, the process does not need to maintain the synchronism of the communication between the log file and the process and the necessary information of the message, so the process has good performance when running without faults; but requires a complex recovery plan to eliminate orphan processes when failing back.
Pessimistic message logs, the saving and sending of messages is synchronized, allowing a process to send a message only after the necessary information (< m.source, m.ssn, m.dest, m.rsn >) for all committed messages is written to the solid memory. At the time of fault recovery, because the pessimistic message log does not necessarily have an orphan process, the fault process only needs to reprocess the previously processed and saved message, thereby realizing the recovery of the process state. However, when the process runs without faults, the storage and sending synchronization of necessary information of the message needs to be maintained, so that the performance of the system when the process runs without faults is necessarily greatly reduced.
In the optimistic message log protocol, since the necessary information of messages < m.source, m.ssn, m.dest, > may fail to be saved to the log file when a process fails, the reception order of messages m.rsn under the conventional message protocol is unrecoverable, and thus the processes of the system have to rollback to resend and receive these unsaved messages. The optimistic message log may result in loss of message receipt order in the message receiving process due to asynchrony of message save to log file and process communication.
Compared with the existing message log recovery method:
different message log recovery protocols have different performance evaluation indicators, and the following five indicators can be used to evaluate the performance of one recovery protocol:
1. ckpt, number of required checkpoints per process.
2. Add, an extra amount of information carried by an application message.
3. Num, number of system messages that need to be exchanged to recover each failed process.
4. Rol, the backoff distance of the process.
5. Roll, number of processes to rollback during recovery.
The message log recovery protocol proposed in this application based on message number checksum message reordering is denoted MNCMR for simplicity. In the MNCMR protocol, each process only needs to asynchronously save one checkpoint, so n.ckpt is equals 1. The data carried by each application message is j and LCjAdd is therefore 2. Num is 2n +2w for the amount of information that needs to be exchanged for each failure; where n is the total number of processes in the system and w is the number of messages that failed to be saved to the message log file due to the failure. When one or more processes fail, only the failed process rolls back, and the non-failed process continues to execute, so the MNCMR protocol has the smallest dis. The number of processes needing to be rolled back by the MNCMR protocol during recovery is equal to the total number of failed processes, and the index of the processes is the same as that of the pessimistic protocol. Besides, the MNCMR protocol has the advantages of both pessimistic and optimistic protocols: when the process is executed without faults, each process saves the message to the log file like an optimistic protocol and is asynchronously communicated with the process, so that the process has good performance when the process is executed without faults; in the process fault recovery stage, the recovery algorithm is simple, so that the method has the advantage of simple recovery algorithm of a pessimistic protocol. Furthermore, compared with the existing protocol, the fault-free process in the MNCMR protocol does not back off or stop waiting when the process fails, but continues to execute, and the characteristic is similar to a forward recovery algorithm, so that the process in the system has higher operation efficiency.
Since the eighties of the last century, a large number of message log recovery protocolsPublished in journal magazines at home and abroad, and several typical protocols are selected below to be compared with the MNCMR protocol. Sistla and Welch [1 ]]Two message-based optimistic log recovery protocols are proposed, one protocol carrying a transfer dependency vector (denoted prasad.1) for the transmitted messages, and the other protocol carrying only the current state interval value (denoted prasad.2) for the transmitting process for the transmitted application messages. The extra information amount needed by each application message in the Prasad.1 protocol is o (n), and the exchange of o (n) is needed for each failure2) The system message of (1). The amount of extra information required for each application message in prasad.2 protocol is o (1), and for each failure, o (n) needs to be exchanged3) The system message of (1). In Strom and Yeminii [2 ]]In the optimistic message logging protocol, each sent application message carries a transfer dependency vector, the vector has n components, and n is the number of processes of the system. Each process periodically broadcasts this delivery dependency vector or appends the vector to the message sent when the process executes without failure.
Table 1 shows the comparison result between the MNCMR protocol and the above protocols, so it is easy to see that the MNCMR protocol is superior to other protocols in each index.
TABLE 1
Disclosure of Invention
The present invention is directed to solve the above problems and to provide a method for checking message log recovery based on message reordering and message number, which has the advantage of enabling a message log protocol to have both optimistic and pessimistic message log protocols.
In order to achieve the purpose, the invention adopts the following technical scheme:
a method for recovering log based on message reordering and message number check adopts messageReordering method, storing the receiving order of the message in the sending process; when a message receiving process fails, firstly acquiring messages which are stored and not stored in a log file and a corresponding logic clock of a message sending process from the sending process under the control of a recovery process, and then reordering the messages which are not stored in the log file according to the logic clock of the messages; finally, the sequenced messages are sent to the fault process again, the fault process receives the messages again and processes the messages, thereby realizing the replay of the messagesiThe working steps are as follows:
step 1, fork is an integer variable, initialize Uik0 and TikIs 0, respectively represents a process piSend to process pkTotal number of messages of 0 and process pkReceiving a process piThe total number of messages of (2) is 0, and meanwhile lsn is made equal to 0.
Step 2, if the timing time is up, switching to step 3; otherwise, the step 4 is carried out.
Step 3, converting the process piSaving the state of (A) into a determinant log file mlog, and saving Ti,Ui,LCk(k ═ 1,2 … n) to local storage, and the old checkpoint is deleted.
Step 4, if the process piTo process pj(j ═ 1,2 … n) to send a message, then proceed to step 5; otherwise, go to step 9.
Step 5, if FiIs 1, represents a process piIf the fault occurs and the recovery is not yet carried out, the step 9 is carried out; otherwise process piAnd (6) normally running, and turning to the step 6.
Step 6, if FjIs 1, represents a process pjIf a failure occurs and has not yet been recovered, wait until FjUntil 0; otherwise, represents process pjAnd (5) normal operation, wherein the step 7 is carried out.
Step 7, process piTo process pjSending messages, LCiIs increased by 1, UijIs increased by 1, will<LCi,j,m>Added to the message log file dfile.
Step 8, adding UijIs saved to local storage and forwarded to process pjSending a message AM (i, LC)i,m)。
Step 9. if process piReceiving a process pjThe sent message is switched to 10; otherwise, go to step 14.
Step 10, Process piReceiving a process pjTransmitted message AM (j, LC)jM), it needs to be determined that this message is by process pjWhether sent or process pjSent by the recovery process.
Step 11, if AMj>LCjThen this indicates that the message is by process pjSent during fault-free operation, so step 12 is carried out; otherwise the message is passed by process pjThe recovery process of (2) is sent, and the process proceeds to step 14.
Step 12, using AMjValue of (1) update LCj. The received message is handed to the application process for processing and is sent to<j,LCj,i,lsn>And recording and saving the record in the memory. Because of the receipt of the message, LCiBy adding 1 to the value of (1) and using LCiAnd LCjMaximum value update LC ofi
Step 13, executing other deterministic events.
Step 14, determining Process piIf it is idle, if process piIf the system is idle, the step 15 is carried out; otherwise, go to step 16.
Step 15, utilizing the idle time to store the record in the memory<j,LCj,i,lsn>Store to mlog. Because of the received message, TijThe value is increased by 1. Will Ti,LCk(k is 1,2 … n) is saved inIn the hard disk, lsn is incremented by 1.
Step 16, if process p is receivedkIf the error flag clear message sys _ clear f (k) is 1,2 … n, the process proceeds to step 17; otherwise, go to step 18.
Step 17, assigning a value of 0 to FkIndicating that process p is now occurringkAnd (5) normally running.
Step 18, if process p is receivedk(k is 1,2 … n) message sys _ setf (k) with error flag set to 1, then proceed to step 19; otherwise, go to step 20.
Step 19, assigning a value of 1 to FkAt this point, process p is indicatedkFailed and not recovered.
Step 20, Process piIs it a post-failure restart? If the process piAnd if the failure is the restart, the step 21 is carried out, otherwise, the step 1 is carried out.
Step 21, using the pre-stored Ti,UiAnd LCkValue update T of (k ═ 1,2 … n)i,UiAnd LCk(k is 1,2 … n), and lsn is 0, Fi1. And (5) transferring to the step 1.
When the common process fails, the recovery process comprises three stages:
stage one:
the recovery process takes the saved messages from the message log file dfile in the order they were sent and stores them in internal memory. Recovery (i) first retrieves the triple by calling RequestDeterminant (i, lsn) through a procedure<j,LCj,lsn>Then GetM (LC) through procedure callj) Obtaining<j,LCj,m>Where m represents the content of the message. Finally, recovery (i) converting the triplets<j,LCj,lsn>Stored in a local memory ARRAY ARRAY.
And a second stage:
the recovery process retrieves from the message log file dfile messages that failed to be saved to mlog due to process failure and stores them in memory.
Recovery (i) first get T through remote procedure callijAnd Uji。TijProcess p is recordediReceived in process pjAnd the number of messages, U, already stored in mlogjiProcess p is recordedjSent to process piThe number of messages. According to UjiAnd TijMeaning of (1), UjiAnd TijDifference value U ofji—TijRepresents a process pjIs sent to piAnd due to piFailures fail to save to mlog number of messages. Then, piThe call is located at pjLocal process remote procedure call GeUnLogM (i, U)ji-Tij) And obtaining the message received firstly according to the difference value. Recovery (i) by repeatedly calling GeUnLogM (i, U)ji-Tij) And acquiring all messages which cannot be stored to mlog due to process failure and storing the messages in the local memory ARRAY ARRAY.
And a third stage: recovery (i) reorder by logical clock for all messages that fail to be saved to mlog due to process failure. Finally, all the messages which are saved and not saved to mlog are sent to p againiProcess, piThe process re-receives and processes these messages until it runs to the point in time before the failure.
The logic clock of stage three is described as follows:
logic clock LC of process ppIs an integer variable for counting the sending and receiving events of a message; LC (liquid Crystal)pSatisfies the following conditions:
1. its initial value is zero;
2. whenever a message is sent, LCpAdding one;
3. after receiving a message every time a process q receives the message and storing the necessary information of the message in a log file, the LCpPlus one, then LCp←max(LCp,LCq+1) Wherein LCqLogic clock representing process q, max represents fetch LCpAnd LCq+1Maximum value of (2).
The specific working steps of the recovery process are as follows:
step 1, restart p from the latest checkpointiAnd will mark process piThe message of sys _ setf (i) in the failure state is sent to each process, and lsn and NUM are set to 0.
And 2, judging whether the determinant file mlog is empty or not. If mlog is not empty, then 3 is carried out; otherwise go to 6.
Step 3, obtaining a determinant record from a determinant file mlog by calling RequestDetermiant (i, lsn)<LCj,j,lsn>. Mix LCjThe value of (d) is assigned to the LC, lsn value is increased by 1. Obtaining messages using remote invocation GetM (LC)<LCj,j,m>This message is sent by the LC in the message log file dfilejA unique identification.
Step 4, judging the message<LCj,j,lsn>Whether it is empty. If the message<LCj,j,lsn>If the result is empty, the operation goes to 2, and if the result is not empty, the operation goes to 5.
Step 5, the received message is processed<LCj,j,lsn>And storing the data into an ARRAY ARRAY of the memory. Namely, it isCARRAY[NUM].j=j,ARRAY[NUM]Lsn ═ lsn; . And adding 1 to the NUM value, and turning to 2 to judge whether mlog is empty or not.
Step 6, process piThe value of the total number of messages NUM sent and saved in the mlog file is saved to NUM'.
Step 7, forObtaining a process p by remotely invoking GetU (i, j)jSend to process piThe total number of messages.
Step 8, remotely calling GetT (i) to obtain process piThe number of messages received for each of the other processes.
Step 9, forDetermine if Uji exists>Tij. If Uji>Tij, indicating Process pjSend to process piIs greater than the number of messages of process piReceived process pjThe number of messages, at this time 10; otherwise go to 13.
Step 10, because Uji>Tij, so there is a halfway message. By remotely calling GetUnlogM (j, U)ji-Tij) Message and recording the obtained message in<LCj,j,m>In (1).
Step 11, judging the obtained message<LCj,j,m>Whether it is empty. If it is<LCj,j,m>If the message is empty, the process goes to 9, otherwise, the process goes to 12.
And step 12, storing the obtained message into an ARRAY ARRAY. Since a message is received, TijThe value is incremented by 1 and the message number NUM value is incremented by 1. And after the execution is finished, the operation is shifted to 9, and whether the message stored in the log file exists or not is continuously judged.
Step 13, according to ARRAY [ k ]].LCjIn ascending order of (a) are arranged sets of ARRAY, where k is from NUM' to NUM-1. I.e. process piLC of received messages sent by other processes according to messagesjSorting the ascending values of (a).
Step 14, process piSending a message AM (ARRAY k)].j,ARRAY[k].LCj,ARRAY[k]M), where k is 0,1 … (NUM-1).
Step 15, to other processes pk(k ≠ 1,2 … n, k ≠ i) sending a marking process piMessage sys _ clear f (i) that normal operation has been resumed. The recovery process ends.
Step 10 in the recovery process:
GeUnLogM(i,Uji-Tij) The process flow of (2) is as follows:
1. the dfile file is opened in a read-only manner.
2. The pointer p is moved to point to the first record stored (dfile is a sequential file, last record located at the end of the file).
3. It is determined whether the pointer p points to the end of the file. If the end of the file is not reached, the file is shifted to 5, otherwise the file is shifted to 10.
5. Reading the record pointed by p and storing the record into the triple<LCj,l,m>。
6. And judging whether the process identifier l is equal to i or not. If l equals i to 7, otherwise 4.
7. difference variable minus one (difference ═ U)ji-Tij)。
8. And judging whether the difference is 0. If difference is 0, it indicates that pj recorded in step 5 is first sent to piSends the record to recovery (i) process, turning to 11. If difference is not 0, go to 4.
4. The pointer p is moved to point to the next record. Turn to 3.
10. Closing the file, returning an invalid triple < NULL, NULL < NULL >, and turning to 11 and ending.
11. And (6) ending.
The invention has the beneficial effects that:
1. it is sometimes not possible to illustrate the exact sequence of replayed messages required in previous optimistic message logging protocols, because the transmission of messages is sometimes related to parameters such as channel delay, cpu speed, etc.
2. The theory and the method for reordering the messages are provided, and the problem that the message log fault-tolerant technology is troubled by the out-of-sequence of the messages which are not stored in the optimistic message log protocol for decades is solved.
3. The MNCMR message log recovery protocol proposed by the patent achieves a perfect combination of pessimistic and optimistic message logs. The protocol has the characteristic of an optimistic protocol when the process runs without faults, and the message is stored to the log file in the process and is asynchronously communicated with the process, so that the process is ensured to have good running performance; when a process fails, the recovery algorithm of the failed process is simple and easy to implement, and the characteristic enables the process to have the advantage of pessimistic protocol.
4. The adoption of the message reordering and message number checking technology enables the fault-free process under the MNCMR protocol to be continuously executed when some processes have faults, and the characteristic enables the performance of the process under the MNCMR protocol in the fault recovery phase to be better than that of the processes under the existing all message log recovery protocol in the fault recovery phase.
Drawings
FIG. 1 changes in channel delay resulting in a change in message reception order;
FIG. 2 is always a prior occurrence relationship example;
FIG. 3 is a process for improved logic clocking;
FIG. 4S (m)j) Indirect preceding occurrence of R (m)i);
FIG. 5 messages are sent, received and saved to a log file;
FIG. 6 obtains message content from the message log dfile;
FIG. 7 is a GetUnLogM (i, difference) flow diagram of the process;
FIG. 8 is a general process flow;
fig. 9 shows a recovery process flow.
Detailed Description
The invention is further described with reference to the following figures and examples.
Basic principle of message reordering
Under the PWD assumption, the message reception events of a process have some randomness, i.e. the reception of messages has some uncertainty in time and order. As shown in FIG. 1, assume that the distributed system consists of processes p, q, and r. Wherein,p,0q,0andr,0initial state intervals representing p, q, and r, respectively;q,1andq,2respectively representing the reception of a message m by a process q1And m2The latter state interval; t is tpqAnd trqRepresenting the communication channel delay between processes p and q and r, respectively. Under the optimistic message log protocol, if the process q is sending the message m1And m2The necessary information before saving to the log file fails at "x". After failure of process q, processes p, q, and r have to restart to resend and receive m1And m2. Obviously, the order in which process q replays (replay) messages should be m1、m2However due to channel delay tpqAnd trqNot a fixed constant, if tpq>trqThe order in which process q receives messages may become m2、m1. The example of fig. 1 illustrates that, although the optimistic message logging protocol requires that a failed process accurately send and receive messages that are not saved to the log file at replay, in some cases (e.g., process channel latency changes, process restart times in the system are ragged, etc.) the order in which actual processes send and receive messages may not coincide with the order before the failure. However, the final result of the system should be consistent with each repeated execution, which means that the execution result of the system may not be related to the receiving order of some messages.
Always prior relationship (always happy before) relationship:
assuming that the channel of the process is a FIFO reliable channel, eiAnd ejRespectively represent messages miAnd mjIs sent andan event is received. If e is executed at any one time in the systemiAlways occurs before ejRegardless of the delay of the channel, the speed of the cpu, etc., e is callediAlways in the first place ejAnd is recorded as:
in FIG. 2, R (m)1)、R(m2)、R(m3) And R (m)4) Respectively represent messages m1、m2、m3And m4Receive event of, S (m)1)、S(m2)、S(m3) And S (m)4) Respectively represent messages m1、m2、m3And m4The sending event of (1). Under the segment determination (PWD) assumption, m is received due to q1Then m is inevitably sent2R receiving m2Then m is inevitably sent3I.e. by Thus, it is possible to provideR(m1) Always occurs first in R (m)3) Indicates R (m)3) Logically dependent on R (m)1),R(m3) And R (m)1) The relationship between them is a logical dependency that is independent of other factors of the system. Since the reception event of a message is a deterministic event, it is possible to determine the time of arrival of the messageUnder the assumption of a FIFO channel,
as shown in fig. 2, since m3And m4Reaches process q with different channel delays, so R(m3) Does not always occur in advance of R (m)4). If event eiDoes not always have to occur prior to event ejBut is related to channel delay, cpu speed, etc., then is called eiNot always occurring in advance of ejIs marked asIn the context of figure 2, it is shown, the actual message receipt sequence of process q is therefore m1、m3、m4Or m1、m4、m3
Equivalent message sequence:
theorem of equivalent sequence of messages
Suppose S is a certain message sequence of process p and S' is a new sequence formed by rearranging the messages in S. The elements in S' satisfy: 1. all messages present in S are still present in S'; 2. if the receive events of some messages have an always-before-occurrence relationship in S, this relationship is still maintained in S'. Under the process channel FIFO and reliable channel assumptions, S and S' are equivalent sequences during process p replay.
And (3) proving that: under theoretic assumptions, although some messages in S are reordered in S ', always the pre-occurring relationship between these messages remains unchanged in S'; the order of receipt of the messages in S' is thus the actual order that may occur in the replay of the process. If S and S 'are not equivalent sequences during the replay of the process p, the execution of the message receiving event in S in the process p is not equivalent to the execution of the message receiving event in S' in the process p, i.e., each execution of the same process is inconsistent, which contradicts the consistency property of the process execution.
As shown in FIG. 2, m1、m3And m4And m1、m4And m3Is an equivalent sequence, i.e. process p replay m1、m3And m4And replay m1、m4And m3The computations performed by the post-process are equivalent.
Improved logic clock:
process p improved logic clock LCpIs an integer variable for counting the sending and receiving events of a message. LC (liquid Crystal)pSatisfies the following conditions:
1. its initial value is zero;
2. whenever a message is sent, LCpAdding one;
3. after receiving a message every time a process q receives the message and storing the necessary information of the message in a log file, the LCpPlus one, then LCp←max(LCp,LCq+1) Wherein LCqLogic clock representing process q, max represents fetch LCpAnd LCq+1Maximum value of (2).
As shown in fig. 3, LC after p sends m1pSending m4 post LC as 1p2. q after receiving and storing m1, LCqLC after sending m2 ═ 2qAfter receiving and storing m3, LCqAfter receiving and storing m4, LCq=7。
Basic theorem of reordering according to transmit logic clock
If the segmentation determination assumes PWD is true andLCp (S (m)i))<LCq(S(mj))。
Wherein R (m)i) And R (m)j) Messages m each representing a process kiAnd mjLCp (S (m)i) Means that process p sends message miThe latter logic clock, LCq (S (m)j) Means that process q sends message mjThe latter logic clock.
And (3) proving that: due to the fact thatR(mj) Logic depends on R (m)i),S(mj) Is determining an event, therefore Otherwise, assume R (m)i) Does not always occur first in S (m)j) This means that R (m)i) And S (m)j) Can occur in any order, or R (m)i) Occurs first in S (m)j) Or S (m)j) Occurs first in R (m)i). If S (m)j) Occurs first in R (m)i) Due to S (m)j) Always occurs first in R (m)j) Thus S (m)j) Can only indirectly occur first in R (m)i). As shown in fig. 4, there must be at least one message mkIs located at mjAnd miSuch that S (m)j)→S(mk),S(mk)→R(mk),R(mk)→S(mi),S(mi)→R(mi) Where "→" indicates a preceding occurrence. In this case, mjAnd miMust arrive at process k, m via different transmission channelsjAnd miMay have different channel delays, so R (m)i) May not always occur in the first place at R (m)j) This contradicts the theorem assumption, and thusAccording to the definition of improving the logic clock, LCp (R (m)i) Must be less than LCq (S (m))j) LCp (R (m))i)<LCq(S(mj)). And because of Therefore LCp (S (m)i))<LCp(R(mi)). LCp (S (m) can be obtainedi))<LCp(R(mi))<LCq(S(mj)),LCp(S(mi))<LCq(S(mj))。
As shown in figure 4 of the drawings,LCp(S(m1))=1,LCr(S(m3))=5,LCp(S(m1))<LCr(S(m3))。
the above theorem indicates that the order of message reception in the message reception sequence of any process can be determined by the logic clock of the message sending process. If the sending process of the message sends the message, storing the logic clock related to the message and the content of the message to a stable storage; after the receiving process of the message fails, the message which cannot be saved to the log file due to the failure can determine the sequence of the messages through the clock stored in the sending process, and the messages are reordered according to the sequence. For example, in FIG. 4, p process sends m1Post-storage<m1,LCp=1>P Process sends m4Post-storage<m4,LCp=2>(ii) a r Process sends m3Post-storage<m3,LCr=5>(ii) a If process p fails at "X", it can be based on the doublet<m1,LCp=1>、<m3,LCr=5>And<m4,LCp=2>the logic clock in (1) reorders the sequence of process p replay messages into<m1,LCp=1>、<m4,LCp=2>And<m3,LCr=5>。
distributed system by process piI is 1,2 … n and a recovery unit recovery (i).
The message log system consists of a message log dfile and a determinant log mlog.
The message log dfile is a sequential file stored in the local storage of the message sending process. The local storage media can respectively adopt an internal memory or a hard disk according to different fault-tolerant capability requirements of the system, if the fault-tolerant capability of the system is designed to allow only one process to make an error, the local storage adopts the internal memory to store the dfile file, and if a plurality of processes are allowed to make errors, the dfile file needs to be stored in the hard disk. A dfile file consists of several records, each record being a triplet:<LCj,i,m>wherein LCjIndicating an improved logical clock after the process pj sends the message m, i indicating the process identity of the receiving process, and m indicating the content of the message. As shown in FIG. 5, process pj sends a message<j,LCj,m>First, firstly, the<LCj,i,m>Storing the dfile.
The determinant log mlog holds the determinants for all messages:<j,LCj,i,lsn>where j denotes the process identification of the sending process of the message m, LCjIndicating the logical clock after the sending process sends m, i indicates the receiving process piLsn indicates the secondary sequence number of the receiving process save message. lsn having an initial value of zero, process piFor each decision factor stored for a message lsn is incremented. As shown in FIG. 5, process piReceiving messages<j,LCj,m>The message is then first submitted to the application (application) and the determinant is then saved to mlog.
As shown in fig. 6, the saved messages are acquired from the message log file dfile in the order of transmission and stored in the internal memory. Recovery (i) first retrieves the triple by calling RequestDeterminant (i, lsn) through a procedure<j,LCj,lsn>Then GetM (LC) through procedure callj) Obtaining<j,LCj,m>Where m represents the content of the message. Finally, recovery (i) converting the triplets<j,LCj,lsn>Stored in a local memory ARRAY ARRAY.
As shown in FIG. 7, GeUnLogM (i, U)ji-Tij) The process flow of (2) is as follows:
1. the dfile file is opened in a read-only manner.
2. The pointer p is moved to point to the first record stored (dfile is a sequential file, last record located at the end of the file).
3. It is determined whether the pointer p points to the end of the file. If the end of the file is not reached, the file is shifted to 5, otherwise the file is shifted to 10.
5. Reading the record pointed by p and storing the record into the triple<LCj,l,m>。
6. And judging whether the process identifier l is equal to i or not. If l equals i to 7, otherwise 4.
7. difference variable minus one (difference ═ U)ji-Tij)。
8. And judging whether the difference is 0. If difference is 0, it indicates that p is recorded in step 5jIs sent to p firstiSends the record to recovery (i) process, turning to 11. If difference is not 0, go to 4.
4. The pointer p is moved to point to the next record. Turn to 3.
10. Closing the file, returning an invalid triple < NULL, NULL < NULL >, and turning to 11 and ending.
11. And (6) ending.
As shown in FIG. 8, a normal process piThe operation flow is as follows:
1. for thek is an integer variable, initialize Uik0 and TikIs 0, respectively represents a process piSend to process pkTotal number of messages of 0 and process pkReceiving a process piThe total number of messages of (2) is 0, and meanwhile lsn is made equal to 0.
2. If the timing time is up, turning to 3; otherwise go to 4.
3. Will process piSaving the state of (A) into a determinant log file mlog, and saving Ti,Ui,LCk(k ═ 1,2 … n) to local storage, and the old checkpoint is deleted.
4. If the process piTo process pj(j ═ 1,2 … n) send message, go to 5; otherwise go to 9.
5. If FiIs 1, represents a process piWhen the fault occurs and the fault is not recovered, the process is shifted to 9; otherwise process piAnd (6) normal operation is carried out.
6. If FjIs 1, represents a process pjIf a failure occurs and has not yet been recovered, wait until FjUntil 0; otherwise, represents process pjNormal operation, now go to 7.
7. Due to process piTo process pjSending messages, LCiIs increased by 1, UijIs increased by 1, will<LCi,j,m>Added to the message log file dfile.
8. Will UijIs saved to local storage and forwarded to process pjSending a message AM (i, LC)i,m)。
9. If the process piReceiving a process pjThe sent message is switched to 10; otherwise go to 14.
10. Process piReceiving a process pjTransmitted message AM (j, LC)jM), it needs to be determined that this message is by process pjWhether sent or process pjSent by the recovery process.
11. If AMj>LCjThen this indicates that the message is by process pjSent during faultless operation, and therefore shifted to 12; otherwise the message is passed by process pjThe recovery process of (2) sends a transition to (14).
12. LC using AMjValue of (1) update LCj. Will receiveThe received message is handed over to the application process for processing and will<j,LCj,i,lsn>And recording and saving the record in the memory. Because of the receipt of the message, LCiBy adding 1 to the value of (1) and using LCiAnd LCjMaximum value update LC ofi
13. Other deterministic events are executed.
14. Determining a Process piIf it is idle, if process piTurning to 15 when the device is idle; otherwise, 16 is entered.
15. Using idle time to store the record in memory<j,LCj,i,lsn>Store to mlog. Because of the received message, TijThe value is increased by 1. Will Ti,LCkThe value of (k 1,2 … n) is saved to the hard disk, and the value of lsn is incremented by 1.
16. If process p is receivedkAn error flag clear message sys _ clear f (k) of (k 1,2 … n), which is then shifted to 17; otherwise go to 18.
17. Assigning a value of 0 to FkIndicating that process p is now occurringkAnd (5) normally running.
18. If process p is receivedk(k 1,2 … n) message sys _ setf (k) with error flag set to 1, then proceed to 19; otherwise go to 20.
19. Assigning a value of 1 to FkAt this point, process p is indicatedkFailed and not recovered.
20. Process piIs it a post-failure restart? If the process piAnd switching to 21 for restarting after the fault, and otherwise, switching to 1.
21. With pre-stored Ti,UiAnd LCkValue update T of (k ═ 1,2 … n)i,UiAnd LCk(k is 1,2 … n), and lsn is 0, Fi1. And (4) turning to 1.
As shown in FIG. 9, the recovery process recovery (i) operation flow is as follows:
1. restart p from the latest checkpointiAnd will mark process piThe message of sys _ setf (i) in the failure state is sent to each process, and lsn and NUM are set to 0.
2. And judging whether the determinant file mlog is empty or not. If mlog is not empty, then 3 is carried out; otherwise go to 6.
3. Obtaining a determinant record from a determinant file mlog by calling RequestDetermiant (i, lsn)<LCj,j,lsn>. Mix LCjThe value of (d) is assigned to the LC, lsn value is increased by 1. Obtaining messages using remote invocation GetM (LC)<LCj,j,m>This message is sent by the LC in the message log file dfilejA unique identification.
4. Determining messages<LCj,j,lsn>Whether it is empty. If the message<LCj,j,lsn>If the result is empty, the operation goes to 2, and if the result is not empty, the operation goes to 5.
5. Message to be received<LCj,j,lsn>And storing the data into an ARRAY ARRAY of the memory. I.e. ARRAY [ NUM ]].LCj=LCj,ARRAY[NUM].j=j,ARRAY[NUM]Lsn ═ lsn. And adding 1 to the NUM value, and turning to 2 to judge whether mlog is empty or not.
6. Will process piThe value of the total number of messages NUM sent and saved in the mlog file is saved to NUM'.
7. For theObtaining a process p by remotely invoking GetU (i, j)jSend to process piThe total number of messages.
8. Remote invocation GetT (i) get Process piThe number of messages received for each of the other processes.
9. For theJudging whether U existsji>Tij. If U isji>TijIndicates a process pjSend to process piIs greater than the number of messages of process pjReceived process pjThe number of messages, at this time 10; otherwise go to 13.
10. Because of Uji>TijSo there is a midway message. By remotely calling GetUnlogM (j, U)ji-Tij) Message and recording the obtained message in<LCj,j,m>In (1).
11. Determining the acquired message<LCj,j,m>Whether it is empty. If it is<LCj,j,m>If the message is empty, the process goes to 9, otherwise, the process goes to 12.
12. The obtained message is stored in an ARRAY ARRAY. Since a message is received, TijThe value is incremented by 1 and the message number NUM value is incremented by 1. And after the execution is finished, the operation is shifted to 9, and whether the message stored in the log file exists or not is continuously judged.
13. According to ARRAY [ k ]].LCjIn ascending order of (a) are arranged sets of ARRAY, where k is from NUM' to NUM-1. I.e. process piLC of received messages sent by other processes according to messagesjSorting the ascending values of (a).
14. To process piSending a message AM (ARRAY k)].j,ARRAY[k].LCj,ARRAY[k]M), where k is 0,1 … (NUM-1).
15. To other respective processes pk(k ≠ 1,2 … n, k ≠ i) sending a marking process piMessage sys _ clear f (i) that normal operation has been resumed. The recovery process ends.
Recovery (i) recovery algorithm correctness proof:
first, the recoverability principle of the process state interval is explained: a process state interval is recoverable, and if any failure occurs in the process in the future, the process may be re-executed for the interval.
Theorem 1, if one or more processes fail, the failed process must be restored to the state before the failure under the action of recovery (i) process.
And (3) proving that: since messages of a process that fail to be saved to mlog have their order of receipt and message content saved in the local stable store of the messaging process, the process is recoverable according to the process state interval recoverability principle described above. Theorem 1 is demonstrated in detail below in two cases.
1. Assume that only one process fails. After the failed process is detected, the failed process will start from the latest checkpoint. The recovery process first obtains the information of the message stored in the determinant log mlog, and then obtains the logic clock and the content of the message from the sequence file dfile according to the information and stores the logic clock and the content in the memory space of the recovery process. Since the record in mlog is determined by the value of lsn variable only and the record in dfile is determined by the logic clock only, eventually all messages that have been saved in mlog must have their logic clock and contents transferred to the memory space ARRAY of the recovery process. For messages that fail to be saved to mlog due to process failure, the logical clock and contents of all such messages executed by the remote procedure call getunrogm (i, difference) will also be transferred into the memory space ARRAY of the recovery process, since the number of such messages can be determined by the number of message sends and the number of message receives and saved. After the messages which cannot be stored to mlog are sorted again according to the logic clock, the recovery process sends the stored messages to the fault process again, the fault process receives and processes the messages again, and finally the state interval before the fault process fails is reached, namely one fault process is recoverable.
2. If multiple processes fail, because each failed process is recovered by the recovery process independently, the failed processes can be recovered by the recovery process when multiple processes fail.
Theorem 2 under the action of the recovery process, the global state of the system after all the failed processes are recovered is a consistent global state.
The proof is as described by the algorithm of the recovery process, if one or more processes are failed, if the event that the failed process sends a message to the failed process, the failed process stops waiting for recovery, otherwise, the execution is continued.
Case 1, assume that the non-failed process did not send a message to the failed process. After the failed process is recovered, the number of messages sent to the failed process by the failed process is not changed, and no orphan messages can exist between the processes.
Case 2, assume that the failed process stops at the event that a message is sent to the failed process. After the failed process is recovered, the number of messages sent to the failed process by the failed process is not changed, and no orphan messages can exist between the processes.
And combining the two conditions, wherein the system global state after the fault process is recovered is a consistent global state according to the meaning of global state consistency.
However, when all the failed processes are recovered, there may be a case where a plurality of non-failed processes send messages to one recovered failed process. In fact, since the sending process sends messages to the recovered process via different channels, there is no always a preceding relationship between the events of receipt of these messages for the receiving process, and these events can be performed in any order.
Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.

Claims (5)

1. A message log recovery method based on message reordering and message number inspection is characterized in that a message reordering method is adopted to store the receiving sequence of messages in a sending process; when a message receiving process fails, firstly acquiring messages which are stored and not stored in a log file and a logic clock corresponding to the message sending process from a sending process under the control of a recovery process, and then reordering the messages which are not stored in the log file according to the logic clock of the messages; finally, the sequenced messages are sent to the fault process again, and the fault process receives the messages again and processes the messages, thereby realizing the replay of the messages, and the required working steps of the common process of the method are as follows:
step 1, fork is an integer variable, initialize Uik0 and TikIs 0, respectively represents a process piSend to process pkTotal number of messages of 0 and process pkReceiving a process piThe total number of messages of (2) is 0, and lsn is made equal to 0; lsn denotes the sequence number of the receive process save message; 1,2 … n; k is 1,2 … n;
step 2, if the timing time is up, turning to step 3; otherwise, turning to 4;
step 3, converting the process piSaving the state of (A) into a determinant file mlog, and saving Ti,Ui,LCkK is 1,2 … n to local storage, delete old checkpoint; 1,2 … n;
step 4, if the process piTo process pjIf j is 1,2 … n sends message, then go to 5; otherwise, turning to 9;
step 5, if FiIs 1, represents a process piWhen the fault occurs and the fault is not recovered, the process is shifted to 9; otherwise process piNormally operating, and turning to 6;
step 6, if FjIs 1, represents a process pjIf a failure occurs and has not yet been recovered, wait until FjUntil 0; otherwise, represents process pjNormal operation, at this time, 7 is carried out;
step 7, process piTo process pjSending messages, LCiIs increased by 1, UijIs increased by 1, will<LCi,j,m>Adding the file into a message log file dfile;
step 8, adding UijIs saved to local storage and forwarded to process pjSending a message AM (i, LC)iM), m represents the content of the message;
step 9. if process piReceiving a process pjThe sent message is switched to 10; otherwise, turning into 14;
step 10, Process piReceiving a process pjTransmitted message AM (j, LC)jM), it needs to be determined that this message is by process pjWhether sent or process pjSent by the recovery process of (2);
step 11, if AMj>LCjThen this indicates that the message is by process pjSent during faultless operation, and therefore shifted to 12; otherwise the message is passed by process pjThe recovery process of (2) is sent, and the process is switched into (14);
step 12, using AMjValue of (1) update LCj(ii) a The received message is handed to the application process for processing and is sent to<j,LCj,i,lsn>Recording and storing the record in a memory; because of the received message, the LCiBy adding 1 to the value of (1) and using LCiAnd LCjMaximum value update LC ofi(ii) a The determinant file mlog holds the determinants of all messages:<j,LCj,i,lsn>where j denotes the process identification of the sending process of the message m, LCjIndicating the logical clock after the sending process sends m, i indicates the receiving process piLsn denotes the sequence number of the received process save message, lsn has an initial value of zero, process piLsn plus one for each determinant that holds a message;
step 13, executing other deterministic events;
step 14, determining Process piIf it is idle, if process piTurning to 15 when the device is idle; otherwise, 16 is switched in;
step 15, utilizing the idle time to store the record in the memory<j,LCj,i,lsn>Storing into mlog; because of the received message, TijThe value is increased by 1; will Ti,LCkThe value of k is 1,2 … n is saved in the hard disk, and the value of lsn is added with 1;
step 16, if process p is receivedkIf k is 1,2 … n error flag clear message sys _ clear f (k), then go to 17; otherwise, turning to 18;
step 17, assigning a value of 0 to FkIndicating that process p is now occurringkNormal operation is carried out;
step 18, if process p is receivedkIf k is 1,2 … n, message sys _ setf (k) with error flag set to 1, then go to 19; otherwise, 20 is switched in;
step 19, assigning a value of 1 to FkAt this point, process p is indicatedkFailed and not recovered;
step 20, judging the process piWhether the failure is followed by restarting; if the process piIf the failure occurs, the operation is switched to 21, otherwise, the operation is switched to 1;
step 21, using the pre-stored Ti,UiAnd LCkValue of k 1,2 … n updates Ti,UiAnd LCkAnd let lsn be 0, Fi1 is ═ 1; and (4) turning to 1.
2. The message log recovery method based on message reordering and message number checking as claimed in claim 1, wherein the recovery process is entered when the normal process encounters the failure, wherein the recovery process comprises three phases:
stage one:
obtaining the saved messages from the message log file dfile according to the sending sequence and storing the saved messages in an internal memory; recovery (i) first retrieves the triple by calling RequestDeterminant (i, lsn) through a procedure<j,LCj,lsn>Then GetM (LC) through procedure callj) Obtaining<j,LCj,m>Wherein m represents the content of the message; finally, recovery (i) converting the triplets<j,LCj,lsn>Storing in a local memory ARRAY ARRAY;
and a second stage:
obtaining the messages which cannot be stored to mlog due to process failure from the message log file dfile and storing the messages in an internal memory;
recovery (i) first get T through remote procedure callijAnd Uji;TijProcess p is recordediReceived in process pjAnd the number of messages, U, already stored in mlogjiProcess p is recordedjSent to process piThe number of messages of (2); according to UjiAnd TijMeaning of (1), UjiAnd TijDifference value U ofji-TijRepresents a process pjIs sent to piAnd due to piFailure to save to mlog number of messages; then, piThe call is located at pjLocal process remote procedure call GetUnLogM (j, U)ji-Tij) Obtaining the message received firstly according to the difference value; recovery (i) by repeatedly calling GetUnLogM (j, U)ji-Tij) Acquiring all messages which cannot be stored to mlog due to process faults and storing the messages in a local memory ARRAY ARRAY;
and a third stage: for all messages that fail to be saved to mlog due to process failure, recovery (i) reorder by logical clock; finally, all the messages which are saved and not saved to mlog are sent to p againiProcess, piThe process re-receives and processes these messages until it runs to the point in time before the failure.
3. The message log recovery method based on message reordering and message number checking as claimed in claim 2, wherein the logic clock of stage three of the recovery process is described as follows:
logic clock LC of process ppIs an integer variable for counting the sending and receiving events of a message; LC (liquid Crystal)pSatisfies the following conditions:
1) its initial value is zero;
2) whenever a message is sent, LCpAdding one;
3) after receiving a message every time a process q receives the message and storing the necessary information of the message in a log file, the LCpPlus one, then LCp←max(LCp,LCq+1), wherein LCqLogic clock representing process q, max represents fetch LCpAnd LCqMaximum value in + 1.
4. The message log recovery method based on message reordering and message number checking as claimed in claim 2, wherein the recovery process comprises the following steps:
step 1, restart p from the latest checkpointiAnd will mark process piSending a message of sys _ SetF (i) in a fault state to each process, and setting lsn and the value of the number NUM of messages as 0;
step 2, judging whether the determinant file mlog is empty or not; if mlog is not empty, then 3 is carried out; otherwise, turning to 6;
step 3, obtaining a determinant record from a determinant file mlog by calling RequestDetermiant (i, lsn)<LCj,j,lsn>(ii) a Mix LCjThe value of (d) is assigned to LC, lsn value is increased by 1; obtaining messages using remote invocation GetM (LC)<LCj,j,m>This message is sent by the LC in the message log file dfilejA unique identifier;
step 4, judging the message<LCj,j,lsn>Whether it is empty; if the message<LCj,j,lsn>If the value is null, turning to 2, otherwise, turning to 5;
step 5, the received message is processed<LCj,j,lsn>Storing the data into an ARRAY ARRAY of a memory; i.e. ARRAY [ NUM ]].LCj=LCj,ARRAY[NUM].j=j,ARRAY[NUM]Lsn ═ lsn; adding 1 to the NUM value, and turning to 2 to judge whether mlog is empty;
step 6, process piThe value of the total number NUM of the messages which are sent and stored in the mlog file is stored into NUM';
step 7, forGet process p by remote invocation GetU (i, j) < 1,2 … njSend to process piThe total number of messages of (a);
step 8, remotely calling GetT (i) to obtain process piThe number of messages received for each of the other processes;
step 9, forj is 1,2 … n, and it is determined whether or not U is presentji>Tij(ii) a If U isji>TijIndicates a process pjSend to process piIs greater than the number of messages of process piReceived process pjThe number of messages, at this time 10; otherwise, turning to 13;
step 10, because Uji>TijSo there is a midway message; by remotely calling GetUnlogM (j, U)ji-Tij) Message and recording the obtained message in<LCj,j,m>Performing the following steps;
step 11, judging the obtained message<LCj,j,m>Whether it is empty; if it is<LCj,j,m>If the message is empty, the step is switched to 9, otherwise, the step is switched to 12;
step 12, storing the obtained message into an ARRAY ARRAY; since a message is received, TijAdding 1 to the value, and adding 1 to the NUM value of the message number; after the execution is finished, the operation is switched to 9, and whether the message stored in the log file exists or not is continuously judged;
step 13, according to ARRAY [ k ]].LCjWherein k is from NUM' to NUM-1; i.e. process piLC of received messages sent by other processes according to messagesjSorting the ascending values;
step 14, process piSending a message AM (ARRAY k)].j,ARRAY[k].LCj,ARRAY[k]M), where k is 0,1 … (NUM-1);
step 15, to other processes pkK 1,2 … n, k ≠ i sending tag process piMessage sys _ clear f (i) that normal operation has been resumed; the recovery process ends.
5. The message log recovery method based on message reordering and message number checking as claimed in claim 4, wherein the step 10 of recovering the process is:
GetUnLogM(j,Uji-Tij) The process flow of (2) is as follows:
(1) opening the dfile file in a read-only mode;
(2) moving a pointer p to point to the record stored firstly, wherein dfile is a sequential file, and the record stored finally is positioned at the tail of the file;
(3) judging whether the pointer p points to the end of the file, if not, turning to (5), otherwise, turning to (10);
(4) moving the pointer p to point to the previous record;
(5) reading the record pointed by p and storing the record into the triple<LCj,l,m>;
(6) Judging whether the process identifier l is equal to i or not; if l is equal to i to turn (7), otherwise, turning (4);
(7) difference variable minus one, difference ═ Uji-Tij
(8) Judging whether difference is 0; if difference is 0, it indicates that p is recorded in step (5)jIs sent to p firstiSending the record to a recovery (i) process, turning to (9); if difference is not 0, turning to (4);
(9) close the file, return < j, LCj, m >, go (11);
(10) closing the file, returning an invalid triple (NULL, NULL, NULL), and turning to (11) to end;
(11) and (6) ending.
CN201210239710.0A 2012-07-11 2012-07-11 The message logging restoration methods that Effect-based operation reorders and message number is checked Expired - Fee Related CN102841840B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210239710.0A CN102841840B (en) 2012-07-11 2012-07-11 The message logging restoration methods that Effect-based operation reorders and message number is checked

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210239710.0A CN102841840B (en) 2012-07-11 2012-07-11 The message logging restoration methods that Effect-based operation reorders and message number is checked

Publications (2)

Publication Number Publication Date
CN102841840A CN102841840A (en) 2012-12-26
CN102841840B true CN102841840B (en) 2015-09-09

Family

ID=47369231

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210239710.0A Expired - Fee Related CN102841840B (en) 2012-07-11 2012-07-11 The message logging restoration methods that Effect-based operation reorders and message number is checked

Country Status (1)

Country Link
CN (1) CN102841840B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105242979B (en) * 2015-09-09 2017-12-12 高胜法 It is a kind of that there is the preceding backward recovery fault-tolerance approach to recovery feature
CN107181805B (en) * 2017-05-26 2019-11-12 上交所技术有限责任公司 A method of realizing that global orderly is recurred under micro services framework
CN108984101B (en) * 2017-06-01 2020-05-08 华为技术有限公司 Method and device for determining relationship between events in distributed storage system
CN110275782B (en) * 2018-03-13 2023-08-29 阿里巴巴集团控股有限公司 Data processing method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1122543A (en) * 1994-09-09 1996-05-15 Abb.专利有限公司 Method of consistent message transmission
CN101390071A (en) * 2003-07-07 2009-03-18 科尔德斯帕克有限责任公司 Messaging system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7421501B2 (en) * 2005-02-04 2008-09-02 Microsoft Corporation Queued sessions for communicating correlated messages over a network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1122543A (en) * 1994-09-09 1996-05-15 Abb.专利有限公司 Method of consistent message transmission
CN101390071A (en) * 2003-07-07 2009-03-18 科尔德斯帕克有限责任公司 Messaging system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Reducing Message Logging Overhead for Log-Based Recovery;Yi-Min Wang;《1993 IEEE International Symposium on Circuits and Systems》;19931231;第1925-1928页 *
一种高效的协调式检查点算法;刘翠英等;《计算机工程》;20111231;第37卷(第23期);第49-51页 *

Also Published As

Publication number Publication date
CN102841840A (en) 2012-12-26

Similar Documents

Publication Publication Date Title
JP6688835B2 (en) Multi-database log with multi-item transaction support
CN106776130B (en) Log recovery method, storage device and storage node
CN103294701B (en) A kind of method that distributed file system and data process
CN101316184B (en) Disaster tolerance switching method, system and device
JP4479930B2 (en) Node system, server switching method, server device, data takeover method, and program
CN111130879B (en) PBFT algorithm-based cluster exception recovery method
CN102841840B (en) The message logging restoration methods that Effect-based operation reorders and message number is checked
JP2007518195A (en) Cluster database using remote data mirroring
CN109815020B (en) Instruction processing alignment system
CN110209526B (en) Storage layer synchronization system and storage medium
CN110121694B (en) Log management method, server and database system
CN113905054B (en) RDMA (remote direct memory access) -based Kudu cluster data synchronization method, device and system
Jalote Fault tolerant processes
CN109960602B (en) Information management method, device, equipment and medium
CN105242979B (en) It is a kind of that there is the preceding backward recovery fault-tolerance approach to recovery feature
Friedman et al. Fast replicated state machines over partitionable networks
US20200026440A1 (en) In-Flight Data Records
JPH1115604A (en) Data multiplex method
CN115658245A (en) Transaction submitting system, method and device based on distributed database system
CN111581221B (en) Method for redundant storage and reconstruction of information of distributed multi-station fusion system
Awerbuch et al. A quantitative approach to dynamic networks
Garg et al. A review of fault tolerant checkpointing protocols for mobile computing systems
CN115202925A (en) Common identification method and system supporting fine-grained fault tolerance based on RDMA
CN100589362C (en) Check point migration method under error tolerance mobile computing environment
Manikandan et al. An efficient non blocking two phase commit protocol for distributed transactions

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20150909

Termination date: 20200711