Embodiment
Below, will illustrate and describe embodiment of the present invention.
Figure 1A is the block diagram that is used for the tolerant system of distributed program according to of the present invention.As shown in Figure 1, said tolerant system comprises fault-tolerant server 10, fault-tolerant client 20, fault-tolerant client 30, all links together through network between all fault-tolerant servers and the fault-tolerant client, and said network includes but not limited to LAN, wide area network etc.And, should be appreciated that this configuration described here only is to be used for illustrative purposes, system can comprise the fault-tolerant server and the fault-tolerant client of arbitrary number.
Said fault-tolerant server comprises communication module 102, tactful designated module 104, policy enforcement module 106, policy database 108, process dependence database 110.Said communication module 102 is used for communicating with fault-tolerant client; Receive the process status abnormal information that fault-tolerant client is sent; Said tactful designated module 104 is used for specifying in advance restarts that fault-tolerant processing is employed restarts strategy automatically or manually restart strategy; Preferably; The said mode of indication in advance is that the user passes through the manually input of computer entry device such as mouse, keyboard, and said policy enforcement module 106 is carried out the corresponding fault-tolerant processing of restarting according to by tactful designated module 104 preassigned strategies; Said policy database 108 storages are restarted strategy automatically and are manually restarted strategy, and said process dependence database 110 stores the process dependence form of dependence between expression distributed program process.
Said fault-tolerant client comprises process status state monitoring module 202, process status abnormal information generation module 204 and communication module 206.Said process status state monitoring module 202 is used to keep watch on the state of a process of the distributed program that himself moves; Said process status abnormal information generation module 204 is used for generating the process status abnormal information when unusual monitoring process status; Said communication module 206 is used for communicating with fault-tolerant server, and the process status abnormal information is sent to fault-tolerant server.
Above-mentioned structure is the application's a logic configuration, that is to say, above-mentioned fault-tolerant client and fault-tolerant server belong to the different logical structure, and in fact it can be disposed in the identical or different physical nodes.What be directed against because of the application is the fault-tolerant of distributed program, so only need fault-tolerant client configuration is got final product in different physical nodes.For example; Shown in Figure 1B; For system with a fault-tolerant server and two fault-tolerant clients, fault-tolerant server and fault-tolerant client can physical configuration in node 1, and another fault-tolerant client can physical configuration in another node 2.But be to be understood that; Above-mentioned configuration mode only is to be used for illustrative purposes; System can comprise the fault-tolerant server and the fault-tolerant client of arbitrary number, also can adopt other configuration, is disposed at three different nodes etc. respectively like a said fault-tolerant server and two fault-tolerant clients.
Below strategy is elaborated.Strategy one speech is from the strategy pattern in the Design Mode (Strategy): it has defined a series of algorithm, and each algorithm is encapsulated, and makes them can also mutual alternative.Strategy pattern can not have influence on the client who uses algorithm by the variation of algorithm.In the present invention, in policy database, store the various strategies of reply process status abnormal conditions.Employed strategy comprises restarts strategy automatically and manually restarts strategy and other auxiliary strategy.When distributed program takes place when unusual, just carry out and restart strategy automatically or manually restart strategy, also can carry out auxiliary strategy simultaneously alternatively.
Automatically restart strategy; Be to send the strategy that instruction makes it to restart automatically to fault-tolerant client when sending the process status abnormal information, restart strategy here automatically and can be divided into again only sending and restart this process, restart own and sequence number at its all, strategy that need all start anew by sequence number at the back when fault-tolerant server receives fault-tolerant client.
Manually restart strategy; Be to notify the O&M personnel so that whether its manual confirmation need restart the strategy of process when fault-tolerant client is sent the process status abnormal information, for example comprise: send note so that the strategy of process is restarted in its affirmation for the O&M personnel when fault-tolerant server receives; Make a phone call to play so-and-so for the O&M personnel and serve abnormal speech so that the strategy of process is restarted in its affirmation; Send mail notification so that the strategy of process etc. is restarted in its affirmation for the O&M personnel.Be to be understood that; It only is to be used for illustrative purposes that described here these are manually restarted strategy; In the process of carrying out fault-tolerant processing; Can use above-mentioned any or arbitrarily a plurality of combination of manually restarting in the strategy, for example send the mail alert note of also transmitting messages simultaneously, the combination of Here it is two kinds of strategies.In addition, can adopt other any suitable manually to restart strategy in the present invention what this did not show clearly.
Auxiliary strategy is a strategy of carrying out subsidiary function when restarting strategy automatically and manually restarting strategy when carrying out, the strategy of misregistration daily record etc. for example, but auxiliary strategy is not the necessary part that realizes fault tolerant mechanism, as required, can not carry out auxiliary strategy.In addition, can adopt other at this any suitable auxiliary strategy of not showing clearly in the present invention.
Fig. 2 is the process flow diagram according to the fault-tolerance approach that is used for distributed program of one embodiment of the present invention.As shown in Figure 2, said method starts from step S201.At step S201, fault-tolerant client terminal start-up is handled main thread, and fault-tolerant server starts handles main thread, and inspection has or not overtime fault-tolerant client.At step S202, fault-tolerant client is sent logging request to fault-tolerant server, and number of the account and password are landed in input.At step S203, fault-tolerant server receives the logging request of fault-tolerant client, and check land number of the account and password whether with login account of being stored and password matching, if checked result is a coupling, then fault tolerant service is upgraded the information of fault-tolerant client.At step S204, fault-tolerant server sends the login return information to fault-tolerant client, confirms that fault-tolerant client successfully logins.At step S205, fault-tolerant client is sent heartbeat message to fault-tolerant server.At step S206, fault-tolerant server receives and upgrades the heartbeat message of fault-tolerant client.Whether fault-tolerant server also judges in the transmission heartbeat whether the physical link between the two is normal according to fault-tolerant client.At step S207, fault-tolerant server sends the heartbeat return information to fault-tolerant client, confirms that the heartbeat message of fault-tolerant client is successfully received.At step S208, fault-tolerant client utilizes its process status monitor module to keep watch on the state of a process of the distributed program that in himself, moves.At step S209, when monitoring process status when unusual, fault-tolerant client utilizes process status abnormal information generation module to generate the process status abnormal information, and utilizes communication module to send it to fault-tolerant server.At step S210; Fault-tolerant server receiving process abnormal state information; And according to restart strategy automatically or manually restart strategy, according to the dependence between the process that defines in the process dependence database; Utilize policy enforcement module to carry out the predetermined fault-tolerant processing of restarting, it is to utilize tactful designated module preassigned by the O&M personnel that wherein said automatic or manual is restarted strategy.At step S211, fault-tolerant client is to fault-tolerant server report process initiation result.Processing finishes.
Fig. 3 is the process flow diagram that adopts based on the fault-tolerance approach of restarting strategy automatically.As shown in Figure 3, said method starts from step S301.In step S301, the process status monitor module of fault-tolerant client finds that the own process A that is kept watch on has occurred unusually, just generates the process status abnormal information, and utilizes communication module that said message is sent to fault-tolerant server.At step S302, fault-tolerant server receives process A condition abnormal information, and lookup process relies on form, finds process B dependent process A.In step 303, restart strategy automatically according to preassigned, fault-tolerant server sends the instruction of end process B to fault-tolerant client.In step 304, when fault-tolerant client receives the instruction of end process B, then finish process B, then will finish the successful executing result message of process B and reply to fault-tolerant server.At step S305, fault-tolerant server receives end process B message of successful, then sends message to the fault-tolerant client that restarts process A.At step S306, fault-tolerant client executing is restarted the instruction of process A, and will restart the successful executing result message of process A and reply to fault-tolerant server.At step S307, fault-tolerant server receives restarts process A message of successful, the instruction of then sending startup process B.At step S308, fault-tolerant client receives instruction and the process of the startup B of startup process B, then will start process B message of successful and be sent to fault-tolerant server.At step S309, fault-tolerant server receives said startup process B message of successful.So far this fault-tolerance approach finishes.
Fig. 4 is the process flow diagram that adopts based on the fault-tolerance approach of manually restarting strategy, for example adopts to the O&M personnel and makes a phone call to play the recording of reporting to the police.As shown in Figure 4, said method starts from step S401.In step S401, the process status monitor module of fault-tolerant client finds that the own process A that is kept watch on has occurred unusually, just generates the process status abnormal information, and utilizes communication module that said message is sent to fault-tolerant server.At step S402, fault-tolerant server receives process A condition abnormal information, and lookup process relies on form, discovery process B dependent process A.In step 403, according to preassigned to the O&M personnel make a phone call to play report to the police recording manually restart strategy, fault-tolerant server is made a phone call and is play the recording of reporting to the police to the O&M personnel.In step 404,, the O&M personnel just carry out manual confirmation when receiving this phone, and fault-tolerant client finishes process B, and then fault-tolerant client will finish the successful executing result message of process B and reply to fault-tolerant server.At step S405, fault-tolerant server receives end process B message of successful, then sends message to the fault-tolerant client that restarts process A.At step S406, fault-tolerant client executing is restarted the instruction of process A, and will restart the successful executing result message of process A and reply to fault-tolerant server.At step S407, fault-tolerant server receives restarts process A message of successful, the instruction of then sending startup process B.At step S408, fault-tolerant client receives instruction and the process of the startup B of startup process B, then will start process B message of successful and be sent to fault-tolerant server.At step S409, fault-tolerant server receives said startup process B message of successful.So far this fault-tolerance approach finishes.
Fig. 5 adopts based on restarting the process flow diagram of strategy with the fault-tolerance approach of auxiliary strategy automatically.As shown in Figure 5; It only is to receive after process B starts
message at step S309 fault-tolerant server with the difference of fault-tolerance approach shown in Figure 3 and also comprises step S501; By the daily record of fault-tolerant server misregistration, use in order to examination in the future.But
said misregistration daily record only is an embodiment of auxiliary strategy; Auxiliary strategy is not that office
for example can also adopt based on manually restarting the fault-tolerance approach of strategy with auxiliary strategy, for example in the method shown in
, adds misregistration daily record and other any suitable steps.
To the process dependence among the application be described with illustrational mode below.
Referring to following table 1, be the form of three exemplary process titles and process institute store path.Three processes are respectively gateway, control center and recording server.But should be appreciated that these processes only are exemplary, can also use other process of any a plurality of numbers.
Table 1
The name of process |
The complete trails of process |
Gateway |
C:\infobird\ibserver.exe |
Control center |
d:\infobird\ibCtlServer.exe |
Recording server |
d:\infobird\ibMonitor.exe |
[0033]Store the process dependence table of dependence between definition distributed program process in the process dependence database.Said form has unlock code, process title, process place client name, sends next bar instruction after what seconds of delaying time, regularly restarts setting, specially restart projects such as beautiful.Below in conjunction with table 2, will be the explanation that makes an explanation of example one a pair of each project with these three processes of control center, gateway and recording server.Unlock code is represented the dependence between process.
Table 2 (regularly starting is What You See Is What You Get)
Unlock code |
The process title |
Process place client name |
How many time-delays is claimed to send next bar instruction behind the number |
Regularly restart setting |
Special reboot flag |
1 |
Control center |
Fault-tolerant client 1 |
1 |
time=″17:5:5″ WhichWeek=″7″ everydayRestart=″TRUE″ |
0 |
2 |
Gateway |
Fault-tolerant client 2 |
1 |
time=″15:5:5″ WhichWeek=″7″ everydayRestart=″TRUE″ |
1 |
3 |
The recording service |
Fault-tolerant client 2 |
1 |
time=″12:5:5″ WhichWeek=″7″ everydayRestart=″TRUE″ |
0 |
Being example shown in the table 2; Unlock code is that to depend on unlock code be 1 process control center for 2 process gateway; To depend on unlock code be 2 process gateway and unlock code is 3 process recording server, and then to depend on unlock code indirectly be 1 process control center.Process place client name shows which fault-tolerant client is the process of distributed program lay respectively in, and for example control center is arranged in fault-tolerant client 1, and process gateway and recording server are arranged in fault-tolerant client 2.For example; When process control center takes place when unusual; The fault-tolerant client 1 at control center place can be sent process situation abnormal information to fault-tolerant server, and fault-tolerant server receives this information, and then lookup process relies on form; Find that gateway and recording server have dependence to control center, fault-tolerant server will be according to the fault-tolerant processing of preassigned strategy execution automatic or manual so.Be treated to example with automatic fault tolerant, the instruction that fault-tolerant server sends end process gateway and recording server to fault-tolerant client 2.After successfully closing above-mentioned two processes, fault-tolerant server sends instructions to fault-tolerant client 1 with the start-up control center, then redispatches and instructs fault-tolerant client 2 successively to start gateway and recording server.
Setting is restarted in time-delay number second in the dependence form, timing and special reboot flag is optional.Time-delay second number is to be illustrated in to receive behind the trigger message to the time of sending between next bar instruction, through this project is set, can avoid indivedual routing failures occur in the network the actual arrival of instruction incorrect phenomenon in proper order.Regularly restart and be provided for guaranteeing that process can regularly restart, unlock code is since 1, and order increases progressively, and no matter program is to have made mistakes (to jump frame, in the dust), still flown (process disappears, and does not have startup), all can repair by boot sequence.And; Regularly start and have the characteristic of What You See Is What You Get, so-calledly regularly start What You See Is What You Get, the order that is meant startup is no longer relevant with special reboot flag with boot sequence, time-delay second number; And it is only relevant with time, the date of disposing in the accompanying drawing; That is to say, start, then only carry out and restart according to the time that is provided with in " regularly restarting setting " if be provided with regularly.For example, be exactly to have broken dependence to restart process control center 17: 5: 5 of every day, restarted the process gateway in 15: 5: 5, restarted the process recording server in 12: 5: 5 shown in the table 2.In addition, if do not want to be provided with the service of regularly restarting, then the method for not timing startup is, is made as sky to WhichWeek, and everydayRestart is set to vacation.Can be regularly what time restart following application like: WhichWeek=" " everydayRestart=" FALSF " parametric t ime (time: divide: second) what day (1~7) of WhichWeek, if everydayRestart=" TRUE " then WhichWeek is invalid several weekly.For special reboot flag, when special reboot flag was got 0 value, expression was only restarted oneself, and is irrelevant with unlock code; When special reboot flag was got 1 value, the own and sequence number all processes in its back were restarted in expression; When special reboot flag was got 2 values, expression need all start anew to start all processes by sequence number.Generally speaking, special reboot flag is 1, and special reboot flag 0 and 2 is the restrictions that are used to break unlock code, has broken the dependence between process, generally gives over to expansion, perhaps only is realize unconventional Starting mode.Just as some program most people is normal use, but procedure development person has but stayed the back door, oneself comes the nonconventional approaches of usefulness in the time of urgent.When hardware fault, situation about can't recover such as overheated occurring, can settings be 2 special reboot flag, just can restart all distributed programs in regular turn this moment.Value is that 0 special reboot flag representes do not have dependence between this process and other processes, is independently, but the needs that start for unified configuration are arranged on it in this table, and utilize 0 special reboot flag break and other processes between dependence.
Introduce client configuration file and server profile below.
The client configuration file is following:
<?xml?version=″1.0″encoding=″GB2312″?>
< p AutoStart=" true " name=" fault-tolerant client " localPort=" 10011 " ADServerIP=" 127.0.0.1 "
ADServerPort″10012″>
<apps>
App name=" gateway " fullPathName=" ibServer.exe "/
< app name=" Scankeyword " fullPathName=" D: program tt " />
</apps>
</p>
Explanation for the client configuration file is following:
1, the name of < p>label is the name of client, must fill in, and can not repeat.
2, the name of < app>label is the name of process, must fill in, and on same client, can not repeat.
3, ADServerIP=" 127.0.0.1 " ADServerPort=" 10012 " be fault-tolerant server ip and port also
Must fill in.
The server end configuration file is following:
<?xml?version=″1.0″encoding=″GB2312″?>
<p?AutoStart=″true″localPort=″10012″>
<appsByOrder>
< app order=" 1 " name=" Scankeyword " whichClient=" fault-tolerant client " afterSencsSendNextIns=" 1 "
time=″15:5:5″WhichWeek=″7″everydayRestart?″TRUE″specialRestartFlag=″1″/>
< app order=" 2 " name=" gateway " whichClient=" fault-tolerant client " afterSencsSendNextIns=" 1 "
time=″15:5:5″WhichWeek=″7″everydayRestart=″TRUE″specialRestartFlag=″0″/>
</appsByOrder>
</p>
Server profile is explained as follows:
1, boot sequence is since 1, and order increases progressively, to guarantee and can regularly restart. and with no matter program is to have made mistakes (jump frame, in the dust), has still flown (process disappears, and not start) and all can repair by boot sequence
2, not timing startup service method is, is made as sky to WhichWeek, and everydayRestart is set to vacation.As: WhichWeek=" " everydayRestart=" FALSE "
3, what time parametric t ime (time: divide: second) what day (1~7) of WhichWeek can be regularly restarts following application several weekly, if cverydayRestart=" TRUE " then WhichWeek is invalid.
4, whichClicnt is meant the client at program place.SpecialRestart Flag value 0-only restarts that oneself is irrelevant with unlock code, and 1-is restarted all in its back of own and sequence number, and 2-need all start anew by sequence number.
5, noting regularly starting is What You See Is What You Get, all has nothing to do with boot sequence, time-delay second number and special reboot flag.
Through the above-mentioned client configuration file and the setting of server profile; Can be with unordered being deployed on the different arbitrarily physical machines of distributed program (being applied as example with control center, gateway, recording server etc. in this application); Utilize this client configuration file, client is just known it will keep watch on for which process, and the server end configuration file is a form of expression; Also can show as the form of database, the strategy of perhaps depositing in the database etc.According to distributed process dependence form, in any case dispose, the dependence between distributed program is but fixed.And, through above-mentioned embodiment of the present invention, state that can automatic monitoring distributed program process, and when the distributed program process is made mistakes, can automatically perform reply and handle, thereby saved lot of manpower and material resources.
Embodiment of the present invention has been described as stated.Yet the present invention is not limited to the scope of above-mentioned embodiment.Can make various modifications and improvement to above-mentioned embodiment without departing from the spirit and scope of the present invention.Scope of the present invention is limited accompanying claims.