CN100388218C

CN100388218C - Method for realizing backup between servers

Info

Publication number: CN100388218C
Application number: CNB021123209A
Authority: CN
Inventors: 丁震; 陈世忠
Original assignee: ZTE Corp
Current assignee: Global Innovation Polymerization LLC
Priority date: 2002-06-27
Filing date: 2002-06-27
Publication date: 2008-05-14
Anticipated expiration: 2022-06-27
Also published as: CN1464396A

Abstract

The present invention provides a method for realizing system standby between servers. The method comprises the following steps that service processing programs on N master machines and one standby machine are started; the N master machines work in cooperation; simultaneously; the standby machine sends link detecting IP messages to other master machines and waits for corresponding responses so as to monitor the states of the master machines; if the standby machine confirms a failure machine, arbitration data is revised and new arbitration data is notified to other service servers by the standby machine; then, the standby machine is restarted to become a new master machine and to take over the work of the failure machine; when the failures of the failure machine are eliminated, service programs are started once again to become a new standby machine and to continue working. The standby machine of the present invention can actively find out and take over the master machine with failures and the failure machine can become the new standby machine and be added into the system after the failures are repaired. Therefore, the present invention has the advantages of good performance for short message devices, high stability for the short message devices, low service processing server device cost and high market competition capability for server devices.

Description

A kind of method that between server, realizes backup

Technical field

The present invention relates to a kind of method that between server program, realizes backup, relate in particular to the N:1 redundancy backup between the traffic service program in the short-message system of field of mobile communication.

Background technology

Along with the development of mobile communication, short message becomes the business that users like, also becomes the important source of profit of telecom operators, and annual short message service amount has reached tens billion of.The business of expansion is had higher requirement to performance and stability to short message equipment rapidly.

In short-message system, service server is the core of business processing, often needs it is carried out the dual-computer redundancy backup, other one is in standby condition during i.e. work, and constantly detect the state of main computer, in case find the main computer fault, the work that connects at once that let it be.Solution generally is to depend on operating system or the third-party software of trooping.

Simultaneously, in order to improve the processing power of short message service center, need some service server collaborative works.In the case, if continue to use former dual-host backup scheme, just need prepare a guest machine for the service processor of each work, so, the cost of equipment will be multiplied.

Therefore, can not cut down performance and stability, the cost that reduces equipment again to greatest extent becomes a pair of contradiction anxious to be solved.

In Chinese patent application number is 01106482 patented claim, mentioned the scheme of cooperation backup between a kind of service server.In this scheme, a certain server is responsible for allocating task, and some other servers are responsible for Processing tasks.In system, the server of being responsible for allocating task becomes the dangerous point of system, and the technical scheme of this patented claim is that this server is backed up.Thus, this scheme has been introduced new dangerous point in system, and the backup of dangerous point is remained a kind of 1: 1 dual-host backup thinking.

Summary of the invention

The technical problem to be solved in the present invention is in order to have overcome in the existing short message equipment every Service Process Server all to be needed to carry out dual-host backup, cause the higher shortcoming of cost, proposed a kind of N:1, promptly many main computers add the method that a guest machine backs up.

Technical scheme of the present invention is as described below:

The first step

Business processing program on N platform main computer and 1 guest machine according to first main computer after the sequence starting of guest machine.All business processing program same arbitration data that read from system all in the start-up course, these arbitration data comprise the main usefulness of this N+1 platform service server or the information of standby role.That is to say that all servers are known the role of oneself and the role of other machine;

Second step

The collaborative work of N platform main computer; Simultaneously, guest machine sends the link detecting IP message and waits for corresponding response to other main computer with certain given frequency, thereby monitors their state;

The 3rd step

If guest machine is not received the response message of certain main computer, then will add one to frequency of failure counter that should main computer.If this frequency of failure counter does not reach the threshold values of setting, then continued for second step; Otherwise, to the 4th step;

The 4th step

Guest machine is confirmed the fault machine, revises above-mentioned arbitration data, and gives other service server with new arbitration data notification, restarts oneself then, makes it to become new main computer, the work of taking over fault machine;

The 5th step

The fault machine starts business procedure once more after fault is got rid of, making oneself becomes new guest machine, returns for second step and continues.

Adopt the method for the invention, compared with prior art, owing to taked the technical measures of N:1 redundancy backup, the main computer that breaks down can initiatively be found and take over to guest machine, and the fault machine can be used as new guest machine and adds system again after repairing fault.Can guarantee the performance and the stability of short message equipment like this, save the equipment cost of N-1 platform Service Process Server again, improve the market competitiveness of short message equipment.

Description of drawings

Fig. 1 is the position view of service server in system.

Fig. 2 is the main process flow diagram of guest machine operation in the system.

Fig. 3 is the main process that service server starts in the inventive method.

Fig. 4 starts the main process that laggard line link detects for guest machine in the inventive method.

Whether Fig. 5 exists the fault machine in the guest machine detection system in the inventive method, and the main process of taking over fault machine.

Embodiment

Fig. 1 has introduced the concrete enforcement of N:1 backup scenario under communication system short message service center engineering-environment.Total system makes up on the network environment of TCP/IP, and each node can be visited mutually by message.What participate in the N:1 back-up job among Fig. 1 mainly comprises 4 service server nodes and 1 arbitration back end.Wherein, the current role that the arbitration back end is preserved each service server node, promptly who is main usefulness, who is standby.The arbitration data can leave in the database, also can take other storage mode.All to read the arbitration data during each service server program start from the arbitration back end, thus know oneself the role and system in the role of other each service server.At a time, 4 service server nodes among Fig. 1 have only 1 guest machine (being the service server 4 among Fig. 1), and all the other 3 is main computer.Other server among this figure is the summary signal of all other nodes, as required, can comprise gateway, Operation and Maintenance server etc.

Fig. 2 is that the master of guest machine uses workflow.As can be seen from Fig. 2, the work of guest machine mainly contains three important step: start, monitor, take over.At first startup of server confirms it self is standby host (start-up course sees Fig. 3 for details).Then, guest machine is monitored their state (observation process sees Fig. 4 for details) by the mechanism that sends message and wait-for-response to each main computer.Guest machine is according to the result of monitoring, and whether according to certain mechanism, judging has the service server node that fault has taken place in the system, and carries out corresponding action, the work (the adapter process sees Fig. 5 for details) of taking over this fault machine.Main computer is directly managed business after starting.

Fig. 3 is that the master that service server starts uses flow process.Need read current arbitration data from the arbitration back end during startup, as in this example service server 4 being configured to guest machine, all the other are main computer.Like this, become main computer and mutual collaborative process business after service server 1,2,3 starts; After starting, service server 4 becomes guest machine.Master after standby host starts comprises with work: init state table, counter and two timers of startup.State table has write down the state of all working traffic servers, and state can be normal or open circuit, and dynamically updates (seeing Fig. 4 for details) according to the result of link detecting, and all node initializing are normal.Counter is meant the frequency of failure counter of each working traffic server, the corresponding counter of each main frame.Guest machine detects certain working traffic server and opens circuit once, and just the counter with correspondence adds one (seeing Fig. 5 for details), and all counters are initialized as zero.Timer is meant link detecting timer and fail counter processing timer, is expressed as timer 1 and timer 2 respectively.As required, timer 1 can be set to 3 seconds regularly, promptly per 3 seconds timers then enter the treatment scheme of Fig. 4; Timer 2 also is made as 3 seconds regularly, per treatment scheme that entered Fig. 5 in 3 seconds.

What Fig. 4 described is timer 1---the treatment scheme of link detecting timer after then.The arbitration tables of data that guest machine reads when starting, each main computer node sends link detecting message in the arbitration tables of data, and temporary transient earlier in the service server state table state each node be made as " opening circuit ".Start timer 1 after all being sent completely once more, wait for the response of link detecting message then.When the response of receiving certain node, just the node state with correspondence is changed to " normally ".

Fig. 5 has described timer 2---and fail counter is handled the treatment scheme of timer after then.Timer 2 then, the service server state table that constantly refreshes in the scintigram 3 is provided with the fail counter table according to scanning result.If certain node state is " normally ", then fail counter zero clearing that will be corresponding with it; If the state of certain node is " opening circuit ", then corresponding with it fail counter is added one.If the counting of the fail counter of certain node such as 20, can determine then that this node breaks down greater than the threshold values of setting.Then, guest machine is revised the arbitration data on the arbitration node, soon oneself is set as main computer, and malfunctioning node is made as new guest machine.Next step sends to other each service server to these new arbitration data, restarts oneself again, becomes real main computer, begins to manage business.If do not find any malfunctioning node specifically, standby host is set timer 2 once more, waits for next time and checking.In addition, the fault machine adds system again after reparation, will become new guest machine monitoring and prepare to take over other server at any time.

Though the present invention has provided the embodiment at communication system short message service server, according to thinking of the present invention, one of ordinary skill in the art can realize the backup of N:1 fully in other any system that needs multiple servers to back up.

Claims

1. method that realizes backup between server is characterized in that:

The first step, business processing program on N platform main computer and 1 guest machine according to first main computer after the sequence starting of guest machine, all business processing program same arbitration data that read from system all in the start-up course, these arbitration data comprise the main usefulness of this N+1 platform service server or the information of standby role;

Second step, the collaborative work of N platform main computer; Simultaneously, guest machine sends the link detecting IP message and waits for corresponding response to other main computer with certain given frequency, thereby monitors their state;

In the 3rd step,, then will add one to frequency of failure counter that should main computer if guest machine is not received the response message of certain main computer; If this frequency of failure counter does not reach the threshold values of setting, then returned for second step; Otherwise, to the 4th step;

In the 4th step, guest machine is confirmed the fault machine, revises above-mentioned arbitration data, promptly revise the information of leading usefulness or standby role of N+1 platform service server, and give other service server, restart oneself then new arbitration data notification, make it to become new main computer, the work of taking over fault machine; In the 5th step, the fault machine starts business procedure once more after fault is got rid of, and making oneself becomes new guest machine, returns for second step and continues.

2. the method for realization backup according to claim 1, it is characterized in that: in the described step 1, also comprise the groundwork process after standby host starts: init state table, counter and startup are used for the timer 1 of link detecting and are used for the timer 2 of frequency of failure counter.

3. the method for realization backup according to claim 2, it is characterized in that, in the described step 2, described guest machine sends the link detecting IP message with certain given frequency to other main computer: when link detecting timer 1 then after, the arbitration data that guest machine reads when starting, each main computer node sends link detecting message in the arbitration data, and temporary transient elder generation state each node in the service server state table is made as " opening circuit "; Start timer 1 after all being sent completely once more, wait for the response of link detecting message then; When the response of receiving certain node, just the node state with correspondence is changed to " normally ".

4. the method for realization according to claim 2 backup is characterized in that, when timer 2 then after, scanning service server state table is provided with frequency of failure counter according to scanning result; If certain node state is " normally ", then frequency of failure counter O reset that will be corresponding with it; If the state of certain node is " opening circuit ", then corresponding with it frequency of failure counter is added one; If the counting of the frequency of failure counter of certain node, determines then that this node breaks down greater than the threshold values of setting.

5. according to the method for claim 3 or 4 described realization backups, it is characterized in that if do not find any fault machine specifically, guest machine is set timer 2 once more, wait for next time and checking.