CN100388218C - Method for realizing backup between servers - Google Patents
Method for realizing backup between servers Download PDFInfo
- Publication number
- CN100388218C CN100388218C CNB021123209A CN02112320A CN100388218C CN 100388218 C CN100388218 C CN 100388218C CN B021123209 A CNB021123209 A CN B021123209A CN 02112320 A CN02112320 A CN 02112320A CN 100388218 C CN100388218 C CN 100388218C
- Authority
- CN
- China
- Prior art keywords
- machine
- main computer
- guest machine
- standby
- frequency
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
Images
Landscapes
- Hardware Redundancy (AREA)
Abstract
The present invention provides a method for realizing system standby between servers. The method comprises the following steps that service processing programs on N master machines and one standby machine are started; the N master machines work in cooperation; simultaneously; the standby machine sends link detecting IP messages to other master machines and waits for corresponding responses so as to monitor the states of the master machines; if the standby machine confirms a failure machine, arbitration data is revised and new arbitration data is notified to other service servers by the standby machine; then, the standby machine is restarted to become a new master machine and to take over the work of the failure machine; when the failures of the failure machine are eliminated, service programs are started once again to become a new standby machine and to continue working. The standby machine of the present invention can actively find out and take over the master machine with failures and the failure machine can become the new standby machine and be added into the system after the failures are repaired. Therefore, the present invention has the advantages of good performance for short message devices, high stability for the short message devices, low service processing server device cost and high market competition capability for server devices.
Description
Technical field
The present invention relates to a kind of method that between server program, realizes backup, relate in particular to the N:1 redundancy backup between the traffic service program in the short-message system of field of mobile communication.
Background technology
Along with the development of mobile communication, short message becomes the business that users like, also becomes the important source of profit of telecom operators, and annual short message service amount has reached tens billion of.The business of expansion is had higher requirement to performance and stability to short message equipment rapidly.
In short-message system, service server is the core of business processing, often needs it is carried out the dual-computer redundancy backup, other one is in standby condition during i.e. work, and constantly detect the state of main computer, in case find the main computer fault, the work that connects at once that let it be.Solution generally is to depend on operating system or the third-party software of trooping.
Simultaneously, in order to improve the processing power of short message service center, need some service server collaborative works.In the case, if continue to use former dual-host backup scheme, just need prepare a guest machine for the service processor of each work, so, the cost of equipment will be multiplied.
Therefore, can not cut down performance and stability, the cost that reduces equipment again to greatest extent becomes a pair of contradiction anxious to be solved.
In Chinese patent application number is 01106482 patented claim, mentioned the scheme of cooperation backup between a kind of service server.In this scheme, a certain server is responsible for allocating task, and some other servers are responsible for Processing tasks.In system, the server of being responsible for allocating task becomes the dangerous point of system, and the technical scheme of this patented claim is that this server is backed up.Thus, this scheme has been introduced new dangerous point in system, and the backup of dangerous point is remained a kind of 1: 1 dual-host backup thinking.
Summary of the invention
The technical problem to be solved in the present invention is in order to have overcome in the existing short message equipment every Service Process Server all to be needed to carry out dual-host backup, cause the higher shortcoming of cost, proposed a kind of N:1, promptly many main computers add the method that a guest machine backs up.
Technical scheme of the present invention is as described below:
The first step
Business processing program on N platform main computer and 1 guest machine according to first main computer after the sequence starting of guest machine.All business processing program same arbitration data that read from system all in the start-up course, these arbitration data comprise the main usefulness of this N+1 platform service server or the information of standby role.That is to say that all servers are known the role of oneself and the role of other machine;
Second step
The collaborative work of N platform main computer; Simultaneously, guest machine sends the link detecting IP message and waits for corresponding response to other main computer with certain given frequency, thereby monitors their state;
The 3rd step
If guest machine is not received the response message of certain main computer, then will add one to frequency of failure counter that should main computer.If this frequency of failure counter does not reach the threshold values of setting, then continued for second step; Otherwise, to the 4th step;
The 4th step
Guest machine is confirmed the fault machine, revises above-mentioned arbitration data, and gives other service server with new arbitration data notification, restarts oneself then, makes it to become new main computer, the work of taking over fault machine;
The 5th step
The fault machine starts business procedure once more after fault is got rid of, making oneself becomes new guest machine, returns for second step and continues.
Adopt the method for the invention, compared with prior art, owing to taked the technical measures of N:1 redundancy backup, the main computer that breaks down can initiatively be found and take over to guest machine, and the fault machine can be used as new guest machine and adds system again after repairing fault.Can guarantee the performance and the stability of short message equipment like this, save the equipment cost of N-1 platform Service Process Server again, improve the market competitiveness of short message equipment.
Description of drawings
Fig. 1 is the position view of service server in system.
Fig. 2 is the main process flow diagram of guest machine operation in the system.
Fig. 3 is the main process that service server starts in the inventive method.
Fig. 4 starts the main process that laggard line link detects for guest machine in the inventive method.
Whether Fig. 5 exists the fault machine in the guest machine detection system in the inventive method, and the main process of taking over fault machine.
Embodiment
Fig. 1 has introduced the concrete enforcement of N:1 backup scenario under communication system short message service center engineering-environment.Total system makes up on the network environment of TCP/IP, and each node can be visited mutually by message.What participate in the N:1 back-up job among Fig. 1 mainly comprises 4 service server nodes and 1 arbitration back end.Wherein, the current role that the arbitration back end is preserved each service server node, promptly who is main usefulness, who is standby.The arbitration data can leave in the database, also can take other storage mode.All to read the arbitration data during each service server program start from the arbitration back end, thus know oneself the role and system in the role of other each service server.At a time, 4 service server nodes among Fig. 1 have only 1 guest machine (being the service server 4 among Fig. 1), and all the other 3 is main computer.Other server among this figure is the summary signal of all other nodes, as required, can comprise gateway, Operation and Maintenance server etc.
Fig. 2 is that the master of guest machine uses workflow.As can be seen from Fig. 2, the work of guest machine mainly contains three important step: start, monitor, take over.At first startup of server confirms it self is standby host (start-up course sees Fig. 3 for details).Then, guest machine is monitored their state (observation process sees Fig. 4 for details) by the mechanism that sends message and wait-for-response to each main computer.Guest machine is according to the result of monitoring, and whether according to certain mechanism, judging has the service server node that fault has taken place in the system, and carries out corresponding action, the work (the adapter process sees Fig. 5 for details) of taking over this fault machine.Main computer is directly managed business after starting.
Fig. 3 is that the master that service server starts uses flow process.Need read current arbitration data from the arbitration back end during startup, as in this example service server 4 being configured to guest machine, all the other are main computer.Like this, become main computer and mutual collaborative process business after service server 1,2,3 starts; After starting, service server 4 becomes guest machine.Master after standby host starts comprises with work: init state table, counter and two timers of startup.State table has write down the state of all working traffic servers, and state can be normal or open circuit, and dynamically updates (seeing Fig. 4 for details) according to the result of link detecting, and all node initializing are normal.Counter is meant the frequency of failure counter of each working traffic server, the corresponding counter of each main frame.Guest machine detects certain working traffic server and opens circuit once, and just the counter with correspondence adds one (seeing Fig. 5 for details), and all counters are initialized as zero.Timer is meant link detecting timer and fail counter processing timer, is expressed as timer 1 and timer 2 respectively.As required, timer 1 can be set to 3 seconds regularly, promptly per 3 seconds timers then enter the treatment scheme of Fig. 4; Timer 2 also is made as 3 seconds regularly, per treatment scheme that entered Fig. 5 in 3 seconds.
What Fig. 4 described is timer 1---the treatment scheme of link detecting timer after then.The arbitration tables of data that guest machine reads when starting, each main computer node sends link detecting message in the arbitration tables of data, and temporary transient earlier in the service server state table state each node be made as " opening circuit ".Start timer 1 after all being sent completely once more, wait for the response of link detecting message then.When the response of receiving certain node, just the node state with correspondence is changed to " normally ".
Fig. 5 has described timer 2---and fail counter is handled the treatment scheme of timer after then.Timer 2 then, the service server state table that constantly refreshes in the scintigram 3 is provided with the fail counter table according to scanning result.If certain node state is " normally ", then fail counter zero clearing that will be corresponding with it; If the state of certain node is " opening circuit ", then corresponding with it fail counter is added one.If the counting of the fail counter of certain node such as 20, can determine then that this node breaks down greater than the threshold values of setting.Then, guest machine is revised the arbitration data on the arbitration node, soon oneself is set as main computer, and malfunctioning node is made as new guest machine.Next step sends to other each service server to these new arbitration data, restarts oneself again, becomes real main computer, begins to manage business.If do not find any malfunctioning node specifically, standby host is set timer 2 once more, waits for next time and checking.In addition, the fault machine adds system again after reparation, will become new guest machine monitoring and prepare to take over other server at any time.
Though the present invention has provided the embodiment at communication system short message service server, according to thinking of the present invention, one of ordinary skill in the art can realize the backup of N:1 fully in other any system that needs multiple servers to back up.
Claims (5)
1. method that realizes backup between server is characterized in that:
The first step, business processing program on N platform main computer and 1 guest machine according to first main computer after the sequence starting of guest machine, all business processing program same arbitration data that read from system all in the start-up course, these arbitration data comprise the main usefulness of this N+1 platform service server or the information of standby role;
Second step, the collaborative work of N platform main computer; Simultaneously, guest machine sends the link detecting IP message and waits for corresponding response to other main computer with certain given frequency, thereby monitors their state;
In the 3rd step,, then will add one to frequency of failure counter that should main computer if guest machine is not received the response message of certain main computer; If this frequency of failure counter does not reach the threshold values of setting, then returned for second step; Otherwise, to the 4th step;
In the 4th step, guest machine is confirmed the fault machine, revises above-mentioned arbitration data, promptly revise the information of leading usefulness or standby role of N+1 platform service server, and give other service server, restart oneself then new arbitration data notification, make it to become new main computer, the work of taking over fault machine; In the 5th step, the fault machine starts business procedure once more after fault is got rid of, and making oneself becomes new guest machine, returns for second step and continues.
2. the method for realization backup according to claim 1, it is characterized in that: in the described step 1, also comprise the groundwork process after standby host starts: init state table, counter and startup are used for the timer 1 of link detecting and are used for the timer 2 of frequency of failure counter.
3. the method for realization backup according to claim 2, it is characterized in that, in the described step 2, described guest machine sends the link detecting IP message with certain given frequency to other main computer: when link detecting timer 1 then after, the arbitration data that guest machine reads when starting, each main computer node sends link detecting message in the arbitration data, and temporary transient elder generation state each node in the service server state table is made as " opening circuit "; Start timer 1 after all being sent completely once more, wait for the response of link detecting message then; When the response of receiving certain node, just the node state with correspondence is changed to " normally ".
4. the method for realization according to claim 2 backup is characterized in that, when timer 2 then after, scanning service server state table is provided with frequency of failure counter according to scanning result; If certain node state is " normally ", then frequency of failure counter O reset that will be corresponding with it; If the state of certain node is " opening circuit ", then corresponding with it frequency of failure counter is added one; If the counting of the frequency of failure counter of certain node, determines then that this node breaks down greater than the threshold values of setting.
5. according to the method for claim 3 or 4 described realization backups, it is characterized in that if do not find any fault machine specifically, guest machine is set timer 2 once more, wait for next time and checking.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB021123209A CN100388218C (en) | 2002-06-27 | 2002-06-27 | Method for realizing backup between servers |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB021123209A CN100388218C (en) | 2002-06-27 | 2002-06-27 | Method for realizing backup between servers |
Publications (2)
Publication Number | Publication Date |
---|---|
CN1464396A CN1464396A (en) | 2003-12-31 |
CN100388218C true CN100388218C (en) | 2008-05-14 |
Family
ID=29742140
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNB021123209A Expired - Lifetime CN100388218C (en) | 2002-06-27 | 2002-06-27 | Method for realizing backup between servers |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN100388218C (en) |
Families Citing this family (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN100353321C (en) * | 2004-02-21 | 2007-12-05 | 华为技术有限公司 | System with primary application and spare program and starting method |
US7904612B2 (en) * | 2004-07-08 | 2011-03-08 | International Business Machines Corporation | Ticket mechanism for sharing computer resources |
US7953703B2 (en) * | 2005-02-17 | 2011-05-31 | International Business Machines Corporation | Creation of highly available pseudo-clone standby servers for rapid failover provisioning |
JP2006260325A (en) * | 2005-03-18 | 2006-09-28 | Fujitsu Ltd | Failure transmission method |
CN100391162C (en) * | 2005-04-13 | 2008-05-28 | 华为技术有限公司 | Control method for switching server |
US8195976B2 (en) * | 2005-06-29 | 2012-06-05 | International Business Machines Corporation | Fault-tolerance and fault-containment models for zoning clustered application silos into continuous availability and high availability zones in clustered systems during recovery and maintenance |
CN100354835C (en) * | 2005-11-11 | 2007-12-12 | 哈尔滨工业大学 | Fault-tolerant server based on arbitration |
CN1859423B (en) * | 2006-02-27 | 2010-12-08 | 华为技术有限公司 | Synchronous switching method for host and repeat device |
CN100461697C (en) * | 2006-04-18 | 2009-02-11 | 华为技术有限公司 | Service take-over method based on device disaster tolerance, service switching device and backup machine |
CN100461106C (en) * | 2007-02-09 | 2009-02-11 | 无敌科技(西安)有限公司 | Multiple protection method of start-up program |
CN101453312B (en) * | 2007-11-30 | 2012-06-27 | 中国移动通信集团公司 | Method and apparatus for device backup |
CN101453366B (en) * | 2007-11-30 | 2011-03-23 | 英业达股份有限公司 | Method and system for on-line repair in real-time |
CN101631204B (en) * | 2008-07-15 | 2012-10-31 | 北大方正集团有限公司 | Method and device for following broadcast in broadcast controlling system |
CN101888610A (en) * | 2010-07-06 | 2010-11-17 | 中兴通讯股份有限公司 | Method, system and device for realizing short message service |
CN102075380B (en) * | 2010-12-16 | 2014-12-10 | 中兴通讯股份有限公司 | Method and device for detecting server state |
CN102630046B (en) * | 2012-03-13 | 2015-07-15 | 深圳市九洲电器有限公司 | Data acquisition system, method, set-top box, network server |
CN103902665A (en) * | 2014-03-11 | 2014-07-02 | 浪潮电子信息产业股份有限公司 | Storage virtualization system implementation method |
CN104980693A (en) * | 2014-04-11 | 2015-10-14 | 深圳中兴力维技术有限公司 | Media service backup method and system |
JP6409812B2 (en) * | 2016-04-01 | 2018-10-24 | 横河電機株式会社 | Redundancy apparatus, redundancy system, and redundancy method |
CN105897508A (en) * | 2016-04-01 | 2016-08-24 | 锐捷网络股份有限公司 | Method and core switch for service processing of distributed data center |
CN109257220B (en) * | 2018-09-25 | 2021-10-29 | 中电科微波通信(上海)股份有限公司 | Standby acquisition terminal and rail vehicle equipment data acquisition system |
CN111669280B (en) * | 2019-03-06 | 2023-05-16 | 中兴通讯股份有限公司 | Message transmission method, device and storage medium |
CN112682955A (en) * | 2020-12-18 | 2021-04-20 | 广东芬尼克兹节能设备有限公司 | Unit replacement control method and system of heat pump system |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH10161895A (en) * | 1996-11-28 | 1998-06-19 | Hitachi Ltd | Server backup method |
JP2001045023A (en) * | 1999-08-02 | 2001-02-16 | Matsushita Electric Ind Co Ltd | Video server system and video data distribution method |
CN1300393A (en) * | 1998-05-14 | 2001-06-20 | 摩托罗拉公司 | Method for switching between multiple system hosts |
CN1340928A (en) * | 2000-09-02 | 2002-03-20 | 深圳市中兴通讯股份有限公司 | Stand-by method and device of communication system |
-
2002
- 2002-06-27 CN CNB021123209A patent/CN100388218C/en not_active Expired - Lifetime
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH10161895A (en) * | 1996-11-28 | 1998-06-19 | Hitachi Ltd | Server backup method |
CN1300393A (en) * | 1998-05-14 | 2001-06-20 | 摩托罗拉公司 | Method for switching between multiple system hosts |
JP2001045023A (en) * | 1999-08-02 | 2001-02-16 | Matsushita Electric Ind Co Ltd | Video server system and video data distribution method |
CN1340928A (en) * | 2000-09-02 | 2002-03-20 | 深圳市中兴通讯股份有限公司 | Stand-by method and device of communication system |
Non-Patent Citations (1)
Title |
---|
P特开平10-161895A 1998.06.19 |
Also Published As
Publication number | Publication date |
---|---|
CN1464396A (en) | 2003-12-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN100388218C (en) | Method for realizing backup between servers | |
CN106254100B (en) | A kind of data disaster tolerance methods, devices and systems | |
CN103607297A (en) | Fault processing method of computer cluster system | |
US7093013B1 (en) | High availability system for network elements | |
CN105302661A (en) | System and method for implementing virtualization management platform high availability | |
CN112506702B (en) | Disaster recovery method, device, equipment and storage medium for data center | |
CN103309790A (en) | Method and device for monitoring mobile terminal | |
US20080082630A1 (en) | System and method of fault tolerant reconciliation for control card redundancy | |
CN110618864A (en) | Interrupt task recovery method and device | |
CN113825164A (en) | Network fault repairing method and device, storage medium and electronic equipment | |
CN112422684A (en) | Target message processing method and device, storage medium and electronic device | |
CN113794597A (en) | Alarm information processing method, system, electronic device and storage medium | |
CN114327967A (en) | Equipment repairing method and device, storage medium and electronic device | |
CN102143011A (en) | Device and method for realizing network protection | |
CN113765705A (en) | Traffic switching method and traffic management server for cross-public-cloud dual-active structure | |
CN101958925A (en) | Method and device for controlling remote equipment | |
CN112486713B (en) | Frozen screen processing method and electronic equipment | |
CN113900855A (en) | Active hot start method, system and device for abnormal state of switch | |
CN111858193A (en) | Method and system for realizing server pool service | |
JP2009211279A (en) | Handling data management server system | |
CN112437146A (en) | Equipment state synchronization method, device and system | |
JPH07319836A (en) | Fault monitoring system | |
Corsava et al. | Self-healing intelligent infrastructure for computational clusters | |
KR970072822A (en) | How to Manage Performance on Distributed Access Node Systems | |
CN105634975B (en) | A kind of load share method of short wave communication network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20180426 Address after: California, USA Patentee after: Global innovation polymerization LLC Address before: 518057 Department of law, Zhongxing building, South Science and technology road, Nanshan District hi tech Industrial Park, Shenzhen Patentee before: ZTE Corp. |
|
TR01 | Transfer of patent right | ||
CX01 | Expiry of patent term |
Granted publication date: 20080514 |
|
CX01 | Expiry of patent term |