CN100388218C - Method for realizing backup between servers - Google Patents

Method for realizing backup between servers Download PDF

Info

Publication number
CN100388218C
CN100388218C CNB021123209A CN02112320A CN100388218C CN 100388218 C CN100388218 C CN 100388218C CN B021123209 A CNB021123209 A CN B021123209A CN 02112320 A CN02112320 A CN 02112320A CN 100388218 C CN100388218 C CN 100388218C
Authority
CN
China
Prior art keywords
machine
main computer
guest machine
standby
frequency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
CNB021123209A
Other languages
Chinese (zh)
Other versions
CN1464396A (en
Inventor
丁震
陈世忠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Global Innovation Polymerization LLC
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Priority to CNB021123209A priority Critical patent/CN100388218C/en
Publication of CN1464396A publication Critical patent/CN1464396A/en
Application granted granted Critical
Publication of CN100388218C publication Critical patent/CN100388218C/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Landscapes

  • Hardware Redundancy (AREA)

Abstract

The present invention provides a method for realizing system standby between servers. The method comprises the following steps that service processing programs on N master machines and one standby machine are started; the N master machines work in cooperation; simultaneously; the standby machine sends link detecting IP messages to other master machines and waits for corresponding responses so as to monitor the states of the master machines; if the standby machine confirms a failure machine, arbitration data is revised and new arbitration data is notified to other service servers by the standby machine; then, the standby machine is restarted to become a new master machine and to take over the work of the failure machine; when the failures of the failure machine are eliminated, service programs are started once again to become a new standby machine and to continue working. The standby machine of the present invention can actively find out and take over the master machine with failures and the failure machine can become the new standby machine and be added into the system after the failures are repaired. Therefore, the present invention has the advantages of good performance for short message devices, high stability for the short message devices, low service processing server device cost and high market competition capability for server devices.

Description

A kind of method that between server, realizes backup
Technical field
The present invention relates to a kind of method that between server program, realizes backup, relate in particular to the N:1 redundancy backup between the traffic service program in the short-message system of field of mobile communication.
Background technology
Along with the development of mobile communication, short message becomes the business that users like, also becomes the important source of profit of telecom operators, and annual short message service amount has reached tens billion of.The business of expansion is had higher requirement to performance and stability to short message equipment rapidly.
In short-message system, service server is the core of business processing, often needs it is carried out the dual-computer redundancy backup, other one is in standby condition during i.e. work, and constantly detect the state of main computer, in case find the main computer fault, the work that connects at once that let it be.Solution generally is to depend on operating system or the third-party software of trooping.
Simultaneously, in order to improve the processing power of short message service center, need some service server collaborative works.In the case, if continue to use former dual-host backup scheme, just need prepare a guest machine for the service processor of each work, so, the cost of equipment will be multiplied.
Therefore, can not cut down performance and stability, the cost that reduces equipment again to greatest extent becomes a pair of contradiction anxious to be solved.
In Chinese patent application number is 01106482 patented claim, mentioned the scheme of cooperation backup between a kind of service server.In this scheme, a certain server is responsible for allocating task, and some other servers are responsible for Processing tasks.In system, the server of being responsible for allocating task becomes the dangerous point of system, and the technical scheme of this patented claim is that this server is backed up.Thus, this scheme has been introduced new dangerous point in system, and the backup of dangerous point is remained a kind of 1: 1 dual-host backup thinking.
Summary of the invention
The technical problem to be solved in the present invention is in order to have overcome in the existing short message equipment every Service Process Server all to be needed to carry out dual-host backup, cause the higher shortcoming of cost, proposed a kind of N:1, promptly many main computers add the method that a guest machine backs up.
Technical scheme of the present invention is as described below:
The first step
Business processing program on N platform main computer and 1 guest machine according to first main computer after the sequence starting of guest machine.All business processing program same arbitration data that read from system all in the start-up course, these arbitration data comprise the main usefulness of this N+1 platform service server or the information of standby role.That is to say that all servers are known the role of oneself and the role of other machine;
Second step
The collaborative work of N platform main computer; Simultaneously, guest machine sends the link detecting IP message and waits for corresponding response to other main computer with certain given frequency, thereby monitors their state;
The 3rd step
If guest machine is not received the response message of certain main computer, then will add one to frequency of failure counter that should main computer.If this frequency of failure counter does not reach the threshold values of setting, then continued for second step; Otherwise, to the 4th step;
The 4th step
Guest machine is confirmed the fault machine, revises above-mentioned arbitration data, and gives other service server with new arbitration data notification, restarts oneself then, makes it to become new main computer, the work of taking over fault machine;
The 5th step
The fault machine starts business procedure once more after fault is got rid of, making oneself becomes new guest machine, returns for second step and continues.
Adopt the method for the invention, compared with prior art, owing to taked the technical measures of N:1 redundancy backup, the main computer that breaks down can initiatively be found and take over to guest machine, and the fault machine can be used as new guest machine and adds system again after repairing fault.Can guarantee the performance and the stability of short message equipment like this, save the equipment cost of N-1 platform Service Process Server again, improve the market competitiveness of short message equipment.
Description of drawings
Fig. 1 is the position view of service server in system.
Fig. 2 is the main process flow diagram of guest machine operation in the system.
Fig. 3 is the main process that service server starts in the inventive method.
Fig. 4 starts the main process that laggard line link detects for guest machine in the inventive method.
Whether Fig. 5 exists the fault machine in the guest machine detection system in the inventive method, and the main process of taking over fault machine.
Embodiment
Fig. 1 has introduced the concrete enforcement of N:1 backup scenario under communication system short message service center engineering-environment.Total system makes up on the network environment of TCP/IP, and each node can be visited mutually by message.What participate in the N:1 back-up job among Fig. 1 mainly comprises 4 service server nodes and 1 arbitration back end.Wherein, the current role that the arbitration back end is preserved each service server node, promptly who is main usefulness, who is standby.The arbitration data can leave in the database, also can take other storage mode.All to read the arbitration data during each service server program start from the arbitration back end, thus know oneself the role and system in the role of other each service server.At a time, 4 service server nodes among Fig. 1 have only 1 guest machine (being the service server 4 among Fig. 1), and all the other 3 is main computer.Other server among this figure is the summary signal of all other nodes, as required, can comprise gateway, Operation and Maintenance server etc.
Fig. 2 is that the master of guest machine uses workflow.As can be seen from Fig. 2, the work of guest machine mainly contains three important step: start, monitor, take over.At first startup of server confirms it self is standby host (start-up course sees Fig. 3 for details).Then, guest machine is monitored their state (observation process sees Fig. 4 for details) by the mechanism that sends message and wait-for-response to each main computer.Guest machine is according to the result of monitoring, and whether according to certain mechanism, judging has the service server node that fault has taken place in the system, and carries out corresponding action, the work (the adapter process sees Fig. 5 for details) of taking over this fault machine.Main computer is directly managed business after starting.
Fig. 3 is that the master that service server starts uses flow process.Need read current arbitration data from the arbitration back end during startup, as in this example service server 4 being configured to guest machine, all the other are main computer.Like this, become main computer and mutual collaborative process business after service server 1,2,3 starts; After starting, service server 4 becomes guest machine.Master after standby host starts comprises with work: init state table, counter and two timers of startup.State table has write down the state of all working traffic servers, and state can be normal or open circuit, and dynamically updates (seeing Fig. 4 for details) according to the result of link detecting, and all node initializing are normal.Counter is meant the frequency of failure counter of each working traffic server, the corresponding counter of each main frame.Guest machine detects certain working traffic server and opens circuit once, and just the counter with correspondence adds one (seeing Fig. 5 for details), and all counters are initialized as zero.Timer is meant link detecting timer and fail counter processing timer, is expressed as timer 1 and timer 2 respectively.As required, timer 1 can be set to 3 seconds regularly, promptly per 3 seconds timers then enter the treatment scheme of Fig. 4; Timer 2 also is made as 3 seconds regularly, per treatment scheme that entered Fig. 5 in 3 seconds.
What Fig. 4 described is timer 1---the treatment scheme of link detecting timer after then.The arbitration tables of data that guest machine reads when starting, each main computer node sends link detecting message in the arbitration tables of data, and temporary transient earlier in the service server state table state each node be made as " opening circuit ".Start timer 1 after all being sent completely once more, wait for the response of link detecting message then.When the response of receiving certain node, just the node state with correspondence is changed to " normally ".
Fig. 5 has described timer 2---and fail counter is handled the treatment scheme of timer after then.Timer 2 then, the service server state table that constantly refreshes in the scintigram 3 is provided with the fail counter table according to scanning result.If certain node state is " normally ", then fail counter zero clearing that will be corresponding with it; If the state of certain node is " opening circuit ", then corresponding with it fail counter is added one.If the counting of the fail counter of certain node such as 20, can determine then that this node breaks down greater than the threshold values of setting.Then, guest machine is revised the arbitration data on the arbitration node, soon oneself is set as main computer, and malfunctioning node is made as new guest machine.Next step sends to other each service server to these new arbitration data, restarts oneself again, becomes real main computer, begins to manage business.If do not find any malfunctioning node specifically, standby host is set timer 2 once more, waits for next time and checking.In addition, the fault machine adds system again after reparation, will become new guest machine monitoring and prepare to take over other server at any time.
Though the present invention has provided the embodiment at communication system short message service server, according to thinking of the present invention, one of ordinary skill in the art can realize the backup of N:1 fully in other any system that needs multiple servers to back up.

Claims (5)

1. method that realizes backup between server is characterized in that:
The first step, business processing program on N platform main computer and 1 guest machine according to first main computer after the sequence starting of guest machine, all business processing program same arbitration data that read from system all in the start-up course, these arbitration data comprise the main usefulness of this N+1 platform service server or the information of standby role;
Second step, the collaborative work of N platform main computer; Simultaneously, guest machine sends the link detecting IP message and waits for corresponding response to other main computer with certain given frequency, thereby monitors their state;
In the 3rd step,, then will add one to frequency of failure counter that should main computer if guest machine is not received the response message of certain main computer; If this frequency of failure counter does not reach the threshold values of setting, then returned for second step; Otherwise, to the 4th step;
In the 4th step, guest machine is confirmed the fault machine, revises above-mentioned arbitration data, promptly revise the information of leading usefulness or standby role of N+1 platform service server, and give other service server, restart oneself then new arbitration data notification, make it to become new main computer, the work of taking over fault machine; In the 5th step, the fault machine starts business procedure once more after fault is got rid of, and making oneself becomes new guest machine, returns for second step and continues.
2. the method for realization backup according to claim 1, it is characterized in that: in the described step 1, also comprise the groundwork process after standby host starts: init state table, counter and startup are used for the timer 1 of link detecting and are used for the timer 2 of frequency of failure counter.
3. the method for realization backup according to claim 2, it is characterized in that, in the described step 2, described guest machine sends the link detecting IP message with certain given frequency to other main computer: when link detecting timer 1 then after, the arbitration data that guest machine reads when starting, each main computer node sends link detecting message in the arbitration data, and temporary transient elder generation state each node in the service server state table is made as " opening circuit "; Start timer 1 after all being sent completely once more, wait for the response of link detecting message then; When the response of receiving certain node, just the node state with correspondence is changed to " normally ".
4. the method for realization according to claim 2 backup is characterized in that, when timer 2 then after, scanning service server state table is provided with frequency of failure counter according to scanning result; If certain node state is " normally ", then frequency of failure counter O reset that will be corresponding with it; If the state of certain node is " opening circuit ", then corresponding with it frequency of failure counter is added one; If the counting of the frequency of failure counter of certain node, determines then that this node breaks down greater than the threshold values of setting.
5. according to the method for claim 3 or 4 described realization backups, it is characterized in that if do not find any fault machine specifically, guest machine is set timer 2 once more, wait for next time and checking.
CNB021123209A 2002-06-27 2002-06-27 Method for realizing backup between servers Expired - Lifetime CN100388218C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB021123209A CN100388218C (en) 2002-06-27 2002-06-27 Method for realizing backup between servers

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB021123209A CN100388218C (en) 2002-06-27 2002-06-27 Method for realizing backup between servers

Publications (2)

Publication Number Publication Date
CN1464396A CN1464396A (en) 2003-12-31
CN100388218C true CN100388218C (en) 2008-05-14

Family

ID=29742140

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB021123209A Expired - Lifetime CN100388218C (en) 2002-06-27 2002-06-27 Method for realizing backup between servers

Country Status (1)

Country Link
CN (1) CN100388218C (en)

Families Citing this family (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100353321C (en) * 2004-02-21 2007-12-05 华为技术有限公司 System with primary application and spare program and starting method
US7904612B2 (en) * 2004-07-08 2011-03-08 International Business Machines Corporation Ticket mechanism for sharing computer resources
US7953703B2 (en) * 2005-02-17 2011-05-31 International Business Machines Corporation Creation of highly available pseudo-clone standby servers for rapid failover provisioning
JP2006260325A (en) * 2005-03-18 2006-09-28 Fujitsu Ltd Failure transmission method
CN100391162C (en) * 2005-04-13 2008-05-28 华为技术有限公司 Control method for switching server
US8195976B2 (en) * 2005-06-29 2012-06-05 International Business Machines Corporation Fault-tolerance and fault-containment models for zoning clustered application silos into continuous availability and high availability zones in clustered systems during recovery and maintenance
CN100354835C (en) * 2005-11-11 2007-12-12 哈尔滨工业大学 Fault-tolerant server based on arbitration
CN1859423B (en) * 2006-02-27 2010-12-08 华为技术有限公司 Synchronous switching method for host and repeat device
CN100461697C (en) * 2006-04-18 2009-02-11 华为技术有限公司 Service take-over method based on device disaster tolerance, service switching device and backup machine
CN100461106C (en) * 2007-02-09 2009-02-11 无敌科技(西安)有限公司 Multiple protection method of start-up program
CN101453312B (en) * 2007-11-30 2012-06-27 中国移动通信集团公司 Method and apparatus for device backup
CN101453366B (en) * 2007-11-30 2011-03-23 英业达股份有限公司 Method and system for on-line repair in real-time
CN101631204B (en) * 2008-07-15 2012-10-31 北大方正集团有限公司 Method and device for following broadcast in broadcast controlling system
CN101888610A (en) * 2010-07-06 2010-11-17 中兴通讯股份有限公司 Method, system and device for realizing short message service
CN102075380B (en) * 2010-12-16 2014-12-10 中兴通讯股份有限公司 Method and device for detecting server state
CN102630046B (en) * 2012-03-13 2015-07-15 深圳市九洲电器有限公司 Data acquisition system, method, set-top box, network server
CN103902665A (en) * 2014-03-11 2014-07-02 浪潮电子信息产业股份有限公司 Storage virtualization system implementation method
CN104980693A (en) * 2014-04-11 2015-10-14 深圳中兴力维技术有限公司 Media service backup method and system
JP6409812B2 (en) * 2016-04-01 2018-10-24 横河電機株式会社 Redundancy apparatus, redundancy system, and redundancy method
CN105897508A (en) * 2016-04-01 2016-08-24 锐捷网络股份有限公司 Method and core switch for service processing of distributed data center
CN109257220B (en) * 2018-09-25 2021-10-29 中电科微波通信(上海)股份有限公司 Standby acquisition terminal and rail vehicle equipment data acquisition system
CN111669280B (en) * 2019-03-06 2023-05-16 中兴通讯股份有限公司 Message transmission method, device and storage medium
CN112682955A (en) * 2020-12-18 2021-04-20 广东芬尼克兹节能设备有限公司 Unit replacement control method and system of heat pump system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10161895A (en) * 1996-11-28 1998-06-19 Hitachi Ltd Server backup method
JP2001045023A (en) * 1999-08-02 2001-02-16 Matsushita Electric Ind Co Ltd Video server system and video data distribution method
CN1300393A (en) * 1998-05-14 2001-06-20 摩托罗拉公司 Method for switching between multiple system hosts
CN1340928A (en) * 2000-09-02 2002-03-20 深圳市中兴通讯股份有限公司 Stand-by method and device of communication system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10161895A (en) * 1996-11-28 1998-06-19 Hitachi Ltd Server backup method
CN1300393A (en) * 1998-05-14 2001-06-20 摩托罗拉公司 Method for switching between multiple system hosts
JP2001045023A (en) * 1999-08-02 2001-02-16 Matsushita Electric Ind Co Ltd Video server system and video data distribution method
CN1340928A (en) * 2000-09-02 2002-03-20 深圳市中兴通讯股份有限公司 Stand-by method and device of communication system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
P特开平10-161895A 1998.06.19

Also Published As

Publication number Publication date
CN1464396A (en) 2003-12-31

Similar Documents

Publication Publication Date Title
CN100388218C (en) Method for realizing backup between servers
CN106254100B (en) A kind of data disaster tolerance methods, devices and systems
CN103607297A (en) Fault processing method of computer cluster system
US7093013B1 (en) High availability system for network elements
CN105302661A (en) System and method for implementing virtualization management platform high availability
CN112506702B (en) Disaster recovery method, device, equipment and storage medium for data center
CN103309790A (en) Method and device for monitoring mobile terminal
US20080082630A1 (en) System and method of fault tolerant reconciliation for control card redundancy
CN110618864A (en) Interrupt task recovery method and device
CN113825164A (en) Network fault repairing method and device, storage medium and electronic equipment
CN112422684A (en) Target message processing method and device, storage medium and electronic device
CN113794597A (en) Alarm information processing method, system, electronic device and storage medium
CN114327967A (en) Equipment repairing method and device, storage medium and electronic device
CN102143011A (en) Device and method for realizing network protection
CN113765705A (en) Traffic switching method and traffic management server for cross-public-cloud dual-active structure
CN101958925A (en) Method and device for controlling remote equipment
CN112486713B (en) Frozen screen processing method and electronic equipment
CN113900855A (en) Active hot start method, system and device for abnormal state of switch
CN111858193A (en) Method and system for realizing server pool service
JP2009211279A (en) Handling data management server system
CN112437146A (en) Equipment state synchronization method, device and system
JPH07319836A (en) Fault monitoring system
Corsava et al. Self-healing intelligent infrastructure for computational clusters
KR970072822A (en) How to Manage Performance on Distributed Access Node Systems
CN105634975B (en) A kind of load share method of short wave communication network

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20180426

Address after: California, USA

Patentee after: Global innovation polymerization LLC

Address before: 518057 Department of law, Zhongxing building, South Science and technology road, Nanshan District hi tech Industrial Park, Shenzhen

Patentee before: ZTE Corp.

TR01 Transfer of patent right
CX01 Expiry of patent term

Granted publication date: 20080514

CX01 Expiry of patent term