CN101309167B - Disaster allowable system and method based on cluster backup - Google Patents

Disaster allowable system and method based on cluster backup Download PDF

Info

Publication number
CN101309167B
CN101309167B CN200810048216XA CN200810048216A CN101309167B CN 101309167 B CN101309167 B CN 101309167B CN 200810048216X A CN200810048216X A CN 200810048216XA CN 200810048216 A CN200810048216 A CN 200810048216A CN 101309167 B CN101309167 B CN 101309167B
Authority
CN
China
Prior art keywords
primary server
load
joint
node
server joint
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN200810048216XA
Other languages
Chinese (zh)
Other versions
CN101309167A (en
Inventor
王芙蓉
史军
莫益军
黄辰
卢正新
李晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN200810048216XA priority Critical patent/CN101309167B/en
Publication of CN101309167A publication Critical patent/CN101309167A/en
Application granted granted Critical
Publication of CN101309167B publication Critical patent/CN101309167B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention relates to a disaster tolerance system and a disaster tolerance method based on the cluster backup; the disaster tolerance system includes a load agent unit and a load service unit; the load service unit includes at least two server nodes; wherein, a server node is the main server node which has service interaction with the user terminal; the other servers are non-main server nodes; the service interaction includes that when the main server node writes data to the local database of the main server node, the main server node backups the data in the local databases of the non-main server nodes in the load service unit; the load agent unit includes load schedulers which are respectively connected with the server nodes; when the main server node is detected to be breakdown, the main server node is executed with failure transferring operation; one server node of the non-main server nodes is selected as the main server node to have service interaction with the user terminal. The disaster tolerance system and the disaster tolerance method improve the utilization rate of the single server node and the whole utilization rate of the multiple peer server nodes.

Description

Disaster tolerance system and method based on cluster backup
Technical field
The invention belongs to the network system field, especially relate in the network system disaster tolerance system and method based on cluster backup.
Background technology
Growing along with the develop rapidly of modern network technology and number of users, the scale of network is increasing, therefore for the server in the network, no matter be single-machine capacity, or the quantity of server all will promote to meet consumers' demand synchronously.Simultaneously, also more and more higher to the requirement of the reliability of server and disaster tolerance ability.
For having a kind of reliable disaster tolerance mechanism under existence conditions, to try one's best after taking place in the disaster that causes servers go down to recover service response, must carry out redundancy backup to server data to the user.Because unit operation can not reach requirement far away, so important data must adopt the main-standby mode of the backup of main computer and guest machine, promptly be equipped with to main realization system and data file synchronously.This set up backup machine independently to come method that main computer is backed up we are referred to as the independent redundancy back mechanism.Existing disaster tolerance technology scheme mainly is to carry out the disaster recovery method based on the independent redundancy back mechanism such as dual-computer redundancy backup or multimachine redundancy backup.
Dual-computer redundancy backup is meant have two machines to keep the process of system and data sync, guest machine constantly to detect the situation of change of the main computer image file of work at present and system by telecommunication cable and delta data is backed up in system's running.Main computer and backup machine are taked the strategy of redundancy backup one to one.The patent No. is that 200410002153.6 patent of invention " a kind of implementation method of webmaster duoble computer disaster-tolerance back-up " has been put down in writing a kind of existing dual-host backup method, be by under the normal condition on runtime server operational system, and the real-time data with in the system copy on the backup server of disaster tolerance, at least on described backup server, move first monitoring program, this first monitoring program and described runtime server connect detecting the operating state of runtime server, and carry out disaster-tolerant recovery because disaster starts network management system when causing paralysing automatically on backup server detecting described runtime server.
A kind of one-to-many after the multimachine redundancy backup is meant many main computers and backup machine made rational planning for or the strategy of multi-to-multi redundancy backup.200510034607.2 patent of invention " method of multi-computer back-up " put down in writing a kind of method of multi-computer back-up, wherein arbitrary main computer connects one or more guest machine, arbitrary guest machine connects one or more main computer, all record the IP address and the backup cycle of connected main computer in the configuration file of every guest machine, all record the IP address or the machine name of connected guest machine in the configuration file of every main computer.This multi-computer back-up has improved the flexibility of backup, and a main computer can respond the backup request of many guest machines, and a guest machine also can propose backup request to many main computers; Realize the guest machine regular request, main computer does not need to detect in real time the situation of change of image file, effectively reduces the performance impact of mirror image software for main computer.
But no matter this method based on the independent redundancy back mechanism is dual-computer redundancy backup or multimachine redundancy backup, all needing to use independently, guest machine comes main computer is backed up, the guest machine most of the time is in silent status when main computer is working properly, carry out when just carrying out input and Data Update and detect and backup operation, no matter be one to one, one-to-many, or the backup mode of multi-to-multi, its machine redundancy is all very big, the unit utilance is very low, for example a pair of five backup mode, if 10 main computers are arranged in the network, need to use 2 guest machines so, these 2 guest machines cause waste very big on the resource because the unit utilization ratio is low, have also increased the cost on the hardware; And when carrying out disaster-tolerant recovery, the separate server resource of the operate as normal of a plurality of equities is not carried out making rational planning for and divide accent on the whole, overall utilization rate is also very low in some cases.
Summary of the invention
The objective of the invention is defective, a kind of disaster tolerance system and method based on cluster backup is provided at above-mentioned disaster tolerance technology based on independent redundancy backup.
For achieving the above object, the invention provides a kind of disaster tolerance system, comprising based on cluster backup: load agent unit and load service unit, the load service unit comprises at least two server nodes, each server node comprises a local data base, interconnects between each server node; The load service unit comprises that one carries out the primary server joint of service interaction with user terminal, and all the other server nodes are non-primary server joint; Described service interaction comprises: local data base read data and/or the write data of primary server joint in primary server joint; When the local data base write data of primary server joint in primary server joint, also comprise: the described data of local data library backup in the non-primary server joint of primary server joint in the load service unit;
The load agent unit comprises the load dispatch device, described load dispatch device is connected respectively with each server node in the load service unit, when the heartbeat that detects primary server joint stops, this primary server joint is carried out the failure transfer operation, from non-primary server joint, select a server node to carry out service interaction as primary server joint and user terminal.
For achieving the above object, the present invention also provides a kind of disaster recovery method based on cluster backup, comprising:
Primary server joint and user terminal carry out service interaction; Described service interaction comprises: local data base read data and/or the write data of primary server joint in primary server joint; When the local data base write data of primary server joint in primary server joint, also comprise: the described data of local data library backup in the non-primary server joint of primary server joint in the load service unit;
When the heartbeat that the load agent unit detects current main service node when the load agent unit stops, from non-primary server joint, selecting a server node to carry out service interaction as primary server joint and user terminal.
The present invention is based on the disaster tolerance system and the method for cluster backup, by introducing cluster backup mechanism, transform the traditional standby server node as with the primary server joint equity server node, make that any two server nodes backup each other in the cluster, in the time of a station server node failure, because other server node all has the data backup of this server node in the cluster, so increased the redundancy of backup.Simultaneously, because the present invention need not to server node is provided with independently backup server node, thereby improved the overall utilization rate of a plurality of server nodes of unit utilance and equity.
Description of drawings
Fig. 1 is the structural representation of a kind of disaster tolerance system embodiment one based on cluster backup of the present invention;
Fig. 2 is the structural representation of a kind of disaster tolerance system embodiment two based on cluster backup of the present invention;
Fig. 3 is the structural representation of a kind of disaster tolerance system embodiment three based on cluster backup of the present invention;
Fig. 4 is the structural representation of a kind of disaster tolerance system embodiment four based on cluster backup of the present invention;
Fig. 5 is the flow chart of a kind of disaster recovery method embodiment based on cluster backup of the present invention;
Fig. 6 is a kind of based on initialization election process flow chart in the disaster recovery method of cluster backup for the present invention;
Fig. 7 is a kind of based on load sharing process flow chart in the disaster recovery method of cluster backup for the present invention;
Fig. 8 is a kind of based on cluster backup process flow diagram in the disaster recovery method of cluster backup for the present invention;
Fig. 9 is a kind of based on load failure transfer process flow chart in the disaster recovery method of cluster backup for the present invention.
Embodiment
Below by drawings and Examples, technical scheme of the present invention is described in further detail.
Fig. 1 is the structural representation of a kind of disaster tolerance system embodiment one based on cluster backup of the present invention.As shown in Figure 1, the disaster tolerance system based on cluster backup among this embodiment comprises: load agent unit LoadProxy 100 and load service unit LoadServer 200.
The LoadServer 200 of load service unit unit comprises server node ServerNode 210, among this embodiment with S-Node1, S-Node2, S-Node3, five server node ServerNode of S-Node4 and S-Node5 210 are example, comprise a local data base Database 220 in each server node.Server node S-Node1, S-Node2, S-Node3, S-Node4 and S-Node5 comprise local data base Data1 respectively, Data2, Data3 interconnects between each server node of Data4 and Data5.Comprise user terminal User_1 among this embodiment, User_2, User_3 ..., User_n.The load service unit comprises that one carries out the primary server joint of service interaction with user terminal User_n, this embodiment is the primary server joint of user terminal User_n with S-Node5, other server nodes S-Node1, S-Node2, S-Node3 and S-Node4 are non-primary server joint.Wherein the service interaction between user terminal User_n and the primary server joint S-Node5 comprises: local data base Data read data and/or the write data of primary server joint S-Node5 in primary server joint S-Node5; When the local data base Data5 write data of primary server joint S-Node5 in primary server joint S-Node5, also comprise: the local data base Data1 in the non-primary server joint of primary server joint S-Node5 in load service unit 200, Data2, Data3 and Data4 back up described data.
Load agent unit LoadProxy100 comprises load dispatch device LoadDispatcher110.Server node S-Node1 among load dispatch device LoadDispatcher 110 and the load service unit LoadServer 200, S-Node2, S-Node3, S-Node4 and S-Node5 connect respectively, when the heartbeat that detects primary server joint S-Node5 stops, primary server joint S-Node5 is carried out the failure transfer operation, from non-primary server joint, select a server node to carry out service interaction as primary server joint and user terminal.
Among this embodiment, load agent unit LoadProxy is used for carrying out when user terminal inserts first the cluster load bridging, user terminal is assigned to a suitable servers node, after this as long as this server node is working properly, the Servers-all request of this user terminal will be directed to this server node, and provide respective service by this server node.Server node resource in the load service unit of load agency pool and optimization rear end, making it efficiently provides service to user terminal, effectively control load service unit flow utilizes the cluster backup advantage of load service unit to carry out disaster tolerance when disaster takes place.
Among this embodiment, load service unit LoadServer is the cluster that the user data of user terminal is realized the peer server node of full redundancy backup, is the entity that the user is provided real service simultaneously.With a plurality of peer server nodes in the rear end that same services is provided adopt Clustering form one have the territory notion bunch.In each bunch, select a bunch of head as primary server joint and several non-primary server joint by election algorithm.Wherein in several non-primary server joint, several secondary bunch head are as candidate's primary server joint, and remaining member's conduct is from server node.The consistency maintenance of the user data of user terminal in master server guarantees bunch.For bunch in all server nodes, any two server nodes wherein are the redundancy backup of the user data of user terminal each other.Principal and subordinate's server is reciprocity, as broad as long fully on the function that is equipped with, and all is general server node ServerNode.The whole bunch of access address AccessIP unification by the load balancing cluster mode working load scheduler LoadDispatcher of IP tunnel provides service entrance first to bunch outer user's terminal.The modal situation of service first here is exactly the user terminal registration.Any one or several server node heartbeat stop in load dispatch device LoadDispatcher detects bunch, promptly break down, as when delaying machine, and the server node that can dispatch operate as normal is taken over it and served and realize disaster tolerance.
Fig. 2 is the structural representation of a kind of disaster tolerance system embodiment two based on cluster backup of the present invention.As shown in Figure 2, the load dispatch device LoadDispatch 110 among this embodiment comprises:
Heartbeat detection module HBDetecter 111 keeps heartbeat to be connected with each server node among the load service unit LoadServer 200, is used to detect the heartbeat message of each server node.The performance lotus number (Capability and Load Number is called for short CLN) that can comprise server node in the heartbeat message.CLN is meant the mark that the index to aspects such as the performance of server node and loads obtains according to the weighting lotus method of figuring, and can be used as the normative reference of the heartbeat ability of passing judgment on server node.CLN generally regularly reports to load dispatch device LoadDispatcher 110 by heartbeat message.Heartbeat detection module HBDetecter 111 generally is used for regularly detecting the heartbeat message of server node, when the heartbeat message that detects certain server node transmission is overtime, can think that this server node breaks down, load dispatch device LoadDispatch 110 will start the failure transfer operation.
Election module ElectionBox 112, be connected with described heartbeat detection module HBDetecter 111, be used for regularly receiving the heartbeat message of heartbeat detection module HBDetecter 111 detected each server node, the heartbeat message tabulation of periodicmaintenance server node.Election module ElectionBox 112 is data maintenance module of load dispatch device LoadDispatcher 110.When comprising CLN in the heartbeat message, election module ElectionBox 112 is the marking bill CLNTicket of reception server node regularly, and the marking of periodicmaintenance server node tabulates, and elects master server, candidate's master server and from server.Marking bill CLNTicket field generally is constructed as follows: principal and subordinate's identifier W/M/S of server node, and the hardware address identifier LSID of this server node, this server node performance lotus is counted CLN.Scheduling strategy module DispatchStrategy 113, be connected with described election module ElectionBox 112, be used for according to heartbeat message tabulation make a strategic decision out the primary server joint that the agency transmits the IP address or detect the fail IP address of the primary server joint that shifts of needs.Scheduling strategy module DispatchStrategy 113 is that the corresponding algorithm policy decision-making of employing is shifted in distribution and the failure of carrying out load to the information of collecting on the non-primary server joint mainly.Load distribution and failure are shifted and are all used minimum statistics weighting lotus to figure method.The IP of the primary server joint LoadServer that the agency that makes a strategic decision out after scheduling strategy module DispatchStrategy 113 mainly handles the data among the election module ElectionBox 112 transmits or one detect the machine of the delaying IP of the primary server joint that shifts that need fail, and give and be redirected forwarding module Redirector 114 and handle.
Be redirected transponder module Redirector 114, be connected with described scheduling strategy module DispatchStrategy113, the fail IP address of the primary server joint that shifts of the registration service request of transmitting user terminal User_n or indication needs is redirected in the IP address that is used for the primary server joint transmitted according to the agency who obtains from described scheduling strategy module DispatchStrategy 113.
Fig. 3 is the structural representation of a kind of disaster tolerance system embodiment three based on cluster backup of the present invention.As shown in Figure 3, the load agent unit LoadProxy 100 among this embodiment also comprises: a redundancy backup device Baker 120; Described load dispatch device LoadDispatch 110 also comprises an advertisement module Ads 115, is connected with redundancy backup device Baker 120.
Advertisement module Ads 115 is used for regularly sending to the redundancy backup device advertising message of load dispatch device, described advertising message comprises heartbeat message, redundancy backup device Baker 120 receives the advertising message of load dispatch device, tabulates according to the heartbeat message that the heartbeat message in the advertising message can upgrade among the redundancy backup device Baker 120 synchronously.
Redundancy backup device Baker 120 is IP service backup machines that load dispatch device LoadDispatcher 110 carries out the dual-computer redundancy Hot Spare, can use general address redundancy protocol realization among the LINUX.Because load dispatch device LoadDispatcher110 plays an important role in whole disaster tolerance system, the effect of redundancy backup device Baker 120 is to think that load dispatch device LoadDispatcher 110 breaks down when the advertising message that receives load dispatch device LoadDispatcher110 is overtime, just can start the virtual ip address service, convert the operating state of redundancy backup device to the load dispatch device, make the continue work of former load dispatch device LoadDispatcher 110 of redundancy backup device Baker 120, make former load dispatch device LoadDispatcher 110 work continue to carry out.Carry out the use test of general address redundancy protocol subsequently, the IP service of proof load scheduler LoadDispatcher 110 asks uninterruptedly to carry out to the user.Redundancy backup device Baker 120 is load dispatch device LoadDispatcher 120 function replisomes, regularly receive the advertising message of load dispatch device LoadDispatcher 120, to keep and the synchronous renewal of electing the heartbeat message among the module ElectionBox 112.
Among this embodiment, by redundancy backup device Baker is set in load agent unit LoadProxy, load dispatch device LoadDispatch is carried out the IP redundancy Hot Spare, ensure its robustness, reduce the Single Point of Faliure risk of load agent unit, further improved the disaster tolerance ability of disaster tolerance system.
Fig. 4 is the structural representation of a kind of disaster tolerance system embodiment four based on cluster backup of the present invention.As shown in Figure 4, the server node ServerNode 210 among this embodiment comprises:
IP configurator module IPConfiger 211, when the server node ServerNode under this IP configuration module IPConfiger 211 210 is the primary server joint that redefines, be used for the redundant IP address configuration order that responsive load scheduler LoadDispatcher 110 sends, and be the IP address of former primary server joint with the IP address configuration of the server node ServerNode 210 under this IP configuration module IPConfiger 211, make this server node former primary server joint that continues carry out work.
Marking module TicketMarker 212 is used for performance and/or load index that periodicity is collected this server node, calculates the performance lotus of this server node and counts CLN.
Heartbeat module HeartBeat 213, keep heartbeats to be connected with load dispatch device LoadDispatcher 110, regularly described performance lotus counted CLN and are carried at and are sent to load dispatch device LoadDispatcher 110 in the heartbeat message.
Event notice module I nformer 214, event notice are divided into up event notice (Uplink Notice) and descending event notice (Downlink Notice).Up event notice sends the notice request of revising related data by carrying out from server to master server; Descending event notice is carried out by master server, to the order that sends data synchronization updating from server.The master server acquiescence can be carried out direct read operation and direct write operation to local data; Only to the direct read operation of local data, just can carry out direct write operation when only ought receive the descending event notice of master server from server acquiescence.
When the server node ServerNode 210 under this event notice module 214 is primary server joint, primary server joint is carried out up event notice UplinkNotice by event notice module I nformer214, and the non-primary server joint among the notification payload service unit LoadServer 200 sends the order of data synchronization updating; When the server node ServerNode210 under this event notice module I nformer 214 is non-primary server joint, primary server joint is accepted descending event notice Downlink Notice by event notice module I nformer 214, receives the order of the data synchronization updating of the primary server joint transmission among the load service unit LoadServer 200.
Data read/write operational module DataWriter/Reader, be used for local data base read data and/or write data, write and new data more the read operation sense data during write operation, read operation all is direct read operation, and write operation is divided into direct write operation and indirect write operation again.Data read/write operational module DataWriter/Reade comprises data reading operation module Data Reader 2151 and data write operation module Data Wirter 2152.Directly write operation Direct Read carries out direct read operation Direct Read to local data base Database, and directly write operation Direct Write directly carries out write operation to local data base.Write operation IndirectWrite is meant from server and directly local data base is not carried out write operation indirectly, but the mode by event notice by the direct write operation Direct Write of primary server joint to the local data base of primary server joint after, send the data synchronization updating instruction by primary server joint again and make non-primary server joint start the direct write operation Direct Write of local data base separately.Write operation Direct Write is called backup operation again indirectly, among this embodiment, data write operation module Data Wirter 2152 is connected with event notice module I nformer214, when to local data base Database write data, be used for by event notice module I nformer214 the described data of local data library backup in the non-primary server joint of load service unit LoadServer 200.
Fig. 5 is a flow chart that the present invention is based on the disaster recovery method of cluster backup.As shown in Figure 5, the disaster recovery method among this embodiment comprises:
Step 10, primary server joint and user terminal carry out service interaction; Described service interaction comprises: local data base read data and/or the write data of primary server joint in primary server joint; When the local data base write data of primary server joint in primary server joint, also comprise: the described data of local data library backup in the non-primary server joint of primary server joint in the load service unit.
When the heartbeat that step 20, load agent unit detect current main service node when the load agent unit stops, from non-primary server joint, selecting a server node to carry out service interaction as primary server joint and user terminal.Local data base in the non-primary server joint of described primary server joint in the load service unit is write described data and is comprised; The non-primary server joint of primary server joint in the load service unit sends data synchronization updating instruction, the described data of local data library backup in the non-primary server joint in the load service unit.
Below in conjunction with the idiographic flow that the present invention is based on the disaster recovery method of cluster backup based on the disaster tolerance system explanation of cluster backup.The disaster recovery method that the present invention is based on cluster backup can comprise four-stage:
Phase I, election process.Conduct an election after starting based on the disaster tolerance system of cluster, each server node LoadServer periodically carries out the marking of performance load number CLN, and CLNTicket is sent to election module among the load agent unit LoadProxy.Load dispatch device DispatchStrategy can make and elect the two kinds of role ROLE that are born in the module: primary server joint (Master Node is called for short M-Node), non-primary server joint.According to design requirement or custom, non-primary server joint can be divided into candidate primary server joint (Candidate Node, be called for short C-Node) again and from server node (SlaveNode is called for short S-Node).Each server carries out the configuration of database with node according to self principal and subordinate role.Candidate's primary server joint is that load agent unit LoadProxy distinguishes it, and candidate's primary server joint is configured to from server node physically.Wherein primary server joint has the direct read authority of the user data of user terminal; Candidate's primary server joint and have the direct read right and the indirect write permission of the user data of user terminal from server node.The load agent unit LoadProxy new primary server joint of selection in candidate's primary server joint of can making a strategic decision.
Second stage, load sharing process.When the service registry request of user terminal arrives load agent unit LoadProxy, load agent unit LoadProxy goes out a server node LoadServer who has minimum statistics weighting lotus number according to the server performance load ordered series of numbers voting plan in the election module, and transmits user's request.This process guarantees the optimum allocation of load in the cluster, avoids producing the disaster that causes the certain server node overload because of the server node mass flow discrepancy.
Phase III, cluster backup process.When the service registry request of user terminal arrives server node LoadServer, server node LoadServer starts the cluster backup process of the user profile data of user terminal, makes that any two server node LoadServer backup each other in the cluster.Have only the response of just returning user's register requirement after the cluster backup process is finished, this moment, the user terminal registration was finished, and after this user terminal can be initiated the business service request to server node.Write operation in business service also all must carry out the cluster backup process.
Quadravalence section, load failure transfer process.When the heartbeat that detects certain server node LoadServer1 as load agent unit LoadProxy stops, just initiating load failure transfer process.Certain performance load number is less and keep the IP of the server node LoadServer2 configuration server node LoadServer1 of heartbeat, use the user ID data that the user is provided and continue service.The process that also will conduct an election is shifted in the failure of primary server joint.
Below disaster recovery method four-stage that the present invention is proposed based on cluster backup specifically describe.
Phase I, election process.
Just began to carry out election process when should start based on the disaster tolerance system of cluster backup.Load agent unit LoadProxy has two network interface card configurations, wherein insert IP address (AccessIP) and be used for the cluster service inlet, be used to receive the registration service request of user terminal, Agent IP address (ProxyIP) is used for the server node trunking communication with the load service unit of rear end.Election process proof load agent unit LoadProxy is prior to server node LoadServer initialization.The IP address that load agent unit LoadProxy is arranged in the configuration file of every station server node LoadServer, initiate server node LoadServer must send heartbeat to load agent unit LoadProxy and add cluster.Load agent unit LoadProxy can initiate election under following two kinds of situations: one, during system initialization; Two, the primary server joint heartbeat stops, promptly breaking down, as when delaying machine.The two election process is similar, process entry condition difference to some extent wherein, initiated by load agent unit LoadProxy when the former is system start-up, the latter is initiated by load agent unit LoadProxy when finding the heartbeat timeout of primary server joint in the heartbeat message tabulation in the election unit., comprise the steps: for for example Fig. 6 with the initialization election process
Steps A 1, LoadProxy initialization start each module.After using the general address redundancy protocol to be configured to LoadDispatcher and redundancy backup device Baker, start the load agent unit LoadProxy process of Baker earlier, and the ElectionBox thread is provided with higher priority, HBDetecter, DispatchStrategy, 3 module threads such as Redirector are in low priority, and these 3 threads are in silent status always during Baker is as the redundancy backup machine.The LoadProxy process of Baker does not start Ads module thread.Restart the LoadProxy process on the LoadDispatcher, start 5 module threads such as ElectionBox, HBDetecter, DispatchStrategy, Redirector and Ads successively.
Steps A 2, LoadServer initialization start each module.The last operation of ServerNode LoadServer process, and start TicketMarker, HeartBeat, DataWriter, DataReader, 6 thread modules such as IPConfiger and Informer successively.
Steps A 3, LoadServer obtain the ProxyIP of LoadProxy.LoadServer reads local profile ls.cfg, therefrom obtains the ProxyIP of LoadProxy, the information such as MAC Address of LoadDispatcher and Baker.
Steps A 4, LoadServer send heartbeat to LoadProxy, and the cycle is T.After obtaining ProxyIP, LoadServer sends heartbeat message to LoadProxy in the cycle.Heartbeat message is a UDP datagram, and mainly by the message identifier MID of heartbeat message, three parts of CLNTicket field specifier TFlag and CLNTicket field are formed.If TFlag is FALSE, LoadProxy is identified as common heartbeat message, is indifferent to the back field; If be TRUE, then need to resolve the CLNTicket field of back.The CLNTicket field is constructed as follows:
This machine principal and subordinate identifier W/M/S, this machine hardware address identifier LSID, this machine property lotus is counted CLN.
It is main that M represents, S represents from (C represents candidate master), the situation of assigned role not also when W is used for new down LoadServer node and adds cluster, and in the time of the host node operate as normal, this node can be designated as S;
LSID can use the MAC Address of LoadServer;
CLN is a weighting lotus number, is to this machine performance and the comprehensive weighting parameters of weighing of loading index.Property lotus number and the minimum weight lotus method of figuring are defined as follows:
Supposing has one group of server S={ S0 in certain cluster, S1,, S n-1}, the cpu busy percentage of U (Si) expression server S i, the current memory usage of M (Si) expression server S i, the current hard disk utilance of D (Si) expression server S i, the current linking number of C (Si) expression server S i, the property lotus number of Si is so:
CLN(Si)={C(Si)*[0.45*U(Si)+0.45*M(Si)+0.1*D(Si)]};
Property lotus number is big more, and this server serviceability is poor more.
Current new connection request can be sent to server S m, when
And if only if server S m meets the following conditions:
CLN(Sm)=min{CLN(Si)},0≤i≤n-1。
If CLN is the statistical value in a period of time, claim that then this algorithm is that minimum statistics weighting lotus is figured method.
Steps A 5, LoadProxy create the heartbeat message tabulation.
The heartbeat detection module of LoadProxy will extract the IP address of each server and set up the heartbeat message tabulation as keyword after detecting heartbeat message, list item has IP address (being designated as IP), server state (being designated as State), MAC Address (being designated as MAC), role identification (being designated as ROLE), the role confirms sign (being designated as Confirm), the statistical value of CLN (being designated as StatCLN), CLN arithmetic mean (being designated as Average CLN), the historical stream load number (being designated as HistoryLN) that divides, and to this table initialization.Possible example of heartbeat message tabulation is as follows:
The tabulation of table 1 heartbeat message
Figure S200810048216XD00131
Annotate: state field, ALIVE represents that heartbeat exists, DEAD represents that heartbeat stops.The ROLE field, C represents the candidate primary server joint, and S represents that from server node, M represents primary server joint, and W represents that the role of server node does not specify as yet.The Confirm field, T represents to finish the role and confirms, and F represents not finish the role and confirms.
Steps A 6, TicketMarker use the method for figuring of the weighting lotus in the steps A 4 to calculate CLN.
Steps A 7, LoadServer utilize heartbeat message to send CLNTicket.After LoadServer collects the related content of CLNTicket field, with the cycle is that (T is a heart beat cycle to 10T, the front coefficient can be suitable empirical value, here be example with 10) send heartbeat to LoadProxy, be that TFlag field in the heartbeat message is filled in TRUE one time every the time of 10T, in common heartbeat message, fill in FALSE;
Steps A 8, LoadProxy put it into ballot box after receiving CLNTicket.The ballot box of LoadProxy is inserted CLN in the CLN statistical value of heartbeat message tabulation after receiving CLNTicket.
Steps A 9, ElectionBox add up the property lotus number of heartbeat message tabulation.After arriving a threshold value, CLN statistical value number calculates the measurement parameter of arithmetic average CLN as the performance load index of this LoadServer in timing statistics; When not producing arithmetic average CLN, all Use Defaults.
Steps A 10, DispatchStrategy carry out principal and subordinate's election according to statistical lotus number in the ballot box.DispatchStrategy sorts to statistical lotus number in the ballot box according to the lotus of minimum statistics weighting described in the steps A 4 method of figuring, minimum as host node, less 3-5 as candidate's host node (according to total node number decision), other are from node, identify in the heartbeat message tabulation.
Tabulation sends role's directive command to LoadServer according to heartbeat message for steps A 11, ElectionBox.
Steps A 12, LoadServer dispose according to carrying out the role, and confirm in follow-up CLNTicket.Host node is configured to MySQL main, from node configuration be from, the carrying out that principal and subordinate's data are duplicated in the cluster backup of back just can utilize the function realization of MySQL like this.Finish the configuration back and in CLNTicket, this machine role W is revised as respective value M or S.
Steps A 13, LoadProxy check principal and subordinate's identification field in follow-up CLNTicket.Identify correct LoadProxy the role in the heartbeat message tabulation is confirmed that identification renewal is TRUE; Identify the incorrect role's directive command of then retransmitting up to identifying correctly.
Steps A 14, LoadProxy check that all roles confirm sign in the heartbeat message tabulation.Think when all signs are TRUE that the role indicates success, election process finishes, can the starting load assigning process.
Second stage, load sharing process.
When user's registration service request sends to the AccessIP of LoadProxy, LoadProxy will use the minimum statistics weighting lotus method of figuring to carry out load allocating according to the heartbeat server resource.Load sharing process comprises the steps: as shown in Figure 7
Step B1, user terminal send the registration service request to the LoadProxy that has AccessIP.
Step B2, DispatchStrategy are according to certain algorithm load server ip of making a strategic decision out.
Have the arithmetic average CLN value of the LoadServer of Dynamic Maintenance in the heartbeat server tabulation, go out a LoadServer according to the decision-making of CLN minimum value distribution principle and serve.If the LoadServer that makes a strategic decision out is a host node, there are enough resources to carry out expense when carrying out cluster backup for guaranteeing host node, set a load threshold, if surpass this load threshold then should distribute to time little LoadServer.
The HistoryLN of step B3, the corresponding list item of renewal ElectionBox.
HistoryLN is the rough Statistics of LoadProxy to rear end LoadServer loading condition, shown the customer flow that this node is gone up in history, these data have reflected the load condition of each node under this worst case of the fully loaded service of flow to a certain extent, also be a important referential data, but the statistical lotus number among the ElectionBox more can reflect this real-time node load situation to load allocating.Can take all factors into consideration these two parameters and carry out load allocating.
Step B4, Redirector are forwarded to the LoadServer that makes a strategic decision out with user's registration service request.
Redirector only plays in load sharing process and transmits the user and ask the node of making a strategic decision out, is actually the process that user's request is redirected to the active service node.
Step B5, this LoadServer start cluster backup, and detailed process is in detail referring to following cluster backup process.
Step B6, directly return user's service registry request ACK after finishing cluster backup to the user.
Finish behind the cluster backup LoadServer and without LoadProxy but directly send ACK information according to User IP to this user, indication user registration is finished, and can carry out service request.
Phase III, cluster backup process.
When the cluster backup process can occur in registration or update user information.LoadServer is in service process, if the user only stores the read operation of data, no matter for its service be host node or from node, all on this node, call the direct read operation that the DataReader module is carried out local data.The cluster backup process comprises the steps: as shown in Figure 8
Step C1, write operation requests.
Write operation requests is meant that keeper or user will be because service needed will be revised update information data, as user's account, authority information etc.
Step C2, DataWriter respond.
Any write operation is responded by the DataWriter module, and any read operation is responded by the DataReader module.DataWriter will be responsible for guaranteeing the execution of cluster backup.
Step C3, corresponding user information are set to WriteMode.
When being in WriteMode, this user profile do not allow to carry out any read operation to this user.After finishing, current write operation changes user profile into reading mode rapidly.
Step C4, judge this LoadServer role
If the role is W then changes step C5.This means that this node does not also carry out principal and subordinate's appointment, so this service temporarily will be rejected, it is not to add cluster when system initialization that this situation may occur in certain node, but just added cluster, sent heartbeat and accepted, but also do not specified the role of this node to LoadProxy.
If the role is S then changes step C7;
If the role is M then changes step C10.
Step C5, cache user request.
Because this server is not assigned the role, for not influencing the cluster backup process, this moment, server should wait further processing at local cache with user's request.
Step C6, wait LoadProxy role directive command are handled after arriving.
Because LoadServer can send heartbeat and CLNTicket to LoadProxy, LoadProxy finds can send role's directive command when this server is not assigned the role, when treating that this order arrives LoadServer, LoadServer can take out user's request and carry out service response from buffer memory.Change step C2.
Process is write in step C7, startup indirectly.
The process of writing is not indirectly directly revised local data information, but by the renewal of carrying out local data after the host node modification again, to guarantee the consistency of cluster backup data.
Step C8, Informer send the up link notice to host node.
The up link notice is meant from node notifies certain user profile to need to upgrade to host node, and Informer can indicate user and the data element that needs renewal in Uplink Notice.
Change user profile into WriteMode after step C9, host node are notified, carry out direct write operation afterwards.
Host node receives that Uplink Notice can change local user data into WriteMode, and directly upgrades local user data.The timestamp of the data of host node is that cluster is up-to-date all the time, also is the source of every part of backup in the cluster.
Step C10, start direct write operation.
After step C11, host node write operation were finished, each sent the down link notice from node to Informer in cluster, and modification user profile is reading mode.
Host node will whenever send the down link notice with individual node by the Informer module after upgrading local data in cluster, also comprise the user and the data element of renewal among the Downlink Notice.Because local data upgrades, and user's information can be revised as reading mode.
Step C12, each carries out direct write operation from node after notified.
Each node Downlink Notice obtains directly to revise local data after the lastest imformation in the cluster, becomes the backup of host node latest data.If from node user data when carrying out direct write operation is reading mode, also need user data is modified as WriteMode.
Revising user profile after step C13, direct write operation are finished is reading mode.
Behind node modification local data, user profile need be reduced to reading mode equally.After all backups all were reduced to reading mode, each became the up-to-date backup of host node from node, thereby formed cluster backup.
Step C14, service node can carry out user's service response.
Service node just can respond user's service after user's write operation is finished, and indicates this write operation success, and data or keeper's configuration data had been finished and write and cluster backup after business service can be used and upgrade after the user.
Quadravalence section, load failure transfer process.
According to role's difference of the server node that breaks down, the process that the load failure is shifted is also inequality.Load failure transfer process comprises the steps: as shown in Figure 9
Step D1, HBDetecter monitor certain station server heartbeat timeout.
Do not have heartbeat message to arrive in the agree on a time frame when HBDetecter detects certain server, start the machine of delaying and judge timer, this server of decidable stops heartbeat behind this timer expiry, and LoadProxy thinks need carry out its machine of delaying the load failure and shift.
This server of sign machine of delaying among step D2, the ElectionBox, this server ip need to be set to TakeoverIP.
Step D3, judge this LoadServer role,, change step D4 if be W; If be S, change step D5; If be M, change step D8;
Step D4, TakeoverIP zero clearing are not carried out the load failure and are shifted, and the load transfer process finishes.
Step D5, DispatchStrategy module figure the IP of the LoadServer that the method decision-making makes new advances according to minimum statistics weighting lotus.
Since among the ElectionBox dynamic memory statistical weight lotus number of rear end LoadServer, the DispatchStrategy new LoadServer that can make a strategic decision out takes over the machine server of delaying and serves.
Step D6, Redirector send TakeoverIP to new LoadServer.
New LoadServer need use TakeoverIP will delay the machine server user's orientation so far, it carries out user's service thereby continue.This process guarantees that disaster each user of back is taken place all still has an available server that it is served.
Step D7, new LoadServer use IPConfiger to carry out the configuration of TakeoverIP.
IPConfiger disposes TakeoverIP on local network interface card, user's service request will be by the transparent new LoadServer that transfers to, and this LoadServer serves it afterwards, changes m;
Step D8, LoadProxy indicate all nodes that flags parameters is put 1 mandatory services device to be in reading mode.
The machine because host node is delayed, the data backup of whole server cluster can not be carried out smoothly, at this time can not be corresponding to user's write operation, still do not influence user's read operation.But whole cluster service will be recovered in the operation afterwards voluntarily.
Step D9, DispatchStrategy figure host node and the load transfer node that the method decision-making makes new advances according to the minimum statistics lotus from candidate's master server
For guaranteeing to produce rapidly new host node, the DispatchStrategy host node that only decision-making makes new advances from candidate's master server improves response speed.The host node of machine also needs to carry out load transfer owing to delay, and new load transfer node also produces in this process.New host node and load transfer node should be tried one's best and be not same server.
Step D10, send role's directive command, send TakeoverIP to the load transfer node to new host node.
Send role's directive command to this node on the one hand behind the host node that the LoadProxy decision-making makes new advances, this host node carries out new role's configuration rapidly after receiving host node; Send TakeoverIP to the load transfer node on the other hand, user's service of the former host node that makes it to continue.
Step D11, new host node respond back Redirector to indicating new host node IP from node.
New host node is configured to respond to LoadProxy rapidly behind the master, and Redirector indicates new host node IP to each from node again, is configured modification from node.
Step D12, flags parameters zero clearing recover cluster service.
Because main and subordinate node has all been finished configuration, can will force the flags parameters zero clearing of reading mode this moment, and whole cluster recovers normal service again.Load failure transfer process finishes.In sum, disaster tolerance system and the method that the present invention is based on cluster backup has following beneficial effect:
(1), saved hardware cost greatly.Each node in the cluster is the main computer of service, is again the backup machine of other servers, does not need to increase new backup machine and realizes backup, reaches the purpose of saving cost by the complexity that improves cluster backup.Under the situation with same services throughput, the disaster recovery method of cluster backup can use server still less to achieve the goal.
(2), improved the overall utilization rate of unit utilance and reciprocity multiserver.The disaster recovery method of cluster backup transform the traditional standby machine as the server with the main computer equity in fact, the silent status expense of former guest machine most of the time fully utilized serve and cluster backup, improved the unit utilance of former guest machine, under the situation that number of servers equates, has higher service throughput behind the cluster, even whole cluster has higher overall utilization rate.
(3), in the cluster arbitrarily two-server backup each other, improved the redundancy of backup.When a station server lost efficacy, owing to other server in the cluster all has the backup of its data, so the redundancy of backup increases greatly.When single node even multinode break down in the system, there are many divided data backups available in the cluster.
(4), the server situation in the whole cluster can the planned as a whole transfer (Failover) of failing, greatly improved the disaster tolerance ability.
It should be noted that at last: above embodiment only in order to technical scheme of the present invention to be described, is not intended to limit; Although with reference to previous embodiment the present invention is had been described in detail, those of ordinary skill in the art is to be understood that: it still can be made amendment to the technical scheme that aforementioned each embodiment put down in writing, and perhaps part technical characterictic wherein is equal to replacement; And these modifications or replacement do not make the essence of appropriate technical solution break away from the spirit and scope of various embodiments of the present invention technical scheme.

Claims (8)

1. the disaster tolerance system based on cluster backup is characterized in that, comprising: load agent unit and load service unit;
The load service unit comprises at least two server nodes, and one carries out the primary server joint of service interaction with user terminal, and all the other server nodes are non-primary server joint; Each server node comprises a local data base, interconnects between each server node; Described service interaction comprises: local data base read data and/or the write data of primary server joint in primary server joint; When the local data base write data of primary server joint in primary server joint, also comprise: the described data of local data library backup in the non-primary server joint of primary server joint in the load service unit;
The load agent unit comprises the load dispatch device, described load dispatch device is connected respectively with each server node in the load service unit, when the heartbeat that detects primary server joint stops, this primary server joint is carried out the failure transfer operation, from non-primary server joint, select a server node to carry out service interaction as primary server joint and user terminal;
Described load dispatch device comprises:
The heartbeat detection module keeps heartbeat to be connected with each server node in the load service unit, is used to detect the heartbeat message of each server node;
The election module is connected with described heartbeat detection module, is used for regularly receiving the heartbeat message of detected each server node of heartbeat detection module, the heartbeat message tabulation of periodicmaintenance server node;
The scheduling strategy module is connected with described election module, be used for according to heartbeat message tabulation make a strategic decision out the primary server joint that the agency transmits the IP address or detect the fail IP address of the primary server joint that shifts of needs;
Be redirected the transponder module, be connected with described scheduling strategy module, the fail IP address of the primary server joint that shifts of the registration service request of transmitting user terminal or indication needs is redirected in the IP address that is used for the primary server joint transmitted according to the agency who obtains from described scheduling strategy module;
Described load agent unit also comprises a redundancy backup device; Described load dispatch device also comprises an advertisement module, is connected with described redundancy backup device, is used for regularly sending to the redundancy backup device advertising message of load dispatch device, and described advertising message comprises heartbeat message; Described redundancy backup device is used for starting the virtual ip address service when the advertising message that receives the load dispatch device is overtime, converts the operating state of redundancy backup device to the load dispatch device.
2. disaster tolerance system according to claim 1 is characterized in that, described server node comprises:
The IP configurator module, when the server node under this IP configurator module is the primary server joint that redefines, be used for the redundant IP address configuration order that the responsive load scheduler sends, and the IP address configuration of the server node that this IP configurator module is affiliated is the IP address of former primary server joint;
The marking module, be used for periodically collecting the performance and/or the load index of this server node, calculate the performance lotus number of this server node, performance lotus number is meant the mark that the index to aspects such as the performance of server node and loads obtains according to the weighting lotus method of figuring, and can be used as the normative reference of the heartbeat ability of passing judgment on server node;
Heartbeat module keeps heartbeat to be connected with the load dispatch device, regularly described performance lotus number is carried at and is sent to the load dispatch device in the heartbeat message;
The event notice module, when the server node under this event notice module was primary server joint, primary server joint sent the order of data synchronization updating by the non-primary server joint in the event notice module notification payload service unit; When the server node under this event notice module was non-primary server joint, non-primary server joint received the order of the data synchronization updating of the primary server joint transmission in the load service unit by the event notice module;
The data read/write operational module is used for local data base read data and/or write data; Be connected with the event notice module, when to the local data base write data, be used for by the event notice module the described data of local data library backup in the non-primary server joint of load service unit.
3. disaster tolerance system according to claim 2 is characterized in that, the described data of local data library backup in the non-primary server joint of described primary server joint in the load service unit comprise:
The non-primary server joint of primary server joint in the load service unit sends data synchronization updating instruction, the local data base Backup Data in non-primary server joint.
4. the disaster recovery method based on cluster backup is characterized in that, comprising:
Primary server joint and user terminal carry out service interaction; Described service interaction comprises: local data base read data and/or the write data of primary server joint in primary server joint; When the local data base write data of primary server joint in primary server joint, also comprise: the described data of local data library backup in the non-primary server joint of primary server joint in the load service unit;
When the heartbeat that detects current main service node when the load agent unit stopped, the load agent unit selected a server node to carry out service interaction as primary server joint and user terminal from non-primary server joint;
The described data of local data library backup in the non-primary server joint of described primary server joint in the load service unit comprise: the non-primary server joint of primary server joint in the load service unit sends data synchronization updating instruction, the described data of local data library backup in the non-primary server joint in the load service unit;
Primary server joint and user terminal carry out also comprising before the service interaction:
User terminal sends the registration service request to the load agent unit first, comprises the user profile of user terminal in the described registration service request;
The load agent unit is this subscriber terminal service according to server node of heartbeat message scheduling of collecting as primary server joint, and described registration service request is transmitted to this primary server joint;
The user profile of this primary server joint storage user terminal, and with described registration service request after backup on the non-primary server joint, to user terminal feedback registration service response, comprise the IP address of primary server joint in the described registration service response.
5. disaster recovery method according to claim 4 is characterized in that, also comprises:
The load agent unit regularly receives the heartbeat message of detected each server node of heartbeat detection module, the heartbeat message tabulation of periodicmaintenance server node.
6. disaster recovery method according to claim 5 is characterized in that, the described server node of selecting from non-primary server joint comprises as primary server joint:
Tabulation redefines primary server joint to the load agent unit according to the heartbeat message in the load agent unit, and send the order of redundant network address configuration to the primary server joint that this redefines, be the IP address of former primary server joint with the IP address configuration of this primary server joint that redefines.
7. according to claim 5 or 6 described disaster recovery methods, it is characterized in that, also comprise: the tabulation of load agent unit backup heartbeat message, and the described heartbeat message tabulation of regular update.
8. disaster recovery method according to claim 4, it is characterized in that, the performance lotus number that comprises server node in the described heartbeat message, performance lotus number is meant the mark that the index to aspects such as the performance of server node and loads obtains according to the weighting lotus method of figuring, and can be used as the normative reference of the heartbeat ability of passing judgment on server node.
CN200810048216XA 2008-06-27 2008-06-27 Disaster allowable system and method based on cluster backup Expired - Fee Related CN101309167B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN200810048216XA CN101309167B (en) 2008-06-27 2008-06-27 Disaster allowable system and method based on cluster backup

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN200810048216XA CN101309167B (en) 2008-06-27 2008-06-27 Disaster allowable system and method based on cluster backup

Publications (2)

Publication Number Publication Date
CN101309167A CN101309167A (en) 2008-11-19
CN101309167B true CN101309167B (en) 2011-04-20

Family

ID=40125399

Family Applications (1)

Application Number Title Priority Date Filing Date
CN200810048216XA Expired - Fee Related CN101309167B (en) 2008-06-27 2008-06-27 Disaster allowable system and method based on cluster backup

Country Status (1)

Country Link
CN (1) CN101309167B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106162637A (en) * 2015-04-10 2016-11-23 成都鼎桥通信技术有限公司 The implementation method of LTE broadband cluster multinode mirror image networking and device

Families Citing this family (85)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5480291B2 (en) * 2008-12-30 2014-04-23 トムソン ライセンシング Synchronizing display system settings
CN101605301B (en) * 2009-07-08 2012-09-26 中兴通讯股份有限公司 Cluster system for multi-node transaction processing and a request message distributing method
CN101729290A (en) * 2009-11-04 2010-06-09 中兴通讯股份有限公司 Method and device for realizing business system protection
CN102082680B (en) * 2009-11-27 2013-09-11 中国移动通信集团北京有限公司 Method for controlling network element connection by acquisition machines, acquisition machines and system
CN102130759A (en) * 2010-01-13 2011-07-20 中国移动通信集团公司 Data collection method, data collection device cluster and data collection devices
CN102281257B (en) * 2010-06-12 2016-08-03 陈银彬 Entertainment information platform
CN102299904B (en) * 2010-06-23 2014-03-19 阿里巴巴集团控股有限公司 System and method for realizing service data backup
CN101924650B (en) * 2010-08-04 2012-03-28 浙江省电力公司 Method for implementing services and intelligent server autonomy of failure information system
CN102148850B (en) * 2010-08-09 2014-08-06 华为软件技术有限公司 Cluster system and service processing method thereof
CN102143011B (en) * 2010-08-23 2013-11-06 华为技术有限公司 Device and method for realizing network protection
CN102831038B (en) * 2011-06-17 2019-03-01 中兴通讯股份有限公司 The disaster recovery method and ENUM-DNS of ENUM-DNS
CN102523234B (en) * 2011-12-29 2015-12-02 山东中创软件工程股份有限公司 A kind of application server cluster implementation method and system
CN102523127A (en) * 2011-12-30 2012-06-27 网宿科技股份有限公司 Master server and slave server switching method and system utilizing same
CN102663017A (en) * 2012-03-21 2012-09-12 互动在线(北京)科技有限公司 Implementation system and implementation method for enhancing availability of MySQL database
CN103209091B (en) * 2013-01-18 2016-06-29 中兴通讯股份有限公司 The heat backup method of group system and system
CN103944746B (en) * 2013-01-23 2018-10-09 新华三技术有限公司 A kind of method and device of two-node cluster hot backup
CN104239164A (en) * 2013-06-19 2014-12-24 国家电网公司 Cloud storage based disaster recovery backup switching system
CN103384211B (en) * 2013-06-28 2017-02-08 百度在线网络技术(北京)有限公司 Data manipulation method with fault tolerance and distributed type data storage system
CN104468163B (en) * 2013-09-18 2018-11-09 腾讯科技(北京)有限公司 The method, apparatus and disaster tolerance network of disaster tolerance network organizing
CN104618127B (en) * 2013-11-01 2019-01-29 深圳市腾讯计算机系统有限公司 Active and standby memory node switching method and system
TWI501092B (en) * 2013-11-19 2015-09-21 Synology Inc Method for controlling operations of server cluster
CN104734896B (en) * 2013-12-18 2019-04-23 青岛海尔空调器有限总公司 The acquisition methods and system of service sub-system operating condition
CN104954157B (en) * 2014-03-27 2018-12-04 中国移动通信集团湖北有限公司 A kind of fault self-recovery method and system
CN103945016B (en) * 2014-04-11 2018-07-06 江苏中科羿链通信技术有限公司 A kind of method and system of Dynamic Host Configuration Protocol server master-slave redundancy
CN105763524A (en) * 2014-12-19 2016-07-13 华为技术有限公司 Registration method in IP multimedia subsystem, device and system
CN104579765B (en) * 2014-12-27 2019-02-26 北京奇虎科技有限公司 A kind of disaster recovery method and device of group system
CN104539462B (en) * 2015-01-09 2017-12-19 北京京东尚科信息技术有限公司 It is a kind of to switch to method and device of the calamity for application example
TWI584654B (en) * 2015-03-27 2017-05-21 林勝雄 Method and system for optimization service
CN104965770B (en) * 2015-06-15 2018-02-02 北京邮电大学 A kind of central server disaster-tolerant backup method
CN104980307A (en) * 2015-06-29 2015-10-14 小米科技有限责任公司 Processing method of data access requests, processing device of data access requests and database server
CN106341366A (en) * 2015-07-06 2017-01-18 中兴通讯股份有限公司 Method and device for backuping multiple key servers and key server
CN105095486A (en) * 2015-08-17 2015-11-25 浪潮(北京)电子信息产业有限公司 Cluster database disaster recovery method and device
CN105592139B (en) * 2015-10-28 2019-03-15 新华三技术有限公司 A kind of the HA implementation method and device of distributed file system management platform
CN106649414B (en) * 2015-11-04 2020-01-31 阿里巴巴集团控股有限公司 Method and equipment for pre-detecting data anomalies of data warehouses
CN105354113B (en) * 2015-11-27 2019-01-25 上海爱数信息技术股份有限公司 A kind of system and method for server, management server
CN105429799B (en) * 2015-11-30 2019-06-11 浙江宇视科技有限公司 Server backup method and device
CN105634832B (en) * 2016-03-16 2019-07-16 浙江宇视科技有限公司 A kind of backup method and device of server
CN107273241B (en) * 2016-04-06 2021-02-26 北京航天发射技术研究所 Redundancy backup and automatic recovery method for important parameters
CN105763386A (en) * 2016-05-13 2016-07-13 中国工商银行股份有限公司 Service processing system and method
CN106020963A (en) * 2016-06-07 2016-10-12 中国建设银行股份有限公司 Cross-system internal service calling method and device
CN106301895A (en) * 2016-08-03 2017-01-04 浪潮(北京)电子信息产业有限公司 A kind of disaster recovery method obtaining cluster monitoring data and device
CN106385334B (en) * 2016-09-20 2019-06-18 携程旅游信息技术(上海)有限公司 Call center system and its abnormality detection and self-recovery method
CN106789197A (en) * 2016-12-07 2017-05-31 高新兴科技集团股份有限公司 A kind of cluster election method and system
CN108241551A (en) * 2016-12-23 2018-07-03 航天星图科技(北京)有限公司 A kind of redundant database system
CN108243209A (en) * 2016-12-23 2018-07-03 深圳市优朋普乐传媒发展有限公司 A kind of method of data synchronization and device
CN107018010A (en) * 2017-03-07 2017-08-04 杭州承联通信技术有限公司 A kind of PDT clusters core network system and its disaster tolerance switching method
CN110447209A (en) * 2017-03-16 2019-11-12 英特尔公司 System, method and apparatus for user plane traffic forwarding
CN106921746A (en) * 2017-03-22 2017-07-04 重庆允升科技有限公司 A kind of data synchronous system and method for data synchronization
CN106953761B (en) * 2017-03-29 2020-03-10 恒生电子股份有限公司 Server disaster recovery system and message processing method based on disaster recovery system
CN106982259A (en) * 2017-04-19 2017-07-25 聚好看科技股份有限公司 The failure solution of server cluster
CN107239505B (en) * 2017-05-10 2020-09-15 广州杰赛科技股份有限公司 Cluster mirror synchronization method and system
CN107329853A (en) * 2017-06-13 2017-11-07 上海微烛信息技术有限公司 Backup method, standby system and the electronic equipment of data-base cluster
CN109428740B (en) * 2017-08-21 2020-09-08 华为技术有限公司 Method and device for recovering equipment failure
CN107819872A (en) * 2017-11-22 2018-03-20 聚好看科技股份有限公司 Ask the method and device of network data
CN108023772B (en) * 2017-12-07 2021-02-26 海能达通信股份有限公司 Abnormal node repairing method, device and related equipment
CN110417842B (en) 2018-04-28 2022-04-12 北京京东尚科信息技术有限公司 Fault processing method and device for gateway server
CN109039747B (en) * 2018-08-09 2021-06-11 北京搜狐新媒体信息技术有限公司 Dual-computer hot standby control method and device for DPDK service
CN109254876A (en) * 2018-09-11 2019-01-22 郑州云海信息技术有限公司 The management method and device of database in cloud computing system
CN109561151B (en) * 2018-12-12 2021-09-17 北京达佳互联信息技术有限公司 Data storage method, device, server and storage medium
CN109669410B (en) * 2018-12-17 2020-06-09 积成电子股份有限公司 Communication master supervisor election method based on multi-source information
US10887382B2 (en) 2018-12-18 2021-01-05 Storage Engine, Inc. Methods, apparatuses and systems for cloud-based disaster recovery
US10958720B2 (en) 2018-12-18 2021-03-23 Storage Engine, Inc. Methods, apparatuses and systems for cloud based disaster recovery
US11176002B2 (en) 2018-12-18 2021-11-16 Storage Engine, Inc. Methods, apparatuses and systems for cloud-based disaster recovery
US11252019B2 (en) 2018-12-18 2022-02-15 Storage Engine, Inc. Methods, apparatuses and systems for cloud-based disaster recovery
US11489730B2 (en) 2018-12-18 2022-11-01 Storage Engine, Inc. Methods, apparatuses and systems for configuring a network environment for a server
US11178221B2 (en) 2018-12-18 2021-11-16 Storage Engine, Inc. Methods, apparatuses and systems for cloud-based disaster recovery
US10983886B2 (en) 2018-12-18 2021-04-20 Storage Engine, Inc. Methods, apparatuses and systems for cloud-based disaster recovery
CN109451063B (en) * 2018-12-24 2021-08-17 北京东土科技股份有限公司 Server hot standby method and system
CN109756573B (en) * 2019-01-15 2022-02-08 苏州链读文化传媒有限公司 File system based on block chain
CN109560903B (en) * 2019-02-14 2024-01-19 湖南智领通信科技有限公司 Vehicle-mounted command communication system for complete disaster recovery
CN110120889B (en) * 2019-05-06 2022-05-20 网易(杭州)网络有限公司 Data processing method, device and computer storage medium
CN110505269A (en) * 2019-06-21 2019-11-26 广州虎牙科技有限公司 Transaction processing system, method for processing business and server
CN110445664B (en) * 2019-09-03 2022-08-09 湖南中车时代通信信号有限公司 Multi-center server dual-network main selection system of automatic train monitoring system
CN112866314B (en) * 2019-11-27 2023-04-07 上海哔哩哔哩科技有限公司 Method for switching slave nodes in distributed master-slave system, master node device and storage medium
CN111651291B (en) * 2020-04-23 2023-02-03 国网河南省电力公司电力科学研究院 Method, system and computer storage medium for preventing split brain of shared storage cluster
CN111565233A (en) * 2020-05-28 2020-08-21 吉林亿联银行股份有限公司 Data transmission method and device
CN111641716B (en) * 2020-06-01 2023-05-02 第四范式(北京)技术有限公司 Self-healing method of parameter server, parameter server and parameter service system
CN111988387B (en) * 2020-08-11 2023-05-30 北京达佳互联信息技术有限公司 Interface request processing method, device, equipment and storage medium
CN112579362A (en) * 2020-12-29 2021-03-30 广州鼎甲计算机科技有限公司 Backup method, system, device and storage medium for Shentong database cluster
CN114285832A (en) * 2021-05-11 2022-04-05 鸬鹚科技(深圳)有限公司 Disaster recovery system, method, computer device and medium for multiple data centers
CN114124928B (en) * 2021-09-27 2023-07-14 苏州浪潮智能科技有限公司 Method, device and system for quickly synchronizing files between devices
CN115277379B (en) * 2022-07-08 2023-08-01 北京城市网邻信息技术有限公司 Distributed lock disaster recovery processing method and device, electronic equipment and storage medium
CN115658368B (en) * 2022-11-11 2023-03-28 北京奥星贝斯科技有限公司 Fault processing method and device, storage medium and electronic equipment
CN115914418B (en) * 2023-03-09 2023-06-30 北京全路通信信号研究设计院集团有限公司 Railway interface gateway equipment
CN116436768B (en) * 2023-06-14 2023-08-15 北京理想信息科技有限公司 Automatic backup method, system, equipment and medium based on cross heartbeat monitoring

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1719831A (en) * 2005-07-15 2006-01-11 清华大学 High-available distributed boundary gateway protocol system based on cluster router structure
CN101060391A (en) * 2007-05-16 2007-10-24 华为技术有限公司 Master and spare server switching method and system and master server and spare server
CN101179432A (en) * 2007-12-13 2008-05-14 浪潮电子信息产业股份有限公司 Method of implementing high availability of system in multi-machine surroundings

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1719831A (en) * 2005-07-15 2006-01-11 清华大学 High-available distributed boundary gateway protocol system based on cluster router structure
CN101060391A (en) * 2007-05-16 2007-10-24 华为技术有限公司 Master and spare server switching method and system and master server and spare server
CN101179432A (en) * 2007-12-13 2008-05-14 浪潮电子信息产业股份有限公司 Method of implementing high availability of system in multi-machine surroundings

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106162637A (en) * 2015-04-10 2016-11-23 成都鼎桥通信技术有限公司 The implementation method of LTE broadband cluster multinode mirror image networking and device
CN106162637B (en) * 2015-04-10 2019-10-25 成都鼎桥通信技术有限公司 The implementation method and device of the broadband LTE cluster multinode mirror image networking

Also Published As

Publication number Publication date
CN101309167A (en) 2008-11-19

Similar Documents

Publication Publication Date Title
CN101309167B (en) Disaster allowable system and method based on cluster backup
CN102346460B (en) Transaction-based service control system and method
US9208029B2 (en) Computer system to switch logical group of virtual computers
CN113014634B (en) Cluster election processing method, device, equipment and storage medium
WO2018103318A1 (en) Distributed transaction handling method and system
US7937437B2 (en) Method and apparatus for processing a request using proxy servers
US20100333094A1 (en) Job-processing nodes synchronizing job databases
CN100570607C (en) The method and system that is used for the data aggregate of multiprocessing environment
US7225356B2 (en) System for managing operational failure occurrences in processing devices
CN103414712B (en) A kind of distributed virtual desktop management system and method
US20090113034A1 (en) Method And System For Clustering
CN106850260A (en) A kind of dispositions method and device of virtual resources management platform
CN104081354A (en) Managing partitions in a scalable environment
CN102761528A (en) System and method for data management
CN102938705A (en) Method for managing and switching high availability multi-machine backup routing table
CN105393519A (en) Failover system and method
CN106874143A (en) Server backup method and backup system thereof
CN110377664B (en) Data synchronization method, device, server and storage medium
CN115080436A (en) Test index determination method and device, electronic equipment and storage medium
US8201017B2 (en) Method for queuing message and program recording medium thereof
KR19990043986A (en) Business take over system
CN113608836A (en) Cluster-based virtual machine high availability method and system
CN112243030A (en) Data synchronization method, device, equipment and medium of distributed storage system
CN114039978B (en) Decentralized PoW computing power cluster deployment method
CN115168042A (en) Management method and device of monitoring cluster, computer storage medium and electronic equipment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20110420

Termination date: 20110627