CN101309167B

CN101309167B - Disaster allowable system and method based on cluster backup

Info

Publication number: CN101309167B
Application number: CN200810048216XA
Authority: CN
Inventors: 王芙蓉; 史军; 莫益军; 黄辰; 卢正新; 李晨
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2008-06-27
Filing date: 2008-06-27
Publication date: 2011-04-20
Anticipated expiration: 2028-06-27
Also published as: CN101309167A

Abstract

The invention relates to a disaster tolerance system and a disaster tolerance method based on the cluster backup; the disaster tolerance system includes a load agent unit and a load service unit; the load service unit includes at least two server nodes; wherein, a server node is the main server node which has service interaction with the user terminal; the other servers are non-main server nodes; the service interaction includes that when the main server node writes data to the local database of the main server node, the main server node backups the data in the local databases of the non-main server nodes in the load service unit; the load agent unit includes load schedulers which are respectively connected with the server nodes; when the main server node is detected to be breakdown, the main server node is executed with failure transferring operation; one server node of the non-main server nodes is selected as the main server node to have service interaction with the user terminal. The disaster tolerance system and the disaster tolerance method improve the utilization rate of the single server node and the whole utilization rate of the multiple peer server nodes.

Description

Disaster tolerance system and method based on cluster backup

Technical field

The invention belongs to the network system field, especially relate in the network system disaster tolerance system and method based on cluster backup.

Background technology

Growing along with the develop rapidly of modern network technology and number of users, the scale of network is increasing, therefore for the server in the network, no matter be single-machine capacity, or the quantity of server all will promote to meet consumers' demand synchronously.Simultaneously, also more and more higher to the requirement of the reliability of server and disaster tolerance ability.

For having a kind of reliable disaster tolerance mechanism under existence conditions, to try one's best after taking place in the disaster that causes servers go down to recover service response, must carry out redundancy backup to server data to the user.Because unit operation can not reach requirement far away, so important data must adopt the main-standby mode of the backup of main computer and guest machine, promptly be equipped with to main realization system and data file synchronously.This set up backup machine independently to come method that main computer is backed up we are referred to as the independent redundancy back mechanism.Existing disaster tolerance technology scheme mainly is to carry out the disaster recovery method based on the independent redundancy back mechanism such as dual-computer redundancy backup or multimachine redundancy backup.

Dual-computer redundancy backup is meant have two machines to keep the process of system and data sync, guest machine constantly to detect the situation of change of the main computer image file of work at present and system by telecommunication cable and delta data is backed up in system's running.Main computer and backup machine are taked the strategy of redundancy backup one to one.The patent No. is that 200410002153.6 patent of invention " a kind of implementation method of webmaster duoble computer disaster-tolerance back-up " has been put down in writing a kind of existing dual-host backup method, be by under the normal condition on runtime server operational system, and the real-time data with in the system copy on the backup server of disaster tolerance, at least on described backup server, move first monitoring program, this first monitoring program and described runtime server connect detecting the operating state of runtime server, and carry out disaster-tolerant recovery because disaster starts network management system when causing paralysing automatically on backup server detecting described runtime server.

A kind of one-to-many after the multimachine redundancy backup is meant many main computers and backup machine made rational planning for or the strategy of multi-to-multi redundancy backup.200510034607.2 patent of invention " method of multi-computer back-up " put down in writing a kind of method of multi-computer back-up, wherein arbitrary main computer connects one or more guest machine, arbitrary guest machine connects one or more main computer, all record the IP address and the backup cycle of connected main computer in the configuration file of every guest machine, all record the IP address or the machine name of connected guest machine in the configuration file of every main computer.This multi-computer back-up has improved the flexibility of backup, and a main computer can respond the backup request of many guest machines, and a guest machine also can propose backup request to many main computers; Realize the guest machine regular request, main computer does not need to detect in real time the situation of change of image file, effectively reduces the performance impact of mirror image software for main computer.

But no matter this method based on the independent redundancy back mechanism is dual-computer redundancy backup or multimachine redundancy backup, all needing to use independently, guest machine comes main computer is backed up, the guest machine most of the time is in silent status when main computer is working properly, carry out when just carrying out input and Data Update and detect and backup operation, no matter be one to one, one-to-many, or the backup mode of multi-to-multi, its machine redundancy is all very big, the unit utilance is very low, for example a pair of five backup mode, if 10 main computers are arranged in the network, need to use 2 guest machines so, these 2 guest machines cause waste very big on the resource because the unit utilization ratio is low, have also increased the cost on the hardware; And when carrying out disaster-tolerant recovery, the separate server resource of the operate as normal of a plurality of equities is not carried out making rational planning for and divide accent on the whole, overall utilization rate is also very low in some cases.

Summary of the invention

The objective of the invention is defective, a kind of disaster tolerance system and method based on cluster backup is provided at above-mentioned disaster tolerance technology based on independent redundancy backup.

For achieving the above object, the invention provides a kind of disaster tolerance system, comprising based on cluster backup: load agent unit and load service unit, the load service unit comprises at least two server nodes, each server node comprises a local data base, interconnects between each server node; The load service unit comprises that one carries out the primary server joint of service interaction with user terminal, and all the other server nodes are non-primary server joint; Described service interaction comprises: local data base read data and/or the write data of primary server joint in primary server joint; When the local data base write data of primary server joint in primary server joint, also comprise: the described data of local data library backup in the non-primary server joint of primary server joint in the load service unit;

The load agent unit comprises the load dispatch device, described load dispatch device is connected respectively with each server node in the load service unit, when the heartbeat that detects primary server joint stops, this primary server joint is carried out the failure transfer operation, from non-primary server joint, select a server node to carry out service interaction as primary server joint and user terminal.

For achieving the above object, the present invention also provides a kind of disaster recovery method based on cluster backup, comprising:

Primary server joint and user terminal carry out service interaction; Described service interaction comprises: local data base read data and/or the write data of primary server joint in primary server joint; When the local data base write data of primary server joint in primary server joint, also comprise: the described data of local data library backup in the non-primary server joint of primary server joint in the load service unit;

When the heartbeat that the load agent unit detects current main service node when the load agent unit stops, from non-primary server joint, selecting a server node to carry out service interaction as primary server joint and user terminal.

The present invention is based on the disaster tolerance system and the method for cluster backup, by introducing cluster backup mechanism, transform the traditional standby server node as with the primary server joint equity server node, make that any two server nodes backup each other in the cluster, in the time of a station server node failure, because other server node all has the data backup of this server node in the cluster, so increased the redundancy of backup.Simultaneously, because the present invention need not to server node is provided with independently backup server node, thereby improved the overall utilization rate of a plurality of server nodes of unit utilance and equity.

Description of drawings

Fig. 1 is the structural representation of a kind of disaster tolerance system embodiment one based on cluster backup of the present invention;

Fig. 2 is the structural representation of a kind of disaster tolerance system embodiment two based on cluster backup of the present invention;

Fig. 3 is the structural representation of a kind of disaster tolerance system embodiment three based on cluster backup of the present invention;

Fig. 4 is the structural representation of a kind of disaster tolerance system embodiment four based on cluster backup of the present invention;

Fig. 5 is the flow chart of a kind of disaster recovery method embodiment based on cluster backup of the present invention;

Fig. 6 is a kind of based on initialization election process flow chart in the disaster recovery method of cluster backup for the present invention;

Fig. 7 is a kind of based on load sharing process flow chart in the disaster recovery method of cluster backup for the present invention;

Fig. 8 is a kind of based on cluster backup process flow diagram in the disaster recovery method of cluster backup for the present invention;

Fig. 9 is a kind of based on load failure transfer process flow chart in the disaster recovery method of cluster backup for the present invention.

Embodiment

Below by drawings and Examples, technical scheme of the present invention is described in further detail.

Fig. 1 is the structural representation of a kind of disaster tolerance system embodiment one based on cluster backup of the present invention.As shown in Figure 1, the disaster tolerance system based on cluster backup among this embodiment comprises: load agent unit LoadProxy 100 and load service unit LoadServer 200.

The LoadServer 200 of load service unit unit comprises server node ServerNode 210, among this embodiment with S-Node1, S-Node2, S-Node3, five server node ServerNode of S-Node4 and S-Node5 210 are example, comprise a local data base Database 220 in each server node.Server node S-Node1, S-Node2, S-Node3, S-Node4 and S-Node5 comprise local data base Data1 respectively, Data2, Data3 interconnects between each server node of Data4 and Data5.Comprise user terminal User_1 among this embodiment, User_2, User_3 ..., User_n.The load service unit comprises that one carries out the primary server joint of service interaction with user terminal User_n, this embodiment is the primary server joint of user terminal User_n with S-Node5, other server nodes S-Node1, S-Node2, S-Node3 and S-Node4 are non-primary server joint.Wherein the service interaction between user terminal User_n and the primary server joint S-Node5 comprises: local data base Data read data and/or the write data of primary server joint S-Node5 in primary server joint S-Node5; When the local data base Data5 write data of primary server joint S-Node5 in primary server joint S-Node5, also comprise: the local data base Data1 in the non-primary server joint of primary server joint S-Node5 in load service unit 200, Data2, Data3 and Data4 back up described data.

Load agent unit LoadProxy100 comprises load dispatch device LoadDispatcher110.Server node S-Node1 among load dispatch device LoadDispatcher 110 and the load service unit LoadServer 200, S-Node2, S-Node3, S-Node4 and S-Node5 connect respectively, when the heartbeat that detects primary server joint S-Node5 stops, primary server joint S-Node5 is carried out the failure transfer operation, from non-primary server joint, select a server node to carry out service interaction as primary server joint and user terminal.

Among this embodiment, load agent unit LoadProxy is used for carrying out when user terminal inserts first the cluster load bridging, user terminal is assigned to a suitable servers node, after this as long as this server node is working properly, the Servers-all request of this user terminal will be directed to this server node, and provide respective service by this server node.Server node resource in the load service unit of load agency pool and optimization rear end, making it efficiently provides service to user terminal, effectively control load service unit flow utilizes the cluster backup advantage of load service unit to carry out disaster tolerance when disaster takes place.

Among this embodiment, load service unit LoadServer is the cluster that the user data of user terminal is realized the peer server node of full redundancy backup, is the entity that the user is provided real service simultaneously.With a plurality of peer server nodes in the rear end that same services is provided adopt Clustering form one have the territory notion bunch.In each bunch, select a bunch of head as primary server joint and several non-primary server joint by election algorithm.Wherein in several non-primary server joint, several secondary bunch head are as candidate's primary server joint, and remaining member's conduct is from server node.The consistency maintenance of the user data of user terminal in master server guarantees bunch.For bunch in all server nodes, any two server nodes wherein are the redundancy backup of the user data of user terminal each other.Principal and subordinate's server is reciprocity, as broad as long fully on the function that is equipped with, and all is general server node ServerNode.The whole bunch of access address AccessIP unification by the load balancing cluster mode working load scheduler LoadDispatcher of IP tunnel provides service entrance first to bunch outer user's terminal.The modal situation of service first here is exactly the user terminal registration.Any one or several server node heartbeat stop in load dispatch device LoadDispatcher detects bunch, promptly break down, as when delaying machine, and the server node that can dispatch operate as normal is taken over it and served and realize disaster tolerance.

Fig. 2 is the structural representation of a kind of disaster tolerance system embodiment two based on cluster backup of the present invention.As shown in Figure 2, the load dispatch device LoadDispatch 110 among this embodiment comprises:

Heartbeat detection module HBDetecter 111 keeps heartbeat to be connected with each server node among the load service unit LoadServer 200, is used to detect the heartbeat message of each server node.The performance lotus number (Capability and Load Number is called for short CLN) that can comprise server node in the heartbeat message.CLN is meant the mark that the index to aspects such as the performance of server node and loads obtains according to the weighting lotus method of figuring, and can be used as the normative reference of the heartbeat ability of passing judgment on server node.CLN generally regularly reports to load dispatch device LoadDispatcher 110 by heartbeat message.Heartbeat detection module HBDetecter 111 generally is used for regularly detecting the heartbeat message of server node, when the heartbeat message that detects certain server node transmission is overtime, can think that this server node breaks down, load dispatch device LoadDispatch 110 will start the failure transfer operation.

Election module ElectionBox 112, be connected with described heartbeat detection module HBDetecter 111, be used for regularly receiving the heartbeat message of heartbeat detection module HBDetecter 111 detected each server node, the heartbeat message tabulation of periodicmaintenance server node.Election module ElectionBox 112 is data maintenance module of load dispatch device LoadDispatcher 110.When comprising CLN in the heartbeat message, election module ElectionBox 112 is the marking bill CLNTicket of reception server node regularly, and the marking of periodicmaintenance server node tabulates, and elects master server, candidate's master server and from server.Marking bill CLNTicket field generally is constructed as follows: principal and subordinate's identifier W/M/S of server node, and the hardware address identifier LSID of this server node, this server node performance lotus is counted CLN.Scheduling strategy module DispatchStrategy 113, be connected with described election module ElectionBox 112, be used for according to heartbeat message tabulation make a strategic decision out the primary server joint that the agency transmits the IP address or detect the fail IP address of the primary server joint that shifts of needs.Scheduling strategy module DispatchStrategy 113 is that the corresponding algorithm policy decision-making of employing is shifted in distribution and the failure of carrying out load to the information of collecting on the non-primary server joint mainly.Load distribution and failure are shifted and are all used minimum statistics weighting lotus to figure method.The IP of the primary server joint LoadServer that the agency that makes a strategic decision out after scheduling strategy module DispatchStrategy 113 mainly handles the data among the election module ElectionBox 112 transmits or one detect the machine of the delaying IP of the primary server joint that shifts that need fail, and give and be redirected forwarding module Redirector 114 and handle.

Be redirected transponder module Redirector 114, be connected with described scheduling strategy module DispatchStrategy113, the fail IP address of the primary server joint that shifts of the registration service request of transmitting user terminal User_n or indication needs is redirected in the IP address that is used for the primary server joint transmitted according to the agency who obtains from described scheduling strategy module DispatchStrategy 113.

Fig. 3 is the structural representation of a kind of disaster tolerance system embodiment three based on cluster backup of the present invention.As shown in Figure 3, the load agent unit LoadProxy 100 among this embodiment also comprises: a redundancy backup device Baker 120; Described load dispatch device LoadDispatch 110 also comprises an advertisement module Ads 115, is connected with redundancy backup device Baker 120.

Advertisement module Ads 115 is used for regularly sending to the redundancy backup device advertising message of load dispatch device, described advertising message comprises heartbeat message, redundancy backup device Baker 120 receives the advertising message of load dispatch device, tabulates according to the heartbeat message that the heartbeat message in the advertising message can upgrade among the redundancy backup device Baker 120 synchronously.

Redundancy backup device Baker 120 is IP service backup machines that load dispatch device LoadDispatcher 110 carries out the dual-computer redundancy Hot Spare, can use general address redundancy protocol realization among the LINUX.Because load dispatch device LoadDispatcher110 plays an important role in whole disaster tolerance system, the effect of redundancy backup device Baker 120 is to think that load dispatch device LoadDispatcher 110 breaks down when the advertising message that receives load dispatch device LoadDispatcher110 is overtime, just can start the virtual ip address service, convert the operating state of redundancy backup device to the load dispatch device, make the continue work of former load dispatch device LoadDispatcher 110 of redundancy backup device Baker 120, make former load dispatch device LoadDispatcher 110 work continue to carry out.Carry out the use test of general address redundancy protocol subsequently, the IP service of proof load scheduler LoadDispatcher 110 asks uninterruptedly to carry out to the user.Redundancy backup device Baker 120 is load dispatch device LoadDispatcher 120 function replisomes, regularly receive the advertising message of load dispatch device LoadDispatcher 120, to keep and the synchronous renewal of electing the heartbeat message among the module ElectionBox 112.

Among this embodiment, by redundancy backup device Baker is set in load agent unit LoadProxy, load dispatch device LoadDispatch is carried out the IP redundancy Hot Spare, ensure its robustness, reduce the Single Point of Faliure risk of load agent unit, further improved the disaster tolerance ability of disaster tolerance system.

Fig. 4 is the structural representation of a kind of disaster tolerance system embodiment four based on cluster backup of the present invention.As shown in Figure 4, the server node ServerNode 210 among this embodiment comprises:

IP configurator module IPConfiger 211, when the server node ServerNode under this IP configuration module IPConfiger 211 210 is the primary server joint that redefines, be used for the redundant IP address configuration order that responsive load scheduler LoadDispatcher 110 sends, and be the IP address of former primary server joint with the IP address configuration of the server node ServerNode 210 under this IP configuration module IPConfiger 211, make this server node former primary server joint that continues carry out work.

Marking module TicketMarker 212 is used for performance and/or load index that periodicity is collected this server node, calculates the performance lotus of this server node and counts CLN.

Heartbeat module HeartBeat 213, keep heartbeats to be connected with load dispatch device LoadDispatcher 110, regularly described performance lotus counted CLN and are carried at and are sent to load dispatch device LoadDispatcher 110 in the heartbeat message.

Event notice module I nformer 214, event notice are divided into up event notice (Uplink Notice) and descending event notice (Downlink Notice).Up event notice sends the notice request of revising related data by carrying out from server to master server; Descending event notice is carried out by master server, to the order that sends data synchronization updating from server.The master server acquiescence can be carried out direct read operation and direct write operation to local data; Only to the direct read operation of local data, just can carry out direct write operation when only ought receive the descending event notice of master server from server acquiescence.

When the server node ServerNode 210 under this event notice module 214 is primary server joint, primary server joint is carried out up event notice UplinkNotice by event notice module I nformer214, and the non-primary server joint among the notification payload service unit LoadServer 200 sends the order of data synchronization updating; When the server node ServerNode210 under this event notice module I nformer 214 is non-primary server joint, primary server joint is accepted descending event notice Downlink Notice by event notice module I nformer 214, receives the order of the data synchronization updating of the primary server joint transmission among the load service unit LoadServer 200.

Data read/write operational module DataWriter/Reader, be used for local data base read data and/or write data, write and new data more the read operation sense data during write operation, read operation all is direct read operation, and write operation is divided into direct write operation and indirect write operation again.Data read/write operational module DataWriter/Reade comprises data reading operation module Data Reader 2151 and data write operation module Data Wirter 2152.Directly write operation Direct Read carries out direct read operation Direct Read to local data base Database, and directly write operation Direct Write directly carries out write operation to local data base.Write operation IndirectWrite is meant from server and directly local data base is not carried out write operation indirectly, but the mode by event notice by the direct write operation Direct Write of primary server joint to the local data base of primary server joint after, send the data synchronization updating instruction by primary server joint again and make non-primary server joint start the direct write operation Direct Write of local data base separately.Write operation Direct Write is called backup operation again indirectly, among this embodiment, data write operation module Data Wirter 2152 is connected with event notice module I nformer214, when to local data base Database write data, be used for by event notice module I nformer214 the described data of local data library backup in the non-primary server joint of load service unit LoadServer 200.

Fig. 5 is a flow chart that the present invention is based on the disaster recovery method of cluster backup.As shown in Figure 5, the disaster recovery method among this embodiment comprises:

Step 10, primary server joint and user terminal carry out service interaction; Described service interaction comprises: local data base read data and/or the write data of primary server joint in primary server joint; When the local data base write data of primary server joint in primary server joint, also comprise: the described data of local data library backup in the non-primary server joint of primary server joint in the load service unit.

When the heartbeat that step 20, load agent unit detect current main service node when the load agent unit stops, from non-primary server joint, selecting a server node to carry out service interaction as primary server joint and user terminal.Local data base in the non-primary server joint of described primary server joint in the load service unit is write described data and is comprised; The non-primary server joint of primary server joint in the load service unit sends data synchronization updating instruction, the described data of local data library backup in the non-primary server joint in the load service unit.

Below in conjunction with the idiographic flow that the present invention is based on the disaster recovery method of cluster backup based on the disaster tolerance system explanation of cluster backup.The disaster recovery method that the present invention is based on cluster backup can comprise four-stage:

Phase I, election process.Conduct an election after starting based on the disaster tolerance system of cluster, each server node LoadServer periodically carries out the marking of performance load number CLN, and CLNTicket is sent to election module among the load agent unit LoadProxy.Load dispatch device DispatchStrategy can make and elect the two kinds of role ROLE that are born in the module: primary server joint (Master Node is called for short M-Node), non-primary server joint.According to design requirement or custom, non-primary server joint can be divided into candidate primary server joint (Candidate Node, be called for short C-Node) again and from server node (SlaveNode is called for short S-Node).Each server carries out the configuration of database with node according to self principal and subordinate role.Candidate's primary server joint is that load agent unit LoadProxy distinguishes it, and candidate's primary server joint is configured to from server node physically.Wherein primary server joint has the direct read authority of the user data of user terminal; Candidate's primary server joint and have the direct read right and the indirect write permission of the user data of user terminal from server node.The load agent unit LoadProxy new primary server joint of selection in candidate's primary server joint of can making a strategic decision.

Second stage, load sharing process.When the service registry request of user terminal arrives load agent unit LoadProxy, load agent unit LoadProxy goes out a server node LoadServer who has minimum statistics weighting lotus number according to the server performance load ordered series of numbers voting plan in the election module, and transmits user's request.This process guarantees the optimum allocation of load in the cluster, avoids producing the disaster that causes the certain server node overload because of the server node mass flow discrepancy.

Phase III, cluster backup process.When the service registry request of user terminal arrives server node LoadServer, server node LoadServer starts the cluster backup process of the user profile data of user terminal, makes that any two server node LoadServer backup each other in the cluster.Have only the response of just returning user's register requirement after the cluster backup process is finished, this moment, the user terminal registration was finished, and after this user terminal can be initiated the business service request to server node.Write operation in business service also all must carry out the cluster backup process.

Quadravalence section, load failure transfer process.When the heartbeat that detects certain server node LoadServer1 as load agent unit LoadProxy stops, just initiating load failure transfer process.Certain performance load number is less and keep the IP of the server node LoadServer2 configuration server node LoadServer1 of heartbeat, use the user ID data that the user is provided and continue service.The process that also will conduct an election is shifted in the failure of primary server joint.

Below disaster recovery method four-stage that the present invention is proposed based on cluster backup specifically describe.

Phase I, election process.

Just began to carry out election process when should start based on the disaster tolerance system of cluster backup.Load agent unit LoadProxy has two network interface card configurations, wherein insert IP address (AccessIP) and be used for the cluster service inlet, be used to receive the registration service request of user terminal, Agent IP address (ProxyIP) is used for the server node trunking communication with the load service unit of rear end.Election process proof load agent unit LoadProxy is prior to server node LoadServer initialization.The IP address that load agent unit LoadProxy is arranged in the configuration file of every station server node LoadServer, initiate server node LoadServer must send heartbeat to load agent unit LoadProxy and add cluster.Load agent unit LoadProxy can initiate election under following two kinds of situations: one, during system initialization; Two, the primary server joint heartbeat stops, promptly breaking down, as when delaying machine.The two election process is similar, process entry condition difference to some extent wherein, initiated by load agent unit LoadProxy when the former is system start-up, the latter is initiated by load agent unit LoadProxy when finding the heartbeat timeout of primary server joint in the heartbeat message tabulation in the election unit., comprise the steps: for for example Fig. 6 with the initialization election process

Steps A 1, LoadProxy initialization start each module.After using the general address redundancy protocol to be configured to LoadDispatcher and redundancy backup device Baker, start the load agent unit LoadProxy process of Baker earlier, and the ElectionBox thread is provided with higher priority, HBDetecter, DispatchStrategy, 3 module threads such as Redirector are in low priority, and these 3 threads are in silent status always during Baker is as the redundancy backup machine.The LoadProxy process of Baker does not start Ads module thread.Restart the LoadProxy process on the LoadDispatcher, start 5 module threads such as ElectionBox, HBDetecter, DispatchStrategy, Redirector and Ads successively.

Steps A 2, LoadServer initialization start each module.The last operation of ServerNode LoadServer process, and start TicketMarker, HeartBeat, DataWriter, DataReader, 6 thread modules such as IPConfiger and Informer successively.

Steps A 3, LoadServer obtain the ProxyIP of LoadProxy.LoadServer reads local profile ls.cfg, therefrom obtains the ProxyIP of LoadProxy, the information such as MAC Address of LoadDispatcher and Baker.

Steps A 4, LoadServer send heartbeat to LoadProxy, and the cycle is T.After obtaining ProxyIP, LoadServer sends heartbeat message to LoadProxy in the cycle.Heartbeat message is a UDP datagram, and mainly by the message identifier MID of heartbeat message, three parts of CLNTicket field specifier TFlag and CLNTicket field are formed.If TFlag is FALSE, LoadProxy is identified as common heartbeat message, is indifferent to the back field; If be TRUE, then need to resolve the CLNTicket field of back.The CLNTicket field is constructed as follows:

This machine principal and subordinate identifier W/M/S, this machine hardware address identifier LSID, this machine property lotus is counted CLN.

It is main that M represents, S represents from (C represents candidate master), the situation of assigned role not also when W is used for new down LoadServer node and adds cluster, and in the time of the host node operate as normal, this node can be designated as S;

LSID can use the MAC Address of LoadServer;

CLN is a weighting lotus number, is to this machine performance and the comprehensive weighting parameters of weighing of loading index.Property lotus number and the minimum weight lotus method of figuring are defined as follows:

Supposing has one group of server S={ S0 in certain cluster, S1,, S n-1}, the cpu busy percentage of U (Si) expression server S i, the current memory usage of M (Si) expression server S i, the current hard disk utilance of D (Si) expression server S i, the current linking number of C (Si) expression server S i, the property lotus number of Si is so:

CLN(Si)＝{C(Si)*[0.45*U(Si)+0.45*M(Si)+0.1*D(Si)]}；

Property lotus number is big more, and this server serviceability is poor more.

Current new connection request can be sent to server S m, when

And if only if server S m meets the following conditions:

CLN(Sm)＝min{CLN(Si)}，0≤i≤n-1。

If CLN is the statistical value in a period of time, claim that then this algorithm is that minimum statistics weighting lotus is figured method.

Steps A 5, LoadProxy create the heartbeat message tabulation.

The heartbeat detection module of LoadProxy will extract the IP address of each server and set up the heartbeat message tabulation as keyword after detecting heartbeat message, list item has IP address (being designated as IP), server state (being designated as State), MAC Address (being designated as MAC), role identification (being designated as ROLE), the role confirms sign (being designated as Confirm), the statistical value of CLN (being designated as StatCLN), CLN arithmetic mean (being designated as Average CLN), the historical stream load number (being designated as HistoryLN) that divides, and to this table initialization.Possible example of heartbeat message tabulation is as follows:

The tabulation of table 1 heartbeat message

Annotate: state field, ALIVE represents that heartbeat exists, DEAD represents that heartbeat stops.The ROLE field, C represents the candidate primary server joint, and S represents that from server node, M represents primary server joint, and W represents that the role of server node does not specify as yet.The Confirm field, T represents to finish the role and confirms, and F represents not finish the role and confirms.

Steps A 6, TicketMarker use the method for figuring of the weighting lotus in the steps A 4 to calculate CLN.

Steps A 7, LoadServer utilize heartbeat message to send CLNTicket.After LoadServer collects the related content of CLNTicket field, with the cycle is that (T is a heart beat cycle to 10T, the front coefficient can be suitable empirical value, here be example with 10) send heartbeat to LoadProxy, be that TFlag field in the heartbeat message is filled in TRUE one time every the time of 10T, in common heartbeat message, fill in FALSE;

Steps A 8, LoadProxy put it into ballot box after receiving CLNTicket.The ballot box of LoadProxy is inserted CLN in the CLN statistical value of heartbeat message tabulation after receiving CLNTicket.

Steps A 9, ElectionBox add up the property lotus number of heartbeat message tabulation.After arriving a threshold value, CLN statistical value number calculates the measurement parameter of arithmetic average CLN as the performance load index of this LoadServer in timing statistics; When not producing arithmetic average CLN, all Use Defaults.

Steps A 10, DispatchStrategy carry out principal and subordinate's election according to statistical lotus number in the ballot box.DispatchStrategy sorts to statistical lotus number in the ballot box according to the lotus of minimum statistics weighting described in the steps A 4 method of figuring, minimum as host node, less 3-5 as candidate's host node (according to total node number decision), other are from node, identify in the heartbeat message tabulation.

Tabulation sends role's directive command to LoadServer according to heartbeat message for steps A 11, ElectionBox.

Steps A 12, LoadServer dispose according to carrying out the role, and confirm in follow-up CLNTicket.Host node is configured to MySQL main, from node configuration be from, the carrying out that principal and subordinate's data are duplicated in the cluster backup of back just can utilize the function realization of MySQL like this.Finish the configuration back and in CLNTicket, this machine role W is revised as respective value M or S.

Steps A 13, LoadProxy check principal and subordinate's identification field in follow-up CLNTicket.Identify correct LoadProxy the role in the heartbeat message tabulation is confirmed that identification renewal is TRUE; Identify the incorrect role's directive command of then retransmitting up to identifying correctly.

Steps A 14, LoadProxy check that all roles confirm sign in the heartbeat message tabulation.Think when all signs are TRUE that the role indicates success, election process finishes, can the starting load assigning process.

Second stage, load sharing process.

When user's registration service request sends to the AccessIP of LoadProxy, LoadProxy will use the minimum statistics weighting lotus method of figuring to carry out load allocating according to the heartbeat server resource.Load sharing process comprises the steps: as shown in Figure 7

Step B1, user terminal send the registration service request to the LoadProxy that has AccessIP.

Step B2, DispatchStrategy are according to certain algorithm load server ip of making a strategic decision out.

Have the arithmetic average CLN value of the LoadServer of Dynamic Maintenance in the heartbeat server tabulation, go out a LoadServer according to the decision-making of CLN minimum value distribution principle and serve.If the LoadServer that makes a strategic decision out is a host node, there are enough resources to carry out expense when carrying out cluster backup for guaranteeing host node, set a load threshold, if surpass this load threshold then should distribute to time little LoadServer.

The HistoryLN of step B3, the corresponding list item of renewal ElectionBox.

HistoryLN is the rough Statistics of LoadProxy to rear end LoadServer loading condition, shown the customer flow that this node is gone up in history, these data have reflected the load condition of each node under this worst case of the fully loaded service of flow to a certain extent, also be a important referential data, but the statistical lotus number among the ElectionBox more can reflect this real-time node load situation to load allocating.Can take all factors into consideration these two parameters and carry out load allocating.

Step B4, Redirector are forwarded to the LoadServer that makes a strategic decision out with user's registration service request.

Redirector only plays in load sharing process and transmits the user and ask the node of making a strategic decision out, is actually the process that user's request is redirected to the active service node.

Step B5, this LoadServer start cluster backup, and detailed process is in detail referring to following cluster backup process.

Step B6, directly return user's service registry request ACK after finishing cluster backup to the user.

Finish behind the cluster backup LoadServer and without LoadProxy but directly send ACK information according to User IP to this user, indication user registration is finished, and can carry out service request.

Phase III, cluster backup process.

When the cluster backup process can occur in registration or update user information.LoadServer is in service process, if the user only stores the read operation of data, no matter for its service be host node or from node, all on this node, call the direct read operation that the DataReader module is carried out local data.The cluster backup process comprises the steps: as shown in Figure 8

Step C1, write operation requests.

Write operation requests is meant that keeper or user will be because service needed will be revised update information data, as user's account, authority information etc.

Step C2, DataWriter respond.

Any write operation is responded by the DataWriter module, and any read operation is responded by the DataReader module.DataWriter will be responsible for guaranteeing the execution of cluster backup.

Step C3, corresponding user information are set to WriteMode.

When being in WriteMode, this user profile do not allow to carry out any read operation to this user.After finishing, current write operation changes user profile into reading mode rapidly.

Step C4, judge this LoadServer role

If the role is W then changes step C5.This means that this node does not also carry out principal and subordinate's appointment, so this service temporarily will be rejected, it is not to add cluster when system initialization that this situation may occur in certain node, but just added cluster, sent heartbeat and accepted, but also do not specified the role of this node to LoadProxy.

If the role is S then changes step C7;

If the role is M then changes step C10.

Step C5, cache user request.

Because this server is not assigned the role, for not influencing the cluster backup process, this moment, server should wait further processing at local cache with user's request.

Step C6, wait LoadProxy role directive command are handled after arriving.

Because LoadServer can send heartbeat and CLNTicket to LoadProxy, LoadProxy finds can send role's directive command when this server is not assigned the role, when treating that this order arrives LoadServer, LoadServer can take out user's request and carry out service response from buffer memory.Change step C2.

Process is write in step C7, startup indirectly.

The process of writing is not indirectly directly revised local data information, but by the renewal of carrying out local data after the host node modification again, to guarantee the consistency of cluster backup data.

Step C8, Informer send the up link notice to host node.

The up link notice is meant from node notifies certain user profile to need to upgrade to host node, and Informer can indicate user and the data element that needs renewal in Uplink Notice.

Change user profile into WriteMode after step C9, host node are notified, carry out direct write operation afterwards.

Host node receives that Uplink Notice can change local user data into WriteMode, and directly upgrades local user data.The timestamp of the data of host node is that cluster is up-to-date all the time, also is the source of every part of backup in the cluster.

Step C10, start direct write operation.

After step C11, host node write operation were finished, each sent the down link notice from node to Informer in cluster, and modification user profile is reading mode.

Host node will whenever send the down link notice with individual node by the Informer module after upgrading local data in cluster, also comprise the user and the data element of renewal among the Downlink Notice.Because local data upgrades, and user's information can be revised as reading mode.

Step C12, each carries out direct write operation from node after notified.

Each node Downlink Notice obtains directly to revise local data after the lastest imformation in the cluster, becomes the backup of host node latest data.If from node user data when carrying out direct write operation is reading mode, also need user data is modified as WriteMode.

Revising user profile after step C13, direct write operation are finished is reading mode.

Behind node modification local data, user profile need be reduced to reading mode equally.After all backups all were reduced to reading mode, each became the up-to-date backup of host node from node, thereby formed cluster backup.

Step C14, service node can carry out user's service response.

Service node just can respond user's service after user's write operation is finished, and indicates this write operation success, and data or keeper's configuration data had been finished and write and cluster backup after business service can be used and upgrade after the user.

Quadravalence section, load failure transfer process.

According to role's difference of the server node that breaks down, the process that the load failure is shifted is also inequality.Load failure transfer process comprises the steps: as shown in Figure 9

Step D1, HBDetecter monitor certain station server heartbeat timeout.

Do not have heartbeat message to arrive in the agree on a time frame when HBDetecter detects certain server, start the machine of delaying and judge timer, this server of decidable stops heartbeat behind this timer expiry, and LoadProxy thinks need carry out its machine of delaying the load failure and shift.

This server of sign machine of delaying among step D2, the ElectionBox, this server ip need to be set to TakeoverIP.

Step D3, judge this LoadServer role,, change step D4 if be W; If be S, change step D5; If be M, change step D8;

Step D4, TakeoverIP zero clearing are not carried out the load failure and are shifted, and the load transfer process finishes.

Step D5, DispatchStrategy module figure the IP of the LoadServer that the method decision-making makes new advances according to minimum statistics weighting lotus.

Since among the ElectionBox dynamic memory statistical weight lotus number of rear end LoadServer, the DispatchStrategy new LoadServer that can make a strategic decision out takes over the machine server of delaying and serves.

Step D6, Redirector send TakeoverIP to new LoadServer.

New LoadServer need use TakeoverIP will delay the machine server user's orientation so far, it carries out user's service thereby continue.This process guarantees that disaster each user of back is taken place all still has an available server that it is served.

Step D7, new LoadServer use IPConfiger to carry out the configuration of TakeoverIP.

IPConfiger disposes TakeoverIP on local network interface card, user's service request will be by the transparent new LoadServer that transfers to, and this LoadServer serves it afterwards, changes m;

Step D8, LoadProxy indicate all nodes that flags parameters is put 1 mandatory services device to be in reading mode.

The machine because host node is delayed, the data backup of whole server cluster can not be carried out smoothly, at this time can not be corresponding to user's write operation, still do not influence user's read operation.But whole cluster service will be recovered in the operation afterwards voluntarily.

Step D9, DispatchStrategy figure host node and the load transfer node that the method decision-making makes new advances according to the minimum statistics lotus from candidate's master server

For guaranteeing to produce rapidly new host node, the DispatchStrategy host node that only decision-making makes new advances from candidate's master server improves response speed.The host node of machine also needs to carry out load transfer owing to delay, and new load transfer node also produces in this process.New host node and load transfer node should be tried one's best and be not same server.

Step D10, send role's directive command, send TakeoverIP to the load transfer node to new host node.

Send role's directive command to this node on the one hand behind the host node that the LoadProxy decision-making makes new advances, this host node carries out new role's configuration rapidly after receiving host node; Send TakeoverIP to the load transfer node on the other hand, user's service of the former host node that makes it to continue.

Step D11, new host node respond back Redirector to indicating new host node IP from node.

New host node is configured to respond to LoadProxy rapidly behind the master, and Redirector indicates new host node IP to each from node again, is configured modification from node.

Step D12, flags parameters zero clearing recover cluster service.

Because main and subordinate node has all been finished configuration, can will force the flags parameters zero clearing of reading mode this moment, and whole cluster recovers normal service again.Load failure transfer process finishes.In sum, disaster tolerance system and the method that the present invention is based on cluster backup has following beneficial effect:

(1), saved hardware cost greatly.Each node in the cluster is the main computer of service, is again the backup machine of other servers, does not need to increase new backup machine and realizes backup, reaches the purpose of saving cost by the complexity that improves cluster backup.Under the situation with same services throughput, the disaster recovery method of cluster backup can use server still less to achieve the goal.

(2), improved the overall utilization rate of unit utilance and reciprocity multiserver.The disaster recovery method of cluster backup transform the traditional standby machine as the server with the main computer equity in fact, the silent status expense of former guest machine most of the time fully utilized serve and cluster backup, improved the unit utilance of former guest machine, under the situation that number of servers equates, has higher service throughput behind the cluster, even whole cluster has higher overall utilization rate.

(3), in the cluster arbitrarily two-server backup each other, improved the redundancy of backup.When a station server lost efficacy, owing to other server in the cluster all has the backup of its data, so the redundancy of backup increases greatly.When single node even multinode break down in the system, there are many divided data backups available in the cluster.

(4), the server situation in the whole cluster can the planned as a whole transfer (Failover) of failing, greatly improved the disaster tolerance ability.

It should be noted that at last: above embodiment only in order to technical scheme of the present invention to be described, is not intended to limit; Although with reference to previous embodiment the present invention is had been described in detail, those of ordinary skill in the art is to be understood that: it still can be made amendment to the technical scheme that aforementioned each embodiment put down in writing, and perhaps part technical characterictic wherein is equal to replacement; And these modifications or replacement do not make the essence of appropriate technical solution break away from the spirit and scope of various embodiments of the present invention technical scheme.

Claims

1. the disaster tolerance system based on cluster backup is characterized in that, comprising: load agent unit and load service unit;

The load service unit comprises at least two server nodes, and one carries out the primary server joint of service interaction with user terminal, and all the other server nodes are non-primary server joint; Each server node comprises a local data base, interconnects between each server node; Described service interaction comprises: local data base read data and/or the write data of primary server joint in primary server joint; When the local data base write data of primary server joint in primary server joint, also comprise: the described data of local data library backup in the non-primary server joint of primary server joint in the load service unit;

The load agent unit comprises the load dispatch device, described load dispatch device is connected respectively with each server node in the load service unit, when the heartbeat that detects primary server joint stops, this primary server joint is carried out the failure transfer operation, from non-primary server joint, select a server node to carry out service interaction as primary server joint and user terminal;

Described load dispatch device comprises:

The heartbeat detection module keeps heartbeat to be connected with each server node in the load service unit, is used to detect the heartbeat message of each server node;

The election module is connected with described heartbeat detection module, is used for regularly receiving the heartbeat message of detected each server node of heartbeat detection module, the heartbeat message tabulation of periodicmaintenance server node;

The scheduling strategy module is connected with described election module, be used for according to heartbeat message tabulation make a strategic decision out the primary server joint that the agency transmits the IP address or detect the fail IP address of the primary server joint that shifts of needs;

Be redirected the transponder module, be connected with described scheduling strategy module, the fail IP address of the primary server joint that shifts of the registration service request of transmitting user terminal or indication needs is redirected in the IP address that is used for the primary server joint transmitted according to the agency who obtains from described scheduling strategy module;

Described load agent unit also comprises a redundancy backup device; Described load dispatch device also comprises an advertisement module, is connected with described redundancy backup device, is used for regularly sending to the redundancy backup device advertising message of load dispatch device, and described advertising message comprises heartbeat message; Described redundancy backup device is used for starting the virtual ip address service when the advertising message that receives the load dispatch device is overtime, converts the operating state of redundancy backup device to the load dispatch device.

2. disaster tolerance system according to claim 1 is characterized in that, described server node comprises:

The IP configurator module, when the server node under this IP configurator module is the primary server joint that redefines, be used for the redundant IP address configuration order that the responsive load scheduler sends, and the IP address configuration of the server node that this IP configurator module is affiliated is the IP address of former primary server joint;

The marking module, be used for periodically collecting the performance and/or the load index of this server node, calculate the performance lotus number of this server node, performance lotus number is meant the mark that the index to aspects such as the performance of server node and loads obtains according to the weighting lotus method of figuring, and can be used as the normative reference of the heartbeat ability of passing judgment on server node;

Heartbeat module keeps heartbeat to be connected with the load dispatch device, regularly described performance lotus number is carried at and is sent to the load dispatch device in the heartbeat message;

The event notice module, when the server node under this event notice module was primary server joint, primary server joint sent the order of data synchronization updating by the non-primary server joint in the event notice module notification payload service unit; When the server node under this event notice module was non-primary server joint, non-primary server joint received the order of the data synchronization updating of the primary server joint transmission in the load service unit by the event notice module;

The data read/write operational module is used for local data base read data and/or write data; Be connected with the event notice module, when to the local data base write data, be used for by the event notice module the described data of local data library backup in the non-primary server joint of load service unit.

3. disaster tolerance system according to claim 2 is characterized in that, the described data of local data library backup in the non-primary server joint of described primary server joint in the load service unit comprise:

The non-primary server joint of primary server joint in the load service unit sends data synchronization updating instruction, the local data base Backup Data in non-primary server joint.

4. the disaster recovery method based on cluster backup is characterized in that, comprising:

When the heartbeat that detects current main service node when the load agent unit stopped, the load agent unit selected a server node to carry out service interaction as primary server joint and user terminal from non-primary server joint;

The described data of local data library backup in the non-primary server joint of described primary server joint in the load service unit comprise: the non-primary server joint of primary server joint in the load service unit sends data synchronization updating instruction, the described data of local data library backup in the non-primary server joint in the load service unit;

Primary server joint and user terminal carry out also comprising before the service interaction:

User terminal sends the registration service request to the load agent unit first, comprises the user profile of user terminal in the described registration service request;

The load agent unit is this subscriber terminal service according to server node of heartbeat message scheduling of collecting as primary server joint, and described registration service request is transmitted to this primary server joint;

The user profile of this primary server joint storage user terminal, and with described registration service request after backup on the non-primary server joint, to user terminal feedback registration service response, comprise the IP address of primary server joint in the described registration service response.

5. disaster recovery method according to claim 4 is characterized in that, also comprises:

The load agent unit regularly receives the heartbeat message of detected each server node of heartbeat detection module, the heartbeat message tabulation of periodicmaintenance server node.

6. disaster recovery method according to claim 5 is characterized in that, the described server node of selecting from non-primary server joint comprises as primary server joint:

Tabulation redefines primary server joint to the load agent unit according to the heartbeat message in the load agent unit, and send the order of redundant network address configuration to the primary server joint that this redefines, be the IP address of former primary server joint with the IP address configuration of this primary server joint that redefines.

7. according to claim 5 or 6 described disaster recovery methods, it is characterized in that, also comprise: the tabulation of load agent unit backup heartbeat message, and the described heartbeat message tabulation of regular update.

8. disaster recovery method according to claim 4, it is characterized in that, the performance lotus number that comprises server node in the described heartbeat message, performance lotus number is meant the mark that the index to aspects such as the performance of server node and loads obtains according to the weighting lotus method of figuring, and can be used as the normative reference of the heartbeat ability of passing judgment on server node.