CN104144064A

CN104144064A - Server fault detection and switchover method

Info

Publication number: CN104144064A
Application number: CN201310166822.2A
Authority: CN
Inventors: 张焰
Original assignee: Individual
Current assignee: Individual
Priority date: 2013-05-09
Filing date: 2013-05-09
Publication date: 2014-11-12

Abstract

The invention provides a server fault detection and switchover method. The server fault detection and switchover method comprises the steps that a network server pool is built first, and the server pool comprises at least two pooling devices which are used for combining multiple servers into a virtual server pool and monitoring and collecting the operation state of the servers in real time; the servers send registering information to the pooling devices to carry out registering after being started; the pooling devices respond to the registering information, and periodic physical condition examination is carried out on the servers; if a server has a fault in the data exchange process, a client side obtains a mistake message from a transmission layer, the client side then knows that the server end has a fault, at the time, the client side starts a fault switchover mode, pool name analysis is carried out by the client side again at the position of the pooling devices, the IP address of the next normally operating server is obtained, the client side is directly connected with the new server, and a fault switchover process is finished.

Description

A kind of server failure detects and changing method

Technical field

The invention relates to web server computer field, relate in particular to a kind of server failure and detect and changing method.

Background technology

In existing traditional server failure tolerant system, conventionally adopt heartbeat mechanism to realize the detection of server failure, its specific implementation mechanism is as follows:

Between detected server and detection server, connect a special-purpose netting twine, i.e. so-called " heartbeat ", this heartbeat only sends the use of detection information for fault detect, not as the use of application data transmission, therefore on server, there are two network interface cards, one for connecting heartbeat, and another piece is for application data circuit.Detect server and every certain interval time, to detected server, send an ICMP message by heartbeat, Ping mode checks the health status of detected server.

If can be responded from detected server after detecting the each ping of server, show that detected server is normal, otherwise show that detected server breaks down, thereby can determine further fault-tolerant processing.

Correctly detecting fast the fault of detected server, is most important link in whole tolerant system, if there is erroneous judgement, will bring heavy losses to user.

In server failure identification, there is significant limitation in traditional fault detection mechanism (heartbeat mechanism):

(1) cannot detect the fault of the network interface that application data used, because heartbeat adopts special-purpose network interface card and netting twine independent and application data netting twine, if data network (network interface card or netting twine etc.) breaks down, cannot detect in this case.

(2) whether cannot detect application services itself occurs extremely, the mode of transmission ICMP protocol message (Ping) that adopts heartbeat mechanism detects the health status of the other side's server, and whether the operating system that can only detect detected server in this Ping mode practical work is normally moved.

(3) if heartbeat itself breaks down, this tolerant system cannot normally be worked.

(4) cannot judge in advance there is hardware performance bottleneck.

The fault that cannot detect application program in the tolerant system of employing heartbeat detection mechanism, cannot detect the network failure that application data is used, and when heartbeat faults itself, cannot normally exercise fault detection capability simultaneously.

Under typical server-client pattern, server is determined its position by DNS domain name, and client application system must be resolved its domain name by using DNS to serve before access services device, thereby obtains the IP address of this server.After client-server connects, just can carry out information exchange; If this server breaks down, client application system has two kinds of possible selections: 1. interrupt communication; 2. select another server to continue.

Under this pattern, whether client application system must detect in the following manner server and interrupt:

(1) not response (time out) of server;

(2) server response error message;

(3) receive transport layer error message;

In order to allow client applications have the possibility of selecting other server after detecting server and breaking down, a server list must be clearly provided in its application program, indicate: first server, second server, the 3rd ... etc.When first server interrupts, trial and second server connect, moreover the 3rd, the rest may be inferred.

That is to say, this failover process is realized by the interference of user program, has larger limitation:

One, server list are static, must have user to specialize;

Two, the selection of alternative server has larger blindness, cannot guarantee that whether selected server is normal and effective, can not select flexibly because of the loading condition of server;

Three, taking over is to realize by application program, lacks the transparency;

Four, passive interruption RM;

Five,, for meeting the assurance of reliability, application development amount is larger.

Because necessary, prior art is improved.

Summary of the invention

The object of the present invention is to provide a kind of server failure to detect and handover module, under server pools framework, realize the fail-over mode of all-transparent, user is without any intervention, only need to after server failure occurs, carry out a pond name analysis and just can obtain new normal server, and re-start connection, and from the convenient and swift process that completes failover.

For reaching aforementioned object, a kind of server failure of the present invention detects and changing method, and it comprises the steps:

(1) model webserver pond, described server pools comprises at least two pond devices, wherein this pond device is responsible for multiple servers to form a virtual server pools, and the running status of server is monitored in real time and gathered;

(2) after startup of server, to pond device, sending registration message registers;

(3) pond device is received after registration message, immediately to register the reply of receiveing the response;

(4) pond device every one regular time interval to this server, send continuously active message, server is carried out to periodicity health-and-status check;

(5) server is received after continuously active message, replies to Chi Huaqi immediately with continuously active acknowledge message;

(6) if the continuously active message that pond device sends, in setting-up time, do not receive continuously active acknowledge message, send continuously rapidly several continuously active message, if still do not receive continuously active acknowledge message, can determining server break down;

(7) client is by name access services device pond, pond, first at device place, pond, carry out pond name analysis, pond device is received after the pond name analysis request that client sends, in the server list that can preserve at self, search, and according to server effectively normally moving of predetermined policy selection, then the IP address of this server is sent to the client of sending analysis request;

(8) client completes after the name analysis of pond, the server ip address then obtaining according to parsing, and direct access services device, and carry out data interaction;

(9) if this server breaks down in exchanges data, client obtains an error message from transport layer, client can know that server end breaks down thus, at this moment, client enters fail-over mode, and client is done pond name analysis one time at device place, pond again, obtains the server ip address of next normal operation, then direct and new server connects, and completes failover process.

According to one embodiment of present invention, server pools system of the present invention is comprised of following three parts:

Server pools: server pools has identical function by one group, and be unified the server composition that management is got up, each server pools is all used unique pond name as sign;

Pond device: be the management equipment of server pools, be responsible for multiple servers to form a virtual server pools, and the running status of each station server is monitored in real time and gathered; Provide pond name analysis function, to can allow user facilitate accessing server simultaneously;

Client: the client computer in access services device pond.

According to one embodiment of present invention, described server pools adopts Hua Qi pond, pond name analysis mechanism: client-access server pools, first at device place, pond, do pond name analysis, the analysis request that pond device is submitted to according to client, in the server list of oneself, inquire about, conventionally each pond name corresponding a plurality of servers, pond device is according to the selection strategy determining in advance, for this user selects a best server ip, and this result is fed back to client to resolve the form of receiveing the response.

Beneficial effect of the present invention: fault recognition method of the present invention, can identify following failure mode: the machine fault of delaying, server program deadlock, server program collapse, system resource exhaust, network failure (the network service fault that net card failure, exchange fault etc. cause).Fault Identification is more accurate, quicker.Relatively traditional Fault Identification mechanism (heartbeat mechanism), can provide the more fault detection capability of the degree of depth.And the advantage of failure switching method of the present invention is simple transparent, without in server list of client maintenance, only need at Chi Huaqi, to do pond name analysis simply at any time.Server increases minimizing or changes all and do any change without notice client simultaneously.User has been realized to transparence completely.

Accompanying drawing explanation

Fig. 1 is the structural representation of server pools system of the present invention;

Fig. 2 is the flow chart of server failure detection of the present invention and changing method.

Embodiment

Alleged " embodiment " or " embodiment " refers to special characteristic, structure or the characteristic that can be contained at least one implementation of the present invention herein.Different local in this manual " in one embodiment " that occur not all refer to same embodiment, neither be independent or the embodiment mutually exclusive with other embodiment optionally.

Refer to Fig. 1, it is the structural representation of server pools system of the present invention.As shown in Figure 1, server pools system of the present invention is comprised of following three parts:

Client: the client computer in access services device pond.

Refer to Fig. 2, it is the flow chart of server failure detection of the present invention and changing method.

As shown in Figure 2, its step comprises:

Step S1: model webserver pond, described server pools comprises at least two pond devices, wherein this pond device is responsible for multiple servers to form a virtual server pools, and the running status of server is monitored in real time and gathered;

Step S2: send registration message to pond device after startup of server and register;

Step S3: pond device is received after registration message, immediately to register the reply of receiveing the response;

Step S4: pond device every one regular time interval to this server, send continuously active message, server is carried out to periodicity health-and-status check;

Step S5: server is received after continuously active message, replies to Chi Huaqi immediately with continuously active acknowledge message;

Step S6: if the continuously active message that pond device sends, in setting-up time, do not receive continuously active acknowledge message, send continuously rapidly several continuously active message, if still do not receive continuously active acknowledge message, can determining server break down;

Step S7: client is by name access services device pond, pond, first at device place, pond, carry out pond name analysis, pond device is received after the pond name analysis request that client sends, in the server list that can preserve at self, search, and according to server effectively normally moving of predetermined policy selection, then the IP address of this server is sent to the client of sending analysis request;

Step S8: client completes after the name analysis of pond, the server ip address then obtaining according to parsing, direct access services device, and carry out data interaction;

Step S9: if this server breaks down in exchanges data, client obtains an error message from transport layer, client can know that server end breaks down thus, at this moment, client enters fail-over mode, and client is done pond name analysis one time at device place, pond again, obtains the server ip address of next normal operation, then direct and new server connects, and completes failover process.

Because server is when registering, himself relevant information of service is provided, such as: service agreement, IP address, port numbers and whether carry out the information such as service detection, if server need to carry out service detection, pond device is according to these information, this server is carried out to periodic service availability detection, and method is according to its service agreement, to send probe message to the IP of appointment and port, and then the response by server judges that normally whether its service.

Fault recognition method of the present invention, can identify following failure mode: the machine fault of delaying, server program deadlock, server program collapse, system resource exhaust, network failure (the network service fault that net card failure, exchange fault etc. cause).Fault Identification is more accurate, quicker.Relatively traditional Fault Identification mechanism (heartbeat mechanism), can provide the more fault detection capability of the degree of depth.And the advantage of failure switching method of the present invention is simple transparent, without in server list of client maintenance, only need at Chi Huaqi, to do pond name analysis simply at any time.Server increases minimizing or changes all and do any change without notice client simultaneously.User has been realized to transparence completely.

Above-mentioned explanation has fully disclosed the specific embodiment of the present invention.It is pointed out that being familiar with any change that person skilled in art does the specific embodiment of the present invention does not all depart from the scope of claims of the present invention.Correspondingly, the scope of claim of the present invention is also not limited only to previous embodiment.

Claims

1. server failure detects and a changing method, and it comprises the steps:

2. server failure according to claim 1 detects and changing method, it is characterized in that: server pools system of the present invention is comprised of following three parts:

Client: the client computer in access services device pond.

3. server failure according to claim 1 detects and changing method, it is characterized in that: described server pools adopts Hua Qi pond, pond name analysis mechanism: client-access server pools, first at device place, pond, do pond name analysis, the analysis request that pond device is submitted to according to client, in the server list of oneself, inquire about, conventionally each pond name corresponding a plurality of servers, pond device is according to the selection strategy determining in advance, for this user selects a best server ip, and this result is fed back to client to resolve the form of receiveing the response.