CN104144064A - Server fault detection and switchover method - Google Patents

Server fault detection and switchover method Download PDF

Info

Publication number
CN104144064A
CN104144064A CN201310166822.2A CN201310166822A CN104144064A CN 104144064 A CN104144064 A CN 104144064A CN 201310166822 A CN201310166822 A CN 201310166822A CN 104144064 A CN104144064 A CN 104144064A
Authority
CN
China
Prior art keywords
server
pond
client
pools
message
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310166822.2A
Other languages
Chinese (zh)
Inventor
张焰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201310166822.2A priority Critical patent/CN104144064A/en
Publication of CN104144064A publication Critical patent/CN104144064A/en
Pending legal-status Critical Current

Links

Landscapes

  • Computer And Data Communications (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention provides a server fault detection and switchover method. The server fault detection and switchover method comprises the steps that a network server pool is built first, and the server pool comprises at least two pooling devices which are used for combining multiple servers into a virtual server pool and monitoring and collecting the operation state of the servers in real time; the servers send registering information to the pooling devices to carry out registering after being started; the pooling devices respond to the registering information, and periodic physical condition examination is carried out on the servers; if a server has a fault in the data exchange process, a client side obtains a mistake message from a transmission layer, the client side then knows that the server end has a fault, at the time, the client side starts a fault switchover mode, pool name analysis is carried out by the client side again at the position of the pooling devices, the IP address of the next normally operating server is obtained, the client side is directly connected with the new server, and a fault switchover process is finished.

Description

A kind of server failure detects and changing method
Technical field
The invention relates to web server computer field, relate in particular to a kind of server failure and detect and changing method.
Background technology
In existing traditional server failure tolerant system, conventionally adopt heartbeat mechanism to realize the detection of server failure, its specific implementation mechanism is as follows:
Between detected server and detection server, connect a special-purpose netting twine, i.e. so-called " heartbeat ", this heartbeat only sends the use of detection information for fault detect, not as the use of application data transmission, therefore on server, there are two network interface cards, one for connecting heartbeat, and another piece is for application data circuit.Detect server and every certain interval time, to detected server, send an ICMP message by heartbeat, Ping mode checks the health status of detected server.
If can be responded from detected server after detecting the each ping of server, show that detected server is normal, otherwise show that detected server breaks down, thereby can determine further fault-tolerant processing.
Correctly detecting fast the fault of detected server, is most important link in whole tolerant system, if there is erroneous judgement, will bring heavy losses to user.
In server failure identification, there is significant limitation in traditional fault detection mechanism (heartbeat mechanism):
(1) cannot detect the fault of the network interface that application data used, because heartbeat adopts special-purpose network interface card and netting twine independent and application data netting twine, if data network (network interface card or netting twine etc.) breaks down, cannot detect in this case.
(2) whether cannot detect application services itself occurs extremely, the mode of transmission ICMP protocol message (Ping) that adopts heartbeat mechanism detects the health status of the other side's server, and whether the operating system that can only detect detected server in this Ping mode practical work is normally moved.
(3) if heartbeat itself breaks down, this tolerant system cannot normally be worked.
(4) cannot judge in advance there is hardware performance bottleneck.
The fault that cannot detect application program in the tolerant system of employing heartbeat detection mechanism, cannot detect the network failure that application data is used, and when heartbeat faults itself, cannot normally exercise fault detection capability simultaneously.
Under typical server-client pattern, server is determined its position by DNS domain name, and client application system must be resolved its domain name by using DNS to serve before access services device, thereby obtains the IP address of this server.After client-server connects, just can carry out information exchange; If this server breaks down, client application system has two kinds of possible selections: 1. interrupt communication; 2. select another server to continue.
Under this pattern, whether client application system must detect in the following manner server and interrupt:
(1) not response (time out) of server;
(2) server response error message;
(3) receive transport layer error message;
In order to allow client applications have the possibility of selecting other server after detecting server and breaking down, a server list must be clearly provided in its application program, indicate: first server, second server, the 3rd ... etc.When first server interrupts, trial and second server connect, moreover the 3rd, the rest may be inferred.
That is to say, this failover process is realized by the interference of user program, has larger limitation:
One, server list are static, must have user to specialize;
Two, the selection of alternative server has larger blindness, cannot guarantee that whether selected server is normal and effective, can not select flexibly because of the loading condition of server;
Three, taking over is to realize by application program, lacks the transparency;
Four, passive interruption RM;
Five,, for meeting the assurance of reliability, application development amount is larger.
Because necessary, prior art is improved.
Summary of the invention
The object of the present invention is to provide a kind of server failure to detect and handover module, under server pools framework, realize the fail-over mode of all-transparent, user is without any intervention, only need to after server failure occurs, carry out a pond name analysis and just can obtain new normal server, and re-start connection, and from the convenient and swift process that completes failover.
For reaching aforementioned object, a kind of server failure of the present invention detects and changing method, and it comprises the steps:
(1) model webserver pond, described server pools comprises at least two pond devices, wherein this pond device is responsible for multiple servers to form a virtual server pools, and the running status of server is monitored in real time and gathered;
(2) after startup of server, to pond device, sending registration message registers;
(3) pond device is received after registration message, immediately to register the reply of receiveing the response;
(4) pond device every one regular time interval to this server, send continuously active message, server is carried out to periodicity health-and-status check;
(5) server is received after continuously active message, replies to Chi Huaqi immediately with continuously active acknowledge message;
(6) if the continuously active message that pond device sends, in setting-up time, do not receive continuously active acknowledge message, send continuously rapidly several continuously active message, if still do not receive continuously active acknowledge message, can determining server break down;
(7) client is by name access services device pond, pond, first at device place, pond, carry out pond name analysis, pond device is received after the pond name analysis request that client sends, in the server list that can preserve at self, search, and according to server effectively normally moving of predetermined policy selection, then the IP address of this server is sent to the client of sending analysis request;
(8) client completes after the name analysis of pond, the server ip address then obtaining according to parsing, and direct access services device, and carry out data interaction;
(9) if this server breaks down in exchanges data, client obtains an error message from transport layer, client can know that server end breaks down thus, at this moment, client enters fail-over mode, and client is done pond name analysis one time at device place, pond again, obtains the server ip address of next normal operation, then direct and new server connects, and completes failover process.
According to one embodiment of present invention, server pools system of the present invention is comprised of following three parts:
Server pools: server pools has identical function by one group, and be unified the server composition that management is got up, each server pools is all used unique pond name as sign;
Pond device: be the management equipment of server pools, be responsible for multiple servers to form a virtual server pools, and the running status of each station server is monitored in real time and gathered; Provide pond name analysis function, to can allow user facilitate accessing server simultaneously;
Client: the client computer in access services device pond.
According to one embodiment of present invention, described server pools adopts Hua Qi pond, pond name analysis mechanism: client-access server pools, first at device place, pond, do pond name analysis, the analysis request that pond device is submitted to according to client, in the server list of oneself, inquire about, conventionally each pond name corresponding a plurality of servers, pond device is according to the selection strategy determining in advance, for this user selects a best server ip, and this result is fed back to client to resolve the form of receiveing the response.
Beneficial effect of the present invention: fault recognition method of the present invention, can identify following failure mode: the machine fault of delaying, server program deadlock, server program collapse, system resource exhaust, network failure (the network service fault that net card failure, exchange fault etc. cause).Fault Identification is more accurate, quicker.Relatively traditional Fault Identification mechanism (heartbeat mechanism), can provide the more fault detection capability of the degree of depth.And the advantage of failure switching method of the present invention is simple transparent, without in server list of client maintenance, only need at Chi Huaqi, to do pond name analysis simply at any time.Server increases minimizing or changes all and do any change without notice client simultaneously.User has been realized to transparence completely.
Accompanying drawing explanation
Fig. 1 is the structural representation of server pools system of the present invention;
Fig. 2 is the flow chart of server failure detection of the present invention and changing method.
Embodiment
Alleged " embodiment " or " embodiment " refers to special characteristic, structure or the characteristic that can be contained at least one implementation of the present invention herein.Different local in this manual " in one embodiment " that occur not all refer to same embodiment, neither be independent or the embodiment mutually exclusive with other embodiment optionally.
Refer to Fig. 1, it is the structural representation of server pools system of the present invention.As shown in Figure 1, server pools system of the present invention is comprised of following three parts:
Server pools: server pools has identical function by one group, and be unified the server composition that management is got up, each server pools is all used unique pond name as sign;
Pond device: be the management equipment of server pools, be responsible for multiple servers to form a virtual server pools, and the running status of each station server is monitored in real time and gathered; Provide pond name analysis function, to can allow user facilitate accessing server simultaneously;
Client: the client computer in access services device pond.
Refer to Fig. 2, it is the flow chart of server failure detection of the present invention and changing method.
As shown in Figure 2, its step comprises:
Step S1: model webserver pond, described server pools comprises at least two pond devices, wherein this pond device is responsible for multiple servers to form a virtual server pools, and the running status of server is monitored in real time and gathered;
Step S2: send registration message to pond device after startup of server and register;
Step S3: pond device is received after registration message, immediately to register the reply of receiveing the response;
Step S4: pond device every one regular time interval to this server, send continuously active message, server is carried out to periodicity health-and-status check;
Step S5: server is received after continuously active message, replies to Chi Huaqi immediately with continuously active acknowledge message;
Step S6: if the continuously active message that pond device sends, in setting-up time, do not receive continuously active acknowledge message, send continuously rapidly several continuously active message, if still do not receive continuously active acknowledge message, can determining server break down;
Step S7: client is by name access services device pond, pond, first at device place, pond, carry out pond name analysis, pond device is received after the pond name analysis request that client sends, in the server list that can preserve at self, search, and according to server effectively normally moving of predetermined policy selection, then the IP address of this server is sent to the client of sending analysis request;
Step S8: client completes after the name analysis of pond, the server ip address then obtaining according to parsing, direct access services device, and carry out data interaction;
Step S9: if this server breaks down in exchanges data, client obtains an error message from transport layer, client can know that server end breaks down thus, at this moment, client enters fail-over mode, and client is done pond name analysis one time at device place, pond again, obtains the server ip address of next normal operation, then direct and new server connects, and completes failover process.
Because server is when registering, himself relevant information of service is provided, such as: service agreement, IP address, port numbers and whether carry out the information such as service detection, if server need to carry out service detection, pond device is according to these information, this server is carried out to periodic service availability detection, and method is according to its service agreement, to send probe message to the IP of appointment and port, and then the response by server judges that normally whether its service.
Fault recognition method of the present invention, can identify following failure mode: the machine fault of delaying, server program deadlock, server program collapse, system resource exhaust, network failure (the network service fault that net card failure, exchange fault etc. cause).Fault Identification is more accurate, quicker.Relatively traditional Fault Identification mechanism (heartbeat mechanism), can provide the more fault detection capability of the degree of depth.And the advantage of failure switching method of the present invention is simple transparent, without in server list of client maintenance, only need at Chi Huaqi, to do pond name analysis simply at any time.Server increases minimizing or changes all and do any change without notice client simultaneously.User has been realized to transparence completely.
Above-mentioned explanation has fully disclosed the specific embodiment of the present invention.It is pointed out that being familiar with any change that person skilled in art does the specific embodiment of the present invention does not all depart from the scope of claims of the present invention.Correspondingly, the scope of claim of the present invention is also not limited only to previous embodiment.

Claims (3)

1. server failure detects and a changing method, and it comprises the steps:
(1) model webserver pond, described server pools comprises at least two pond devices, wherein this pond device is responsible for multiple servers to form a virtual server pools, and the running status of server is monitored in real time and gathered;
(2) after startup of server, to pond device, sending registration message registers;
(3) pond device is received after registration message, immediately to register the reply of receiveing the response;
(4) pond device every one regular time interval to this server, send continuously active message, server is carried out to periodicity health-and-status check;
(5) server is received after continuously active message, replies to Chi Huaqi immediately with continuously active acknowledge message;
(6) if the continuously active message that pond device sends, in setting-up time, do not receive continuously active acknowledge message, send continuously rapidly several continuously active message, if still do not receive continuously active acknowledge message, can determining server break down;
(7) client is by name access services device pond, pond, first at device place, pond, carry out pond name analysis, pond device is received after the pond name analysis request that client sends, in the server list that can preserve at self, search, and according to server effectively normally moving of predetermined policy selection, then the IP address of this server is sent to the client of sending analysis request;
(8) client completes after the name analysis of pond, the server ip address then obtaining according to parsing, and direct access services device, and carry out data interaction;
(9) if this server breaks down in exchanges data, client obtains an error message from transport layer, client can know that server end breaks down thus, at this moment, client enters fail-over mode, and client is done pond name analysis one time at device place, pond again, obtains the server ip address of next normal operation, then direct and new server connects, and completes failover process.
2. server failure according to claim 1 detects and changing method, it is characterized in that: server pools system of the present invention is comprised of following three parts:
Server pools: server pools has identical function by one group, and be unified the server composition that management is got up, each server pools is all used unique pond name as sign;
Pond device: be the management equipment of server pools, be responsible for multiple servers to form a virtual server pools, and the running status of each station server is monitored in real time and gathered; Provide pond name analysis function, to can allow user facilitate accessing server simultaneously;
Client: the client computer in access services device pond.
3. server failure according to claim 1 detects and changing method, it is characterized in that: described server pools adopts Hua Qi pond, pond name analysis mechanism: client-access server pools, first at device place, pond, do pond name analysis, the analysis request that pond device is submitted to according to client, in the server list of oneself, inquire about, conventionally each pond name corresponding a plurality of servers, pond device is according to the selection strategy determining in advance, for this user selects a best server ip, and this result is fed back to client to resolve the form of receiveing the response.
CN201310166822.2A 2013-05-09 2013-05-09 Server fault detection and switchover method Pending CN104144064A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310166822.2A CN104144064A (en) 2013-05-09 2013-05-09 Server fault detection and switchover method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310166822.2A CN104144064A (en) 2013-05-09 2013-05-09 Server fault detection and switchover method

Publications (1)

Publication Number Publication Date
CN104144064A true CN104144064A (en) 2014-11-12

Family

ID=51853136

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310166822.2A Pending CN104144064A (en) 2013-05-09 2013-05-09 Server fault detection and switchover method

Country Status (1)

Country Link
CN (1) CN104144064A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110971482A (en) * 2019-11-05 2020-04-07 北京字节跳动网络技术有限公司 Back-end server detection method and device based on ebpf and electronic equipment
CN111107172A (en) * 2018-10-28 2020-05-05 无锡雅座在线科技股份有限公司 Automatic switching method for terminal access entrance
CN111224959A (en) * 2019-12-29 2020-06-02 西安天互通信有限公司 Server port automatic detection and forwarding defense system and defense method

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111107172A (en) * 2018-10-28 2020-05-05 无锡雅座在线科技股份有限公司 Automatic switching method for terminal access entrance
CN110971482A (en) * 2019-11-05 2020-04-07 北京字节跳动网络技术有限公司 Back-end server detection method and device based on ebpf and electronic equipment
CN110971482B (en) * 2019-11-05 2021-07-23 北京字节跳动网络技术有限公司 Back-end server detection method and device based on ebpf and electronic equipment
CN111224959A (en) * 2019-12-29 2020-06-02 西安天互通信有限公司 Server port automatic detection and forwarding defense system and defense method

Similar Documents

Publication Publication Date Title
CN103795553B (en) Active and standby server switching based on monitoring
CN103973728B (en) The method and device of load balancing under a kind of multiple data centers environment
CN103731290A (en) Server failure switching method
CN103414916B (en) Fault diagnosis system and method
CN109344014A (en) A kind of main/standby switching method, device and communication equipment
CN106330475A (en) Method and device for managing main and standby nodes in communication system and high availability cluster
CN106789306A (en) Restoration methods and system are collected in communication equipment software fault detect
CN113300917B (en) Traffic monitoring method and device for Open Stack tenant network
CN105323121B (en) A kind of Network status detection method and device
CN106603261A (en) Hot backup method, first master device, backup device and communication system
CN103731287A (en) Method for selecting server to take over fault server
CN114710798B (en) Fault positioning method and device
CN104144064A (en) Server fault detection and switchover method
JP2004171370A (en) Address control system and method between client/server in redundant constitution
CN103731315A (en) Server failure detecting method
CN101262479B (en) A network file share method, server and network file share system
CN104202199A (en) Method and system for detecting interface status and processing interface fault according to interface status
CN102571438B (en) Remote monitoring system and its automatic network diagnostic method
CN110474821B (en) Node fault detection method and device
CN103731289A (en) Method for automatic expansion of network server
CN105490847B (en) A kind of private cloud storage system interior joint failure real-time detection and processing method
CN103731291A (en) Data transmission structure and program development method of network server pool system
CN108933714A (en) It is a kind of to detect the method, apparatus and storage medium that IP address whether there is
CN105224426A (en) Physical host fault detection method, device and empty machine management method, system
CN103001832B (en) The detection method of distributed file system interior joint and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20141112