CN101202658A - System and method for service take-over of multi-host system - Google Patents

System and method for service take-over of multi-host system Download PDF

Info

Publication number
CN101202658A
CN101202658A CNA2006101688067A CN200610168806A CN101202658A CN 101202658 A CN101202658 A CN 101202658A CN A2006101688067 A CNA2006101688067 A CN A2006101688067A CN 200610168806 A CN200610168806 A CN 200610168806A CN 101202658 A CN101202658 A CN 101202658A
Authority
CN
China
Prior art keywords
service
over
host
take
main frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CNA2006101688067A
Other languages
Chinese (zh)
Inventor
刘宏亮
陈玄同
刘文涵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inventec Corp
Original Assignee
Inventec Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inventec Corp filed Critical Inventec Corp
Priority to CNA2006101688067A priority Critical patent/CN101202658A/en
Publication of CN101202658A publication Critical patent/CN101202658A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention relates to a service taking-over system and method for a multi-host system. The multi-host system comprises a service host which monitors the operation state between each other by a heartbeat mechanism and as least a standby host. When the service host that provides external service has faults, the invention firstly takes over the external network protocol address of providing the external service of the service host to any standby host and then prepares to take over the service of the service host to the service environment needed by the standby host and detects the preparing state of the service environment and throws the access request packet servicing through the external network protocol address before complete preparation as well as takes over the service after complete preparation and receives the access request packet to the service so as to provide the external service. The service taking-over system and method can realize no interruption and transparent taking over on the network protocol address of the multi-host system and service.

Description

The service take-over system and method for multi-host system
Technical field
The present invention relates to the service take-over technology of a kind of multi-host system or cluster system, particularly a kind of service take-over system and method for high availability cluster system.
Background technology
For the computer system that makes the operation important service provides continual service, prevailing way is to arrange gather together (Cluster) or the multi-host system of high available (High Available) at present.High available gathering together is made of at least two main frames usually, and in the process that service externally is provided, a main frame provides normal service, and other main frame then is in " awaiting orders " state.And realize the mutual monitoring of operating state between each main frame by " heartbeat (Heartbeat) " mechanism.
For example Fig. 1 shows the structural representation that a kind of typical high availability is gathered together, in this high availability is gathered together, whole system is that host computer system 10 is made of two mainframe host computers 12 and main frame 14, and has a private network agreement (Private Internet Protocol respectively, Private IP) address 192.168.0.1 and 192.168.0.2, but host computer system 10 externally provides service then by an addressable common network protocol address, or title procotol (Public Internet Protocol, Public IP) address 10.10.1.10.Client is by common network protocol address visit host computer system 10, and from the angle of client, whole system is exactly a host computer system that common network protocol address a: 10.10.1.10 is provided, thus whole system to client hidden concrete structure.Detect state of the other side between two main frames 12 and 14 mutually by " heartbeat " mechanism, when " awaiting orders " main frame detects that the main frame that service is provided at present breaks down and can not provide service or operating state unstable the time, take over common network protocol address and work by " awaiting orders " main frame, service externally is provided.The main frame that breaks down then carries out mistake and recovers, and after returning to normal condition, then is in " awaiting orders " state, prepares to take over the service of the main frame that breaks down at any time.
All services that is provided of gathering together at present nearly all can realize by network, and provide service just might realize serving to carry out free of discontinuities between a plurality of main frames of cluster system by network and switch.But the servicing property that externally provides by the common network protocol address is different, so internet protocol address is taken over the back service, and whether available also existence is different.For example, some services can provide after internet protocol address is taken at once, for example DHCP (DHCP), the domain name service (DNS of pure net network service, Domain Name Service), the HTTP(Hypertext Transport Protocol) service of terminal emulation agreement (Telnet) and static Web page website browsing, as long as these services have a very little configuration file the same with original fault main frame just can start, promptly can unremittingly externally provide service.
Otherwise, file service such as file transfer protocol (FTP) (FTP), HTTP(Hypertext Transport Protocol) just may not provide service at once, because these services are except that providing the network connection, the space of depositing file also is provided, depositing the file space needs the prepared time, and needing assurance to provide the main frame of service to deposit the file space at present is same position with providing the file space of depositing of service host originally.In addition, if provide access services to block equipment by network, network small computer system interface (iSCSI) for example, this situation is just complicated more.Main frame not only needs to provide external Connection Service, also needs to guarantee before the failover and the disk after switching is same, can not change the entities access disk in handoff procedure.In this case, service can not be taken over immediately, and needs to wait for that disk system all is ready to just can carry out.
Therefore, in time taken over procotol at the running main frame, if not carrying out sufficient hardware environment before adapter prepares, especially need long period hardware to prepare under the situation of ability safety nozzle network service afterwards, for example the service of network small computer system interface must guarantee before internet protocol address and network service adapter itself, hard disk and corresponding Redundant Array of Inexpensive Disc (RAID, Redundant Arrayof Inexpensive Disks), logical volume (LV) setting is ready to, because relating to hardware prepares to need the more time usually, usually need about 30 seconds at least, if be not ready for precedingly visiting this service by this internet protocol address, will cause " service-denial ", and then the denied access mistake takes place.System can provide the service report mistake with this, so known technology can't be realized the free of discontinuities and the transparent adapter of serving.
Summary of the invention
In order to solve problem and the defective in the above-mentioned known technology, the service take-over system and method that the purpose of this invention is to provide a kind of multi-host system, when breaking down with the main frame that service is provided at multi-host system, other running main frame can safety, free of discontinuities and take over its internet protocol address and service pellucidly, with the normal operation that guarantees service and the normal use of function.
For achieving the above object, the invention provides a kind of service take-over system, be applied to comprise in the multi-host system of a service host and at least one main frame of awaiting orders, service host externally provides service by a pair of outer network protocol address, the main frame of awaiting orders is ready, and at least one awaits orders and monitors operating state mutually by heartbeat mechanism between the main frame service host therewith, and wherein this service take-over system includes: an internet protocol address is taken over module, a service take-over module and a request processing module; Wherein, this internet protocol address is taken over module is judged service host by heartbeat mechanism operating state, and when service host breaks down, send a resource release request notification service main frame and discharge external internet protocol address and the service that it occupies, with the external internet protocol address of taking over service host to the main frame of awaiting orders wherein, the service take-over module is in order to the service of prepare the taking over service host required service environment of main frame of so far awaiting orders, and takes over service.The service environment standby condition of service take-over module detects in request processing module system, and before service environment is not prepared fully, abandons by the request data package of external internet protocol address to this service access.
According to technical conceive of the present invention, this request processing module also comprises a resource preparation module, in order to generate the required service environment of service take-over of arranging with hardware resource.
According to technical conceive of the present invention, wherein this resource preparation module system provides the network of this service of adapter to connect, and the preceding identical addressing space that breaks down with this service host is provided.
According to technical conceive of the present invention, this service of taking over is during for file service, the file parking space of same position before this resource preparation module provides and breaks down with this service host.
According to technical conceive of the present invention, when this service of taking over is served for the block device access, be the block equipment of same access services block equipment before this resource preparation module is prepared and this service host breaks down.
According to technical conceive of the present invention, this service take-over module judges whether this service of adapter needs environmental preparation, if do not need to prepare, then takes over this service immediately; Otherwise this service take-over module is carried out and is taken over the required service environment preparation of this service.In addition, for achieving the above object, the invention also discloses a kind of service take-over method, be applied to comprise in the multi-host system of a service host and at least one main frame of awaiting orders, service host and at least one are awaited orders and are monitored operating state mutually by heartbeat mechanism between the main frame, the method comprises following steps: the operating state of judging service host by heartbeat mechanism, and when service host breaks down, send a resource release request notification service main frame, to discharge external internet protocol address and the service that it occupies; The a pair of outer network protocol address that service externally is provided of taking over service host is to the main frame of awaiting orders wherein; Prepare to take over the service of service host to the required service environment of the main frame of awaiting orders; Detect the standby condition of service environment, before service environment is prepared fully, do not abandon by the access request data bag of external internet protocol address to this service; And after service environment is prepared fully, take over service, and receive access request data bag to this service, so that service externally to be provided.
According to technical conceive of the present invention, also comprise the step that generates with the required service environment of the service take-over of hardware resource agreement.
According to technical conceive of the present invention, the step of this preparation service environment comprises the following step: provide the network of this service of adapter to connect; And provide with this service host break down before identical addressing space.
According to technical conceive of the present invention, this service of taking over is during for file service, and the file parking space of same position before breaking down with this service host is provided.
According to technical conceive of the present invention, when this service of taking over was served for the block device access, preparation was the block equipment of same access services block equipment before breaking down with this service host.
According to technical conceive of the present invention, before this prepares the step of this service environment, also comprise and judge whether this service of taking over needs the step of environmental preparation, if do not need to prepare, then take over this service immediately; Otherwise, prepare to take over this required service environment of this service.
When the present invention provides high available service by internet protocol address and service take-over under multi-host system and like environment, for the service take-over that needs the time to prepare, taken over the back for the host services that guarantees to break down in internet protocol address external service is provided safe and punctually, therefore before service take-over, prepare the required service environment of the service of taking over, and before being ready to complete, abandon request data package service.And then constantly detect the standby condition of service environment, after being ready to complete with the adapter of carrying out service with service externally is provided.
Therefore, the present invention has the following advantages: not only having guaranteed to provide the service with fast quick-acting pipe characteristic fast, and guarantee to provide fast under the situation of service client not interrupt the free of discontinuities and the transparent adapter of realization multisystem network protocol address and service with being connected of service host end.
Description of drawings
Fig. 1 is the structural representation of the two main frame cluster systems of a typical high availability;
Fig. 2 is the block diagram that many host services of the present invention are taken over system;
Fig. 3 is the flow chart of steps of many host services adapting method of the present invention;
Fig. 4 is the access request process chart of " protection " status service; And
Fig. 5 is the access request process chart of " ready " status service.
Wherein, description of reference numerals is as follows:
10 host computer systems, 12 main frames
14 main frames, 20 internet protocol addresses are taken over module
22 service take-over modules, 24 resource preparation module
26 request processing module
It is guard mode that step 102 is provided with system
Step 104 is provided with all services and is guard mode
Step 106 is taken over the common network protocol address
Does step 108 need to prepare the service take-over environment?
Step 110 is taken over this service immediately
Step 112 is carried out the resource environment of service take-over and is prepared
Step 114 is taken over this service
Step 116 is provided with this service and is ready state
Whether does step 118 still exist the service of guard mode?
Step 120 provides external service
It is ready state that step 122 is provided with system
Step 202 receives service access request
Is step 204 service in ready state?
Step 206 abandons the access request data bag of service
Step 208 is sent to respective service and handles
Step 302 receives service access request
Step 304 is sent to respective service and handles
Embodiment
About feature of the present invention and actual beneficial effect, existing conjunction with figs. is described in detail as follows most preferred embodiment.
Please refer to Fig. 2, it shows the block diagram that many host services of the present invention are taken over system, described multi-host system comprises a service host and at least one main frame of awaiting orders, for example according to shown in Figure 1, multi-host system 10 comprises main frame 12 and main frame 14, here suppose that main frame 12 is service host, main frame 14 is the main frame of awaiting orders, and service host 12 and await orders and monitor operating state mutually by heartbeat mechanism between the main frame 14.Therefore, at the problems referred to above of known technology, many host services adapter of the present invention system has comprised an internet protocol address and has taken over module 20, a service take-over module 22 and a request processing module 26.Hereinafter will elaborate to above-mentioned each module.
Internet protocol address of the present invention is taken over module 20 in order to after the current service host 12 that service is provided breaks down, and makes a wherein main frame of other armed state that the fast quick-acting pipe of external internet protocol address 10.10.1.10 of this service host 12 is pass by.When there being a plurality of awaiting orders during main frame, the main frame service of taking over can be selected at random about being awaited orders by which.Any one or a plurality of main frame of awaiting orders may detect the fault of fault main frame, therefore these main frames of awaiting orders internet protocol address and the service that all can attempt to take over this fault main frame, but cause conflict in order to prevent a plurality of main frames of awaiting orders from taking over internet protocol address and service simultaneously, the technology that extensively adopts roughly has two kinds at present: token ring or arbitration mechanism.The principle of token ring just is to use token to move circulation between all main frames of awaiting orders, and which main frame of awaiting orders has the adapter obligation that token just has internet protocol address and service; Arbitration mechanism is exactly that await orders whichsoever that main frame promptly will take over internet protocol address and service preceding all will be done two things, promptly check whether " locking " is arranged, if there be not " locking ", then carry out " locking ", take over internet protocol address and service then, if instead " locking " arranged, then finish, do not take over.These two technology are used to prevent that a plurality of main frames of awaiting orders from taking over internet protocol address and service simultaneously and the conflict that causes.But it is pointed out that the technology that the main frame of awaiting orders of the present invention is taken over service is not limited to above-mentioned two kinds.
Afterwards, the main frame of awaiting orders of adapter sends this failed services main frame 12 of command request externally provides the internet protocol address of service to discharge it.Thus, former client computer that visits this external internet protocol address 10.10.1.10 or application program be still by this address visit, but actual have this internet protocol address and provide the main frame of service to be changed to another main frame this moment.
After the main frame 14 of awaiting orders is taken over the internet protocol address of former service host 12, can take over immediately by service take-over module 22 for for example service of pure net network and the service of static Web page website browsing etc., and externally provide.But take over the service of environment for needs, for example network small computer system interface (iSCSI), file transfer protocol (FTP) (FTP), server info block/public the Internet file system (SMB/CIFS), NFS network area block devices such as (NFS) are served and file service, need the software of certain hour to prepare (a few cases) and hardware preparation (most cases), have only the internet protocol address that to pass through to be taken over after the above-mentioned service take-over preparation of execution that service is provided in time, safely.Therefore, 22 needs of service take-over module are before service take-over, and the service of prepare taking over this fault main frame 12 is to the required hardware environment of the main frame 14 of awaiting orders.
The adapter environmental preparation of service take-over module 22 is different with the difference of COS, some service of taking over need be ready to take over required hardware environment in advance, thereby may be very consuming time, some service does not then need to prepare to take over environment, therefore directly takes over apace.Therefore, whether service take-over module 22 needs the butt tube service to need environmental preparation to judge.If do not need to prepare, then take over this service immediately; Otherwise service take-over module 22 is carried out the required service environment of the service of adapter and is prepared.Whether need environmental preparation about taking over, service take-over module 22 can be judged by taking over COS, if the service that is provided is relevant with storage area or file content, for example network small computer system interface (ISCSI), file transfer protocol (FTP) (FTP), HTTP(Hypertext Transport Protocol), NFS (NFS), server info block/public the Internet file system (SMB/CIFS) or the like service relevant with memory space or file content then needs the time to prepare service environment; On the contrary, for pure net network service, for example DHCP (DHCP) of pure net network service, domain name service (DNS) etc., 22 of service take-over modules are prepared service environment when not required.
It mainly is relevant with hardware or stand-by period that software and hardware very consuming time is prepared, for example the preparation of disk, tape etc. is very consuming time (as waiting for that disk is discharged by miscellaneous equipment, wait for that tape is around arriving the beginning position, create Redundant Array of Inexpensive Disc (RAID), logical volume (LV), snapshot or the like), some environmental preparation then needs to wait for a time-out time.Also have the service of some preparations may only need change configuration file or change path etc.Then than being easier to, only the service routine that need restart or start on this main frame gets final product for service take-over.
The external service of multi-host system not only comprises block device access functions such as network small computer system interface, and file transfer protocol (FTP) (FTP), server info block/public the Internet file system (SMB/CIFS), NFS file access function such as (NFS) also are provided.In addition, also provide long-range agreement (SSH), terminal emulation agreement (Telnet), the user interface management functions such as (WebUI) logined, DHCP (DHCP), domain name service network functions such as (DNS) are provided simultaneously.These services roughly can be divided into two classes: services such as first kind service such as network small computer system interface, file transfer protocol (FTP), server info block/public the Internet file system, NFS, this class service need be arranged with hardware resource, for example the network small computer system interface must be the operation to certain disk of determining, and file transfer protocol (FTP), server info block/public the Internet file system, NFS etc. share must be based on certain catalogue on certain disk of determining.The second class service such as long-range management function and the network functions such as DHCP, domain name service such as agreement, terminal emulation agreement logined.This class service is basic and hardware resource is irrelevant.As long as the calculator operate as normal, internet protocol address just normally provides can provide external service, so in above dual controller failover process, this two classes service needs to handle respectively.
For first kind service, the connectivity after guaranteeing fault, need also to guarantee that the space of the space of visiting and the preceding visit of breaking down is identical.Otherwise the user capture space can change, and causing normally to provide service.So first kind service needs earlier hardware to be ready to before failover, just can provide real service then.
For the second class service, need the quick connectedness after the assurance fault takes place, need guarantee can not have tangible delay after fault takes place.Because the especially long-range management service of logining agreement, terminal emulation agreement, user interface of these services is all closely related with user experience, significantly postpone to reduce user's Quality of experience.Wherein, finish by the resource preparation module 24 of service take-over module 22 for the failover environment of first kind service.Resource preparation module 24 provides the network of taking over this service to connect, and provides and identical addressing space before this service host breaks down.Wherein when the service of taking over during for file service, the file parking space of same position before this resource preparation module provides and breaks down with this service host; When the service of taking over is served for the block device access, be the block equipment of same access services block equipment before this resource preparation module system prepares and service host breaks down.
For example, prepare for the storage area of disk array (Disk Array), but resource preparation module 24 execution in step are as follows: send command request fault main frame and discharge shared disk unit; If the fault main frame can also be movable, then discharge these hard disc apparatus, otherwise, also just do not need to discharge hard disc apparatus because keep off; Again the public disk space of these hard disks of Initiation reads the assembling data of Redundant Array of Inexpensive Disc, logical volume simultaneously; Assembling data according to Redundant Array of Inexpensive Disc are dressed up Redundant Array of Inexpensive Disc with these hard disk packet group, Redundant Array of Inexpensive Disc reduction this moment; Redundant Array of Inexpensive Disc being divided or started according to the assembling data of logical volume is different logical volumes, logical volume reduction this moment.Serve for the network small computer system interface, equipment need be exported to corresponding starter (Initiator), for file transfer protocol (FTP), server info block/public the Internet file system, NFS service, these equipment carries (mount) are arrived designated directory, dividing different Redundant Array of Inexpensive Discs, logical volume all to assemble according to the assembling data finishes, till all devices all was ready to, at this moment all hardware resources were just ready.
Above, the address adapter of internet protocol address adapter module 20 and 22 pairs of fault main frames of service take-over module and the environmental preparation before the service take-over are illustrated.In above-mentioned service take-over process, service take-over module 22 is determined the service that all will be taken over according to the fault main frame by the service that internet protocol address offers client, and carries out corresponding fast quick-acting pipe or adapter environmental preparation according to the difference of these Service Properties.But in the set-up procedure of service take-over, the port of corresponding with service is closed, if at this moment visit this serve port by above-mentioned internet protocol address, then can cause service to be rejected mistake, cause the visit of client to go wrong, and then client can be abandoned this service access request.Therefore, when guaranteeing the service take-over environmental preparation, realize the free of discontinuities and the transparent adapter of service, many host services adapter of the present invention system comprises a request processing module 26, with timely detection and understand and a certainly wait whether the environmental preparation of the service of taking over is finished.Request processing module 26 can realize the judgement that is ready to complete of service environment or adapter service by command calls or function call, and provides a return value when finishing, with representative operation success or not.Perhaps, request processing module 26 by when these orders finish on certain disk written document or sign, and then remove to detect that sign and whether exist.If promptly sign or file exist, then serve the environment that needs and be ready to, otherwise represent environment still unripe.But be not limited to said method for judgement the present invention that environmental preparation is finished, the multiple method that can reach this purpose all can be used.
Request processing module 26 is in a certain environmental preparation process of waiting the service of taking over, constantly detect the state that service environment is prepared, whether the service of judging normally takes over, and the request of the external internet protocol address visit corresponding with service port by the multi-host system taken over is handled.Before service environment was prepared not prepare fully, request processing module 26 abandoned the access request data bag to this service.Because access request is dropped sending to the respective service port, system can not produce unaccepted the response to client of service, and client then continues to resend the request retry because not receiving response.
And after service take-over module 22 service take-over environmental preparations fully, after the adapter service, open the corresponding with service port.Request processing module 26 stops simultaneously to the abandoning of the access request data bag of this serve port, and begins to receive the access request data bag that sends this port, and then externally normally providing of service has been provided.Need the visit of waiting the service of taking over of time for other, all realize the normal operation of service and the normal use of function by such scheme.
Therefore, in the client of access services, service has realized free of discontinuities, transparent adapter, though the time of access services can temporarily delay, final service is not interrupted, and data are not lost and guaranteed fail safe and reliability.
Below in conjunction with Fig. 3 the service take-over method of multi-host system of the present invention is made an explanation, this figure is the flow chart of steps of the service take-over method of multi-host system of the present invention.The present invention is applied to comprise in the multi-host system of a service host and at least one main frame of awaiting orders, and wherein service host and at least one are awaited orders and monitored operating state mutually by heartbeat mechanism between the main frame.When the current service host that service is provided broke down, other main frame of awaiting orders detected the state of this fault main frame by the mode of heartbeat, and therefore wherein the main frame of awaiting orders is then carried out the internet protocol address of this fault main frame and the adapter of service is provided.Because the service of some type need provide required service environment in taking over main frame when taking over, regular hour need be spent at preparation service take-over environment, so all services of multi-host system there is not or do not enter into fully the state of operate as normal in service take-over/handoff procedure.
Here all services that can define system all enter into normal operating conditions fully and are " ready " state, be that all types of services have received in the main frame that the internet protocol address of above-mentioned taking over fault main frame promptly serves, and complete, normal service can externally be provided, then be called whole multi-host system and entered " ready " state.Otherwise,, represent that then whole system does not have in internet protocol address adapter/service take-over process or in other failover process or do not enter into " ready " state fully if system is in " protection " state.In addition, definition service " protection " state is need prepare to take over the service that hardware environment could be taken over for some, the guard mode of being taked, and promptly before the service environment preparation is not finished, service can not be taken over and the state of external service can not normally be provided.Because service can not normally externally provide, therefore before the request of this service of client-access arrives the corresponding with service port, this visit request data package is abandoned.Be " ready " state until service, promptly after finishing the service take-over environmental preparation, take over service, and the state when serving can normally externally be provided.At this moment stop the abandoning of the access request data bag of respective service port begun to receive the access request data bag that sends this port, externally normally providing of service is provided.
Please refer to Fig. 3 now, the state that the host computer system of awaiting orders at first is set is a guard mode (step 102), record mark, and all services that the host computer system of awaiting orders is set simultaneously are guard mode (step 104).All services then can reach the result of access request processing by the mode that simple acquiescence abandons all service requests for " protection " state.Above-mentioned state step is a pith of the present invention, describes the system of " protection " state and the request processing step of service in detail below in conjunction with Fig. 4.
Fig. 4 is the access request handling process schematic diagram of " protection " status service, and when this awaited orders host computer system when client-access, the flow process of processing client service access request as shown in the figure.Receive client and send access request to a certain service, judge that this serves " ready " state (step 204) that whether is in to " protection " status system (step 202).If service is not in " ready " state, promptly current is that " protection " state then abandons the access request data bag (step 206) to this service; Otherwise, be sent to respective service and handle (step 208).
Abandon for the service access request bag of " protection " state and to have multiple implementation; be to use iptables/netfilter to realize in the simplest method of Unix/Linux platform, for example Xia Mian instruction can abandon the request that all mail to " network small computer system interface " service.
#iptables-A INPUT-p tcp--dport 3260-j DROP, wherein 3260 is Service Ports of iSCSI.
For non-" protection " status service, when promptly service was in " ready " state, the access request of then cancelling this service abandoned operation, promptly cancelled the protection to service, but required this visit request of service processing.The instruction that the request package that for example calls off a visit abandons is:
#iptables-D?INPUT-p?tcp--dport?3260-j?DROP
#iptables-A?INPUT-p?tcp--dport?3260-j?ACCEPT
Above " protection " state that will serve of two processes remove, make system can receive and handle the service request that mails to " network microcomputer system interface " that last step abandons.
It is pointed out that just to provide a summary example of realizing aforesaid operations here, is not in order to limit protection scope of the present invention, can realize that the known technology of aforesaid operations all can be applied among the present invention.
System is being set and service is " protection " after the state, the taking over fault main frame externally provides the common network protocol address (step 106) of service.Adapter about internet protocol address is a known technology, for example can be referring to the code of network convention adapter in the project of increasing income (LVS, Linux Virtual Server).Then, then can take over each service.System provides a plurality of external services, carries out the service that software and hardware is prepared or time is extremely short for not needing, and system can provide at once.Therefore, judge that whether the service that needs to take over needs to carry out service take-over environmental preparation (step 108), if do not need, then takes over this service (step 110) immediately.Management function for example is provided and the service of network function is provided, these services are basic to have nothing to do with hardware resource, promptly can externally provide after internet protocol address normally provides.
About whether needing to prepare the service take-over environment in the step 108, can judge according to taking over COS, if the service that is provided is relevant with storage area or file content, for example network small computer system interface (ISCSI), file transfer protocol (FTP) (FTP), HTTP(Hypertext Transport Protocol), NFS (NFS), server info block/public the Internet file system (SMB/CIFS) or the like service relevant with memory space or file content then needs the time to prepare service environment; On the contrary, for pure net network service, for example the DHCP (DHCP) of pure net network service, domain name service (DNS) etc. are then prepared service environment between when not required.
As indicated above, service for some and hardware resource agreement, for example network small computer system interface, file transfer protocol (FTP) etc., owing to needing the service take-over environmental preparation thereby can not providing immediately, therefore advance to step 112, carry out the resource environment preparation process (step 112) of service take-over.Concrete steps about environmental preparation below will provide detailed description.
When carrying out the environmental preparation of service take-over, the adapter environmental preparation is different with the difference of COS, and it mainly is relevant with hardware or stand-by period that software and hardware very consuming time is prepared, and some environmental preparation then needs to wait for a time-out time.Some service that need prepare may only need change configuration file or change path etc.Then than being easier to, only the service routine that need restart or start on this main frame gets final product for service take-over.
For need with the service of hardware resource agreement, the network connectivity of the adapter service after guaranteeing fault, the fault hosting space of visiting before also needing to guarantee the space visited and breaking down is identical.Otherwise the user capture space can change, and causing normally to provide service.So this type of service needs earlier hardware to be ready to before failover, just can provide real service then.When with the adapter service of hardware resource agreement during, need provide the file parking space of same position before breaking down with service host for file service; When the service of taking over was served for the block device access, then preparation was the block equipment of same access services block equipment before breaking down with service host.
Take over this service (step 114) after the resource environment of the service of taking over is ready to complete, this service just enters " ready " state (step 116) then, and external service (step 120) normally is provided.Though it is very long that these take over the service consuming time time when carrying out the resource preparation; but because be under dual " protection " state of system and service always; system is to visiting all requests of this service by internet protocol address; be that the Internet protocol data bag all can be dropped; so can not produce the message of service-denial, and also constantly this service of retry of client." protection " status service request processing flow schematic diagram referring to Fig. 4.In this case, the service that whether needs the time to prepare all will well be taken over.
After arbitrary service enters " ready " state (step 116), can judge whether to be in addition the service (step 118) of " protection " state simultaneously, if do not have, then whole system is set to " ready " state (step 122); Otherwise, advance to step 108, to other adapter service execution step 108 that is in " protection " state to step 122.Repeat above-mentioned steps, until the adapter of finishing all services, therefore make it all be in " ready " state, promptly whole system is in " ready " state, at this moment will handle according to the flow process of " ready " status service request processing flow schematic diagram of Fig. 5 the access request of service.
As shown in Figure 5, host computer system receives the access request (step 302) to serve port, and directly is sent to respective service (step 304) handled in this visit request.This also is the handling process of normal condition, and system is in this state in the most of the time, the processing that do not have request data package to be dropped this moment.In case whole system is set to " ready " state, the step that above-mentioned procotol adapter step, service take-over and access request data bag abandon does not then re-use, but handle automatically by taking over main frame, finish safe failover by the above-mentioned steps interaction when switching up to breaking down once more.
By above-mentioned explanation as can be known, the present invention has guaranteed that not only the service with quick switching characteristic can provide fast, has guaranteed to provide fast under the service scenario connection of client and server not interrupt simultaneously.In addition, in time provide service after not only having guaranteed to provide fast the inferior service of service scenario ready, and multiple reliability of service when having guaranteed to concentrate on polytype service on the high availability host computer system.When under above situation, carrying out the service fault adapter, can bring the many service systems and the favorable user experience of the real free of discontinuities of user, transparent switching.
Though the present invention with aforesaid better embodiment openly as above, so it is not in order to limit the present invention.Those of ordinary skill in the art should recognize under the situation that does not break away from the disclosed scope and spirit of the present invention of appending claims of the present invention, and change of being done and retouching all are included within the scope of patent protection of the present invention.Please refer to appending claims about the protection range that the present invention defined.

Claims (12)

1. the service take-over system of a multi-host system, be applied to comprise in the multi-host system of a service host and at least one main frame of awaiting orders, this service host externally provides service by a pair of outer network protocol address, this main frame of awaiting orders is ready, and at least one awaits orders this service host and this and monitors operating state mutually by heartbeat mechanism between the main frame, and wherein this service take-over system includes:
One internet protocol address is taken over module, it judges the operating state of this service host by this heartbeat mechanism, and when this service host breaks down, send a resource release request and notify this service host to discharge this external internet protocol address and service that it occupies, to take over the extremely wherein main frame of awaiting orders of this external internet protocol address;
One service take-over module, it to the required service environment of this main frame of awaiting orders, and takes over this service in order to the service of prepare taking over this service host; And
One request processing module, it detects the service environment standby condition of this service take-over module, and before this service environment is not prepared fully, abandons by the access request data bag of this external internet protocol address to this service.
2. the service take-over system of multi-host system as claimed in claim 1, wherein this request processing module also comprises a resource preparation module, in order to generate the required service environment of service take-over with the hardware resource agreement.
3. the service take-over system of multi-host system as claimed in claim 2, wherein this resource preparation module system provides the network of this service of adapter to connect, and the preceding identical addressing space that breaks down with this service host is provided.
4. the service take-over system of multi-host system as claimed in claim 3, in the time of wherein should taking over service for file service, the file parking space of same position before this resource preparation module provides and breaks down with this service host.
5. the service take-over system of multi-host system as claimed in claim 3 when the service of wherein should taking over is served for the block device access, is the block equipment of same access services block equipment before this resource preparation module is prepared and this service host breaks down.
6. the service take-over system of multi-host system as claimed in claim 1, wherein this service take-over module judges that whether this service of adapter needs environmental preparation, if do not need preparation, then takes over this service immediately; Otherwise this service take-over module is carried out and is taken over the required service environment preparation of this service.
7. the service take-over method of a multi-host system, be applied to comprise in the multi-host system of a service host and at least one main frame of awaiting orders, at least one awaits orders this service host and this and monitors operating state mutually by heartbeat mechanism between the main frame, and the method includes the steps of:
Judge the malfunction of this service host by this heartbeat mechanism, and when this service host breaks down, send a resource release request and notify this service host to discharge this external internet protocol address and service that it occupies;
Take over the extremely wherein main frame of awaiting orders of this external internet protocol address that this service host discharges;
Prepare to take over the service of this service host to the required service environment of this main frame of awaiting orders;
Detect the standby condition of this service environment, before this service environment is prepared fully, do not abandon by the access request data bag of this external internet protocol address to this service; And
After this service environment is prepared fully, take over this service, and receive access request data bag, so that this service externally to be provided this service.
8. the service take-over method of multi-host system as claimed in claim 7 also comprises the step that generates with the required service environment of the service take-over of hardware resource agreement.
9. the service take-over method of multi-host system as claimed in claim 8, wherein the step of this preparation service environment comprises the following step:
Provide the network of this service of adapter to connect; And
Provide with this service host break down before identical addressing space.
10. the service take-over method of multi-host system as claimed in claim 9 in the time of wherein should taking over service for file service, provides the file parking space of same position before breaking down with this service host.
11. the service take-over method of multi-host system as claimed in claim 9, when the service of wherein should taking over was served for the block device access, preparation was the block equipment of same access services block equipment before breaking down with this service host.
12. the service take-over method of multi-host system as claimed in claim 7 wherein before this prepares the step of this service environment, also comprises and judges whether this service of taking over needs the step of environmental preparation, if do not need to prepare, then takes over this service immediately; Otherwise, prepare to take over this required service environment of this service.
CNA2006101688067A 2006-12-14 2006-12-14 System and method for service take-over of multi-host system Pending CN101202658A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNA2006101688067A CN101202658A (en) 2006-12-14 2006-12-14 System and method for service take-over of multi-host system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNA2006101688067A CN101202658A (en) 2006-12-14 2006-12-14 System and method for service take-over of multi-host system

Publications (1)

Publication Number Publication Date
CN101202658A true CN101202658A (en) 2008-06-18

Family

ID=39517640

Family Applications (1)

Application Number Title Priority Date Filing Date
CNA2006101688067A Pending CN101202658A (en) 2006-12-14 2006-12-14 System and method for service take-over of multi-host system

Country Status (1)

Country Link
CN (1) CN101202658A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103324896A (en) * 2013-06-25 2013-09-25 浪潮电子信息产业股份有限公司 NFS (Network File System) network storage security strengthening method based on IPtables
CN104718536A (en) * 2012-06-25 2015-06-17 Netapp股份有限公司 Non-disruptive controller replacement in network storage systems
CN106682486A (en) * 2016-12-19 2017-05-17 交控科技股份有限公司 Safe computer platform and information processing method
CN113127069A (en) * 2019-12-31 2021-07-16 成都鼎桥通信技术有限公司 Position service management method and device based on dual systems and terminal equipment

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104718536A (en) * 2012-06-25 2015-06-17 Netapp股份有限公司 Non-disruptive controller replacement in network storage systems
CN104718536B (en) * 2012-06-25 2018-04-13 Netapp股份有限公司 Non-destructive controller in network store system is replaced
CN103324896A (en) * 2013-06-25 2013-09-25 浪潮电子信息产业股份有限公司 NFS (Network File System) network storage security strengthening method based on IPtables
CN106682486A (en) * 2016-12-19 2017-05-17 交控科技股份有限公司 Safe computer platform and information processing method
CN113127069A (en) * 2019-12-31 2021-07-16 成都鼎桥通信技术有限公司 Position service management method and device based on dual systems and terminal equipment
CN113127069B (en) * 2019-12-31 2023-08-22 成都鼎桥通信技术有限公司 Dual-system-based location service management method and device and terminal equipment

Similar Documents

Publication Publication Date Title
US6920580B1 (en) Negotiated graceful takeover in a node cluster
US9916113B2 (en) System and method for mirroring data
US5774640A (en) Method and apparatus for providing a fault tolerant network interface controller
US7577720B2 (en) Migration method for software application in a multi-computing architecture, method for carrying out functional continuity implementing said migration method and multi-computing system provided therewith
CN1761240B (en) Intelligent integrated network security device for high-availability applications
US6760859B1 (en) Fault tolerant local area network connectivity
US8549639B2 (en) Method and apparatus for diagnosing and mitigating malicious events in a communication network
US6920579B1 (en) Operator initiated graceful takeover in a node cluster
US20080209258A1 (en) Disaster Recovery Architecture
US7047439B2 (en) Enhancing reliability and robustness of a cluster
CN101951345B (en) Message transmitting method and equipment
EP2127215A2 (en) Method and apparatus for hardware assisted takeover
JPH04217136A (en) Data integrity assurance system
WO2017215430A1 (en) Node management method in cluster and node device
CN113300917B (en) Traffic monitoring method and device for Open Stack tenant network
JP2004171370A (en) Address control system and method between client/server in redundant constitution
CN107357800A (en) A kind of database High Availabitity zero loses solution method
US6804819B1 (en) Method, system, and computer program product for a data propagation platform and applications of same
CN101202658A (en) System and method for service take-over of multi-host system
EP3352415A1 (en) Smb service failure handling method, and storage device
US20080198740A1 (en) Service take-over system of multi-host system and method therefor
CN105490847A (en) Real-time detecting and processing method of node failure in private cloud storage system
US6370654B1 (en) Method and apparatus to extend the fault-tolerant abilities of a node into a network
JP2006285453A (en) Information processor, information processing method, and information processing program
CN107315660A (en) A kind of two-node cluster hot backup method of virtualization system, apparatus and system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication