CN111488247A

CN111488247A - High-availability method and device for managing and controlling multiple fault tolerance of nodes

Info

Publication number: CN111488247A
Application number: CN202010277503.9A
Authority: CN
Inventors: 赵胜龑
Original assignee: Shanghai Zstack Information Technology Co ltd
Current assignee: Shanghai Zstack Information Technology Co ltd
Priority date: 2020-04-08
Filing date: 2020-04-08
Publication date: 2020-08-04
Anticipated expiration: 2040-04-08
Also published as: CN111488247B

Abstract

The method comprises the steps that a management and control service system is established according to a main management and control node and a slave management and control node on an application layer, wherein the main management and control node and the slave management and control node respectively comprise a pair of FT (FT) management and control nodes protected by FT, and each pair of FT management and control nodes comprises a main FT management and control node and a secondary FT management and control node; determining failed FT control nodes in the control service system and FT control nodes where virtual access addresses are located; and carrying out fault tolerance processing on the management and control service system according to the failed FT management and control node and the FT management and control node where the virtual access address is located. The method can realize that the recovery time of externally provided service can be kept at the second level under the condition of multiple fault tolerance, and simultaneously meets the requirements of ensuring the recovery time and having multiple fault tolerance.

Description

High-availability method and device for managing and controlling multiple fault tolerance of nodes

Technical Field

The application relates to the field of computers, in particular to a high-availability method and equipment for managing and controlling multiple fault tolerance of nodes.

Background

The management and control node of the cloud management platform is a central node for distributing and managing various cloud resources, and the availability of the management and control node is extremely important. The traditional management and control node is often operated on a single server, and has a problem of single point of failure, and when the server fails (for example, power failure, network failure, etc.), there is a risk that the management and control node is inaccessible.

In a production environment, the larger the scale of the cluster, the higher the requirement on the high availability of the cloud pipe nodes is; in some special fields, such as finance, where high-frequency operation is required, higher requirements are initially made on the high availability of the pipe control nodes. The solutions adopted in the industry at present, while addressing the high availability requirement to some extent, still have some disadvantages: the scheme as used requires extra time in heartbeat detection, virtual machine operating system startup and managed node startup, and the time is accumulated together. During this period, the management node cannot provide access to the outside, and it usually takes several minutes for the management node to restore access. Or, due to the logic limitation of the synchronization mechanism of the database, the requirement that more than 2 nodes are simultaneously the master nodes cannot be realized. Therefore, only 2 nodes can be on line at the same time by the structure, so the scheme can only tolerate 1 time.

The current solution only guarantees multiple fault tolerance and sacrifices recovery time; or only the recovery time is ensured, and multiple fault tolerance is sacrificed; it is difficult to satisfy the requirements of ensuring the recovery time and having multiple fault tolerance at the same time.

Disclosure of Invention

An object of the present application is to provide a method and an apparatus for managing and controlling multiple fault tolerance, which solve the problem that it is difficult for a management and control node to satisfy the requirement of guaranteeing the recovery time and having multiple fault tolerance simultaneously in the prior art.

According to one aspect of the application, a highly available method for managing multiple fault tolerance of a node is provided, and the method comprises the following steps:

the method comprises the steps that a management and control service system is established according to a master management and control node and a slave management and control node on an application layer, wherein the master management and control node and the slave management and control node respectively comprise a pair of FT (variable transmission) protected nodes, and each pair of FT management and control nodes comprises a master FT management and control node and a slave FT management and control node;

determining failed FT control nodes in the control service system and FT control nodes where virtual access addresses are located;

and carrying out fault tolerance processing on the management and control service system according to the failed FT management and control node and the FT management and control node where the virtual access address is located.

Further, the primary FT management and control node and the secondary FT management and control node contain the same data content and the corresponding databases are encapsulated in the respective virtual machines.

Further, determining the FT management node where the failed FT management node and the virtual access address are located in the management and control service system includes:

positioning a failed physical host in the management and control service system, and determining that a virtual machine on the failed physical host is a failed FT management and control node;

determining the position of a main control node on an application layer, and determining an FT control node where a virtual access address in the control service system is located according to the position of the main control node.

Further, performing fault tolerance processing on the management and control service system according to the failed FT management and control node and the FT management and control node where the virtual access address is located includes:

if a primary FT control node in the primary FT control nodes on the application layer is a failed FT control node, switching a secondary FT control node protected by the same FT as the primary FT control node into a primary FT control node, and meanwhile, taking the failed FT control node off line;

automatically docking through a network card protecting the FT outer layer of the failed FT control node, and forwarding a data packet to the virtual access address through the network card;

and searching a physical machine meeting the condition in the cluster where the main control node is located by protecting the FT of the failed FT control node, so as to create a new secondary FT control node on the physical machine meeting the condition.

if a failed FT control node exists in each pair of FT control nodes protected by FT, judging whether the failed FT control node is a primary FT control node, if so, switching a secondary FT control node protected by the same FT as the primary FT control node into a primary FT control node, and meanwhile, taking the failed FT control node off line;

automatically docking a network card on the outer layer of the FT corresponding to the FT control node where the virtual access address is located, and forwarding a data packet to the virtual access address through the network card;

and if the primary FT control node and the secondary FT control node in the slave control nodes on the application layer are invalid FT control nodes, continuously completing fault-tolerant processing of the control service system through the primary FT control node and the secondary FT control node in the primary control nodes on the application layer.

if the primary FT control node and the secondary FT control node on the primary control node are both failed FT control nodes, switching the virtual access address to the primary FT control node in the secondary control nodes;

and continuously finishing the fault tolerance processing of the management and control service system through a primary FT management and control node and a secondary FT management and control node in the secondary management and control nodes.

and if the primary FT control node and the secondary FT control node on the primary control node are both failed and one FT control node in the secondary control nodes is failed, switching the virtual access address to the non-failed FT control node in the secondary control nodes, and continuing the fault tolerance processing of the control service system through the non-failed FT control node where the virtual access address is newly located.

and if the primary FT control node and the secondary FT control node on the slave control node are both failed and one FT control node in the master control node is failed, continuing the fault tolerance processing of the control service system through the rest non-failed FT control nodes.

According to another aspect of the present application, there is also provided a high availability apparatus for managing multiple fault tolerance of a node, the apparatus including:

the system comprises a building device and a service management and control device, wherein the building device is used for building a management and control service system according to a master management and control node and a slave management and control node on an application layer, the master management and control node and the slave management and control node respectively comprise a pair of FT management and control nodes protected by FT, and each pair of FT management and control nodes comprises a master FT management and control node and a secondary FT management and control node;

the determining device is used for determining the failed FT control node in the control service system and the FT control node where the virtual access address is located;

and the fault-tolerant processing device is used for carrying out fault-tolerant processing on the management and control service system according to the failed FT management and control node and the FT management and control node where the virtual access address is located.

one or more processors; and

a memory storing computer readable instructions that, when executed, cause the processor to perform the operations of the method as previously described.

According to yet another aspect of the present application, there is also provided a computer readable medium having computer readable instructions stored thereon, the computer readable instructions being executable by a processor to implement the method as described above.

Compared with the prior art, the management and control service system is established according to the master management and control node and the slave management and control node on the application layer, wherein the master management and control node and the slave management and control node respectively comprise a pair of FT (FT) management and control nodes protected by FT, and each pair of FT management and control nodes comprises a master FT management and control node and a secondary FT management and control node; determining failed FT control nodes in the control service system and FT control nodes where virtual access addresses are located; and carrying out fault tolerance processing on the management and control service system according to the failed FT management and control node and the FT management and control node where the virtual access address is located. The method can realize that the recovery time of externally provided service can be kept at the second level under the condition of multiple fault tolerance, and simultaneously meets the requirements of ensuring the recovery time and having multiple fault tolerance.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 illustrates a flow diagram of a highly available method of managing multiple fault tolerance of a node provided in accordance with an aspect of the present application;

fig. 2 is a schematic diagram illustrating an architecture of a management node service system according to an embodiment of the present application;

FIG. 3 is a schematic diagram illustrating a failure condition of 1 physical node in one embodiment of the present application;

FIG. 4 is a schematic diagram illustrating a first case where 2 physical nodes fail in one embodiment of the present application;

FIG. 5 is a diagram illustrating a second scenario in which 2 physical nodes fail in one embodiment of the present application;

FIG. 6 is a diagram illustrating a third scenario in which 2 physical nodes fail in one embodiment of the present application;

FIG. 7 is a diagram illustrating a first scenario in which 3 physical nodes fail in one embodiment of the present application;

FIG. 8 is a diagram illustrating a second scenario in which 3 physical nodes fail in one embodiment of the present application;

fig. 9 is a schematic structural diagram of a high availability device for managing multiple fault tolerance of a node according to another aspect of the present application.

The same or similar reference numbers in the drawings identify the same or similar elements.

Detailed Description

The present application is described in further detail below with reference to the attached figures.

In a typical configuration of the present application, the terminal, the device serving the network, and the trusted party each include one or more processors (e.g., Central Processing Units (CPUs)), input/output interfaces, network interfaces, and memory.

The Memory may include volatile Memory in a computer readable medium, Random Access Memory (RAM), and/or nonvolatile Memory such as Read Only Memory (ROM) or flash Memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, Phase-Change RAM (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), flash Memory or other Memory technology, Compact Disc Read-Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, magnetic cassette tape, magnetic tape storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media (transmyedia), such as modulated data signals and carrier waves.

Fig. 1 is a flow chart illustrating a highly available method for managing multiple fault tolerance of a node according to an aspect of the present application, the method including: step S11 and step S13,

in step S11, a management and control service system is established according to a master management and control node and a slave management and control node on an application layer, where the master management and control node and the slave management and control node each include a pair of FT management and control nodes protected by FT, and each pair of FT management and control nodes includes a master FT management and control node and a slave FT management and control node; here, the management and control node service (mn services) is composed of a plurality of internal sub-services, the service is encapsulated into a mirror image of a virtual machine, and then a pair of virtual machines protected by FT is created respectively by using the mirror image, thereby forming a management and control node service commonly supported by 4 virtual machines online at the same time; meanwhile, in order to disperse risks, 4 virtual machines are deployed on 4 physical nodes, and FT is Fault Tolerance (Fault Tolerance). The primary FT control node and the secondary FT control node contain the same data content, and corresponding databases are packaged in corresponding virtual machines. Specifically, 4 virtual machines are deployed, one pair of each virtual machine is a master pair and a slave pair, as shown in fig. 2, mn1, mn2, mn3 and mn4 all serve as management and control node services, and are encapsulated into the respective virtual machines together with respective database services, and each pair of mn and a corresponding database constitute a management and control node service for external support access. The PVM and the SVM respectively represent a primary virtual machine (a primary VM) and a secondary virtual machine (a secondary VM) protected by FT, and the contents of a pair of the primary VM and the secondary VM protected by FT always keep the same, that is, the contents of each pair of FT management nodes including the primary FT management node and the secondary FT management node always keep the same. It should be noted that, there are 2 layers of synchronization mechanisms in the management and control service system to ensure 4 nodes to be synchronized, one layer is an FT mechanism (bottom layer virtualization) and the other layer is a master-slave mechanism of a database (application layer), as shown in fig. 2, mn1 and mn2 are a pair of FT virtual machines, and mn3 and mn4 are another pair of FT virtual machines; for the application layer, the current 2 nodes are mn2 and mn4, and mn2 also has a master-slave relationship mn1 in the virtualization layer, and mn3 also has a master-slave relationship mn4 in the virtualization layer.

In step S12, determining a failed FT management and control node in the management and control service system and an FT management and control node where the virtual access address is located; here, the virtual access address (vip) is an entry ip of an external access control node, for an application layer, the system only has nodes formed by 2 PVMs, a service in the system provides the vip, and it is calculated which PVM node is a master node, and the vip is configured on which node. And determining the FT control node where the vip is located by calculating which control node on the application layer is the master control node. The failed FT management node is a failed node, and may be any one or any combination of two pairs of primary FT management nodes and secondary FT management nodes protected by FT, that is, any one or any combination of mn1, mn2, mn3, and mn 4.

In step S13, fault tolerance processing of the management and control service system is performed according to the failed FT management and control node and the FT management and control node where the virtual access address is located. Here, subsequent fault-tolerant processing can be performed according to the judged failed FT control node and the control node where the vip is determined, where the fault-tolerant processing is associated with the number, the location, and the location of the failed FT control node, and if the number of the failed FT control nodes is 1 and the vip fails for the PVM on the master control node, the system can also perform self-recovery through FT to perform fault tolerance for the original number of times or perform fault tolerance again for 2 times while still having 3 nodes working simultaneously. That is, after a failure, there is only one vip switch in the worst case, all services are ready after the switch, and there is no need to restart the system, and even if there is a vip switch, that is, about 1s, the user is almost imperceptible.

By the method, the nodes can be checked and recovered by self through the FT, each fault node can automatically find the nodes meeting the revival condition and restart and recover again, and the process control node is completely insensitive. By using the method, the fault tolerance can be carried out for at least 3 times, and if only 1 fault or 1 fault of the main and standby equipment exists, the FT can find a chance to automatically recover to 4 times of lives.

In an embodiment of the present application, in step S12, a failed physical host in the management and control service system is located, and a virtual machine on the failed physical host is determined to be a failed FT management and control node; determining the position of a main control node on an application layer, and determining an FT control node where a virtual access address in the control service system is located according to the position of the main control node. Here, with reference to fig. 2, the number and the position of the physical nodes that fail in the system are determined, for example, 1 physical node failure, 2 physical node failures, or 3 physical node failures exist, and for example, the PVM on the master node is determined when 1 physical node failure exists. And calculating whether the control node on the application layer formed by mn1 and mn2 is a master control node or the control node on the application layer formed by mn3 and mn4 is a master control node, and determining the FT control node where the vip is located, for example, if the control node on the application layer formed by mn1 and mn2 is calculated to be the master control node, determining the FT control node where the vip is located to be mn 2. Dbsync represents the process of data synchronization between virtual machines, and the management and control nodes between 2 PVMs serve the master and standby nodes of the mutual database, but at the same time, only one master node provides access to the outside, that is, the node where the virtual access address (vip) is located.

In an embodiment of the application, in step S13, if a primary FT control node of primary FT control nodes on the application layer is a failed FT control node, a secondary FT control node protected by the same FT as the primary FT control node is switched to the primary FT control node, and the failed FT control node is offline; automatically docking through a network card protecting the FT outer layer of the failed FT control node, and forwarding a data packet to the virtual access address through the network card; and searching a physical machine meeting the condition in the cluster where the main control node is located by protecting the FT of the failed FT control node, so as to create a new secondary FT control node on the physical machine meeting the condition. Here, when 1 physical node in the system fails, if the physical node is a master FT management node in the master management nodes on the application layer, that is, a PVM (mn2) node where the vip is located fails, as shown in fig. 3, a node where mn2 is located fails, and then a node where mn1, which is originally a slave node, is changed from an SVM to a PVM, and at the same time, the original PVM goes offline. The Vip and the ip are both set in the virtual machine, only the PVM provides access to the virtual machine protected by the FT at the same time, when the PVM is correspondingly switched, the network card on the outer layer of the FT can be automatically butted, the virtual machine is automatically adapted on a virtualization layer, and an application layer is not sensitive. In the new PVM (namely the original mn1), the network packet is also directly forwarded to the vip through the network card of the FT as the configuration of the internal network of the original PVM (mn2), an external user accesses the management and control node through the vip, and the switching at the moment is not sensible to the user. Under the scene that 1 node fails, the FT back end searches for an FT physical node meeting the condition in the background, and if a healthy physical node meeting the condition exists in the system, a new FT slave node is created again, so that the FT combination is recovered automatically; if not, the current environment still has 3 nodes working simultaneously, and the fault tolerance can be carried out for 2 times. In the recovery process of 1 node failure, only FT service switching time is needed, FT switching does not need any restarting process, and the service is always prepared at the beginning, so that the recovery time is second level; the process of searching the nodes meeting the conditions to reconstruct the FT is completely background operation, and is irrelevant to the access of the user to the management and control node service, and the user has no perception in an application layer. Wherein, the virtual machines of FT are created, deleted, etc. in the same cluster, and physical machines with the same configuration can be added to the same cluster. The virtual machine needs a physical machine as a host, if 1 node of the FT-protected virtual machine fails, the FT tries to automatically create a SVM that is a candidate for FT again on other physical machines that meet the condition that the FT can create the SVM, and the condition that the FT can create the SVM includes, but is not limited to, other physical machines in the cluster, which have sufficient computing resources (such as CPU memory, etc.), and on which a new SVM can be automatically created. For example, under the same cluster, other physical machines identical to the failed physical machine are configured, the CPU memory resource is sufficient, and the management node is already added to the management node, and the management node automatically arranges the SVM to be created on the management node.

In an embodiment of the application, in step S13, if there is one failed FT control node in each pair of FT control nodes protected by FT, it is determined whether the failed FT control node is a primary FT control node, and if so, a secondary FT control node protected by the same FT as the primary FT control node is switched to be the primary FT control node, and the failed FT control node is offline; automatically docking a network card on the outer layer of the FT corresponding to the FT control node where the virtual access address is located, and forwarding a data packet to the virtual access address through the network card; and searching a physical machine meeting the condition in the cluster where the main control node is located by protecting the FT of the failed FT control node, so as to create a new secondary FT control node on the physical machine meeting the condition. Here, when there are 2 physical nodes failing (2 nd fault-tolerant scenario), there are 3 cases, the first case is that one failed FT management node exists in each pair of FT management nodes protected by FT, that is, 2 virtual machines protected by FT each fail 1, fault tolerance occurs for the 2 nd time when the failed node occurring at the time of the first fault tolerance is the primary FT management node where vip is located, and another primary FT management node fails, as shown in fig. 4, mn2 and mn4 fail, mn1 is switched to PVM, the original PVM is offline, FT automatically recovers to 3 or 4 nodes by searching for a physical machine satisfying the condition in the same cluster, or degrades to the above-mentioned case where 1 failed physical node exists, and the service originally deployed on the physical machine is deployed into the virtual machine. In the recovery process of node failure, only FT service switching time is needed, FT switching does not need any restarting process, the scene can still continue fault tolerance for 1 time, and the recovery time only relates to FT switching time and is second level.

Continuing to the above embodiment, in the second case that 2 physical nodes are failed, if a primary FT management and control node and a secondary FT management and control node in the slave management and control nodes on the application layer are both failed FT management and control nodes, continuing to complete the fault tolerance processing of the management and control service system through the primary FT management and control node and the secondary FT management and control node in the primary management and control node on the application layer. Here, the slave management and control node on the application layer is the node where the non-vip is located, when the physical machine corresponding to 1 pair of virtual machines where the non-vip is located fails, as shown in fig. 5, mn3 and mn4 fail, the original vip is not switched, because the failed node is a pair of virtual machines protected by FT and 2 nodes all fail, in this case, FT does not recover by itself, vip does not need to be switched, the node where the vip is located is still protected by FT, fault tolerance can be performed once again, the recovery time does not involve FT switching and vip switching, and the original network connection is not interrupted.

In an embodiment of the application, in the case that a third 2 physical nodes fail, if a primary FT control node and a secondary FT control node on the primary control node are both failed FT control nodes, the virtual access address is switched to a primary FT control node in the secondary control nodes; and continuously finishing the fault tolerance processing of the management and control service system through a primary FT management and control node and a secondary FT management and control node in the secondary management and control nodes. Here, the master control node is a node where the vip is located, when 1 pair of virtual machines where the vip is located correspond to physical machines which fail, both the master FT control node and the secondary FT control node on the master control node fail, as shown in fig. 6, mn1 and mn2 fail, and at this time, the vip needs to be switched, mn3 is a PVM in a protected pair of FT virtual machines, and when mn1 and mn2 both fail, the vip is switched to a PVM in a slave control node, that is, to mn 3. Because the failure node is a pair of virtual machines protected by FT and 2 nodes fail, under the condition, FT is not recovered by self, the other pair of virtual machines are still protected by FT and still can fault-tolerant for 1 time, and the recovery time relates to vip switching and is second level. In summary, there is only one vip switch in all the cases of fault tolerance 2, and vip switch is also the second-level recovery time, so the recovery time of fault tolerance 2 is still the second level.

In an embodiment of the application, in step S13, if both the primary FT management node and the secondary FT management node on the primary management node are failed and there is a FT management node failure in the secondary management nodes, the virtual access address is switched to the FT management node that is not failed in the secondary management nodes, and the fault tolerance processing of the management and control service system is continued through the FT management node that is not failed and where the virtual access address is newly located. Here, when a pair of FTs fails, only the database (application layer) is synchronously protected, and there are two cases in this case where there is a physical node failure, that is, there are 3 physical nodes in the system, where a primary FT management node and a secondary FT management node on a master node both fail and there is a FT management node failure in the slave management nodes, that is, the failed nodes include a node where the vip is located, as shown in fig. 7, mn1, mn2, and mn4 fail, and the vip needs to be switched, and at this time, the recovery time is of the order of seconds since it only involves switching the vip.

Continuing with the above embodiment, when there are 3 physical nodes in the system that fail. And if the primary FT control node and the secondary FT control node on the secondary control node both fail and one FT control node in the primary control node fails, continuing the fault tolerance processing of the control service system through the rest FT control nodes which do not fail. Here, if a node where a non-vip is located fails, as shown in fig. 8, in the case where the first failure is mn2, mn3 and mn4 fail, and then vip switching is not required, and access can still be provided to the outside although fault tolerance is not achieved.

It should be noted that, in all the above failure situations, the offline of any node will send a corresponding warning to the user layer, and the user can configure the receiving end of the user, and when any node is offline, the user can receive the warning notification sent by the system at any time. Through the design of this application, the synchronous many control nodes of database are born to realize can be fault-tolerant 3 at least to cloud control node through the virtual machine of FT, can be from recovering when the condition satisfies, and simultaneously in fault-tolerant in-process, the external service recovery time that provides can keep at the second level.

In addition, the embodiment of the present application further provides a computer readable medium, on which computer readable instructions are stored, and the computer readable instructions can be executed by a processor to implement the aforementioned high availability method for managing multiple fault tolerance of a node.

In correspondence with the method described above, the present application also provides a terminal, which includes modules or units capable of executing the steps of the method described in fig. 1 or each embodiment, and these modules or units can be implemented by hardware, software or a combination of hardware and software, and this application is not limited thereto. For example, in an embodiment of the present application, there is also provided an apparatus for managing a highly available method for multiple fault tolerance of a node, the apparatus including:

one or more processors; and

For example, the computer readable instructions, when executed, cause the one or more processors to:

Fig. 9 is a schematic structural diagram of a high availability device for managing multiple fault tolerance of a node according to another aspect of the present application, where the device includes: the system comprises a construction device 11, a determination device 12 and a fault-tolerant processing device 13, wherein the construction device 11 is used for constructing a management and control service system according to a master management and control node and a slave management and control node on an application layer, wherein each of the master management and control node and the slave management and control node comprises a pair of FT management and control nodes protected by FT, and each pair of FT management and control nodes comprises a master FT management and control node and a secondary FT management and control node; the determining device 12 is configured to determine an FT management and control node in the management and control service system that fails and an FT management and control node where a virtual access address is located; the fault-tolerant processing device 13 is configured to perform fault-tolerant processing on the management and control service system according to the failed FT management and control node and the FT management and control node where the virtual access address is located.

It should be noted that the content executed by the building device 11, the determining device 12 and the fault-tolerant processing device 13 is the same as or corresponding to the content in the above steps S11, S12 and S13, and for brevity, the description is omitted here.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

It should be noted that the present application may be implemented in software and/or a combination of software and hardware, for example, implemented using Application Specific Integrated Circuits (ASICs), general purpose computers or any other similar hardware devices. In one embodiment, the software programs of the present application may be executed by a processor to implement the steps or functions described above. Likewise, the software programs (including associated data structures) of the present application may be stored in a computer readable recording medium, such as RAM memory, magnetic or optical drive or diskette and the like. Additionally, some of the steps or functions of the present application may be implemented in hardware, for example, as circuitry that cooperates with the processor to perform various steps or functions.

In addition, some of the present application may be implemented as a computer program product, such as computer program instructions, which when executed by a computer, may invoke or provide methods and/or techniques in accordance with the present application through the operation of the computer. Program instructions which invoke the methods of the present application may be stored on a fixed or removable recording medium and/or transmitted via a data stream on a broadcast or other signal-bearing medium and/or stored within a working memory of a computer device operating in accordance with the program instructions. An embodiment according to the present application comprises an apparatus comprising a memory for storing computer program instructions and a processor for executing the program instructions, wherein the computer program instructions, when executed by the processor, trigger the apparatus to perform a method and/or a solution according to the aforementioned embodiments of the present application.

It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the apparatus claims may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Claims

1. A highly available method for managing multiple fault tolerance of a node, the method comprising:

2. The method of claim 1, wherein the primary and secondary FT policing nodes contain identical data content and corresponding databases are encapsulated in respective virtual machines.

3. The method according to claim 1 or 2, wherein determining the FT management node in which the failed FT management node and the virtual access address are located in the management service system comprises:

4. The method according to claim 3, wherein performing fault tolerance processing of the management and control service system according to the failed FT management and control node and the FT management and control node where the virtual access address is located includes:

5. The method according to claim 3, wherein performing fault tolerance processing of the management and control service system according to the failed FT management and control node and the FT management and control node where the virtual access address is located includes:

6. The method according to claim 3, wherein performing fault tolerance processing of the management and control service system according to the failed FT management and control node and the FT management and control node where the virtual access address is located includes:

7. The method according to claim 3, wherein performing fault tolerance processing of the management and control service system according to the failed FT management and control node and the FT management and control node where the virtual access address is located includes:

8. The method according to claim 3, wherein performing fault tolerance processing of the management and control service system according to the failed FT management and control node and the FT management and control node where the virtual access address is located includes:

9. The method according to claim 3, wherein performing fault tolerance processing of the management and control service system according to the failed FT management and control node and the FT management and control node where the virtual access address is located includes:

10. A highly available apparatus for managing multiple fault tolerance of a node, the apparatus comprising:

11. A highly available apparatus for managing multiple fault tolerance of a node, the apparatus comprising:

one or more processors; and

a memory storing computer readable instructions that, when executed, cause the processor to perform the operations of the method of any of claims 1 to 9.

12. A computer readable medium having computer readable instructions stored thereon which are executable by a processor to implement the method of any one of claims 1 to 9.