CN114610545A

CN114610545A - Method, system, device and medium for reducing single point of failure of private cloud computing

Info

Publication number: CN114610545A
Application number: CN202210284261.5A
Authority: CN
Inventors: 王则陆; 刘毅枫; 马晓光
Original assignee: Xian Chaoyue Shentai Information Technology Co Ltd
Current assignee: Xian Chaoyue Shentai Information Technology Co Ltd
Priority date: 2022-03-22
Filing date: 2022-03-22
Publication date: 2022-06-10

Abstract

The invention relates to the technical field of cloud computing, and discloses a method, a system, equipment and a medium for reducing single-point faults of private cloud computing. The method comprises the following steps: deploying management software at each computing node in a cluster, and realizing management information synchronization of each computing node in the cluster through a cluster file system of the management software; establishing a process of external voting service, and selecting a voting subsystem from the cluster through the external voting service; selecting, by an external arbiter, a virtual master node from the voting subsystem for executing a command to manage distribution in the cluster; in response to a single point of failure of the virtual master node, the external arbitrator selects one of the remaining computing nodes in the voting subsystem as a new virtual master node to perform the task. The method effectively reduces single point of failure of private cloud computing.

Description

Method, system, device and medium for reducing single point of failure of private cloud computing

Technical Field

The invention relates to the technical field of private cloud computing, in particular to a method, a system, equipment and a medium for reducing single-point faults of private cloud computing.

Background

In recent years, with the rapid development of cloud computing technology, private clouds have become more and more widely used, and the private clouds are built for single use by one client, so that the most effective control on data, security and service quality can be provided. The private cloud is a set of special infrastructure, the management of virtual and physical equipment including virtual machines, physical machines, storage, networks and the like is realized by a unified management platform, and the stability of the management platform directly determines the stability of the cloud platform.

For the private cloud, the stability of the management platform has two high availability modes of hot standby and cold standby at present, the hot standby mode is that a plurality of management platforms are started in one private cloud platform, one of the management platforms is a main one, the other management platforms are standby, and when the physical machine where the main management platform is located is crashed accidentally or the physical machine of the management platform is crashed, one of the other standby management platforms is upgraded to the main management, so that the high availability is realized; the cold standby mode is that a management platform is started in a private cloud platform, the state of the management platform is continuously detected by means of a virtualization bottom layer, and when the management platform is crashed accidentally, the virtualization platform restarts the virtual machine of the management platform, so that the purpose of high availability is achieved. The two modes can achieve high availability, but the management platform can be used normally after 3 minutes, and the user can not log in the management interface to operate, so that the scheduling of the cloud platform and the management and maintenance of the user can be affected.

Disclosure of Invention

In view of this, the present invention provides a method, system, device and medium for reducing single point of failure in private cloud computing. The method for reducing the single-point faults of the private cloud computing, provided by the invention, deploys management software on all the computing nodes in the private cloud, each computing node is used as a management node and a computing node, no separate management node exists in a cluster, and the positions of all the computing nodes in the cluster are equal and have no primary and secondary points. Each computing node can be used as a management node, and the management interfaces are the same, so that single-point faults of the management nodes are eliminated, and continuous high availability of the management nodes is guaranteed.

Based on the above objectives, an aspect of the embodiments of the present invention provides a method for reducing single point of failure of private cloud computing, comprising the following steps: deploying management software at each computing node in a cluster, and realizing management information synchronization of each computing node in the cluster through a cluster file system of the management software; establishing a process of external voting service, and selecting a voting subsystem from the cluster through the external voting service; selecting, by an external arbiter, a virtual master node from the voting subsystem for executing a command to manage distribution in the cluster; in response to a single point of failure of the virtual master node, the external arbitrator selects one of the remaining computing nodes in the voting subsystem as a new virtual master node to perform the task.

In some embodiments, the deploying management software at each computing node in the cluster, the synchronizing management information of each computing node in the cluster via a cluster file system of the management software includes: responding to a user logging in a computing node for operation, the computing node executes task allocation and sending operation on the rest computing nodes in the cluster, the computing node updates management information to obtain new management information, and the new management information is synchronized on the rest computing nodes in the cluster through the cluster file system.

In some embodiments, said deploying management software at each computing node in a cluster, said synchronizing management information of each computing node in the cluster by a cluster file system of the management software comprises: and each computing node in the cluster stores management information by adopting an embedded database sqlite.

In some embodiments, the deploying management software at each computing node in the cluster, the synchronizing management information of each computing node in the cluster via a cluster file system of the management software further comprises: the FUSE file system maps the management information corresponding to the computing node into a configuration file corresponding to a memory so as to store the management information corresponding to the computing node; and synchronizing the configuration files corresponding to the computing nodes in the rest computing nodes in the cluster in real time by using the corosync so as to realize the management information synchronization of each computing node in the cluster.

In some embodiments, the process of establishing an external voting service by which to elect voting subsystems from the cluster comprises: the process of the external voting service provides two voting rights for each computing node in the cluster; and determining whether the computing node is in the voting subsystem according to the running condition of the computing node and the received voting condition, and determining that the computing node is in the voting subsystem in response to the computing node running normally and receiving votes except for the computing node.

In some embodiments, said electing by an external arbitrator a virtual master node from said voting subsystem for executing said command to manage distribution in said cluster comprises: counting the number of votes obtained by each computing node in the voting subsystem by an external arbitrator, and sequencing the votes according to the sequence from high to low; selecting the computing node with the first vote digit sequence as a virtual master node, and executing automatically executed commands including management distribution in the cluster by the virtual master node.

In some embodiments, said selecting, by the external arbitrator, one of the remaining compute nodes in the voting subsystem as a new virtual master node to perform the task in response to the single point of failure of the virtual master node comprises: and in response to the single point failure of the virtual master node, eliminating the virtual master node with the single point failure in the voting subsystems to obtain a new voting subsystem, initiating a new round of voting in the new voting subsystem by the process of the external voting service, counting the number of votes obtained by each computing node in the new voting subsystem by the external arbitrator, sequencing the votes according to the sequence from high to low, and selecting the computing node with the first number of votes obtained as the new virtual master node.

In another aspect of the embodiments of the present invention, a system for reducing single point of failure in private cloud computing is further provided, including the following modules: the system comprises a first module, a second module and a third module, wherein the first module is configured to deploy management software on each computing node in a cluster, and the management information synchronization of each computing node in the cluster is realized through a cluster file system of the management software; a second module configured to establish a process for an external voting service, and select a voting subsystem from the cluster through the external voting service; a third module configured to select a virtual master node from the voting subsystem by an external arbitrator for executing a command to manage distribution in the cluster; and a fourth module configured to respond to a single point of failure of the virtual master node, the external arbitrator selecting one of the remaining computing nodes in the voting subsystem as a new virtual master node to perform a task.

In another aspect of the embodiments of the present invention, there is also provided a computer device, including at least one processor; and a memory storing computer instructions executable on the processor, the instructions when executed by the processor implementing the steps of any of the methods described above.

In another aspect of the embodiments of the present invention, a computer-readable storage medium is also provided, in which a computer program for implementing any one of the above method steps is stored when the computer program is executed by a processor.

The invention has at least the following beneficial effects: the invention provides a method, a system, equipment and a medium for reducing single-point faults of private cloud computing, wherein the method for reducing the single-point faults of the private cloud computing realizes the synchronization of management information of each computing node in a cluster by deploying management software for each computing node in the cluster; and establishing an external voting service process for selecting a virtual main node in the cluster, executing an automatically executed command in the cluster and avoiding split brains of the cluster. Each computing node in the cluster performs computing and management, and single-point faults of private cloud computing are effectively reduced.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic diagram of an embodiment of a method for reducing a single point of failure in private cloud computing according to the present invention;

FIG. 2 is a schematic diagram of a cluster file system according to an embodiment of the present invention;

fig. 3 is a schematic diagram of a cluster voting service according to an embodiment of the method for reducing a single point of failure in private cloud computing provided in the present invention;

fig. 4 is a schematic diagram of cluster management of an embodiment of a method for reducing single point of failure in private cloud computing according to the present invention;

fig. 5 is a schematic diagram of an embodiment of a system for reducing single point of failure in private cloud computing according to the present invention;

FIG. 6 is a schematic diagram of one embodiment of a computer device;

fig. 7 is a schematic diagram of an embodiment of a computer-readable storage medium provided by the present invention.

Detailed Description

Embodiments of the present invention are described below. However, it is to be understood that the disclosed embodiments are merely examples and that other embodiments may take various and alternative forms.

In addition, it should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are only used for convenience of expression and should not be construed as a limitation to the embodiments of the present invention, and the descriptions thereof in the following embodiments are omitted. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

One or more embodiments of the present application will be described below with reference to the accompanying drawings.

Based on the above objectives, a first aspect of the embodiments of the present invention provides an embodiment of a method for reducing a single point of failure in private cloud computing. Fig. 1 is a schematic diagram illustrating an embodiment of a method for reducing a single point of failure in private cloud computing according to the present invention. As shown in fig. 1, a method for reducing single point of failure in private cloud computing according to an embodiment of the present invention includes the following steps:

s1, deploying management software at each computing node in the cluster, and realizing the management information synchronization of each computing node in the cluster through a cluster file system of the management software;

s2, establishing a process of external voting service, and selecting a voting subsystem from the cluster through the external voting service;

s3, selecting a virtual main node from the voting subsystem by an external arbitrator for executing a command for managing distribution in the cluster;

and S4, responding to the single point of failure of the virtual master node, the external arbitrator selects one computing node from the rest computing nodes in the voting subsystem as a new virtual master node to execute the task.

In another embodiment of the method for reducing single point of failure of private cloud computing provided by the present invention, the method is implemented based on an open-source KVM virtualization technology, and it can be understood that, in practical applications, the method for reducing single point of failure of private cloud computing is not limited to being implemented based on an open-source KVM virtualization technology, and may also be implemented according to other virtualization technologies.

As shown in fig. 2, management software is deployed for each compute node in the cluster, and management information of each compute node is stored through a database-based file system, so that each compute node in the cluster performs both computation and management, the statuses of all the compute nodes are equal, no primary and secondary points exist, and the management interfaces of all the compute nodes are the same. In order to ensure that the performance database adopts an embedded database sqlite, and the management information of the computing node stored in the embedded database sqlite is mapped into a configuration file in a memory through a FUSE file system, so as to store the configuration information of a virtual machine, a cluster, storage and the like. On one hand, the FUSE file system stores all management information in a database file of a server disk so as to avoid data loss; on the other hand, a copy is also made in the memory to improve performance. The configuration file can be synchronized to other computing nodes of the cluster in real time through the corosync, and the management information synchronization of all the computing nodes in the cluster is realized, so that the single-point fault of the management node is eliminated, and the continuous high availability of the nodes in the cluster is ensured.

As shown in fig. 3, an external voting service process is established in the cluster, and a voting subsystem that can participate in voting in the cluster is selected through the external voting service process, and further, a virtual master node of the cluster is selected from the voting subsystem, so as to execute an automatically executed command in the cluster, including a command for managing distribution. The external voting service process selects a voting subsystem which can participate in voting by observing the state of each computing node in the cluster, each computing node in the cluster has two voting rights, except for the node with a single point fault, only the node which receives the voting from the computing node except the computing node can be used as the node in the voting subsystem, the service is connected with the cluster members through a network and provides the voting for the members, only a part of the cluster can be voted at any time, and the voting service process supports the clusters with even number of nodes and odd number of nodes. And the external arbitrator counts the vote result of each computing node in the voting subsystem, and sorts the result according to the sequence from high to low, wherein the computing node with the first vote digit column is the virtual master node selected by the external arbitrator. The external voting service transmits the pre-configured votes to the voting subsystems of the cluster, the pre-configured votes comprise the votes provided by the external arbitrator, and the main function of the external voting service is to improve the tolerance of the cluster to the number of the failed nodes. In the whole cluster operation process, only one virtual main node is used for executing the automatically executed command. If the virtual main node is crashed accidentally and has a node fault, another node can be selected through the voting service process to serve as the virtual main node to execute tasks, and cluster split brain is effectively avoided. The method for selecting the other node as the virtual master node is used for voting in the new voting subsystem which eliminates the node failure, and the mode for selecting the virtual master node is the same as the mode for selecting the virtual master node for the first time, so that the voting of the cluster cannot be influenced if the node failure occurs in the running node in the voting subsystem.

As shown in fig. 4, the management information of all the computing nodes in the cluster is stored in the database, and there is only one virtual master node in the whole cluster operation process. If the computing node logged in by the user is not the virtual master node, all the operations of the user are issued by the currently logged computing node and are sent to the corresponding computing node in the cluster for execution. And relevant configuration files such as the virtual machine, the storage, the cluster and the like are updated and synchronously written into a database of the currently logged computing node, and the updated configuration files are synchronized to other computing nodes of the cluster by the cluster file system, so that the synchronization of management information in all the computing nodes of the cluster is ensured, and the probability of single-point failure is reduced.

In view of the above, according to a second aspect of the embodiments of the present invention, a system for reducing single point of failure of private cloud computing is provided, and fig. 5 is a schematic diagram of an embodiment of the system for reducing single point of failure of private cloud computing provided by the present invention. As shown in fig. 5, the system for reducing single point of failure in private cloud computing provided by the present invention includes the following modules: the first module 011 is configured to deploy management software at each computing node in a cluster, and the management information synchronization of each computing node in the cluster is realized through a cluster file system of the management software; a second module 012 configured to establish a process of an external voting service by which voting subsystems are elected from the cluster; a third module 013 configured for selecting, by an external arbiter, a virtual master node from said voting subsystem for executing a command for managing distribution in said cluster; and a fourth module 014 configured to, in response to a single point of failure of the virtual master node, the external arbitrator selecting one of the remaining computing nodes in the voting subsystem as a new virtual master node to perform a task.

In view of the above object, a third aspect of the embodiments of the present invention provides a computer device, and fig. 6 is a schematic diagram illustrating an embodiment of a computer device provided by the present invention. As shown in fig. 6, an embodiment of a computer device provided by the present invention includes the following modules: at least one processor 021; and a memory 022, the memory 022 storing computer instructions 023 executable on the processor 021, the computer instructions 023, when executed by the processor 021, implementing the steps of any of the methods described above.

The invention also provides a computer readable storage medium. FIG. 7 is a schematic diagram illustrating an embodiment of a computer-readable storage medium provided by the present invention. As shown in fig. 7, the computer readable storage medium 031 stores a computer program 032 which, when executed by a processor, performs the method as described above.

Finally, it should be noted that, as one of ordinary skill in the art can appreciate that all or part of the processes of the methods of the above embodiments can be implemented by a computer program to instruct related hardware, and the program of the method for setting system parameters can be stored in a computer readable storage medium, and when executed, the program can include the processes of the embodiments of the methods as described above. The storage medium of the program may be a magnetic disk, an optical disk, a read-only memory (ROM), or a Random Access Memory (RAM). The embodiments of the computer program may achieve the same or similar effects as any of the above-described method embodiments.

Furthermore, the methods disclosed according to embodiments of the present invention may also be implemented as a computer program executed by a processor, which may be stored in a computer-readable storage medium. Which when executed by a processor performs the above-described functions defined in the methods disclosed in embodiments of the invention.

Further, the above method steps and system elements may also be implemented using a controller and a computer readable storage medium for storing a computer program for causing the controller to implement the functions of the above steps or elements.

Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as software or hardware depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosed embodiments of the present invention.

In one or more exemplary designs, the functions may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a general-purpose or special-purpose computer, or a general-purpose or special-purpose processor. Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (D0L), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, D0L, or wireless technologies such as infrared, radio, and microwave are all included in the definition of medium. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk, blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.

The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the present disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.

It should be understood that, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.

The numbers of the embodiments disclosed in the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, of embodiments of the invention is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of the embodiments of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.

Claims

1. A method of reducing single points of failure for private cloud computing, comprising:

deploying management software at each computing node in a cluster, and realizing management information synchronization of each computing node in the cluster through a cluster file system of the management software;

establishing a process of external voting service, and selecting a voting subsystem from the cluster through the external voting service;

selecting, by an external arbiter, a virtual master node from the voting subsystem for executing a command to manage distribution in the cluster;

in response to a single point of failure of the virtual master node, the external arbitrator selects one of the remaining computing nodes in the voting subsystem as a new virtual master node to perform the task.

2. The method of reducing single points of failure for private cloud computing according to claim 1, said deploying management software at each computing node in a cluster, said implementing synchronization of management information for each computing node in the cluster through a cluster file system of the management software comprising:

responding to a user logging in a computing node for operation, the computing node executes task allocation and sending operation on the rest computing nodes in the cluster, the computing node updates management information to obtain new management information, and the new management information is synchronized on the rest computing nodes in the cluster through the cluster file system.

3. The method of reducing single points of failure for private cloud computing according to claim 1, said deploying management software at each computing node in a cluster, said implementing synchronization of management information for each computing node in the cluster through a cluster file system of the management software comprising:

and each computing node in the cluster stores management information by adopting an embedded database sqlite.

4. The method of reducing single points of failure for private cloud computing of claim 3, said deploying management software at each compute node in a cluster, said synchronizing management information for each compute node in the cluster via a cluster file system of the management software further comprising:

the FUSE file system maps the management information corresponding to the computing node into a configuration file corresponding to a memory so as to store the management information corresponding to the computing node;

and synchronizing the configuration files corresponding to the computing nodes in the rest computing nodes in the cluster in real time by using the corosyn c so as to realize the synchronization of the management information of each computing node in the cluster.

5. The method of reducing single point of failure for private cloud computing of claim 1, said process of establishing an external voting service by which to elect voting subsystems from the cluster comprises:

the process of the external voting service provides two voting rights for each computing node in the cluster;

determining whether the computing node is in the voting subsystem according to the running condition of the computing node and the received voting condition;

and responding to the condition that the computing node operates normally and receives votes except for the computing node, and determining that the computing node is in the voting subsystem.

6. The method of reducing single points of failure for private cloud computing of claim 1, said electing by an external arbitrator a virtual master node from the voting subsystem for executing commands to manage distribution in the cluster comprising:

counting the number of votes obtained by each computing node in the voting subsystem by an external arbitrator, and sequencing the votes according to the sequence from high to low;

selecting the computing node with the first vote digit sequence as a virtual master node, and executing automatically executed commands including management distribution in the cluster by the virtual master node.

7. The method of reducing single point of failure for private cloud computing of claim 1, said external arbitrator selecting one of the remaining computing nodes in the voting subsystem as a new virtual master node to perform a task in response to a single point of failure of the virtual master node comprising:

and in response to the occurrence of a single point fault of the virtual master node, eliminating the virtual master node with the single point fault in the voting subsystem to obtain a new voting subsystem, initiating a new round of voting in the new voting subsystem by the process of the external voting service, counting the number of votes obtained by each computing node in the new voting subsystem by the external arbitrator, sequencing the votes according to the sequence from high to low, and selecting the computing node with the first votes obtained digit column as the new virtual master node.

8. A system for reducing single points of failure for private cloud computing, comprising:

the system comprises a first module, a second module and a third module, wherein the first module is configured to deploy management software on each computing node in a cluster, and the management information synchronization of each computing node in the cluster is realized through a cluster file system of the management software;

a second module configured to establish a process for an external voting service, and select a voting subsystem from the cluster through the external voting service;

a third module configured to select a virtual master node from the voting subsystem by an external arbitrator for executing a command to manage distribution in the cluster; and

and the fourth module is configured to respond to the single-point failure of the virtual master node, and the external arbiter selects one computing node from the rest computing nodes in the voting subsystem as a new virtual master node to execute a task.

9. A computer device, comprising:

at least one processor; and

a memory storing computer instructions executable on the processor, the instructions when executed by the processor implementing the steps of the method of any one of claims 1 to 7.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.