CN114168261A

CN114168261A - OpenStack-based high availability method and device for managing bare metal instances

Info

Publication number: CN114168261A
Application number: CN202111359826.3A
Authority: CN
Inventors: 李博; 谢涛涛; 宋伟; 申嘉童
Original assignee: Inspur Cloud Information Technology Co Ltd
Current assignee: Inspur Cloud Information Technology Co Ltd
Priority date: 2021-11-17
Filing date: 2021-11-17
Publication date: 2022-03-11

Abstract

The invention discloses a high-availability method and a device for managing bare metal instances based on OpenStack, belonging to the technical field of cloud computing. The method and the device can avoid the situation that the bare metal instance is not managed due to the instability or the abnormity of the IronicDriver nova-computer service at the management side, and improve the management capability of the cloud platform on the bare metal.

Description

OpenStack-based high availability method and device for managing bare metal instances

Technical Field

The invention relates to the technical field of cloud computing, in particular to a high-availability method and device for managing bare metal instances based on OpenStack.

Background

Aiming at the management of a bare metal instance, the common action of nova and ironic is needed, although the current nova-computer service supports multi-copy deployment, the instance can be managed by only one nova-computer service, and when the service is abnormal, the instance loses management, and the on-off and synchronous power supply states cannot be carried out.

Disclosure of Invention

The technical task of the invention is to provide a high-availability method and device for managing bare metal examples based on OpenStack, which can avoid the situation that the bare metal examples are not managed by nova due to instability or abnormity of IronicDriver nova-computer service at the management side, and can effectively improve the management capability of a cloud platform on the bare metal.

The technical scheme adopted by the invention for solving the technical problems is as follows:

a high availability method for managing bare metal instances based on OpenStack is provided in OpenStack cloud platform environment, and the method comprises the steps of increasing the check of bare metal nodes of deployed instances in a timed task update _ available _ resource of nova-computer services, reselecting a service in the services in the up state to take over the instances if the nova-computer services corresponding to the instances are not in the up state, and updating the hosts of the instances to the hosts of new services;

and when resource tracker is used for updating and detecting resources, updating the host information into the hypervisor, and completing the re-management of the instances.

Namely, after the original nova-computer service of the bare machine instance is abnormal, the continuous and stable management of nova to the bare machine instance is realized through the high availability of the nova-computer.

Bare metal server: the bare metal server is a physical server deployed on the cloud of the cloud center, and provides excellent computing performance and data security for services such as a core database, a key application system, high-performance computing and big data. The cloud storage system has the same performance and physical isolation of the server as the physical machine, but is more flexible and convenient to use, and the supply, operation and maintenance management are carried out by the cloud center.

Bare metal example: after a user purchases the bare metal server according to the required specification, the mirror image and the network configuration, the cloud computing management platform installs the system on the bare metal server, and the bare metal server of the user is delivered after the configuration is completed according to the requirements of the user. This instance now appears as a nova instance on the OpenStack side.

Highly available distributed systems: high availability ha (high availability) is one of the factors that must be considered in the design of the architecture of the distributed system, which generally means that the time during which the system cannot provide service is reduced by design. In specific design, high availability generally realizes redundancy of services through clustering, and when one of the services is abnormal, a backup can take over the service to realize continuous and stable operation of the service.

Timing tasks: in each system, timing tasks have a large number of use scenes and different implementation modes, and the timing tasks in nova-computer services in the OpenStack platform are mainly described herein, and the same service logic is continuously executed mainly according to a certain time interval so as to ensure that the system can acquire new information and data all the time. The primary focus of the IronicDriver nova-computer is on the usage of computing resources and the operational status of the nova-computer service itself.

At present, an open-source cloud platform management system is mainly realized based on OpenStack, and the OpenStack controls computing, storage and network resources of a whole data center. Physical bare machines (also known as cloud physical hosts) are both the infrastructure of a data center and the primary computing resources for high performance computing. As a physical computing resource (bare metal server), the bare computer can be conveniently brought under the management of the computing service nova of OpenStack by the Ironic component, and the management of the bare computer can be conveniently realized by using the nova and the Ironic, including but not limited to: deployment examples, startup and shutdown, mounting and unloading resources and the like, and the cloud physical host and the elastic cloud server use uniform interfaces and similar flows. The nova management of bare computer is realized by the nova-computer service running IronicDriver.

Further, the method is realized in the following specific steps:

1) deploying an OpenStack platform comprising keystone, nova, neutron, circle, pane and ironic components, wherein the nova-computer service running the IronicDriver requires 3 copies to be deployed on 3 different hosts;

2) adding a bare metal server into the environment, registering the bare metal server in an ironic baremetal node, and converting the bare metal server into an available state;

3) according to the specification of bare metal, configuring and deploying a bare metal server by a user mirror image and a user network;

4) using nova to manage bare metal examples, such as operations of switching on and off, mounting/unloading rolls and the like;

5) detecting whether each IronicDriver computer service is in an up state or not at fixed time intervals by the timing task in the nova-computer, detecting whether an ironicbaremetic node deploys an instance or not, and caching each node into a node cache of the corresponding computer service;

6) the nova-computer service calls a resource tracker in sequence to the nodes which can be managed by the nova-computer service to update the resource information of the barefoot nodes;

7) when an IronicDriver nova-computer service exception is no longer in the up state, other nova-computer services find the service exception and remove the service exception from the hash ring;

8) when the nova-computer service of the hypervisor corresponding to a certain instance is removed from the hash ring, the nova loses the management of the instance, at this time, the normal nova-computer service detects that the nova-computer service of the instance is in a down state, a service is reselected from the hash ring to manage the instance, and the host of the instance is updated to the host of the new service and is updated to the instance record of the nova database;

9) after the host of the instance is updated, a new nova-computer service finds one more computer node which is not in the database in the timing task, and when resource updating and detection are carried out by using a resource tracker, the host information of the computer node is updated to the hypervisor, so that the instance is re-managed.

Preferably, to ensure high availability, the deployed 3 copies of IronicDriver nova-computer services are respectively located on 3 different hosts, so as to avoid service exception caused by host downtime.

Further, for a baremetal node with an instance, nova-computer will detect the computer service of each instance in turn; if the service state is up, namely the hypervisor corresponding to the instance is normal and the instance can be managed by nova, detecting the next instance;

and if the state of the corresponding service is not up, reselecting a new service through the hash ring, detecting whether the service is in up and enabled states, and updating the host of the instance to be the newly selected host if the conditions are met.

Furthermore, the normal service detects whether the service of the instance is normal, where the instance to be detected is a bare computer instance not belonging to the current service management, because the current service can execute a timing check task, it indicates that the current service itself is normal;

the judgment of whether the instance service is normal or not can be judged to be normal when the service is in the up state, including service in a disable state but in the up state; service for force down belongs to the down state.

Preferably, selecting the new service means performing md5 calculation by using uuid of the barrel metal node where the instance is located, and mapping the value in the hash ring as key to obtain the new service.

And selecting a new host for the example, wherein the selection method follows an algorithm of selecting service by later ironic computer nodes, and the md5 value of the node uuid is used as a hash value to be mapped to a certain node in the hash ring, namely the service of the computer node, namely the new host of the example.

Preferably, the host of the update instance refers to the host and the launch _ on fields recorded in the instances table in the nova database.

Preferably, the current service only updates the host of the instance, and after the timing task of the next round is started, the instance can be managed by the nova-computer, and the service of the instance completes the update of the hypervisor;

in the process of executing the timing task, the new service compares the node cached in the node cache of the new service with the node recorded in the database as the new service, checks that the host of the original hypervisor is inconsistent with the actual host of the instance, and updates the new host information into the hypervisor corresponding to the baretal node in the process of resource tracker resource updating.

Updating the host of the hypervisor by the resource tracker, wherein the implicit condition is that the updated instance is not added to the node cache of the current nova-computer service, on one hand, the newly mapped host of the instance is not necessarily the current host, on the other hand, a plurality of services may detect the abnormality of the original service of the instance at the same time, but the final updating result is consistent, so that the current service only updates the host of the instance, and after the next round of timing task is started, the instance can be managed by the nova-computer, and the update of the hypervisor is completed by the service of the instance.

The invention claims a high-availability device for managing bare metal instances based on OpenStack, which comprises: at least one memory and at least one processor;

the at least one memory to store a machine readable program;

the at least one processor is used for calling the machine readable program and executing the high-availability method for managing the bare metal instance based on the OpenStack.

The present invention also claims a computer readable medium having stored thereon computer instructions which, when executed by a processor, cause the processor to perform the above-described OpenStack-based high availability method of managing bare metal instances.

Compared with the prior art, the highly available method and device for managing the bare metal instance based on the OpenStack have the following beneficial effects:

aiming at the management of a bare metal instance, the common action of nova and ironic is needed, although the current nova-computer service supports multi-copy deployment, the instance can be managed by only one nova-computer service, and when the service is abnormal, the instance loses management, and the on-off and synchronous power supply states cannot be carried out. In order to solve the problem, the method updates the host of the instance in time by using a detection mechanism of the timing task after the computer service corresponding to the bare computer instance is abnormal, so that the host is managed by nova again. Therefore, the high availability of nova-computer can be effectively improved, the stability of the service of the management side is ensured, and certain help and improvement are provided for the overall reliability and credibility of the cloud platform.

Drawings

Fig. 1 is a flowchart of a high availability method for managing bare metal instances based on OpenStack according to an embodiment of the present invention.

Detailed Description

The invention is further described with reference to the following figures and specific examples.

Under the OpenStack platform, the creation and management of bare metal instances is largely done by Nova and Ironics components. For OpenStack, the host of each instance, i.e. its host, is the computing node where the nova-computer service that created the instance is located. However, for bare metal instances, a computer driver is an IronicDriver, a corresponding nova-computer is actually deployed in a management node, a specific bare computer is a carrier of the instances, each bare metal node is registered in a hypervisor as a computing node, and a host of each hypervisor is distributed by a hash ring. Currently, for the IronicDriver nova-computer service itself, high availability of multiple copies can be supported; meanwhile, for a bare metal node in an available state, after a corresponding nova-computer service is abnormal, the IronicDriver can select a service for the node again in the hash ring, so that the nova recovers the management of the node; however, when a bare metal node is completely deployed to become a bare metal instance, the corresponding service exception will cause nova to be unable to manage the bare metal instance. Actually, at this time, only the management-side service is in a problem, the bare computer instance itself is normal and has no influence, meanwhile, there are other available nova-computer service services in the environment, but OpenStack loses the management capability of the instance, and can be re-managed only after the abnormal service is restored again, which obviously increases the production and operation and maintenance costs, and also does not meet the requirement of cloud service reliability.

Based on the above problems, an embodiment of the present invention provides a high availability method for managing bare metal instances based on OpenStack, that is, in an OpenStack cloud platform environment, a high availability method for managing bare metal instances is provided, in a timed task update _ available _ resource of nova-computer service, the method increases the check on the bare metal node of a deployed instance, if the nova computer service corresponding to the instance is not in an up state, a service is selected again from the services in the up state to take over the instance, and the host of the instance is updated to the host of a new service;

By the method, the situation that the bare metal instance is not managed by nova due to instability or abnormity of the IronicDriver nova-computer service at the management side can be avoided, and the management capability of the cloud platform on the bare metal can be effectively improved.

The key point of the method is that host and computer service services of a bare metal example are detected at regular time, active switching is carried out when the service is abnormal, and the specific implementation process is as follows:

1) the OpenStack platform is deployed and mainly comprises keystone, nova, neutron, circle, pane and ironic components, wherein the nova-computer service running the IronicDriver needs to deploy 3 copies on 3 different hosts;

in order to ensure high availability, the 3 copies of the IronicDriver nova-computer service are deployed, and the 3 services are required to be respectively located on 3 different hosts, so as to avoid service exception caused by host downtime.

2) And adding a bare metal server to the environment, registering the bare metal server in an ironic baremetal node, and converting the bare metal server into an available state.

3) And according to the specification of the bare metal, configuring and deploying the bare metal server by the user mirror image and the user network.

4) And managing bare metal instances by using nova, such as operations of switching on and off, mounting/dismounting the volume and the like.

5) And detecting whether each IronicDriver service is in an up state or not by the timing task in the nova-computer 60s at each time, detecting whether an ironicbarempirical node deploys an instance or not, and caching each node into a node cache of the corresponding computer service.

6) And the nova-computer service calls the resource tracker in sequence to the node which can be managed by the nova-computer service to update the resource information of the barefoot node.

7) When an IronicDriver nova-computer service exception is no longer in the up state, other nova-computer services find the service exception and remove the service exception from the hash ring.

for a baremetal node with an instance, nova-computer will detect the computer service of each instance in turn; if the service state is up, namely the hypervisor corresponding to the instance is normal and the instance can be managed by nova, detecting the next instance;

The normal service detects whether the service of the instance is normal, wherein the instance needing to be detected is a bare computer instance not belonging to the current service management, and the current service can execute a timing check task, so that the current service is normal;

Selecting a new service means that the uuid of the barrel metal node where the instance is located is used for md5 calculation, and the value is mapped in hash ring as key to obtain the new service.

And updating the new host of the instance into the database, wherein the updated host of the instance is the host information of the instance, and the updated host of the instance refers to the host and the launch _ on fields recorded in the instances table in the nova database.

After the instance is updated, the updated node information is not directly added to the node cache of the current service, because: on one hand, the host of the service newly selected by the instance is not necessarily the current host, on the other hand, a plurality of services may detect the abnormality of the original service of the instance at the same time, but the final updating result is consistent, so the current service only updates the host of the instance, and after the next round of timing task is started, the instance can be managed by the nova-computer, and the update of the hypervisor is completed by the service of the instance.

As shown in FIG. 1, the figure shows in more detail the nova-computer's high availability management of bare metal instances through timed tasks. The figure shows that the timing task update _ available _ resource for updating the available resource after the IronicDriver nova-computer service is started:

first, 3 resources are loaded: nova-computer service, instances, bare metal nodes;

obtaining all nova-computer services with the type of hypervisor type as ironic, sequentially judging whether the services are in an up state or not, and mapping all up services to nodes on hash ring to obtain the hash ring for managing the bare computer example;

each nova-computer service acquires all instances with the host as the hostname of the nova-computer service from the database, namely acquires the bare computer instance managed by the current nova-computer and records the bare computer instance in the instances;

all the bare metal nodes recorded in the ironic were obtained and tested in turn for each bare metal node:

when the node belongs to the current service management, adding the node into a node _ cache;

if the node does not belong to the current service management, but there is already an instance on the node, the instance is further checked: judging whether the computer service corresponding to the instance is in an up state, if not, indicating that the nova loses the management capability of the instance, and if the computer service corresponding to the instance is not in the up state, the service corresponding to the computer service is also removed from the hash ring, at the moment, reselecting a service from the hash ring as a new service of the instance, confirming that the new service is in the up state again, and setting the host of the service as the new host of the instance, so that the instance is managed by the nova again;

when the detection is finished, the cache node is updated;

the instance after the host is modified is updated in the next round of timing task of the new nova-computer service: when the new service detects that a new node exists in the node cache but the node does not belong to the own host in the database, the resource tracker updates the actual host to the host of the original hypervisor in the process of updating the resource, and the consistency between the database record and the cache data is completed.

The embodiment of the invention also provides a high-availability device for managing bare metal instances based on OpenStack, which comprises the following steps: at least one memory and at least one processor;

the at least one memory to store a machine readable program;

the at least one processor is configured to invoke the machine readable program to execute the OpenStack-based high availability method for managing bare metal instances according to the above embodiment of the present invention.

An embodiment of the present invention further provides a computer-readable medium, where a computer instruction is stored on the computer-readable medium, and when the computer instruction is executed by a processor, the processor is caused to execute the high availability method for managing bare metal instances based on OpenStack in the above embodiments of the present invention. Specifically, a system or an apparatus equipped with a storage medium on which software program codes that realize the functions of any of the above-described embodiments are stored may be provided, and a computer (or a CPU or MPU) of the system or the apparatus is caused to read out and execute the program codes stored in the storage medium.

In this case, the program code itself read from the storage medium can realize the functions of any of the above-described embodiments, and thus the program code and the storage medium storing the program code constitute a part of the present invention.

Examples of the storage medium for supplying the program code include a floppy disk, a hard disk, a magneto-optical disk, an optical disk (e.g., CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD + RW), a magnetic tape, a nonvolatile memory card, and a ROM. Alternatively, the program code may be downloaded from a server computer via a communications network.

Further, it should be clear that the functions of any one of the above-described embodiments may be implemented not only by executing the program code read out by the computer, but also by causing an operating system or the like operating on the computer to perform a part or all of the actual operations based on instructions of the program code.

Further, it is to be understood that the program code read out from the storage medium is written to a memory provided in an expansion board inserted into the computer or to a memory provided in an expansion unit connected to the computer, and then causes a CPU or the like mounted on the expansion board or the expansion unit to perform part or all of the actual operations based on instructions of the program code, thereby realizing the functions of any of the above-described embodiments.

While the invention has been shown and described in detail in the drawings and in the preferred embodiments, it is not intended to limit the invention to the embodiments disclosed, and it will be apparent to those skilled in the art that various combinations of the code auditing means in the various embodiments described above may be used to obtain further embodiments of the invention, which are also within the scope of the invention.

Claims

1. A high-availability method for managing bare metal instances based on OpenStack is characterized in that in a timing task update _ available _ resource of nova-computer service, the method increases the check on bare metal nodes of deployed instances, if the nova-computer service corresponding to an instance is not in an up state, a service is reselected from the services in the up state to take over the instance, and the host of the instance is updated to the host of a new service;

2. The OpenStack-based high availability method for managing bare metal instances according to claim 1, wherein the method is implemented as follows:

4) managing bare metal examples by using nova;

3. The OpenStack-based method for managing bare metal instances as claimed in claim 2, wherein the deploying 3 copies of the IronicDriver nova-computer service, 3 services are respectively located on 3 different hosts.

4. The OpenStack-based management bare metal instance high availability method according to claim 1 or 2, wherein for instantiated bare metal nodes, nova-computer will detect the computer service of each instance in turn; if the service state is up, namely the hypervisor corresponding to the instance is normal and the instance can be managed by nova, detecting the next instance;

5. The highly available method for managing bare metal instances based on OpenStack according to claim 4, wherein a normal service detects whether the service of an instance is normal, where the instance to be detected is a bare machine instance not belonging to the current service management;

6. The highly available method for managing bare metal instances based on OpenStack according to claim 1 or 2, wherein selecting a new service means performing md5 calculation using uuid of a bare metal node where an instance is located, and mapping the value as key in hash ring to obtain the new service.

7. The OpenStack-based high availability method for managing bare metal instances according to claim 1 or 2, wherein the host of the updated instance refers to the host and launch _ on fields recorded in instances table in nova database.

8. The method for managing the high availability of the bare metal instance based on the OpenStack as claimed in claim 2, wherein the current service only updates the host of the instance, and after the next round of timing task is started, the instance can be managed by nova-computer, and the update of hypervisor is completed by the service of the instance;

9. A highly available device for managing bare metal instances based on OpenStack, comprising: at least one memory and at least one processor;

the at least one memory to store a machine readable program;

the at least one processor, configured to invoke the machine readable program to perform the method of any of claims 1 to 8.

10. Computer readable medium, characterized in that it has stored thereon computer instructions which, when executed by a processor, cause the processor to carry out the method of any one of claims 1 to 8.