WO2020113670A1 - 防脑裂的OpenStack虚拟机高可用系统 - Google Patents
防脑裂的OpenStack虚拟机高可用系统 Download PDFInfo
- Publication number
- WO2020113670A1 WO2020113670A1 PCT/CN2018/121655 CN2018121655W WO2020113670A1 WO 2020113670 A1 WO2020113670 A1 WO 2020113670A1 CN 2018121655 W CN2018121655 W CN 2018121655W WO 2020113670 A1 WO2020113670 A1 WO 2020113670A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- management
- computing node
- virtual machine
- module
- node device
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
- G06F2009/45579—I/O management, e.g. providing access to device drivers or storage
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
- G06F2009/45595—Network integration; Enabling network access in virtual machine instances
Definitions
- the invention relates to the field of cloud computing, in particular to a high-availability system for preventing split brain OpenStack virtual machines, and belongs to the field of computers.
- HA High Availability
- the Nova module responsible for computing function management only provides the Evacuate interface for evacuating virtual machines to other nodes when the host fails, but the module itself lacks the scheduling management function for HA;
- Masakari a sub-open source project that deals specifically with HA, has just become an official project from the OpenStack incubation project. The maturity of the project itself is still very low. It can only complete HA recovery in a few scenarios and cannot support commercial use.
- the invention provides a high-availability system for preventing split brain OpenStack virtual machines, which is characterized by comprising a management terminal device, a management network, a computing node device and a shared storage device,
- At least two management-end devices communicate through the management network to form a management cluster
- the management terminal device and the computing node device are connected by communication through the management network,
- the computing node device is connected to the shared storage device,
- Each management device includes:
- Nova control module including Nova's native virtual machine VM management process, used to manage the life cycle of the virtual machine VM;
- Cluster management module used to collect cluster operating status information
- High-availability module for high-availability management of all computing node devices
- the high availability module runs a high availability management method, which includes the following operations:
- Operation A-1 check whether the cluster status is normal through the operating status information collected by the cluster management module. If it is abnormal, trigger a cluster abnormal alarm and end, if it is normal, go to operation A-2;
- Operation A-2 check the status reported by each computing node device through the management network, if it is normal, this round of inspection is terminated, otherwise go to the next operation A-3;
- Operation A-3 according to the abnormal status reported by each computing node device through the management network, determine whether processing is needed one by one. If no processing is required, the abnormal processing of the computing node device ends, and go back to the previous operation A-2; otherwise Go to the next operation A-4;
- Operation A-4 for the computing node device that needs to be processed in an abnormal state, check the status of the shared storage device connected to it, and when the shared storage device is abnormal, control the cloud computing virtual machine running on the computing node device through the Nova control module
- the VM program does not run and ends, otherwise, go to the next operation A-5;
- Operation A-5 issue a Fencing isolation request to the connected computing node device with the shared storage device in a normal state, and fencing is to kill and shut down the cloud computing virtual machine VM program of the node;
- Operation A-6 issue a command to the Nova control module to trigger the cloud computing virtual machine VM program running on the computing node device,
- the computing node device In addition to installing the cloud computing virtual machine VM program, the computing node device also has:
- Nova-computer computer module used to directly respond to the management process of the management terminal device to control the running state of the virtual machine VM, and communicate with the Hypervisor API;
- Libvirt management module used to provide standard Hypervisor API interface management process on KVM
- the Lock management module in conjunction with the Libvirt management module, is used to update and monitor the lock heartbeat of the shared storage device;
- the high-availability computing node module is at least used to report the heartbeat to the management device,
- the method of running the high-availability computing node module includes the following operations:
- Operation C-1 when the virtual machine VM continues to update and store the lock heartbeat, if the write is normal, no processing is required, otherwise, if the lock heartbeat is written abnormally, go to operation C-2;
- Operation C-3 if the management device returns the processing result within the specified time, go to operation C-5, otherwise go to operation C-4;
- the Lock management module performs Fencing isolation operation, that is, killing or isolating the cloud computing virtual machine VM program of the computing node device;
- the Lock management module determines whether Fencing is required according to the processing result returned by the management device.
- the high availability module After the management device sends a Fencing request to the connected computing node device with the shared storage device in a normal state, the high availability module also runs the following operations:
- Operation B-1 continuously monitor the Fencing event reported by the computing node device, and once the message is received, go to operation B-2;
- Operation B-2 check whether the cluster status is normal through the operating status information collected by the cluster management module. If it is abnormal, trigger a cluster abnormal alarm and end, if it is normal, go to operation B-3;
- Operation B-3 check the network status reported by each computing node device through the management network, if it is normal, this round of inspection is terminated, otherwise go to operation B-4;
- Operation B-4 according to the abnormal status reported by each computing node device through the management network, determine whether processing is required, and if processing is not required, proceed to operation B-6; otherwise, go to operation B-5;
- Operation B-5 For the computing node device in the abnormal state that needs to be processed, check the status of the shared storage device connected to it. When the shared storage device is abnormal, go to operation B-6 without Fencing and end, otherwise, transfer To operation B-7;
- Operation B-6 for scenarios that do not require Fencing, issue a stop Fencing request to the corresponding computing node device;
- Operation B-7 for the scenario that requires Fencing, issue a Fencing request to the corresponding computing node device,
- the process of recovery after the process of the Lock management module restarts includes the following operations:
- Operation D-2 once the lock heartbeat registration fails, kill closes the cloud computing virtual machine VM program of the computing node device;
- the Libvirt management module records all the computing node devices that have been shut down by the VM program of the cloud computing virtual machine that was killed, and records them in the isolation log file;
- Operate D-4 regularly check the isolation log file, and if there is an update, go to operation D-5;
- Operation D-5 report the isolation log files of all computing node devices to the management device. If the report fails, the process ends and the next report is required; otherwise, after reporting to the management device, the management device will issue an instruction to proceed restore.
- the management device After reporting to the management device, the management device performs the following specific operations:
- the management device receives the quarantine log file reported by the agent computing node device, and determines whether to perform automatic processing. If the automatic processing is transferred to the operation D-8, if the automatic processing is not required, the operation is switched to the operation D-7;
- the management terminal device automatically processes the fencing cloud computing virtual machine VM program, and calls the Nova interface to control the cloud computing virtual machine VM program to resume running again.
- the shared storage device is managed and operated by CephFS or NFS file management program,
- the VM management process of the virtual machine includes Nova-api, Nova-conductor or Nova-scheduler,
- the cluster management module includes Etcd or Consul.
- the management network includes:
- the management network plane is used to connect the management terminal device and provide management services
- Storage network plane used to connect to the back-end shared storage device, used to provide storage services
- the service network plane is used to connect computing node devices and provide access services for cloud computing virtual machine VMs.
- the management network plane, storage network plane, and service network plane of the management network are normal, the network status reported by the computing node device in operation A-2 through the management network is judged to be normal, otherwise according to the specifics of the abnormal computing node device
- the type of interruption is which one or more of the management network plane, storage network plane, and service network plane are processed accordingly.
- the management network includes:
- the management network plane is used to connect the management terminal device and provide management services
- Storage network plane used to connect to the back-end shared storage device, used to provide storage services
- Service network plane used to connect computing node devices, used to provide virtual machine VM access services
- the network status reported by the computing node device in operation B-3 through the management network is judged to be normal, otherwise according to the abnormal computing node device’s
- the specific interruption type is which one or more of the management network plane, storage network plane, and service network plane perform corresponding Fencing processing.
- the cloud computing virtual machine VM program has a VMGuestOS operating system, which performs the following recovery operations after Fencing:
- Operation E-1 the Qga in VMGuestOS and the highly available computing node module of the computing node device continue to maintain a lock heartbeat, and when the VM program of the cloud computing virtual machine fails, go to operation E-2;
- Operation E-2 when the highly available computing node module receives the report of the abnormal event, it is reported to the management device;
- the management device After receiving the report of the abnormal event, the management device directly calls the Nova interface to control the cloud computing virtual machine VM program to resume operation again.
- the failure includes a blue screen of the computing node device where the VM program of the cloud computing virtual machine is running, or a stuck or dead machine.
- the management device After reporting to the management device, the management device performs the following specific operations:
- Operation D-6 the management device receives the quarantine log file reported by the agent computing node device, and determines whether to perform automatic processing. If the automatic processing is transferred to the operation D-8, if the automatic processing is not required, the operation is switched to D-7;
- the management terminal device automatically processes the fencing cloud computing virtual machine VM program, and calls the Nova interface to control the cloud computing virtual machine VM program to resume running again.
- the anti-brain split OpenStack virtual machine high-availability system has a high-availability module, which can run a high-availability management method, and detects connected computing nodes in real time through a series of operations from A-1 to A-6
- the status of the device and the shared storage device according to the type of abnormal state learned: the abnormality of the computing node device or the abnormality of the shared storage device, specifically which part of the management network plane, storage network plane, business network plane in the management network is abnormal And determine whether to perform Fencing operation to close the cloud computing virtual machine VM program corresponding to the abnormal computing node device, so as to ensure the high availability of the cloud computing virtual machine VM program of the computing node device in the system.
- the management device can run a series of operations from C-1 to C-5, update and store the lock heartbeat of the Lock distributed read-write lock in real time, and report the fault conditions written during the update in real time.
- the management device operate according to the processing result of the management device: whether Fencing closes the VM program of the cloud computing virtual machine of the computing node device, thereby protecting the lock distributed read-write lock from the host level of the computing node device Refine down to the VM level of the virtual machine, which can perform concurrent read and write protection for a single virtual machine.
- FIG. 1 is a schematic structural diagram of a high-availability system of an OpenStack virtual machine for preventing split brain in an embodiment of the present invention
- FIG. 2 is a schematic flowchart of a high-availability management method for a high-availability management terminal device of an OpenStack virtual machine for preventing split brain in an embodiment of the present invention
- FIG. 3 is a schematic flow chart of Fencing for a high-availability module of a high-availability management terminal device of an OpenStack virtual machine for preventing split brain in an embodiment of the present invention
- FIG. 4 is a schematic flowchart of a high-availability management method for a highly available computing node device of an OpenStack virtual machine for preventing split brain in an embodiment of the present invention
- FIG. 5 is a schematic diagram of a recovery process after restarting the process of the Lock management module of the highly available computing node device of the OpenStack virtual machine for preventing split brain in the embodiment of the present invention.
- FIG. 6 is a schematic diagram of steps for performing a recovery operation of a cloud computing virtual machine VM program of a high-availability computing node device of an OpenStack virtual machine for preventing split brain in an embodiment of the present invention.
- Virtual Machine is a virtual machine, which refers to a complete computer system with complete hardware system functions simulated by software and running in a completely isolated environment.
- OpenStack OpenStack is an open source cloud computing management platform project. It is a free software and open source project authorized by the Apache license, developed and initiated by NASA (National Aeronautics and Space Administration) and Rackspace.
- the computing resource management component in the OpenStack project includes nova-api, nova-scheduler, nova-conductor, nova-compute and other processes.
- the core computing controller of the entire OpenStack project it is used to implement the life cycle management of user virtual machine instances to provide virtual services, such as virtual machine creation, power on, shutdown, suspend, pause, adjustment, migration, restart, destruction, etc.
- Nova-api an interactive interface provided by Nova externally, and a message processing portal. Managers can manage internal infrastructure through this interface, and can also provide services to users through this interface. After receiving the request, after basic verification, it will send each request to the next module through the message queue.
- Nova-scheduler mainly completes the scheduling of each virtual machine instance in Nova. According to the conditions such as CPU architecture, host memory, load, whether it has certain hardware requirements, etc., each instance can be scheduled and assigned to the appropriate node.
- Nova-conductor Nova's internal processor for long tasks. It mainly deals with the tracking management of time-consuming tasks such as the creation and migration of virtual machine instances. In addition, it is also responsible for the access control of the database to avoid Nova-compute from directly accessing the database.
- Nova-computer located on the computing node, is the real executor of virtual machine lifecycle management operations. Receive requests through the message queue, respond to the management processes of the control node, and directly be responsible for various communications with the Hypervisor.
- Nova controller a role definition or title.
- Nova processes including Nova-api, nova-conductor, nova-scheduler, etc., which are mainly responsible for processing virtual machine management operations; they are generally deployed on independent nodes called management nodes, which are not related to nova-compute Compute nodes are deployed together.
- HaStack uses C-S structure to provide one of the two self-developed components of HA function, located on the server side. As the brain of HA management, it is used to manage the overall HA behavior, and its functions are performed by the highly available modules.
- HaStack-agent one of the two self-developed components that uses the C-S structure to provide HA functionality, is located on the Agent side. Mainly responsible for mounting the shared directory, reporting the node's heartbeat status and VM Fencing events; and cooperating with HaStack to complete the management of some HA actions, its function is run by the highly available computing node module.
- API Application, Programming, Interface
- application programming interface application programming interface.
- the component exposes the kernel through the API for external access and calling.
- Hypervisor is an intermediate software layer that runs between a physical server and an operating system. It allows multiple operating systems and applications to share a set of basic physical hardware, so it can also be regarded as a "meta" operating system in a virtual environment. As an abstraction of platform hardware and operating system, it can coordinate access to all physical devices and virtual machines on the server, also called virtual machine monitor (Virtual Machine Monitor). Hypervisor is the core of all virtualization technologies. The ability to support non-disruptive migration of multiple workloads is a basic function of Hypervisor. When the server starts and executes the hypervisor, it will allocate the appropriate amount of memory, CPU, network and disk to each virtual machine, and load the guest operating system of all virtual machines.
- KVM Kernel-based Virtual Machine
- KVM Kernel-based Virtual Machine
- is a complete hardware-based virtualization mainly provides kernel-based virtual machines.
- Libvirt a management process that provides a standard Hypervisor API interface on top of KVM.
- Lock run by the Lock management module 304, is set in the computing node device 300, cooperates with the libvirt component, and is located on the upper layer of the architecture of the shared storage device 500 to complete the update and monitoring of various lock heartbeats. Used to provide distributed read-write locks to control and manage concurrent writes to the same storage.
- the innovative Lock module in this embodiment is a distributed read-write lock manager newly invented with reference to the native Lock function. You can also use the native Lock module as needed, or perform adaptive secondary development of the native Lock.
- Etcd a highly available distributed key-value database, is implemented in GO language and guarantees strong consistency through a consistency algorithm.
- cluster software it is mainly used to provide the following two functions: one is to form a three-plane cluster to sense the global health status for HA decision; the other is to serve as an information bridge between HaStack and HaStack-agent.
- Ceph a unified distributed storage software designed for excellent performance, reliability, and scalability.
- CephFS a distributed file system based on Ceph storage. In this solution, it is mainly used to store lock files of various Lock modules.
- NFS Network File System
- the NFS server can allow the NFS client to mount the shared directory on the remote NFS server to the local NFS client.
- the client application of the local NFS can transparently read and write files located on the remote NFS server, just like accessing local disk partitions and directories.
- GuestOS Guest in the field of virtualization is used to refer to the virtualized system, which is an example of a virtual machine running software (such as an operating system). GuestOS is the operating system for virtual machines.
- QGA It is the abbreviation of Qemu (Emulator)-Guest (Guest)-Agent (Agent). It is a common application running inside a virtual machine, that is, a serial port is added to the virtual machine to communicate with the host. Implement a way for the host machine to interact with the virtual machine VM.
- the high-availability system of the split-proof OpenStack virtual machine includes a management device 100, a management network 200, a computing node device 300, and a shared storage device 400.
- At least two management-end devices communicate through the management network to form a management cluster 110.
- the management terminal device and the computing node device are communicatively connected through the management network.
- the computing node device is connected to the shared storage device.
- FIG. 1 here are three management terminal devices 100 (that is, control nodes A, B, and C in the figure), three computing node devices 300 (that is, computing nodes A, B, and C in the figure) and A shared storage device 400 will be described as an example.
- all three computing node devices 300 are connected to one shared storage device 400, that is, three computing node devices 300 share one shared storage device 400.
- Each management device 100 includes a Nova control module 101, a cluster management module 102, and a high availability module 103.
- Nova control module 101 namely Nova controller in the figure, includes Nova's native virtual machine VM management process, which is used to manage the life cycle of the virtual machine VM.
- the cluster management module 102 namely Etcd in the figure, is used to collect the running status information of the cluster.
- the high availability module 103 that is, FitOS HaStack in the figure, is used for high availability management of all computing node devices.
- the management network 200 is divided into three major network planes, namely a management network plane 201, a storage network plane 202, and a service network plane 203.
- the management network plane 201 is used to connect to the management terminal device and is used to provide management services.
- the storage network plane 202 is used to connect to the back-end shared storage device and is used to provide storage services.
- the service network plane 203 is used to connect computing node devices, and is used to provide access services for cloud computing virtual machine VMs.
- All nodes are connected to the three planes, and the cluster management module 102, that is, Etcd in the figure corresponds to each plane to form a corresponding cluster.
- each computing node device 300 also has a Nova-computer computer module 302, a libvirt management module 303, a lock management module 304, and a highly available computing node module 305 .
- Nova-computer computer module 302 namely Nova-compute in the figure, is used to directly control the running state of the cloud computing virtual machine VM in response to each management process of the management terminal device, and communicate with the Hypervisor API.
- the Libvirt management module 303 namely Libvirt in the figure, is used to provide a management process of a standard Hypervisor API interface on the KVM.
- the Lock management module 304 namely Lock in the figure, cooperates with the Libvirt management module to update and monitor the lock heartbeat of the shared storage device.
- the highly available computing node module 305 that is, the HaStack-agent in the figure, is at least used to report the lock heartbeat to the management device.
- Nova-controller run by Nova control module 101, including Nova-api, Nova-conductor or Nova-scheduler and other virtual machine management processes, is set in the management device 100, and is mainly used to manage the life cycle of the virtual machine VM operating.
- HaStack which is run by the high-availability module 103, is set in the management device 100 and is used to manage global HA behavior.
- the cluster software is run by the cluster management module 102, and the software used includes Etcd, Consul, etc. In this embodiment, Etcd is used. Used in combination with the HaStack component, it is set in the management device 100 and is used to sense the health status of the entire cluster for HA decision-making, and serves as an information bridge between the highly available module 103 and the highly available computing node module 305.
- Nova-compute a native Nova process
- Nova-computer computer module 302 is run by Nova-computer computer module 302 and is set in computing node device 300 to respond to the management processes of the control node. It is the real executor of virtual machine life cycle management operations and is directly responsible for Hypervisor carries out various communications.
- HaStack-agent used in conjunction with the nova-compute process, is run by the high-availability compute node module 305, set in the compute node device 300, and is mainly responsible for mounting shared directories, reporting the node's lock heartbeat status, and cooperating with HaStack components to complete part of the HA Action management functions.
- Libvirt set in the computing node device 300, is run by the Libvirt management module 303, and provides a standard Hypervisor API management process on top of the virtual machine VM.
- Lock run by the Lock management module 304, is set in the computing node device 300, cooperates with the libvirt component, and is located on the upper layer of the architecture of the shared storage device 500 to complete the update and monitoring of various lock heartbeats.
- the innovative Lock module in this embodiment is a distributed read-write lock manager newly invented with reference to the native Lock function. You can also use the native Lock module as needed, or perform adaptive secondary development of the native Lock.
- the shared storage system is run by the shared storage device 400.
- the software programs used include CephFS and NFS, which provide shared file system storage.
- the high availability module 103 runs a method of high availability management.
- the method includes the following operations:
- operation A-1 check whether the cluster status is normal through the running status information collected by the cluster management module. If it is abnormal, trigger a cluster abnormal alarm and end. If it is normal, go to operation A-2.
- HaStack checks whether the cluster status is normal. If it is abnormal, it triggers a cluster abnormal alarm and ends this round of inspection; if it is normal, it proceeds to operation A-2.
- Operation A-2 check the status reported by each computing node device through the management network. If it is normal, this round of inspection is terminated, otherwise go to the next operation A-3.
- HaStack checks the status of the three-plane management network reported by each node through the HaStack-agent. If all are normal, the round of inspection is terminated; otherwise, go to operation A-3.
- Operation A-3 according to the abnormal status reported by each computing node device through the management network, determine whether processing is needed one by one. If no processing is required, the abnormal processing of the computing node device ends, and go back to the previous operation A-2; otherwise Go to the next step A-4.
- HaStack processes the nodes with exceptions one by one, and determines the subsequent processing strategy based on which network plane is interrupted by each node, and compares the HA strategy matrix; if no processing is required, the node ends abnormal processing and returns to operation A-3; Otherwise, if subsequent processing is required, go to operation A-4.
- Operation A-4 for the computing node device that needs to be processed in an abnormal state, check the status of the shared storage device connected to it, and when the shared storage device is abnormal, control the cloud computing virtual machine running on the computing node device through the Nova control module
- the VM program does not run and ends, otherwise, go to the next operation A-5.
- HaStack checks the working status of the shared storage device 400. If the shared storage device 400 is abnormal at this time, it cannot trigger HA, that is, the cloud computing virtual machine VM does not run. This round of processing ends; otherwise, if the storage is normal, go to operation A-5.
- a Fencing request is issued to the connected computing node device with the shared storage device in a normal state, and fencing is to kill the VM program of the cloud computing virtual machine of the node.
- Operation A-6 issuing a command to the Nova control module to trigger the cloud computing virtual machine VM program running on the computing node device to run.
- Node module operation includes the following operations:
- Operation C-1 When the cloud computing virtual machine VM continues to update and store the lock heartbeat, no processing is required if the write is normal, otherwise, once the lock heartbeat write is abnormal, go to operation C-2.
- the virtual machine VM continuously updates Lock's lock heartbeat and stores it; if the write in the storage is normal, no processing is required; otherwise, once the lock heartbeat write is abnormal for more than a predetermined time, the operation proceeds to operation C-2.
- the Lock management module reports the storage abnormal event to the management device, and waits for the management device to feedback the processing result.
- Lock notifies HaStack-agent, reports the underlying storage abnormal event to HaStack, and waits for HaStack to provide the processing result.
- Operation C-3 If the management device returns the processing result within the specified time, go to operation C-5, otherwise go to operation C-4.
- HaStack returns the processing opinion within the predetermined time, then go to operation C-5; otherwise, go to operation C-4.
- the Lock management module performs a Fencing operation, that is, kills the VM program of the cloud computing virtual machine of the computing node device.
- Lock performs the Fencing isolation operation according to the default settings, that is, kills or shuts down all virtual machine VMs running on the computing node.
- the Lock management module determines whether Fencing is required according to the processing result returned by the management device.
- Embodiment 1 On the basis of Embodiment 1, as shown in FIG. 3, when the management device 100 issues a Fencing request to the computing node device connected to the shared storage device in a normal state, HaStack needs to determine how to respond to the underlying HaStack-agent according to the current status of the environment. The storage interrupt event reported by the terminal, for this reason, the high availability module also runs the following operations:
- Operation B-1 continuously monitoring the Fencing event reported by the computing node device, and once receiving the message, go to operation B-2.
- HaStack continuously monitors the Fencing event reported by HaStack-agent, and once the message is received, it proceeds to operation B-2.
- operation B-2 check whether the cluster status is normal through the running status information collected by the cluster management module. If it is abnormal, trigger a cluster abnormal alarm and end. If it is normal, go to operation B-3.
- HaStack checks whether the cluster status is normal, if it is abnormal, it triggers a cluster abnormal alarm, and ends this round of inspection; if it is normal, go to operation B-3.
- Operation B-3 check the network status reported by each computing node device through the management network, if it is normal, this round of inspection is terminated, otherwise go to operation B-4.
- HaStack checks the three-plane status of the management network reported by each node through HaStack-agent.
- Operation B-4 according to the abnormal status reported by each computing node device through the management network, determine whether processing is required, and if processing is not required, proceed to operation B-6; otherwise, go to operation B-5.
- HaStack processes the nodes with exceptions one by one, according to the specific interrupt type of each node, compares the HA strategy matrix to determine the subsequent Fencing processing strategy; if no processing is required, go to operation B-6; otherwise if subsequent processing is required, go to operation B -5.
- Operation B-5 For the computing node device in the abnormal state that needs to be processed, check the status of the shared storage device connected to it. When the shared storage device is abnormal, go to operation B-6 without Fencing and end, otherwise, transfer Go to operation B-7.
- HaStack checks the storage status. If the storage is abnormal, Fencing is not required, go to operation B-6; otherwise, go to operation B-7.
- Operation B-6 for scenarios where Fencing is not required, issue a stop Fencing request to the corresponding computing node device.
- HaStack issues a request to stop Fencing to HaStack-agent.
- Operation B-7 for the scenario that requires Fencing, issue a Fencing request to the corresponding computing node device.
- HaStack issues a Fencing request to HaStack-agent.
- the recovery process includes the following operations:
- Operation D-1 When the Libvirt management module is started, register and obtain the lock heartbeat through the Lock management module. If the registration fails, go to operation D-2.
- Libvirt registers with Lock and acquires the lock heartbeat when it starts, and if it fails, it proceeds to operation D-2.
- Operation D-2 once the lock heartbeat registration fails, kill closes the cloud computing virtual machine VM program of the computing node device.
- the Libvirt management module records all computing node devices that have been shut down by the VM program of the cloud computing virtual machine that was killed, and records them in the Fencing log file.
- Operate D-4 regularly check the quarantine log file, and if there is an update, go to operation D-5.
- HaStack-agent regularly checks the Fencing log on the node, and once it finds an update, it moves to operation D-5.
- Operation D-5 report the isolation log files of all computing node devices to the management device. If the report fails, the process ends and the next report is required; otherwise, after reporting to the management device, the management device will issue an instruction to proceed restore.
- the HaStack-agent reports all Fencing logs to HaStack. If the report fails, the processing ends and the next report is required.
- the management device After reporting to the management device, the management device performs the following specific operations:
- the management device receives the Fencinglog file reported by the agent computing node device, and determines whether to perform automatic processing. If the automatic processing shifts to the operation D-8, if the automatic processing is not required, the operation moves to the operation D-7.
- HaStack receives the Fencing log reported by the agent, and determines whether to perform automatic processing according to the processing switch configured in advance: if the automatic processing is turned to D-8, if the automatic processing is not required, to D-7.
- HaStack does not automatically restore all Fencing virtual machines, but only reports to the police, and the subsequent administrators manually restore.
- the management terminal device automatically processes the fencing cloud computing virtual machine VM program, and calls the Nova interface to control the cloud computing virtual machine VM program to resume running again.
- HaStack needs to automatically handle the Fencing virtual machine, which will call the Nova interface one by one to trigger the HA recovery process.
- the cloud computing virtual machine VM program has a VM GuestOS operating system, which performs the following recovery operation after Fencing:
- the Qga in the VM GuestOS and the HaStack-agent of the computing node continue to maintain a heartbeat. Once the blue screen in the virtual machine is stuck or stuck, go to operation E-2.
- Operation E-2 when the highly available computing node module receives the report of the abnormal event, it is reported to the management device.
- HaStack-agent when HaStack-agent receives an abnormal event, it will immediately report it to HaStack.
- the management device After receiving the report of the abnormal event, the management device directly calls the Nova interface to control the cloud computing virtual machine VM program to resume operation again.
- HaStack after receiving an abnormal event inside the VM of the virtual machine, HaStack directly issues an HA command to Nova to trigger HA recovery.
- this embodiment provides a management method for a highly-available management terminal device of a split-open OpenStack virtual machine, which includes the following operations:
- Operation A-1 check whether the cluster status is normal through the collected operating status information. If it is abnormal, trigger the cluster abnormal alarm and end, if it is normal, go to operation A-2;
- Operation A-2 check the status reported by each computing node device through the management network, if it is normal, this round of inspection is terminated, otherwise go to the next operation A-3;
- Operation A-3 according to the abnormal status reported by each computing node device through the management network, determine whether processing is needed one by one. If no processing is required, the abnormal processing of the computing node device ends, and go back to the previous operation A-2; otherwise Go to the next operation A-4;
- Operation A-4 for the computing node device that needs to be processed in an abnormal state, check the status of the shared storage device connected to it, and when the shared storage device is abnormal, control the cloud computing virtual machine running on the computing node device through the Nova control module
- the VM program does not run and ends, otherwise, go to the next operation A-5;
- Operation A-5 issuing a Fencing request to the computing node device with the connected shared storage device in a normal state
- Operation A-6 issuing a command to the Nova control module to trigger the cloud computing virtual machine VM program running on the computing node device to run.
- Operation B-1 continuously monitor the Fencing event reported by the computing node device, and once the message is received, go to operation B-2;
- Operation B-2 check whether the cluster status is normal through the collected operating status information. If it is abnormal, trigger the cluster abnormal alarm and end, if it is normal, go to operation B-3;
- Operation B-3 check the network status reported by each computing node device through the management network, if it is normal, this round of inspection is terminated, otherwise go to operation B-4;
- Operation B-4 according to the abnormal status reported by each computing node device through the management network, determine whether processing is required, and if processing is not required, proceed to operation B-6; otherwise, go to operation B-5;
- Operation B-5 For the computing node device in the abnormal state that needs to be processed, check the status of the shared storage device connected to it. When the shared storage device is abnormal, go to operation B-6 without Fencing and end, otherwise, transfer To operation B-7;
- Operation B-6 for scenarios that do not require Fencing, issue a stop Fencing request to the corresponding computing node device;
- Operation B-7 for the scenario that requires Fencing, issue a Fencing request to the corresponding computing node device.
- this embodiment provides a management method for a highly available computing node device of a split-open OpenStack virtual machine, which includes the following operations:
- Operation C-1 when the virtual machine VM continues to update and store the lock heartbeat, if the write is normal, no processing is required, otherwise, if the lock heartbeat is written abnormally, go to operation C-2;
- Operation C-3 if the management device returns the processing result within the specified time, go to operation C-5, otherwise go to operation C-4;
- Operation C-4 if the management device does not return the processing result within the specified time, the Lock management module executes the Fencing operation, that is, killing or isolating the cloud computing virtual machine VM program of the computing node device;
- the Lock management module determines whether Fencing is required according to the processing result returned by the management device.
- the process of recovery after the process of the Lock management module restarts includes the following operations:
- Operation D-2 once the lock heartbeat registration fails, kill closes the cloud computing virtual machine VM program of the computing node device;
- the Libvirt management module records all the computing node devices that have been shut down by killing the VM program of the cloud computing virtual machine, and records them in the Fencing log file;
- Operate D-4 regularly check the Fencing log files, and if there is an update, go to operation D-5;
- Operation D-5 report the Fencing log files of all computing node devices to the management device. If the report fails, the processing is ended and the next report is required; otherwise, after reporting to the management device, the management device will issue an instruction to proceed restore.
- Operation E-1 Qga in VM GuestOS and the highly available computing node module of the computing node device continue to maintain a lock heartbeat, and when the VM program of the cloud computing virtual machine fails, go to operation E-2;
- Operation E-2 when the highly available computing node module receives the report of the abnormal event, it is reported to the management device;
- the management device After receiving the report of the abnormal event, the management device directly calls the Nova interface to control the cloud computing virtual machine VM program to resume operation again.
- the failure includes a blue screen of the computing node device where the VM program of the cloud computing virtual machine is running, or a stuck or dead machine.
- the invention has been secondary developed based on the original OpenStack version.
- a set of independent high-availability systems for OpenStack virtual machines with anti-brain split are independently developed on the periphery of OpenStack. Get rid of the dependence on the IPMI plane detection/hardware dog in the traditional HA solution, and realize the complete virtual machine high availability (HA) technical method of carrier-grade reliability. For this reason, the present invention provides an improved OpenStack anti-brain split High availability system for virtual machines.
- split-brain refers to a highly available (HA) system.
- HA highly available
- two connected control nodes or computing nodes are disconnected, they are originally a whole system and split into Two independent nodes. At this time, the two nodes begin to compete for shared resources. As a result, the system will be chaotic and data will be corrupted.
- the improved anti-brain split OpenStack virtual machine high availability management terminal device and management method provided by the improvement of the present invention That can solve this problem.
- the anti-brain split OpenStack virtual machine high availability system since it has a high availability module, it can run the high availability management method, and through a series of operations from A-1 to A-6, detect connected computing nodes in real time
- the status of the device and shared storage device depends on the type of abnormal state learned: the abnormality of the computing node device or the abnormality of the shared storage device, specifically which part of the management network plane, storage network plane, service network plane in the management network is abnormal And determine whether to perform Fencing operation to close the cloud computing virtual machine VM program corresponding to the abnormal computing node device, so as to ensure the high availability of the cloud computing virtual machine VM program of the computing node device in the system.
- the management device can run a series of operations from C-1 to C-5, update and store the lock heartbeat of the Lock distributed read-write lock in real time, and report the fault conditions written during the update in real time.
- To the management device operate according to the processing result of the management device: whether Fencing closes or isolates the VM program of the cloud computing virtual machine of the computing node device, so that the protection strength of the Lock distributed read-write lock is controlled by the computing node device
- the host level is refined to the virtual machine VM level, and concurrent read and write protection can be performed for a single virtual machine.
- the lock protection strength of the Lock distributed read-write lock is refined from the host level of the computing node device to the virtual machine VM level, and concurrent read and write protection can be performed for a single virtual machine.
- the self-invented full-process VM Fencing protection mechanism prevents the virtual machine from being abnormally terminated due to the failure of the shared storage device and other failures affecting the underlying lock heartbeat.
- the asynchronous notification mechanism is adopted to solve the problem of HA management VM disconnection caused by Lock restart, and automatic recovery is realized.
- HaStack implements three planes of management network (management network plane, business network plane, storage network plane) by integrating Etcd and Qga ) The health status, and the precise perception of the virtual machine VM internal operating state:
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Debugging And Monitoring (AREA)
- Computer And Data Communications (AREA)
- Memory System Of A Hierarchy Structure (AREA)
- Hardware Redundancy (AREA)
Abstract
Description
Claims (10)
- 一种防脑裂的OpenStack虚拟机高可用系统,其特征在于,包括管理端装置、管理网络、计算节点装置以及共享存储装置,其中,至少两个管理端装置之间通过所述管理网络进行通信而组成管理集群,所述管理端装置与所述计算节点装置通过管理网络通信连接,所述计算节点装置与所述共享存储装置连接,每个管理端装置包括:Nova控制模块,包括Nova原生的虚拟机VM管理进程,用于对虚拟机VM的生命周期进行管理操作;集群管理模块,用于收集所述集群的运行状况信息;以及高可用模块,用于对所有的所述计算节点装置进行高可用管理,所述高可用模块运行高可用管理的方法,该方法包括以下操作:操作A-1,通过所述集群管理模块收集的运行状况信息检查集群状态是否正常,如果异常,则触发集群异常告警并结束,如果正常,则转到操作A-2;操作A-2,检查各个所述计算节点装置通过管理网络上报的状态,如果正常,则此轮检查终止,否则转到下一步操作A-3;操作A-3,根据每个所述计算节点装置通过管理网络上报的异常状态,逐个判断是否需要进行处理,如果无需处理,则该计算节点装置异常处理结束,转回上一步操作A-2;否则转到下一步操作A-4;操作A-4,对于需要处理的异常状态的所述计算节点装置,检查与之连接的共享存储装置的状态,当共享存储装置异常时,通过所述 Nova控制模块控制该计算节点装置上运行的所述云计算虚拟机VM程序不运行,并结束,否则,转到下一步操作A-5;操作A-5,向所连接的共享存储装置状态正常的所述计算节点装置下发隔离请求;操作A-6,向所述Nova控制模块下发命令,触发该计算节点装置上运行的所述云计算虚拟机VM程序运行,所述计算节点装置除安装有云计算虚拟机VM程序之外,还具有:Nova-computer计算机模块,用于直接响应所述管理端装置各管理进程来控制所述虚拟机VM的运行状态,并与Hypervisor API进行通信;Libvirt管理模块,用于在KVM上提供标准的Hypervisor API接口的管理进程;Lock管理模块,与所述Libvirt管理模块配合,用于对共享存储装置的的锁心跳进行更新和监控;以及高可用计算节点模块,至少用于将所述锁心跳上报给所述管理端装置,其中,所述高可用计算节点模块运行包括以下操作的方法:操作C-1,当所述虚拟机VM持续更新并存储锁心跳时,若写入正常则无需处理,否则一旦所述锁心跳写入异常,则转到操作C-2;操作C-2,所述Lock管理模块向管理端装置上报存储异常事件,并等待管理端装置反馈处理结果;操作C-3,若管理端装置在规定时间内返回了处理结果,则转到 操作C-5,否则转到操作C-4;操作C-4,若管理端装置未在规定时间内返回处理结果,则所述Lock管理模块执行隔离操作;操作C-5,所述Lock管理模块按照管理端装置返回的处理结果,判断是否需要隔离。
- 根据权利要求1所述的防脑裂的OpenStack虚拟机高可用系统,其特征在于:其中,当管理端装置向所连接的共享存储装置状态正常的所述计算节点装置下发隔离请求后,所述高可用模块还运行以下操作:操作B-1,持续监听所述计算节点装置上报的隔离事件,一旦收到消息则转到操作B-2;操作B-2,通过所述集群管理模块收集的运行状况信息检查集群状态是否正常,如果异常,则触发集群异常告警并结束,如果正常,则转到操作B-3;操作B-3,检查各个所述计算节点装置通过管理网络上报的网络状态,如果正常,则此轮检查终止,否则转到操作B-4;操作B-4,根据每个所述计算节点装置通过管理网络上报的异常状态,判断是否需要进行处理,如果无需处理,则进行操作B-6;否则转到操作B-5;操作B-5,对于需要处理的异常状态的所述计算节点装置,检查与之连接的共享存储装置的状态,当共享存储装置异常时,无需隔离 并转到操作B-6,并结束,否则,转到操作B-7;操作B-6,针对无需隔离的场景,向对应的所述计算节点装置下发停止隔离请求;操作B-7,针对需要隔离的场景,向对应的所述计算节点装置下发执行隔离请求,所述Lock管理模块的进程重启后恢复的过程包括以下操作:操作D-1,在所述Libvirt管理模块启动时,通过所述Lock管理模块注册并获取所述锁心跳,如注册失败则转到S2;操作D-2,一旦锁心跳注册失败,则关闭或隔离该计算节点装置的云计算虚拟机VM程序;操作D-3,所述Libvirt管理模块记录所有被关闭或隔离云计算虚拟机VM程序的计算节点装置,并记录在隔离日志文件中;操作D-4,定期检查隔离日志文件,发现有更新则转到操作D-5;操作D-5,向管理端装置上报所有计算节点装置的隔离日志文件,若上报失败,则此次处理结束,留待下次上报;否则,上报给管理端装置后,由管理端装置发出指示进行恢复。
- 根据权利要求1所述的防脑裂的OpenStack虚拟机高可用系统,其特征在于:其中,在上报给管理端装置后,管理端装置进行以下的具体操作:操作D-6,管理端装置收到计算节点装置上报的隔离日志文件,判断是否要进行自动处理,若自动处理转向操作D-8,若无需自动处理,转向操作D-7;操作D-7,管理端装置告警待由人工处理;操作D-8,管理端装置自动处理被隔离的云计算虚拟机VM程序,调用Nova接口控制云计算虚拟机VM程序再次恢复运行。
- 根据权利要求1所述的防脑裂的OpenStack虚拟机高可用系统,其特征在于:所述共享存储装置为CephFS或NFS文件管理程序管理运行,所述虚拟机VM管理进程包括Nova-api、Nova-conductor或Nova-scheduler,所述集群管理模块包括Etcd或Consul。
- 根据权利要求1所述的防脑裂的OpenStack虚拟机高可用系统,其特征在于:所述管理网络包括:管理网络平面,用于对接所述管理端装置,用于提供管理服务;存储网络平面,用于对接后端的所述共享存储装置,用于提供存储服务;业务网络平面,用于对接所述计算节点装置,用于提供所述云计算虚拟机VM的访问服务。
- 根据权利要求5所述的防脑裂的OpenStack虚拟机高可用系统,其特征在于:其中,当所述管理网络的管理网络平面、存储网络平面以及业务网络平面均正常时,操作A-2中所述计算节点装置通过管理网络上报的网络状态才判断为正常,否则根据异常的所述计算节点装置的具体中断类型是管理网络平面、存储网络平面以及业务网络平面中的哪一种或几种进行相应的处理。
- 根据权利要求2所述的防脑裂的OpenStack虚拟机高可用系统,其特征在于:其中,其中,所述管理网络包括:管理网络平面,用于对接所述管理端装置,用于提供管理服务;存储网络平面,用于对接后端的所述共享存储装置,用于提供存储服务;业务网络平面,用于对接所述计算节点装置,用于提供虚拟机VM的访问服务,对应的,当所述管理网络的管理网络平面、存储网络平面以及业务网络平面均正常时,操作B-3中所述计算节点装置通过管理网络上报的网络状态才判断为正常,否则根据异常的所述计算节点装置的具体中断类型是管理网络平面、存储网络平面以及业务网络平面中的哪一种或几种进行相应的隔离处理。
- 根据权利要求1所述的防脑裂的OpenStack虚拟机高可用系统,其特征在于:其中,所述云计算虚拟机VM程序具有VM GuestOS操作系统,该操作系统在隔离后进行以下的恢复操作:操作E-1,VM GuestOS中的Qga与计算节点装置的高可用计算节点模块持续保持锁心跳,当所述云计算虚拟机VM程序出现故障时,转到操作E-2;操作E-2,当高可用计算节点模块接收到异常事件的报告时,上报给管理端装置;操作E-3,管理端装置收到异常事件的报告后,直接调用Nova接口控制云计算虚拟机VM程序再次恢复运行。
- 根据权利要求8所述的防脑裂的OpenStack虚拟机高可用系统,其特征在于:其中,所述故障包括所述云计算虚拟机VM程序运行所在的计算节点装置蓝屏或卡死、死机。
- 根据权利要求2所述的防脑裂的OpenStack虚拟机高可用系统,其特征在于:其中,在上报给管理端装置后,管理端装置进行以下的具体操作:操作D-6,管理端装置收到计算节点装置上报的隔离日志文件,判断是否要进行自动处理,若自动处理转向操作D-8,若无需自动处理,转向操作D-7;操作D-7,管理端装置告警待由人工处理;操作D-8,管理端装置自动处理被隔离的云计算虚拟机VM程序,调用Nova接口控制云计算虚拟机VM程序再次恢复运行。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
BR112020004407-5A BR112020004407A2 (pt) | 2018-12-04 | 2018-12-18 | sistema de alta disponibilidade de uma máquina virtual openstack para impedir split-brain. |
PH12020550045A PH12020550045A1 (en) | 2018-12-04 | 2020-02-05 | High-availability System of OpenStack Virtual Machine for Preventing Split-brain |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811474780.8A CN109614201B (zh) | 2018-12-04 | 2018-12-04 | 防脑裂的OpenStack虚拟机高可用系统 |
CN201811474780.8 | 2018-12-04 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020113670A1 true WO2020113670A1 (zh) | 2020-06-11 |
Family
ID=66005497
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/121655 WO2020113670A1 (zh) | 2018-12-04 | 2018-12-18 | 防脑裂的OpenStack虚拟机高可用系统 |
Country Status (4)
Country | Link |
---|---|
CN (1) | CN109614201B (zh) |
BR (1) | BR112020004407A2 (zh) |
PH (1) | PH12020550045A1 (zh) |
WO (1) | WO2020113670A1 (zh) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112214466B (zh) * | 2019-07-12 | 2024-05-14 | 海能达通信股份有限公司 | 分布式集群系统及数据写入方法、电子设备、存储装置 |
CN111212127A (zh) * | 2019-12-29 | 2020-05-29 | 浪潮电子信息产业股份有限公司 | 一种存储集群及业务数据的维护方法、装置和存储介质 |
CN113765709B (zh) * | 2021-08-23 | 2022-09-20 | 中国人寿保险股份有限公司上海数据中心 | 基于Openstack云平台多维监控的虚拟机高可用实现系统及方法 |
CN113965459A (zh) * | 2021-10-08 | 2022-01-21 | 浪潮云信息技术股份公司 | 基于consul进行主机网络监控实现计算节点高可用的方法 |
CN114090184B (zh) * | 2021-11-26 | 2022-11-29 | 中电信数智科技有限公司 | 一种虚拟化集群高可用性的实现方法和设备 |
CN115858222B (zh) * | 2022-12-19 | 2024-01-02 | 安超云软件有限公司 | 一种虚拟机故障处理方法、系统及电子设备 |
CN116382850B (zh) * | 2023-04-10 | 2023-11-07 | 北京志凌海纳科技有限公司 | 一种利用多存储心跳检测的虚拟机高可用管理装置及系统 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103684941A (zh) * | 2013-11-23 | 2014-03-26 | 广东新支点技术服务有限公司 | 基于仲裁服务器的集群裂脑预防方法和装置 |
CN107239383A (zh) * | 2017-06-28 | 2017-10-10 | 郑州云海信息技术有限公司 | 一种OpenStack虚拟机的故障监控方法及装置 |
CN107885576A (zh) * | 2017-10-16 | 2018-04-06 | 北京易讯通信息技术股份有限公司 | 一种基于OpenStack的私有云中虚拟机HA的方法 |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104253860B (zh) * | 2014-09-11 | 2017-08-08 | 武汉噢易云计算股份有限公司 | 一种基于共享存储消息队列的虚拟机高可用实现方法 |
-
2018
- 2018-12-04 CN CN201811474780.8A patent/CN109614201B/zh active Active
- 2018-12-18 BR BR112020004407-5A patent/BR112020004407A2/pt not_active IP Right Cessation
- 2018-12-18 WO PCT/CN2018/121655 patent/WO2020113670A1/zh active Application Filing
-
2020
- 2020-02-05 PH PH12020550045A patent/PH12020550045A1/en unknown
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103684941A (zh) * | 2013-11-23 | 2014-03-26 | 广东新支点技术服务有限公司 | 基于仲裁服务器的集群裂脑预防方法和装置 |
CN107239383A (zh) * | 2017-06-28 | 2017-10-10 | 郑州云海信息技术有限公司 | 一种OpenStack虚拟机的故障监控方法及装置 |
CN107885576A (zh) * | 2017-10-16 | 2018-04-06 | 北京易讯通信息技术股份有限公司 | 一种基于OpenStack的私有云中虚拟机HA的方法 |
Non-Patent Citations (1)
Title |
---|
WU, JIANG: "A Better VM HA Solution: Split-brain Solving & Host Network Fault Awareness", OPEN INFRASTRUCTURE SUMMIT, 14 November 2018 (2018-11-14), pages 1 - 30, XP009521666 * |
Also Published As
Publication number | Publication date |
---|---|
BR112020004407A2 (pt) | 2021-06-22 |
CN109614201B (zh) | 2021-02-09 |
PH12020550045A1 (en) | 2020-10-12 |
CN109614201A (zh) | 2019-04-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020113669A1 (zh) | 防脑裂的OpenStack虚拟机高可用计算节点装置及管理方法 | |
WO2020113668A1 (zh) | 防脑裂的OpenStack虚拟机高可用管理端装置及管理方法 | |
WO2020113670A1 (zh) | 防脑裂的OpenStack虚拟机高可用系统 | |
US20190065275A1 (en) | Systems and methods for providing zero down time and scalability in orchestration cloud services | |
CN106716360B (zh) | 支持多租户应用服务器环境中的补丁修补的系统和方法 | |
US9684545B2 (en) | Distributed and continuous computing in a fabric environment | |
US9652326B1 (en) | Instance migration for rapid recovery from correlated failures | |
US6477663B1 (en) | Method and apparatus for providing process pair protection for complex applications | |
US9846706B1 (en) | Managing mounting of file systems | |
US20130185716A1 (en) | System and method for providing a virtualized replication and high availability environment | |
US20070067366A1 (en) | Scalable partition memory mapping system | |
US10983877B1 (en) | Backup monitoring with automatic verification | |
US9703651B2 (en) | Providing availability of an agent virtual computing instance during a storage failure | |
US20140173329A1 (en) | Cascading failover of blade servers in a data center | |
Glider et al. | The software architecture of a san storage control system | |
US20220291850A1 (en) | Fast restart of large memory systems | |
US11119872B1 (en) | Log management for a multi-node data processing system | |
US7467324B1 (en) | Method and apparatus for continuing to provide processing on disk outages | |
JP3467750B2 (ja) | 分散オブジェクト処理システム | |
CN114691304A (zh) | 实现集群虚拟机高可用的方法和装置、设备和介质 | |
Dell | ||
WO2022108914A1 (en) | Live migrating virtual machines to a target host upon fatal memory errors | |
US20200125434A1 (en) | Preventing corruption by blocking requests | |
Lee et al. | NCU-HA: A lightweight HA system for kernel-based virtual machine | |
US11977431B2 (en) | Memory error prevention by proactive memory poison recovery |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18942243 Country of ref document: EP Kind code of ref document: A1 |
|
REG | Reference to national code |
Ref country code: BR Ref legal event code: B01A Ref document number: 112020004407 Country of ref document: BR |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 112020004407 Country of ref document: BR Kind code of ref document: A2 Effective date: 20200304 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18942243 Country of ref document: EP Kind code of ref document: A1 |