US20230185680A1 - Cloud restart for vm failover and capacity management - Google Patents
Cloud restart for vm failover and capacity management Download PDFInfo
- Publication number
- US20230185680A1 US20230185680A1 US18/159,593 US202318159593A US2023185680A1 US 20230185680 A1 US20230185680 A1 US 20230185680A1 US 202318159593 A US202318159593 A US 202318159593A US 2023185680 A1 US2023185680 A1 US 2023185680A1
- Authority
- US
- United States
- Prior art keywords
- virtual machine
- data center
- cluster
- hosts
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 claims abstract description 28
- 238000010586 diagram Methods 0.000 description 11
- 230000008569 process Effects 0.000 description 11
- 238000004590 computer program Methods 0.000 description 5
- 238000007726 management method Methods 0.000 description 4
- 238000013508 migration Methods 0.000 description 4
- 230000005012 migration Effects 0.000 description 4
- 238000012986 modification Methods 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 238000011084 recovery Methods 0.000 description 4
- 230000010076 replication Effects 0.000 description 3
- 238000007792 addition Methods 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000007366 host health Effects 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 230000003362 replicative effect Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/202—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
- G06F11/2023—Failover techniques
- G06F11/2025—Failover techniques using centralised failover control functionality
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1448—Management of the data involved in backup or backup restore
- G06F11/1451—Management of the data involved in backup or backup restore by selection of backup contents
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1458—Management of the backup or restore process
- G06F11/1469—Backup restoration techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/202—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
- G06F11/2023—Failover techniques
- G06F11/2028—Failover techniques eliminating a faulty processor or activating a spare
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/60—Software deployment
- G06F8/65—Updates
- G06F8/658—Incremental updates; Differential updates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
- G06F2009/45575—Starting, stopping, suspending or resuming virtual machine instances
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
- G06F2009/45595—Network integration; Enabling network access in virtual machine instances
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2201/00—Indexing scheme relating to error detection, to error correction, and to monitoring
- G06F2201/815—Virtual
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/60—Software deployment
- G06F8/61—Installation
- G06F8/63—Image based installation; Cloning; Build to order
Definitions
- hosts within the on-premises network have one or more virtual machines (VMs) instantiated therein.
- VMs virtual machines
- HA high availability
- hosts run in clusters, across which load balancing and migration of VMs in the event of failure of one of the hosts to another of the hosts in the cluster are provided.
- customers have to reserve one or more hosts in the on-premises network for failover capacity. Many of the reserved hosts for failover capacity, however, remain idle.
- There have been solutions to share failover capacity among different clusters in the on-premises network but they still require an investment in hosts that may remain idle for long periods of time.
- One or more embodiments provide a method of providing a cloud restart service to enable the restart of VMs that had been running in a host in a cluster of hosts within an on-premises network, to a host running in a cloud computing center.
- a method of restarting a virtual machine running in a cluster of hosts in a first data center, in a second data center includes the steps of: transmitting images of virtual machines, including a first virtual machine, running in the cluster of hosts at a first point in time to the second data center for replication in the second data center; generating difference data representing a difference in an image of the first virtual machine at a second point in time and the image of the first virtual machine at the first point in time; transmitting the difference data to the second data center; setting the first virtual machine to be inactive in the first data center; and communicating with a control plane in the second data center to set as active, and power on, a virtual machine in the second data center using the replicated image of the first virtual machine after updating the replicated image using the difference data.
- FIG. 1 is a block diagram of a computing system in which one or more embodiments may be utilized.
- FIG. 2 is a block diagram of a virtualization manager of an on-premises network and elements of a cloud computing system that enable restart of VMs in the cloud, according to one or more embodiments.
- FIG. 3 is a flow diagram showing steps performed by a cloud restart service and by an HCX service to enable restart of VMs in the cloud, according to one or more embodiments.
- FIG. 4 is a flow diagram showing steps performed by a local control plane of a cloud computing system to enable migration of VMs from the cloud to the on-premises network, according to one or more embodiments.
- FIG. 5 is a flow diagram showing steps performed by a cloud restart service and by an HCX service to enable restart of lower-priority VMs in the cloud, according to one or more embodiments.
- FIG. 6 is a flow diagram showing steps performed by a cloud restart service and by an HCX service to enable disaster recovery of VMs to the cloud, according to one or more embodiments.
- FIG. 1 is a block diagram of a computing system 100 in which one or more embodiments may be utilized.
- computing system 100 includes an on-premises network 102 and a cloud computing system 150 .
- On-premises network 102 includes a virtualized computing system that includes a plurality of hosts 104 a , 104 b , 104 c , with each host having one or more VMs 120 1 , 120 2 , . . . , 120 n , instantiated therein.
- Hosts 104 a , 104 b , 104 c form a cluster 127 of hosts.
- cluster 127 of hosts is configured to provide high availability (HA) services, such that if one of the hosts in cluster 127 experiences a failure, the VMs in the failed host can be restarted in another host in cluster 127 or as further described below in cloud computing system 150 .
- HA high availability
- Each of the hosts 104 a , 104 b , 104 c includes hypervisor 116 and HA agent 121 , which run on top of hardware platform 106 .
- Hardware platform 106 includes CPU 106 , memory 110 , network interface card (NIC) 112 , and storage 114 .
- NIC network interface card
- On-premises network 102 also includes virtualization manager 130 , which manages the provisioning of virtual compute, network, and storage resources (e.g., VMs 120 1 , 120 2 , . . . , 120 n ) from physical compute, network, and storage resources in on-premise network 102 .
- virtualization manager 130 also includes hybrid cloud exchange service (HCX) 131 , cloud restart service 132 , and HA master 133 , all of which will be further described below.
- HCX hybrid cloud exchange service
- Cloud computing system 150 includes the following control plane components, a virtual infrastructure manager 154 and a VM management server 157 , through which virtual compute, storage and network resources are provisioned for different customers of cloud computing system.
- VM management server 157 is a virtualization management software executed in a physical or virtual server (e.g., VMware vCenter Server®), that cooperates with hypervisors installed in hosts 162 1 to 162 M to provision virtual compute, storage and network resources from hardware resources 160 , which include hosts 162 1 to 162 M , storage hardware 164 , and network hardware 165 .
- Virtual infrastructure manager 154 is a virtual infrastructure management software executed in a physical or virtual server (e.g., VMware vCloud Director®), that partitions the virtual compute, storage and network resources provisioned by VM management server 157 , for the different customers of cloud computing system 150 .
- cloud computing system 150 may support multiple cloud computing environment (one of which is depicted as cloud computing environment 170 ), that are available to different customers in a multi-tenant configuration.
- the virtual compute, storage, and network resources are provisioned in cloud computing environment 170 to form a virtual data center or a software-defined data center.
- the virtual data center includes one or more virtual networks 182 used to communicate amongst VMs 172 and managed by at least one network gateway component (e.g., gateway 184 ), as well as one or more isolated internal networks 186 not connected to gateway 184 .
- Gateway 184 e.g., executing as a virtual appliance
- Gateway 184 is configured to provide VMs 172 and other components in cloud computing environment 170 with connectivity to an external network 140 (e.g., Internet).
- Gateway 184 manages external public IP addresses for the virtual data center and one or more private internal networks interconnecting VMs 172 .
- Gateway 184 is configured to route traffic incoming to and outgoing from the virtual data center and provide networking services, such as firewalls, network address translation (NAT), dynamic host configuration protocol (DHCP), and load balancing.
- Gateway 184 may be configured to provide virtual private network (VPN) connectivity over a network 140 with another VPN endpoint, such as a gateway 124 within on-premises network 102 .
- VPN virtual private network
- the VMs 172 may be instantiated on one or more hosts 162 1 , . . . , 162 n .
- the virtual data center further includes a local control plane (LCP) 174 , implemented as a physical or virtual server, configured to communicate with virtual infrastructure manager 154 and enable control-plane communications between an administrator computer and virtual infrastructure manager 154 .
- LCP local control plane
- FIG. 2 is a block diagram showing elements of computing system 100 used to restart a VM running in on-premises network 102 in cloud computing environment 170 , according to one or more embodiments.
- Each host in cluster 127 includes an HA agent 121 , which monitors the health of the host on which it is installed.
- HA agent 121 provides periodic health reports to HA master 133 of virtualization manager 130 , which acts accordingly by informing cloud restart service 132 to start a VM failover-to-cloud process, to be explained in more detail below.
- HCX service 131 provides a control plane connectivity to cloud computing environment 170 , so that VMs of cluster 127 can include both VMs running in on-premises network 102 and VMs running in cloud computing environment 170 . Also shown in FIG.
- LCP 174 is the control plane unit of cloud computing environment 170 that communicates with HCX 130 of on-premises network 102 , and which instructs virtual infrastructure manager 154 to provision one or more VMs in cloud computing environment 170 .
- Cloud storage 175 which may be an object-based storage, is used to store images of VMs currently running in on-premises network 102 , so that when one or those VMs fails, a restart of those VMs in cloud computing environment 170 may be performed as explained in detail below.
- FIG. 3 is a flow diagram showing steps performed in a method 300 of restarting a VM in cloud computing environment 170 , according to one or more embodiments.
- cloud restart service 132 instructs HA agent 121 of each host 104 a , 104 b , 104 c within cluster 127 to monitor VMs 120 1 , 120 2 , . . . , 120 n to be cloud protected.
- cloud restart service 132 instructs HCX service 131 to initiate VM replication for those VMs to be cloud protected.
- a determination is made as to whether or not a host 104 a , 104 b , 104 c has failed.
- HA agent 121 running within each of the hosts 104 a , 104 b , 104 c of cluster 127 . If none of the hosts is experiencing a failure, then the process loops back to step 306 , to continue monitoring for a failure of a host.
- step 307 a determination is made by virtualization manage 130 as to whether there are sufficient available resources within cluster 127 to spin up the VMs that are running in the failed host in another host within cluster 127 . If there are sufficient available resources (‘Yes’ decision in step 307 ), then in step 310 the VMs of the failed host are migrated to another host in cluster 127 that can accommodate those VMs, and the process ends.
- step 308 cloud restart service 132 instructs HCX service 131 to synchronize the last bits of data of the failed VMs, i.e., cloud restart service 132 generates difference data representing any updates to the failed VM images since they were replicated in step 304 and transmits the difference data to HCX service 131 for synchronization.
- step 311 HCX service 131 communicates with LCP 174 of cloud computing environment 170 to synchronize the last bits of data of the failed VMs by updating the image of the failed VMs stored in cloud storage 175 with the different data transmitted by cloud restart service 132 .
- step 312 cloud restart service 132 instructs virtualization manager 130 to unprotect the VMs of the host that failed, which also includes setting those VMs as ‘inactive’.
- step 314 cloud restart service 132 instructs LCP 174 of cloud computing environment 170 to: a) set replicated VMs of failed VMs as active within cloud computing environment 170 , b) protect the active VMs, and c) power on the active VMs.
- a customer pays cloud computing environment 170 for services required for running VMs in cloud computing environment 170 only when there is a failure in a host of an HA cluster and only when other hosts in the HA cluster cannot accommodate the VMs of the failed host. While cloud storage resources may be needed for replicating the VM images prior to failure, such costs are generally much lower than reserving extra hosts in on-premises network 102 .
- FIG. 4 is a flow diagram illustrating a method 400 performed by LCP 174 of cloud computing environment 170 to enable migration of a VM that had been restarted in cloud computing environment 170 , back to a host of cluster 127 that has recovered from the failure, according to one or more embodiments.
- LCP 174 causes difference data representing updates to the images of the VMs running in cloud computing environment 170 , to be periodically sent to on-premises HCX 131 , for updating of the images of these VMs stored in on-premises network 102 .
- step 406 a determination is made as to whether or not a previously failed host in cluster 127 has recovered from failure.
- step 406 If the host has not recovered from the failure (‘No’ decision in step 406 ), the process loops back to step 404 to continue to periodically send the difference data to on-premises HCX service 131 , for updating of the images of these VMs stored in on-premises network 102 . If the host has recovered from the failure (‘Yes’ decision in step 406 ), in step 408 the last bits of data obtained from the active VMs running in cloud computing environment 170 , i.e., any updates to such VM images since step 404 , are sent to on-premises HCX service 131 for synchronization.
- LCP 174 communicates with virtual infrastructure manager 154 and VM management server 157 of cloud computing system 150 to unprotect the active VMs running in cloud computing environment 170 .
- LCP 174 instructs HCX service 131 to: a) set replicated VMs in on-premises network 102 as ‘active’, b) protect the active VMs, and c) power on the active VMs.
- FIG. 5 is a flow diagram illustrating a method 500 of migrating VMs between an on-premises network 102 and a cloud computing environment 170 based on ‘priority’, according to one or more embodiments. More specifically, if there are a plurality of VMs running in an cluster 127 (see FIG. 1 ), those VMs may be assigned different priority levels, such that high priority VMs should be provided with the necessary amount of resources to run at an acceptable speed, whereas low priority VMs should have their resources lessened to accommodate the higher priority VMs.
- each of the VMs 120 1 , 120 2 , . . . , 120 n running within hosts 104 a , 104 b , 104 c of cluster 127 are assigned priority rankings, such as a ranking from a highest priority of one (1) to a lower priority of N, with N being an integer greater than one. For example, a scale from 1 (highest priority) to 5 (lowest priority) may be utilized to rank the VMs.
- cloud restart service 132 instructs HCX service 131 to initiate VM replication for VMs having the lowest priority ranking (e.g., the VMs ranked with a value of 5), and in step 505 HCX communicates with LCP 174 to replicate VMs.
- step 506 a determination is made as to whether any of the high priority VMs (the VMs ranked ‘1’) are running sub-optimally.
- Such information may be provided by way of HA agent 121 in cluster 127 to HA master 133 , for example, and may involve determining the amount of processor resources consumed by the high-priority VMs over a most recent time period (e.g., the last 10 milliseconds). If the determination in step 506 is ‘No’, then the process loops back to step 506 to continue to monitor the high-ranked VMs to make sure they are running at acceptable conditions.
- This process may continue until all the high priority VMs are running optimally. For example, it may be the case that all VMs ranked 2, 3, 4 and 5 may have to be transferred to cloud computing environment 170 to enable all VMs ranked 1 within cluster 127 to run at their optimal levels.
- FIG. 6 illustrates a method 600 of transferring VMs from on-premises to the cloud in a disaster recovery (DR) operation, according to one or more embodiments.
- DR disaster recovery
- RTO recovery time objective
- RPO recovery point objective
- This may involve moving the entire on-premises network to a different location, due to a weather catastrophe at a location where the on-premises network is located.
- a person such as an administrator, must manually perform an operation, such as by clicking a ‘restart all VMs in the cloud’ button, due to a disaster happening at the on-premises network.
- VM transfer from on-premises to the cloud takes place automatically.
- a detection of a partial failure DR event triggers VMs to be restarted in cloud computing environment 170 in the manner similar to that described above with respect to FIG. 3 . This results in a much quicker RTO, in the order of minutes, than may be obtained from conventional DR processes that typically take multiple hours or more.
- step 602 RPO value is set for DR, such as by an administrator of on-premises network 102 .
- cloud restart service 132 instructs HCX service 131 to replicate VMs that are backed up for DR. This may involve just high priority VMs, or it may involve all VMs running in on-premises network 102 .
- step 605 HCX service 131 communicates with LCP 174 of DR service to replicate VMs.
- step 606 a determination is made as to whether or not the RPO time period has elapsed. If ‘Yes’, then the process loops back to step 604 . If ‘No’, then the process waits until the RPO time period has elapsed.
- steps 610 , 611 , 612 , and 614 are carried out automatically by cloud restart service 132 .
- cloud restart service 132 instructs HCX service 131 to synchronize the last bits of data of VMs backed up for DR, in a similar manner as described above.
- HCX service 131 communicates with LCP 174 of cloud computing environment 170 to synchronize the last bits of data of VMs backed up for DR.
- cloud restart service 132 instructs visualization manager 130 to unprotect the VMs backed up for DR, and set those unprotected VMs as ‘inactive’.
- cloud restart service 132 instructs LCP 174 of cloud computing environment 170 to: a) set replicated VMs of VMs backed up for DR as ‘active’, b) protect the active VMs, and c) power on the active VMs.
- the various embodiments described herein may employ various computer-implemented operations involving data stored in computer systems. For example, these operations may require physical manipulation of physical quantities—usually, though not necessarily, these quantities may take the form of electrical or magnetic signals, where they or representations of them are capable of being stored, transferred, combined, compared, or otherwise manipulated. Further, such manipulations are often referred to in terms, such as producing, identifying, determining, or comparing. Any operations described herein that form part of one or more embodiments of the invention may be useful machine operations.
- one or more embodiments of the invention also relate to a device or an apparatus for performing these operations.
- the apparatus may be specially constructed for specific required purposes, or it may be a general purpose computer selectively activated or configured by a computer program stored in the computer.
- various general purpose machines may be used with computer programs written in accordance with the teachings herein, or it may be more convenient to construct a more specialized apparatus to perform the required operations.
- One or more embodiments of the present invention may be implemented as one or more computer programs or as one or more computer program modules embodied in one or more computer readable media.
- the term computer readable medium refers to any data storage device that can store data which can thereafter be input to a computer system-computer readable media may be based on any existing or subsequently developed technology for embodying computer programs in a manner that enables them to be read by a computer.
- Examples of a computer readable medium include a hard drive, network attached storage (NAS), read-only memory, random-access memory (e.g., a flash memory device), a CD (Compact Discs)—CD-ROM, a CD-R, or a CD-RW, a DVD (Digital Versatile Disc), a magnetic tape, and other optical and non-optical data storage devices.
- the computer readable medium can also be distributed over a network coupled computer system so that the computer readable code is stored and executed in a distributed fashion.
- Virtualization systems in accordance with the various embodiments may be implemented as hosted embodiments, non-hosted embodiments or as embodiments that tend to blur distinctions between the two, are all envisioned.
- various virtualization operations may be wholly or partially implemented in hardware.
- a hardware implementation may employ a look-up table for modification of storage access requests to secure non-disk data.
- the virtualization software can therefore include components of a host, console, or guest operating system that performs virtualization functions.
- Plural instances may be provided for components, operations or structures described herein as a single instance. Boundaries between various components, operations and data stores are somewhat arbitrary, and particular operations are illustrated in the context of specific illustrative configurations. Other allocations of functionality are envisioned and may fall within the scope of the invention(s).
- structures and functionality presented as separate components in exemplary configurations may be implemented as a combined structure or component.
- structures and functionality presented as a single component may be implemented as separate components.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- Computer Security & Cryptography (AREA)
- Hardware Redundancy (AREA)
Abstract
A method of restarting a virtual machine (VM) running in a cluster of hosts in a first data center, in a second data center, includes: transmitting an image of the VM to the second data center; in response to determining that a host in the cluster in which the VM was running has failed, determining whether or not there are sufficient resources in the cluster to run the VM in another host in the cluster; and upon determining that there are not sufficient resources in the cluster to run the VM in another host in the cluster, setting the VM to be inactive in the first data center, and communicating with a control plane in the second data center to set as active, and power on, a VM in the second data center using the image of the VM that has been transmitted to the second data center.
Description
- This application is a continuation of U.S. patent application Ser. No. 16/744,876, filed Jan. 16, 2020, which is incorporated by reference herein.
- In a virtualization system running within an on-premises network, hosts within the on-premises network have one or more virtual machines (VMs) instantiated therein. For high availability (HA) systems that require maintaining of services with a high degree of availability, hosts run in clusters, across which load balancing and migration of VMs in the event of failure of one of the hosts to another of the hosts in the cluster are provided. In HA systems of today, customers have to reserve one or more hosts in the on-premises network for failover capacity. Many of the reserved hosts for failover capacity, however, remain idle. There have been solutions to share failover capacity among different clusters in the on-premises network but they still require an investment in hosts that may remain idle for long periods of time.
- One or more embodiments provide a method of providing a cloud restart service to enable the restart of VMs that had been running in a host in a cluster of hosts within an on-premises network, to a host running in a cloud computing center.
- In one embodiment, a method of restarting a virtual machine running in a cluster of hosts in a first data center, in a second data center, includes the steps of: transmitting images of virtual machines, including a first virtual machine, running in the cluster of hosts at a first point in time to the second data center for replication in the second data center; generating difference data representing a difference in an image of the first virtual machine at a second point in time and the image of the first virtual machine at the first point in time; transmitting the difference data to the second data center; setting the first virtual machine to be inactive in the first data center; and communicating with a control plane in the second data center to set as active, and power on, a virtual machine in the second data center using the replicated image of the first virtual machine after updating the replicated image using the difference data.
- Further embodiments include a non-transitory computer-readable storage medium comprising instructions that cause a computer system to carry out the above method above, as well as a computer system configured to carry out the above method.
-
FIG. 1 is a block diagram of a computing system in which one or more embodiments may be utilized. -
FIG. 2 is a block diagram of a virtualization manager of an on-premises network and elements of a cloud computing system that enable restart of VMs in the cloud, according to one or more embodiments. -
FIG. 3 is a flow diagram showing steps performed by a cloud restart service and by an HCX service to enable restart of VMs in the cloud, according to one or more embodiments. -
FIG. 4 is a flow diagram showing steps performed by a local control plane of a cloud computing system to enable migration of VMs from the cloud to the on-premises network, according to one or more embodiments. -
FIG. 5 is a flow diagram showing steps performed by a cloud restart service and by an HCX service to enable restart of lower-priority VMs in the cloud, according to one or more embodiments. -
FIG. 6 is a flow diagram showing steps performed by a cloud restart service and by an HCX service to enable disaster recovery of VMs to the cloud, according to one or more embodiments. -
FIG. 1 is a block diagram of acomputing system 100 in which one or more embodiments may be utilized. As shown,computing system 100 includes an on-premises network 102 and acloud computing system 150. On-premises network 102 includes a virtualized computing system that includes a plurality ofhosts more VMs Hosts cloud computing system 150. - Each of the
hosts hypervisor 116 andHA agent 121, which run on top ofhardware platform 106.Hardware platform 106 includesCPU 106,memory 110, network interface card (NIC) 112, andstorage 114. - On-
premises network 102 also includesvirtualization manager 130, which manages the provisioning of virtual compute, network, and storage resources (e.g.,VMs premise network 102. In the embodiments illustrated herein,virtualization manager 130 also includes hybrid cloud exchange service (HCX) 131,cloud restart service 132, and HAmaster 133, all of which will be further described below. -
Cloud computing system 150 includes the following control plane components, avirtual infrastructure manager 154 and aVM management server 157, through which virtual compute, storage and network resources are provisioned for different customers of cloud computing system.VM management server 157 is a virtualization management software executed in a physical or virtual server (e.g., VMware vCenter Server®), that cooperates with hypervisors installed in hosts 162 1 to 162 M to provision virtual compute, storage and network resources fromhardware resources 160, which include hosts 162 1 to 162 M,storage hardware 164, andnetwork hardware 165.Virtual infrastructure manager 154 is a virtual infrastructure management software executed in a physical or virtual server (e.g., VMware vCloud Director®), that partitions the virtual compute, storage and network resources provisioned byVM management server 157, for the different customers ofcloud computing system 150. As shown inFIG. 1 ,cloud computing system 150 may support multiple cloud computing environment (one of which is depicted as cloud computing environment 170), that are available to different customers in a multi-tenant configuration. - The virtual compute, storage, and network resources are provisioned in
cloud computing environment 170 to form a virtual data center or a software-defined data center. The virtual data center includes one or morevirtual networks 182 used to communicate amongstVMs 172 and managed by at least one network gateway component (e.g., gateway 184), as well as one or more isolatedinternal networks 186 not connected togateway 184. Gateway 184 (e.g., executing as a virtual appliance) is configured to provideVMs 172 and other components incloud computing environment 170 with connectivity to an external network 140 (e.g., Internet). Gateway 184 manages external public IP addresses for the virtual data center and one or more private internalnetworks interconnecting VMs 172.Gateway 184 is configured to route traffic incoming to and outgoing from the virtual data center and provide networking services, such as firewalls, network address translation (NAT), dynamic host configuration protocol (DHCP), and load balancing. Gateway 184 may be configured to provide virtual private network (VPN) connectivity over anetwork 140 with another VPN endpoint, such as agateway 124 within on-premises network 102. As shown inFIG. 1 , theVMs 172 may be instantiated on one or more hosts 162 1, . . . , 162 n. - The virtual data center further includes a local control plane (LCP) 174, implemented as a physical or virtual server, configured to communicate with
virtual infrastructure manager 154 and enable control-plane communications between an administrator computer andvirtual infrastructure manager 154. -
FIG. 2 is a block diagram showing elements ofcomputing system 100 used to restart a VM running in on-premises network 102 incloud computing environment 170, according to one or more embodiments. Each host in cluster 127 includes anHA agent 121, which monitors the health of the host on which it is installed. HAagent 121 provides periodic health reports toHA master 133 ofvirtualization manager 130, which acts accordingly by informingcloud restart service 132 to start a VM failover-to-cloud process, to be explained in more detail below. HCXservice 131 provides a control plane connectivity tocloud computing environment 170, so that VMs of cluster 127 can include both VMs running in on-premises network 102 and VMs running incloud computing environment 170. Also shown inFIG. 2 isLCP 174, which is the control plane unit ofcloud computing environment 170 that communicates withHCX 130 of on-premises network 102, and which instructsvirtual infrastructure manager 154 to provision one or more VMs incloud computing environment 170.Cloud storage 175, which may be an object-based storage, is used to store images of VMs currently running in on-premises network 102, so that when one or those VMs fails, a restart of those VMs incloud computing environment 170 may be performed as explained in detail below. -
FIG. 3 is a flow diagram showing steps performed in amethod 300 of restarting a VM incloud computing environment 170, according to one or more embodiments. Instep 302 and with reference also toFIG. 1 ,cloud restart service 132 instructsHA agent 121 of eachhost VMs step 304,cloud restart service 132 instructs HCXservice 131 to initiate VM replication for those VMs to be cloud protected. Instep 306, a determination is made as to whether or not ahost HA agent 121 running within each of thehosts step 306, to continue monitoring for a failure of a host. - If a host has experienced a failure, as reported by a
HA agent 121 running in that host (e.g., or by the HA agent not being able to send a heartbeat signal toHA master 133 at a prescribed time, thereby indicating a problem with a host on which that HA agent is installed), in step 307 a determination is made by virtualization manage 130 as to whether there are sufficient available resources within cluster 127 to spin up the VMs that are running in the failed host in another host within cluster 127. If there are sufficient available resources (‘Yes’ decision in step 307), then instep 310 the VMs of the failed host are migrated to another host in cluster 127 that can accommodate those VMs, and the process ends. If there are not sufficient available resources (‘No’ decision in step 307), then instep 308cloud restart service 132 instructsHCX service 131 to synchronize the last bits of data of the failed VMs, i.e.,cloud restart service 132 generates difference data representing any updates to the failed VM images since they were replicated instep 304 and transmits the difference data toHCX service 131 for synchronization. Instep 311, HCXservice 131 communicates with LCP 174 ofcloud computing environment 170 to synchronize the last bits of data of the failed VMs by updating the image of the failed VMs stored incloud storage 175 with the different data transmitted bycloud restart service 132. Instep 312,cloud restart service 132 instructsvirtualization manager 130 to unprotect the VMs of the host that failed, which also includes setting those VMs as ‘inactive’. Instep 314,cloud restart service 132 instructs LCP 174 ofcloud computing environment 170 to: a) set replicated VMs of failed VMs as active withincloud computing environment 170, b) protect the active VMs, and c) power on the active VMs. - By way of the steps shown in
FIG. 3 , a customer payscloud computing environment 170 for services required for running VMs incloud computing environment 170 only when there is a failure in a host of an HA cluster and only when other hosts in the HA cluster cannot accommodate the VMs of the failed host. While cloud storage resources may be needed for replicating the VM images prior to failure, such costs are generally much lower than reserving extra hosts in on-premises network 102. -
FIG. 4 is a flow diagram illustrating amethod 400 performed byLCP 174 ofcloud computing environment 170 to enable migration of a VM that had been restarted incloud computing environment 170, back to a host of cluster 127 that has recovered from the failure, according to one or more embodiments. Instep 404,LCP 174 causes difference data representing updates to the images of the VMs running incloud computing environment 170, to be periodically sent to on-premises HCX 131, for updating of the images of these VMs stored in on-premises network 102. In step 406, a determination is made as to whether or not a previously failed host in cluster 127 has recovered from failure. If the host has not recovered from the failure (‘No’ decision in step 406), the process loops back to step 404 to continue to periodically send the difference data to on-premises HCX service 131, for updating of the images of these VMs stored in on-premises network 102. If the host has recovered from the failure (‘Yes’ decision in step 406), instep 408 the last bits of data obtained from the active VMs running incloud computing environment 170, i.e., any updates to such VM images sincestep 404, are sent to on-premises HCX service 131 for synchronization. Instep 410,LCP 174 communicates withvirtual infrastructure manager 154 andVM management server 157 ofcloud computing system 150 to unprotect the active VMs running incloud computing environment 170. Instep 412,LCP 174 instructsHCX service 131 to: a) set replicated VMs in on-premises network 102 as ‘active’, b) protect the active VMs, and c) power on the active VMs. -
FIG. 5 is a flow diagram illustrating amethod 500 of migrating VMs between an on-premises network 102 and acloud computing environment 170 based on ‘priority’, according to one or more embodiments. More specifically, if there are a plurality of VMs running in an cluster 127 (seeFIG. 1 ), those VMs may be assigned different priority levels, such that high priority VMs should be provided with the necessary amount of resources to run at an acceptable speed, whereas low priority VMs should have their resources lessened to accommodate the higher priority VMs. - In
step 502, each of theVMs hosts step 504,cloud restart service 132 instructsHCX service 131 to initiate VM replication for VMs having the lowest priority ranking (e.g., the VMs ranked with a value of 5), and instep 505 HCX communicates withLCP 174 to replicate VMs. Instep 506, a determination is made as to whether any of the high priority VMs (the VMs ranked ‘1’) are running sub-optimally. Such information may be provided by way ofHA agent 121 in cluster 127 toHA master 133, for example, and may involve determining the amount of processor resources consumed by the high-priority VMs over a most recent time period (e.g., the last 10 milliseconds). If the determination instep 506 is ‘No’, then the process loops back to step 506 to continue to monitor the high-ranked VMs to make sure they are running at acceptable conditions. If the determination instep 506 if ‘Yes’, then instep 508cloud restart service 132 instructsHCX service 131 to synchronize the last bits of data of VMs having priority=N, with those being the lowest priority VMs, and instep 509HCX service 131 communicates withLCP 174 to synchronize the last bits of data of the lowest priority VMs. Instep 510,cloud restart service 132 instructsvirtualization manager 130 to unprotect VMs having priority=N, and set those VMs as ‘inactive’. Instep 512,cloud restart service 132 instructsLCP 174 ofcloud computing service 150 to: a) set replicated VMs of VMs having priority=N as ‘active’, b) protect the active VMs, and c) power on the active VMs. - In step 514, a determination is made as to whether any of the high priority VMs (e.g., VMs having priority=1) are running sub-optimally. If ‘Yes’, then N is decreased by one (1), such as from N=5 to N=4, and the process loops back to
step 508. If ‘No’, meaning that all of the high priority VMs are running optimally, then the process loops back to step 506, to continue monitoring the performance of the high priority VMs. - So, with reference again to
FIG. 4 , if at least one of the high priority VMs is running sub-optimally even after all of the lowest ranked (=5) VMs have been transferred tocloud computing system 170 and are no longer being run within cluster 127, the next-lowest ranked (=4) VMs are synchronized with thecloud computing service 150 for possible migration of those VMs tocloud computing system 170, to enable the high priority VMs to run at their optimal levels. This process may continue until all the high priority VMs are running optimally. For example, it may be the case that all VMs ranked 2, 3, 4 and 5 may have to be transferred tocloud computing environment 170 to enable all VMs ranked 1 within cluster 127 to run at their optimal levels. -
FIG. 6 illustrates amethod 600 of transferring VMs from on-premises to the cloud in a disaster recovery (DR) operation, according to one or more embodiments. For DR, there is typically set a recovery time objective (RTO) and a recovery point objective (RPO), which have to be met to salvage the on-premises network after a disaster has occurred. This may involve moving the entire on-premises network to a different location, due to a weather catastrophe at a location where the on-premises network is located. Conventionally, to cause DR to occur, a person, such as an administrator, must manually perform an operation, such as by clicking a ‘restart all VMs in the cloud’ button, due to a disaster happening at the on-premises network. The embodiments described with reference toFIG. 6 do not require a manual operation by an administrator and instead VM transfer from on-premises to the cloud takes place automatically. A detection of a partial failure DR event triggers VMs to be restarted incloud computing environment 170 in the manner similar to that described above with respect toFIG. 3 . This results in a much quicker RTO, in the order of minutes, than may be obtained from conventional DR processes that typically take multiple hours or more. - In
step 602, RPO value is set for DR, such as by an administrator of on-premises network 102. Instep 604,cloud restart service 132 instructsHCX service 131 to replicate VMs that are backed up for DR. This may involve just high priority VMs, or it may involve all VMs running in on-premises network 102. Instep 605,HCX service 131 communicates withLCP 174 of DR service to replicate VMs. Instep 606, a determination is made as to whether or not the RPO time period has elapsed. If ‘Yes’, then the process loops back tostep 604. If ‘No’, then the process waits until the RPO time period has elapsed. - At any time, if a partial failure DR event has occurred,
steps cloud restart service 132. Instep 610,cloud restart service 132 instructsHCX service 131 to synchronize the last bits of data of VMs backed up for DR, in a similar manner as described above. Instep 611,HCX service 131 communicates withLCP 174 ofcloud computing environment 170 to synchronize the last bits of data of VMs backed up for DR. Instep 612,cloud restart service 132 instructsvisualization manager 130 to unprotect the VMs backed up for DR, and set those unprotected VMs as ‘inactive’. Instep 614,cloud restart service 132 instructsLCP 174 ofcloud computing environment 170 to: a) set replicated VMs of VMs backed up for DR as ‘active’, b) protect the active VMs, and c) power on the active VMs. - The various embodiments described herein may employ various computer-implemented operations involving data stored in computer systems. For example, these operations may require physical manipulation of physical quantities—usually, though not necessarily, these quantities may take the form of electrical or magnetic signals, where they or representations of them are capable of being stored, transferred, combined, compared, or otherwise manipulated. Further, such manipulations are often referred to in terms, such as producing, identifying, determining, or comparing. Any operations described herein that form part of one or more embodiments of the invention may be useful machine operations. In addition, one or more embodiments of the invention also relate to a device or an apparatus for performing these operations. The apparatus may be specially constructed for specific required purposes, or it may be a general purpose computer selectively activated or configured by a computer program stored in the computer. In particular, various general purpose machines may be used with computer programs written in accordance with the teachings herein, or it may be more convenient to construct a more specialized apparatus to perform the required operations.
- The various embodiments described herein may be practiced with other computer system configurations including hand-held devices, microprocessor systems, microprocessor-based or programmable consumer electronics, minicomputers, mainframe computers, and the like.
- One or more embodiments of the present invention may be implemented as one or more computer programs or as one or more computer program modules embodied in one or more computer readable media. The term computer readable medium refers to any data storage device that can store data which can thereafter be input to a computer system-computer readable media may be based on any existing or subsequently developed technology for embodying computer programs in a manner that enables them to be read by a computer. Examples of a computer readable medium include a hard drive, network attached storage (NAS), read-only memory, random-access memory (e.g., a flash memory device), a CD (Compact Discs)—CD-ROM, a CD-R, or a CD-RW, a DVD (Digital Versatile Disc), a magnetic tape, and other optical and non-optical data storage devices. The computer readable medium can also be distributed over a network coupled computer system so that the computer readable code is stored and executed in a distributed fashion.
- Although one or more embodiments of the present invention have been described in some detail for clarity of understanding, it will be apparent that certain changes and modifications may be made within the scope of the claims. Accordingly, the described embodiments are to be considered as illustrative and not restrictive, and the scope of the claims is not to be limited to details given herein, but may be modified within the scope and equivalents of the claims. In the claims, elements and/or steps do not imply any particular order of operation, unless explicitly stated in the claims.
- Virtualization systems in accordance with the various embodiments may be implemented as hosted embodiments, non-hosted embodiments or as embodiments that tend to blur distinctions between the two, are all envisioned. Furthermore, various virtualization operations may be wholly or partially implemented in hardware. For example, a hardware implementation may employ a look-up table for modification of storage access requests to secure non-disk data.
- Many variations, modifications, additions, and improvements are possible, regardless the degree of virtualization. The virtualization software can therefore include components of a host, console, or guest operating system that performs virtualization functions. Plural instances may be provided for components, operations or structures described herein as a single instance. Boundaries between various components, operations and data stores are somewhat arbitrary, and particular operations are illustrated in the context of specific illustrative configurations. Other allocations of functionality are envisioned and may fall within the scope of the invention(s). In general, structures and functionality presented as separate components in exemplary configurations may be implemented as a combined structure or component. Similarly, structures and functionality presented as a single component may be implemented as separate components. These and other variations, modifications, additions, and improvements may fall within the scope of the appended claim(s).
Claims (20)
1. A method of restarting a virtual machine running in a cluster of hosts in a first data center, in a second data center, the method comprising:
transmitting an image of the virtual machine to the second data center;
in response to determining that one of the hosts in the cluster in which the virtual machine was running has failed, determining whether or not there are sufficient resources in the cluster to run the virtual machine in another one of the hosts in the cluster; and
upon determining that there are not sufficient resources in the cluster to run the virtual machine in another one of the hosts in the cluster, setting the virtual machine to be inactive in the first data center, and communicating with a control plane in the second data center to set as active, and power on, a virtual machine in the second data center using the image of the virtual machine that has been transmitted to the second data center.
2. The method of claim 1 , wherein the cluster of hosts is not assigned a spare host in the first data center.
3. The method of claim 1 , wherein the first data center is an on-premises data center serving one tenant and the second data center is a cloud data center serving a plurality of tenants.
4. The method of claim 1 , further comprising:
after setting the virtual machine to be inactive in the first data center, upon detecting that sufficient resources have been freed up in one of the hosts in the cluster of hosts to run the virtual machine therein, notifying the control plane in the second data center that the sufficient resources have been freed up.
5. The method of claim 4 , further comprising:
updating the image of the virtual machine with data received from the second data center in response to said notifying; and
setting as active, and powering on, the virtual machine using the updated image of the virtual machine.
6. The method of claim 1 , further comprising:
in response to determining that a performance of another virtual machine having higher priority than the virtual machine is below a minimum performance threshold, setting the virtual machine to be inactive in the first data center, and communicating with the control plane in the second data center to set as active, and power on, the virtual machine in the second data center using the image of the virtual machine that has been transmitted to the second data center.
7. The method of claim 1 , wherein a portion of the image of the virtual machine is transmitted to the second data center prior to determining that the host in which the virtual machine was running has failed.
8. A computer system comprising:
a memory configured to store executable code for restarting a virtual machine running in a cluster of hosts in a first data center, to a second data center, and a processor configured to execute the code to:
transmit an image of the virtual machine to the second data center;
in response to determining that one of the hosts in the cluster in which the virtual machine was running has failed, determine whether or not there are sufficient resources in the cluster to run the virtual machine in another one of the hosts in the cluster; and
upon determining that there are not sufficient resources in the cluster to run the virtual machine in another one of the hosts in the cluster, set the virtual machine to be inactive in the first data center, and communicate with a control plane in the second data center to set as active, and power on, a virtual machine in the second data center using the image of the virtual machine that has been transmitted to the second data center.
9. The computer system of claim 8 , wherein the cluster of hosts is not assigned a spare host in the first data center.
10. The computer system of claim 8 , wherein the first data center is an on-premises data center serving one tenant and the second data center is a cloud data center serving a plurality of tenants.
11. The computer system of claim 8 , wherein the processor is configured to execute the code to:
after setting the virtual machine to be inactive in the first data center, upon detecting that sufficient resources have been freed up in one of the hosts in the cluster of hosts to run the virtual machine therein, notify the control plane in the second data center that the sufficient resources have been freed up.
12. The computer system of claim 11 , wherein the processor is configured to execute the code to:
update the image of the virtual machine with data received from the second data center in response to said notifying; and
set as active, and power on, the virtual machine using the updated image of the virtual machine.
13. The computer system of claim 8 , wherein the processor is configured to execute the code to:
in response to determining that a performance of another virtual machine having higher priority than the virtual machine is below a minimum performance threshold, set the virtual machine to be inactive in the first data center, and communicate with the control plane in the second data center to set as active, and power on, the virtual machine in the second data center using the image of the virtual machine that has been transmitted to the second data center.
14. The computer system of claim 8 , wherein a portion of the image of the virtual machine is transmitted to the second data center prior to determining that the host in which the virtual machine was running has failed.
15. A non-transitory computer-readable medium storing code for causing, when executed by a processor, restarting of a virtual machine running in a cluster of hosts in a first data center, to a second data center, the processor when executing the code performing the steps of:
transmitting an image of the virtual machine to the second data center;
in response to determining that one of the hosts in the cluster in which the virtual machine was running has failed, determining whether or not there are sufficient resources in the cluster to run the virtual machine in another one of the hosts in the cluster; and
upon determining that there are not sufficient resources in the cluster to run the virtual machine in another one of the hosts in the cluster, setting the virtual machine to be inactive in the first data center, and communicating with a control plane in the second data center to set as active, and power on, a virtual machine in the second data center using the image of the virtual machine that has been transmitted to the second data center.
16. The non-transitory computer readable medium of claim 15 , wherein the cluster of hosts is not assigned a spare host in the first data center.
17. The non-transitory computer readable medium of claim 15 , wherein the steps further comprise:
after setting the virtual machine to be inactive in the first data center, upon detecting that sufficient resources have been freed up in one of the hosts in the cluster of hosts to run the virtual machine therein, notifying the control plane in the second data center that the sufficient resources have been freed up.
18. The non-transitory computer readable medium of claim 17 , wherein the steps further comprise:
updating the image of the virtual machine with data received from the second data center in response to said notifying; and
setting as active, and powering on, the virtual machine using the updated image of the virtual machine.
19. The non-transitory computer readable medium of claim 15 , wherein the steps further comprise:
in response to determining that a performance of another virtual machine having higher priority than the virtual machine is below a minimum performance threshold, setting the virtual machine to be inactive in the first data center, and communicating with the control plane in the second data center to set as active, and power on, the virtual machine in the second data center using the image of the virtual machine that has been transmitted to the second data center.
20. The non-transitory computer readable medium of claim 15 , wherein a portion of the image of the virtual machine is transmitted to the second data center prior to determining that the host in which the virtual machine was running has failed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US18/159,593 US20230185680A1 (en) | 2020-01-16 | 2023-01-25 | Cloud restart for vm failover and capacity management |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/744,876 US11593234B2 (en) | 2020-01-16 | 2020-01-16 | Cloud restart for VM failover and capacity management |
US18/159,593 US20230185680A1 (en) | 2020-01-16 | 2023-01-25 | Cloud restart for vm failover and capacity management |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/744,876 Continuation US11593234B2 (en) | 2020-01-16 | 2020-01-16 | Cloud restart for VM failover and capacity management |
Publications (1)
Publication Number | Publication Date |
---|---|
US20230185680A1 true US20230185680A1 (en) | 2023-06-15 |
Family
ID=76857431
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/744,876 Active 2041-02-23 US11593234B2 (en) | 2020-01-16 | 2020-01-16 | Cloud restart for VM failover and capacity management |
US18/159,593 Pending US20230185680A1 (en) | 2020-01-16 | 2023-01-25 | Cloud restart for vm failover and capacity management |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/744,876 Active 2041-02-23 US11593234B2 (en) | 2020-01-16 | 2020-01-16 | Cloud restart for VM failover and capacity management |
Country Status (1)
Country | Link |
---|---|
US (2) | US11593234B2 (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11347521B2 (en) * | 2020-01-16 | 2022-05-31 | Vmware, Inc. | Cloud restart for non-critical performance virtual machines |
US11593234B2 (en) | 2020-01-16 | 2023-02-28 | Vmware, Inc. | Cloud restart for VM failover and capacity management |
US11314605B2 (en) * | 2020-08-03 | 2022-04-26 | EMC IP Holding Company LLC | Selecting optimal disk types for disaster recovery in the cloud |
US11388231B1 (en) * | 2021-01-28 | 2022-07-12 | Salesforce, Inc. | Multi-substrate fault tolerant continuous delivery of datacenter builds on cloud computing platforms |
CN115118466B (en) * | 2022-06-14 | 2024-04-12 | 深信服科技股份有限公司 | Policy generation method and device, electronic equipment and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110099187A1 (en) * | 2009-10-22 | 2011-04-28 | Vmware, Inc. | Method and System for Locating Update Operations in a Virtual Machine Disk Image |
US9292319B2 (en) * | 2012-03-28 | 2016-03-22 | Google Inc. | Global computing interface |
US20160378622A1 (en) * | 2015-06-29 | 2016-12-29 | Vmware, Inc. | Virtual Machine Recovery On Non-Shared Storage in a Single Virtual Infrastructure Management Instance |
US20180232254A1 (en) * | 2017-02-10 | 2018-08-16 | Xilinx, Inc. | Migrating accelerators between compute systems |
US20200034270A1 (en) * | 2018-07-24 | 2020-01-30 | Vmware, Inc. | Machine learning system for workload failover in a converged infrastructure |
US20210117295A1 (en) * | 2019-10-22 | 2021-04-22 | Cohesity, Inc. | Generating standby cloud versions of a virtual machine |
Family Cites Families (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4339763B2 (en) | 2004-09-07 | 2009-10-07 | 株式会社日立製作所 | Failover method and computer system |
US8135930B1 (en) | 2008-07-14 | 2012-03-13 | Vizioncore, Inc. | Replication systems and methods for a virtual computing environment |
US8429449B2 (en) * | 2010-03-01 | 2013-04-23 | International Business Machines Corporation | Optimized placement of virtual machines in a network environment |
US8413148B2 (en) | 2011-03-10 | 2013-04-02 | Telefonaktiebolaget L M Ericsson (Publ) | Virtualization support in platform management (PLM) information model |
US10454997B2 (en) | 2012-09-07 | 2019-10-22 | Avigilon Corporation | Distributed physical security system |
CN103019861A (en) | 2012-12-11 | 2013-04-03 | 华为技术有限公司 | Distribution method and distribution device of virtual machine |
US9329958B2 (en) | 2013-12-03 | 2016-05-03 | Vmware, Inc. | Efficient incremental checkpointing of virtual devices |
US9294524B2 (en) * | 2013-12-16 | 2016-03-22 | Nicira, Inc. | Mapping virtual machines from a private network to a multi-tenant public datacenter |
US9465704B2 (en) * | 2014-03-26 | 2016-10-11 | Vmware, Inc. | VM availability during management and VM network failures in host computing systems |
US10642635B2 (en) * | 2014-06-07 | 2020-05-05 | Vmware, Inc. | Decentralized demand-based virtual machine migration management |
US9766930B2 (en) * | 2014-06-28 | 2017-09-19 | Vmware, Inc. | Using active/passive asynchronous replicated storage for live migration |
US9760443B2 (en) | 2014-06-28 | 2017-09-12 | Vmware, Inc. | Using a recovery snapshot during live migration |
US9495189B2 (en) * | 2014-12-30 | 2016-11-15 | Vmware, Inc. | Live replication of a virtual machine exported and imported via a portable storage device |
US9910712B2 (en) * | 2015-03-26 | 2018-03-06 | Vmware, Inc. | Replication of a virtualized computing environment to a computing system with offline hosts |
US9535738B2 (en) | 2015-04-03 | 2017-01-03 | International Business Machines Corporation | Migrating virtual machines based on relative priority of virtual machine in the context of a target hypervisor environment |
US9846589B2 (en) * | 2015-06-04 | 2017-12-19 | Cisco Technology, Inc. | Virtual machine placement optimization with generalized organizational scenarios |
DE102016204756B4 (en) | 2015-12-23 | 2024-01-11 | OET GmbH | Electric refrigerant drive |
US11502972B2 (en) * | 2016-08-28 | 2022-11-15 | Vmware, Inc. | Capacity optimization in an automated resource-exchange system |
US10346191B2 (en) * | 2016-12-02 | 2019-07-09 | Wmware, Inc. | System and method for managing size of clusters in a computing environment |
US11537419B2 (en) | 2016-12-30 | 2022-12-27 | Intel Corporation | Virtual machine migration while maintaining live network links |
US10509667B1 (en) | 2017-01-19 | 2019-12-17 | Tintri By Ddn, Inc. | Modeling space consumption of a migrated VM |
US10318333B2 (en) | 2017-06-28 | 2019-06-11 | Sap Se | Optimizing allocation of virtual machines in cloud computing environment |
US10963356B2 (en) * | 2018-04-18 | 2021-03-30 | Nutanix, Inc. | Dynamic allocation of compute resources at a recovery site |
US20190370043A1 (en) | 2018-04-30 | 2019-12-05 | Nutanix, Inc. | Cooperative memory management |
US10977068B2 (en) | 2018-10-15 | 2021-04-13 | Microsoft Technology Licensing, Llc | Minimizing impact of migrating virtual services |
US11561999B2 (en) | 2019-01-31 | 2023-01-24 | Rubrik, Inc. | Database recovery time objective optimization with synthetic snapshots |
US10963287B2 (en) | 2019-03-27 | 2021-03-30 | Amazon Technologies, Inc. | Reducing request latency in a multi-tenant web service host |
US11010084B2 (en) | 2019-05-03 | 2021-05-18 | Dell Products L.P. | Virtual machine migration system |
US11086732B2 (en) | 2019-10-28 | 2021-08-10 | Rubrik, Inc. | Scaling single file snapshot performance across clustered system |
US11593234B2 (en) | 2020-01-16 | 2023-02-28 | Vmware, Inc. | Cloud restart for VM failover and capacity management |
-
2020
- 2020-01-16 US US16/744,876 patent/US11593234B2/en active Active
-
2023
- 2023-01-25 US US18/159,593 patent/US20230185680A1/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110099187A1 (en) * | 2009-10-22 | 2011-04-28 | Vmware, Inc. | Method and System for Locating Update Operations in a Virtual Machine Disk Image |
US9292319B2 (en) * | 2012-03-28 | 2016-03-22 | Google Inc. | Global computing interface |
US20160378622A1 (en) * | 2015-06-29 | 2016-12-29 | Vmware, Inc. | Virtual Machine Recovery On Non-Shared Storage in a Single Virtual Infrastructure Management Instance |
US20180232254A1 (en) * | 2017-02-10 | 2018-08-16 | Xilinx, Inc. | Migrating accelerators between compute systems |
US20200034270A1 (en) * | 2018-07-24 | 2020-01-30 | Vmware, Inc. | Machine learning system for workload failover in a converged infrastructure |
US20210117295A1 (en) * | 2019-10-22 | 2021-04-22 | Cohesity, Inc. | Generating standby cloud versions of a virtual machine |
Also Published As
Publication number | Publication date |
---|---|
US11593234B2 (en) | 2023-02-28 |
US20210224168A1 (en) | 2021-07-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11593234B2 (en) | Cloud restart for VM failover and capacity management | |
US9977688B2 (en) | Live migration of virtual machines across virtual switches in virtual infrastructure | |
US20190384648A1 (en) | Proactive high availability in a virtualized computer system | |
US8874954B1 (en) | Compatibility of high availability clusters supporting application failover with shared storage in a virtualization environment without sacrificing on virtualization features | |
US10243780B2 (en) | Dynamic heartbeating mechanism | |
JP5536878B2 (en) | Changing access to the Fiber Channel fabric | |
US10404795B2 (en) | Virtual machine high availability using shared storage during network isolation | |
US8239863B2 (en) | Method and system for migrating a virtual machine | |
US20170364423A1 (en) | Method and apparatus for failover processing | |
US9632813B2 (en) | High availability for virtual machines in nested hypervisors | |
US10521315B2 (en) | High availability handling network segmentation in a cluster | |
US20110314470A1 (en) | Virtual Machine Infrastructure Capable Of Automatically Resuming Paused Virtual Machines | |
US11556372B2 (en) | Paravirtual storage layer for a container orchestrator in a virtualized computing system | |
US11461123B1 (en) | Dynamic pre-copy and post-copy determination for live migration between cloud regions and edge locations | |
US11604672B2 (en) | Operational health of an integrated application orchestration and virtualized computing system | |
US11573839B1 (en) | Dynamic scheduling for live migration between cloud regions and edge locations | |
US11734038B1 (en) | Multiple simultaneous volume attachments for live migration between cloud regions and edge locations | |
US11347521B2 (en) | Cloud restart for non-critical performance virtual machines | |
US20220197684A1 (en) | Monitoring for workloads managed by a container orchestrator in a virtualized computing system | |
US10855521B2 (en) | Efficient replacement of clients running large scale applications | |
US11307842B2 (en) | Method and system for virtual agent upgrade using upgrade proxy service | |
US20230333758A1 (en) | Maintaining a fault-tolerance threshold of a clusterstore during maintenance activities | |
US11722560B2 (en) | Reconciling host cluster membership during recovery | |
US11841759B2 (en) | Fault tolerance handling for services | |
US20240126582A1 (en) | Disaster recovery of containerized workloads |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
AS | Assignment |
Owner name: VMWARE LLC, CALIFORNIA Free format text: CHANGE OF NAME;ASSIGNOR:VMWARE, INC.;REEL/FRAME:067103/0030 Effective date: 20231121 |