WO2020103627A1 - Service self-healing method and device based on virtual machine disaster recovery, and storage medium - Google Patents

Service self-healing method and device based on virtual machine disaster recovery, and storage medium

Info

Publication number
WO2020103627A1
WO2020103627A1 PCT/CN2019/112364 CN2019112364W WO2020103627A1 WO 2020103627 A1 WO2020103627 A1 WO 2020103627A1 CN 2019112364 W CN2019112364 W CN 2019112364W WO 2020103627 A1 WO2020103627 A1 WO 2020103627A1
Authority
WO
WIPO (PCT)
Prior art keywords
virtual machine
active
standby
healing
disaster recovery
Prior art date
Application number
PCT/CN2019/112364
Other languages
French (fr)
Chinese (zh)
Inventor
周志军
李华
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2020103627A1 publication Critical patent/WO2020103627A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/1658Data re-synchronization of a redundant component, or initial sync of replacement, additional or spare unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/4557Distribution of virtual machine instances; Migration and load balancing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45595Network integration; Enabling network access in virtual machine instances

Definitions

  • the present invention relates to the field of computer technology, and in particular, to a service self-healing method, device, and storage medium based on virtual machine disaster recovery.
  • Cloud computing has the characteristics of dynamically adjusting resources, so many applications, especially cluster applications based on load balancing, such as web applications, support dynamic scaling, that is, dynamically adjust the application servers in the cluster according to the application load to improve Application reliability and availability.
  • applications that wish to support dynamic scaling that is, applications must be stateless.
  • stateful applications such as applications containing state data, applications containing file system data or database data, neither support dynamic scaling under load balancing.
  • business self-healing is generally used to improve the reliability and availability of applications.
  • the business self-healing method is usually implemented by regenerating a virtual machine.
  • This type of virtual machine mounts a cloud hard disk as a data storage disk, and then monitors the virtual machine status of the application. If the virtual machine status is abnormal, such as PING (Packet Internet Groper, Internet package Explorer) failed, URL (Uniform Resource Locator) access failure, etc., then restart the virtual machine with abnormal status, if the business is not restored, delete the virtual machine, re-create a same IP address (Internet Protocol) Address (Internet Protocol address) virtual machine, and mount the same cloud hard disk as a data storage disk to ensure data consistency, so as to achieve business self-healing.
  • this business self-healing method has the problem of deleting virtual machines and rebuilding virtual machines for a long time, which requires at least a few minutes, which will cause the problem of long service interruption.
  • the invention provides a business self-healing method based on virtual machine disaster recovery, which includes: monitoring the state of the active virtual machine during the operation of the active virtual machine; and monitoring the state of the active virtual machine
  • the service self-healing trigger condition is met, controlling the standby virtual machine corresponding to the active virtual machine to process the service of the active virtual machine; wherein the standby virtual machine and the active virtual machine are located in different data centers
  • the standby virtual machine is configured as a disaster recovery virtual machine of the active virtual machine.
  • the invention also provides a service self-healing device based on virtual machine disaster recovery.
  • the service self-healing device based on virtual machine disaster recovery includes a processor and a memory; the processor is used to execute the virtual machine-based storage stored in the memory Disaster recovery business self-healing program to achieve the above-mentioned business self-healing method based on virtual machine disaster recovery.
  • the present invention further provides a storage medium, where the storage medium stores one or more programs, and the one or more programs can be executed by one or more processors to implement the above-mentioned virtual machine disaster recovery-based business self-service The healing method.
  • the present invention also provides a computer program product, the computer program product includes a computer program stored on a non-transitory computer-readable storage medium, the computer program includes program instructions, when the program instructions are executed by a computer, the The computer executes the method in any of the above method embodiments.
  • FIG. 1 is a flowchart of a method for self-healing services based on virtual machine disaster recovery according to a first embodiment of the present invention
  • FIG. 2 is a flowchart of a method for self-healing services based on virtual machine disaster recovery according to a second embodiment of the present invention
  • FIG. 3 is a structural diagram of a service self-healing device based on virtual machine disaster recovery according to a third embodiment of the present invention.
  • FIG. 4 is a structural diagram of a business self-healing system based on virtual machine disaster recovery according to a third embodiment of the present invention.
  • This embodiment provides a business self-healing method based on virtual machine disaster recovery.
  • FIG. 1 it is a flowchart of a service self-healing method based on virtual machine disaster recovery according to the first embodiment of the present invention.
  • Step S110 in the process of running the primary virtual machine, performing status monitoring on the primary virtual machine.
  • Monitoring the status of the active virtual machine includes: sending monitoring messages to the active virtual machine every preset time period; collecting return messages corresponding to the monitoring messages; and determining the active messages according to the return messages corresponding to the monitoring messages The state of the virtual machine.
  • the status of the active virtual machine includes: network status and / or access status.
  • the types of monitoring messages include but are not limited to: PING messages and URL access messages.
  • the PING message may be an ICMP (Internet Control Messages Protocol) message corresponding to the PING command.
  • ICMP Internet Control Messages Protocol
  • the return messages corresponding to the monitoring messages include: Time Out messages corresponding to PING messages, and URL access failure messages corresponding to URL access messages.
  • Step S120 when it is monitored that the state of the active virtual machine meets the service self-healing trigger condition, the standby virtual machine corresponding to the active virtual machine is controlled to process the service of the active virtual machine; the standby virtual machine and the active virtual machine are located in different data Center, and the standby virtual machine is configured as the disaster recovery virtual machine of the primary virtual machine.
  • the disaster recovery virtual machine refers to a virtual machine corresponding to the same service as the active virtual machine and used to replace the active virtual machine to process the service.
  • the service self-healing trigger condition is used to identify whether the current state of the active virtual machine needs to replace the active virtual machine with a standby virtual machine to realize the service self-healing.
  • the service self-healing trigger conditions include: network abnormality of the active virtual machine and / or service abnormality.
  • the state of the active virtual machine meets the service self-healing trigger condition. For example: when the state of the active virtual machine meets the service self-healing trigger condition, turn off the active virtual machine to make the device state of the active virtual machine in the standby state; turn on the standby virtual machine to make the device state of the standby virtual machine in active status. Since the standby virtual machine is the disaster recovery virtual machine of the active virtual machine, after the active virtual machine is turned off and the standby virtual machine is turned on, the standby virtual machine replaces the active virtual machine and begins to process the business of the active virtual machine. During the process, the active virtual machine is transformed into a disaster recovery virtual machine of the standby virtual machine.
  • the standby virtual machine is configured as the disaster recovery virtual machine of the active virtual machine, including: the standby virtual machine is configured to have the same IP address as the active virtual machine; the standby virtual machine is configured to synchronize data with the active virtual machine.
  • the standby virtual machine can replace the active virtual machine.
  • API Application Programming Interface
  • Data synchronization between the active virtual machine and the standby virtual machine is realized.
  • the data of the active virtual machine is stored in a storage device mounted on the active virtual machine
  • the data of the standby virtual machine is stored in a storage device mounted on the standby virtual machine.
  • the storage devices respectively mounted on the active virtual machine and the standby virtual machine may be cloud hard disks.
  • the data of the active virtual machine and the standby virtual machine need to be synchronized, during the operation of the active virtual machine, the data image in the storage device mounted by the active virtual machine is copied to the storage device mounted by the standby virtual machine In order to synchronize the data of the standby virtual machine and the active virtual machine (data mirroring synchronization).
  • the standby virtual machine is configured as the disaster recovery virtual machine of the active virtual machine, and when the state of the active virtual machine meets the service self-healing trigger condition, the standby virtual machine replaces the active virtual machine to complete the service self-healing
  • the process of business self-healing there is no need to restart the active virtual machine, let alone re-create the active virtual machine, directly control the standby virtual machine to replace the active virtual machine, and process the business of the active virtual machine, and the process takes time Shorter, can quickly achieve business self-healing, shortening business interruption time.
  • the active virtual machine and the standby virtual machine may be disaster recovery virtual machines. That is, after controlling the standby virtual machine corresponding to the active virtual machine to process the business of the active virtual machine, the active virtual machine that has stopped the business processing has been converted into the standby virtual machine, and the standby virtual machine that started the business processing has been converted into the active Use a virtual machine.
  • the original active virtual machine (converted standby virtual machine) can be troubleshooting, so that the state of the original active virtual machine can be processed normally, so that the original standby virtual machine (converted active virtual machine) is monitored
  • the original active virtual machine can replace the original standby virtual machine to complete the service self-healing.
  • FIG. 2 is a flowchart of a service self-healing method based on virtual machine disaster recovery according to a second embodiment of the present invention.
  • Step S210 a first virtual machine is set in the first data center, and a second virtual machine corresponding to the same service application as the first virtual machine is set in the second data center.
  • Deploying the first virtual machine and the second virtual machine corresponding to the same business application can distribute the business application in two different data centers, reduce the risk of business interruption, and achieve data center-level disaster tolerance.
  • Step S220 Configure a virtual machine disaster recovery strategy for the first virtual machine and the second virtual machine so that the first virtual machine serves as the active virtual machine and the second virtual machine serves as the standby virtual machine.
  • Configuring the virtual machine disaster recovery strategy includes: configuring the first virtual machine as the primary virtual machine and configuring the second virtual machine as the standby virtual machine, so that the second virtual machine serves as the disaster recovery virtual machine of the first virtual machine.
  • the second virtual machine is configured to have the same IP address as the first virtual machine; the second virtual machine is configured to synchronize data with the first virtual machine; the device state of the first virtual machine is configured as the main state, and The device state of the second virtual machine is the standby state, so that the first virtual machine becomes the active virtual machine, and the second virtual machine becomes the standby virtual machine.
  • the standby virtual machine can be used as the disaster recovery virtual machine of the primary virtual machine, and can replace the primary virtual machine for business processing.
  • the virtual machine can be adjusted as the active virtual machine or the standby virtual machine. In some cases, if the first virtual machine is the active virtual machine, the second virtual machine is the standby virtual machine; if the second virtual machine is the active virtual machine, the first virtual machine is the active virtual machine.
  • Configuring the first virtual machine and the second virtual machine to have the same IP address can be achieved by using VRRP (VirtualRouterRedundancyProtocol, virtual routing redundancy protocol) technology of the network switch.
  • VRRP VirtualRouterRedundancyProtocol, virtual routing redundancy protocol
  • the network switch connecting the first virtual machine (active virtual machine) and the second virtual machine (standby virtual machine) can be controlled by calling the API interface of the cloud resource management system, so that the network switch uses VRRP technology, Configure the second virtual machine with the same IP address as the first virtual machine.
  • Configuring the first virtual machine and the second virtual machine to synchronize data can ensure data consistency between the first virtual machine in the first data center and the second virtual machine in the second data center.
  • the data synchronization may be data mirroring synchronization.
  • the data in the storage device mounted by the first virtual machine (active virtual machine) and the storage device mounted by the second virtual machine (standby virtual machine) can be configured by calling the API interface of the cloud resource management system The data in the image is synchronized.
  • Step S230 Configure a service self-healing strategy for the first virtual machine and the second virtual machine.
  • Configuring the business self-healing strategy includes: configuring the state of the primary virtual machine to start the business self-healing process when the conditions for business self-healing are met.
  • the service self-healing trigger conditions include: network abnormality of the active virtual machine and / or service abnormality.
  • the network of the active virtual machine is abnormal, including but not limited to: the network of the active virtual machine is detected to be disconnected N times consecutively.
  • N is a positive integer greater than 1, and N can be an empirical value or an experimentally obtained value. For example, the number of consecutive PING failed to reach the main virtual machine N times.
  • the business of the active virtual machine is abnormal, including but not limited to: the service access to the active virtual machine fails for M consecutive times.
  • M is a positive integer greater than 1, M can be an empirical value or an experimentally obtained value.
  • URL access to the active virtual machine fails for M consecutive times.
  • Step S240 During the process of running the first virtual machine, perform status monitoring on the first virtual machine.
  • the device state of the first virtual machine is the active state (power-on state)
  • the first virtual machine can run as the active virtual machine and can process services
  • the device state of the second virtual machine is the standby state (off state)
  • the second virtual machine is temporarily unavailable as a standby virtual machine and cannot process business.
  • the monitoring message is sent to the first virtual machine that is the active virtual machine every preset time period, and the return message corresponding to the monitoring message is collected, for example, a URL access failure message indicating that the access fails is collected, and a collection message indicating that the connection is not pinged is collected.
  • Time Out messages, etc. to monitor the state of the first virtual machine accordingly; according to the configured business self-healing strategy, determine whether the state of the first virtual machine meets the business self-healing trigger condition, if it meets, then start the business self-healing process, if not If it matches, the state of the first virtual machine continues to be monitored.
  • monitoring messages include but are not limited to: PING messages and URL access messages.
  • the service self-healing trigger condition includes: the number of consecutive PING failures to the main virtual machine reaches 3 times, and the number of consecutive URL access failures to the main virtual machine reaches 3 times; send to the first virtual machine every 5 seconds PING message, and there is no PING connection for 3 consecutive times, and the URL access message is sent to the first virtual machine every 5 seconds, and the access fails for 3 consecutive times.
  • the status of the first virtual machine meets the service self-healing trigger condition , You can start the business self-healing process.
  • Step S250 when it is monitored that the state of the first virtual machine meets the service self-healing trigger condition in the service self-healing strategy, the second virtual machine is used as the active virtual machine, and the first virtual machine is used as the standby virtual machine.
  • the API interface of the cloud resource management system is called, the first virtual machine in the first data center is turned off, and the second virtual machine in the second data center is turned on.
  • the device state of the first virtual machine is turned off, that is, the first virtual machine enters the standby state; the device state of the second virtual machine is turned on, that is, the second virtual machine enters the active state, replacing the first The virtual machine starts processing the business of the first virtual machine.
  • the data of the first virtual machine and the second virtual machine are mirrored and the IP addresses of the first virtual machine and the second virtual machine are synchronized The same, so after the first virtual machine is turned off and the second virtual machine is turned on, there is no impact on the access to the service, thereby achieving self-healing of the service.
  • This embodiment provides a service self-healing device based on virtual machine disaster recovery.
  • FIG. 3 it is a structural diagram of a service self-healing device based on virtual machine disaster recovery according to a third embodiment of the present invention.
  • the service self-healing device based on virtual machine disaster recovery includes but is not limited to: a processor 310 and a memory 320.
  • the processor 310 is used to execute a service self-healing program based on virtual machine disaster recovery stored in the memory 320, so as to implement the above-mentioned service self-healing method based on virtual machine disaster recovery.
  • the processor 310 is used to execute a service self-healing program based on virtual machine disaster recovery stored in the memory 320 to implement the following steps of the service self-healing method based on virtual machine disaster recovery: running the active virtual machine During the process of monitoring the status of the active virtual machine; when it is detected that the state of the active virtual machine meets the service self-healing trigger condition, the standby virtual machine corresponding to the active virtual machine is controlled to process the active virtual machine Business using virtual machines; wherein the standby virtual machine and the active virtual machine are located in different data centers, and the standby virtual machine is configured as a disaster recovery virtual machine of the active virtual machine.
  • the standby virtual machine is configured as a disaster recovery virtual machine of the active virtual machine, including: the standby virtual machine is configured to be the same as the Internet protocol IP address of the active virtual machine; The standby virtual machine is configured to synchronize data with the active virtual machine.
  • the standby virtual machine is configured to be the same as the Internet protocol IP address of the active virtual machine, including: controlling the standby virtual machine and all the machines by calling an application programming interface API of the cloud resource management system The network switch between the active virtual machines, so that the network switch configures the standby virtual machine to have the same IP address as the active virtual machine.
  • the standby virtual machine is configured to synchronize data with the active virtual machine, including: configuring the data in the storage device on which the standby virtual machine is mounted by calling an API interface of the cloud resource management system To synchronize with the data mirror in the storage device mounted by the active virtual machine.
  • the data image in the storage device mounted by the active virtual machine is copied to the storage device mounted by the standby virtual machine, so that the standby virtual machine The data of the machine and the active virtual machine are synchronized.
  • the state monitoring of the active virtual machine includes: sending monitoring messages to the active virtual machine every preset time period; collecting return messages corresponding to the monitoring messages; according to the Monitor the return message corresponding to the message to determine the status of the active virtual machine.
  • controlling the standby virtual machine corresponding to the active virtual machine to process the service of the active virtual machine includes: shutting down the active virtual machine, so that the device state of the active virtual machine is in Standby state; turn on the standby virtual machine so that the device state of the standby virtual machine is in the active state.
  • the service self-healing trigger condition includes: a network abnormality and / or a service abnormality of the active virtual machine.
  • the above-mentioned business self-healing equipment based on virtual machine disaster recovery can be set on the side of the cloud resource management system, or can be set independently.
  • FIG. 4 it is a structural diagram of a service self-healing system based on virtual machine disaster recovery according to a third embodiment of the present invention.
  • the business self-healing equipment based on virtual machine disaster recovery and the cloud resource management system are set independently.
  • the business self-healing system based on virtual machine disaster recovery includes: a cloud resource management system 410, a business self-healing device 420 based on virtual machine disaster recovery, a first data center 430 and a second data center 440.
  • the cloud resource management system 410 includes a network switch (not shown in the figure), and the first data center 430 and the second data center 440 may be connected through the network switch.
  • the first virtual machine 431 and the third virtual machine 432 are installed in the first data center 430, and the second virtual machine 441 and the fourth virtual machine 442 are installed in the second data center 440.
  • the cloud resource management system 410 is used to manage virtual machines in the first data center 430 and virtual machines in the second data center 440.
  • the cloud resource management system 410 provides an API interface 411 that is connected to the first data center 430 and the second data center 440, respectively.
  • the service self-healing device 420 based on virtual machine disaster recovery can call the API interface 411 to configure the first virtual machine 431, the second virtual machine 441, the third virtual machine 432, and the fourth virtual machine 442, and convert the second virtual machine 441 is configured as a disaster recovery virtual machine of the first virtual machine 431, and makes the first virtual machine 431 and the second virtual machine 441 process the first service correspondingly, and configures the fourth virtual machine 442 as the disaster recovery virtual of the third virtual machine 432 Machine, and make the third virtual machine 432 and the fourth virtual machine 442 handle the second service correspondingly.
  • the service self-healing device 420 based on virtual machine disaster recovery can also call the API interface 411 to configure the first service self-healing trigger condition for the first virtual machine 431 and the second virtual machine 441, and the third virtual machine 432 and the fourth virtual machine Machine 442 configures the second service self-healing trigger condition.
  • the first virtual machine 431 is in the power-on state as the active virtual machine
  • the second virtual machine 441 is in the off state as the standby virtual machine
  • the third virtual machine 432 is used as the active
  • the virtual machine is in a power-on state
  • the fourth virtual machine 442 is in a power-off state as a standby virtual machine.
  • the service self-healing device 420 based on virtual machine disaster recovery can send monitoring messages to the first virtual machine 431 and the third virtual machine 432 respectively, and monitor the first virtual machine 431 and the third virtual machine respectively by collecting return messages corresponding to the monitoring messages 432 status;
  • the service self-healing device 420 based on virtual machine disaster recovery calls the API interface 411 of the cloud resource management system 410 to close the first virtual machine when it detects that the status of the first virtual machine 431 meets the first service self-healing trigger Machine 431 turns on the second virtual machine 441 to complete the service self-healing of the first service;
  • the service self-healing device 420 based on virtual machine disaster recovery monitors that the state of the third virtual machine 432 satisfies the trigger condition of the second service self-healing, Call the API interface 411 of the cloud resource management system 410, turn off the third virtual machine 432, turn on the fourth virtual machine 442, and complete the self-healing of the second service.
  • An embodiment of the present invention also provides a storage medium (computer-readable storage medium).
  • a storage medium (computer-readable storage medium).
  • the storage medium may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as read-only memory, flash memory, hard disk, or solid-state hard disk; the memory may also include combination.
  • the processor is used to execute a service self-healing program based on virtual machine disaster recovery stored in the memory, so as to implement the following steps of a service self-healing method based on virtual machine disaster recovery: during the operation of the active virtual machine, The primary virtual machine performs status monitoring; when it is monitored that the status of the primary virtual machine meets the service self-healing trigger condition, the standby virtual machine corresponding to the primary virtual machine is controlled to process the business of the primary virtual machine; wherein , The standby virtual machine and the active virtual machine are located in different data centers, and the standby virtual machine is configured as a disaster recovery virtual machine of the active virtual machine.
  • the standby virtual machine is configured as a disaster recovery virtual machine of the active virtual machine, including: the standby virtual machine is configured to be the same as the Internet protocol IP address of the active virtual machine; The standby virtual machine is configured to synchronize data with the active virtual machine.
  • the standby virtual machine is configured to be the same as the Internet protocol IP address of the active virtual machine, including: controlling the standby virtual machine and all the machines by calling an application programming interface API of the cloud resource management system The network switch between the active virtual machines, so that the network switch configures the standby virtual machine to have the same IP address as the active virtual machine.
  • the standby virtual machine is configured to synchronize data with the active virtual machine, including: configuring the data in the storage device on which the standby virtual machine is mounted by calling an API interface of the cloud resource management system To synchronize with the data mirror in the storage device mounted by the active virtual machine.
  • the data image in the storage device mounted by the active virtual machine is copied to the storage device mounted by the standby virtual machine, so that the standby virtual machine The data of the machine and the active virtual machine are synchronized.
  • the state monitoring of the active virtual machine includes: sending monitoring messages to the active virtual machine every preset time period; collecting return messages corresponding to the monitoring messages; according to the Monitor the return message corresponding to the message to determine the status of the active virtual machine.
  • controlling the standby virtual machine corresponding to the active virtual machine to process the service of the active virtual machine includes: shutting down the active virtual machine, so that the device state of the active virtual machine is in Standby state; turn on the standby virtual machine so that the device state of the standby virtual machine is in the active state.
  • the service self-healing trigger condition includes: a network abnormality and / or a service abnormality of the active virtual machine.
  • An embodiment of the present invention also provides a computer program product.
  • the computer program product includes a computer program stored on a non-transitory computer-readable storage medium.
  • the computer program includes program instructions. When the program instructions are executed by a computer When, the computer is caused to execute the method in any of the above method embodiments.
  • Such software may be distributed on computer-readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media).
  • computer storage media includes both volatile and nonvolatile implemented in any method or technology for storing information such as computer readable instructions, data structures, program modules, or other data Sex, removable and non-removable media.
  • Computer storage media include but are not limited to RAM, ROM, EEPROM, flash memory or other memory technologies, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cartridges, magnetic tape, magnetic disk storage or other magnetic storage devices, or may Any other medium for storing desired information and accessible by a computer.
  • the communication medium generally contains computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transmission mechanism, and may include any information delivery medium .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Hardware Redundancy (AREA)

Abstract

Disclosed are a service self-healing method and device based on virtual machine disaster recovery, and a storage medium. The method comprises: during the process of running a primary virtual machine, performing state monitoring on the primary virtual machine (S110); when it is monitored that the state of the primary virtual machine satisfies a service self-healing trigger condition, controlling a standby virtual machine corresponding to the primary virtual machine to process a service of the primary virtual machine, wherein the standby virtual machine and the primary virtual machine are located at different data centers, and the standby virtual machine is configured to be a disaster recovery virtual machine of the primary virtual machine (S120).

Description

一种基于虚机容灾的业务自愈方法、设备和存储介质Business self-healing method, equipment and storage medium based on virtual machine disaster recovery
交叉引用cross reference
本发明要求在2018年11月21日提交至中国专利局、申请号为201811393959.9、发明名称为“一种基于虚机容灾的业务自愈方法、设备和存储介质”的中国专利申请的优先权,该申请的全部内容通过引用结合在本发明中。The present invention requires the priority of the Chinese patent application submitted to the Chinese Patent Office on November 21, 2018, with the application number 201811393959.9 and the invention titled "a business self-healing method, equipment and storage medium based on virtual machine disaster recovery" The entire content of this application is incorporated by reference in the present invention.
技术领域Technical field
本发明涉及计算机技术领域,特别是涉及一种基于虚机容灾的业务自愈方法、设备和存储介质。The present invention relates to the field of computer technology, and in particular, to a service self-healing method, device, and storage medium based on virtual machine disaster recovery.
背景技术Background technique
随着云计算的推广和应用,越来越多的应用部署在云环境下。云计算具有可动态调整资源的特点,因此很多应用,特别是基于负载均衡的集群应用,比如,web类应用,支持动态伸缩,即根据应用的负载情况,动态调整集群中的应用服务器,以提高应用的可靠性和可用性。但是,对于希望支持动态伸缩的应用存在一个限制要求,即:应用必须是无状态的。对于有状态的应用,比如:含有状态数据的应用,含有文件系统数据或数据库数据的应用,都不支持负载均衡下的动态伸缩。With the promotion and application of cloud computing, more and more applications are deployed in the cloud environment. Cloud computing has the characteristics of dynamically adjusting resources, so many applications, especially cluster applications based on load balancing, such as web applications, support dynamic scaling, that is, dynamically adjust the application servers in the cluster according to the application load to improve Application reliability and availability. However, there is a limitation requirement for applications that wish to support dynamic scaling, that is, applications must be stateless. For stateful applications, such as applications containing state data, applications containing file system data or database data, neither support dynamic scaling under load balancing.
对于有状态的应用,一般采用业务自愈的方式来提高应用的可靠性和可用性。目前,业务自愈方法通常采用虚拟机重生来实现,这类虚拟机挂载云硬盘作为数据存放盘,然后监控应用的虚拟机状态,如果虚拟机状态异常,比如PING(Packet Internet Groper,因特网包探索器)不通、URL(Uniform Resource Locator,统一资源定位符)访问失败等,则重启该状态异常的虚拟机,如果业务没有恢复,则删除该虚拟机,重新创建一个具有相同IP地址(Internet Protocol Address,互联网协议地址)的虚拟机,并且挂载同一个云硬盘作为数据存放盘,确保数据一致性,从而实现业务自愈。但是,这种业务自愈方法存在删除虚拟机、重建虚拟机时间长的问题,至少需要几分钟,这样会导致业务中断时间长的问题发生。For stateful applications, business self-healing is generally used to improve the reliability and availability of applications. Currently, the business self-healing method is usually implemented by regenerating a virtual machine. This type of virtual machine mounts a cloud hard disk as a data storage disk, and then monitors the virtual machine status of the application. If the virtual machine status is abnormal, such as PING (Packet Internet Groper, Internet package Explorer) failed, URL (Uniform Resource Locator) access failure, etc., then restart the virtual machine with abnormal status, if the business is not restored, delete the virtual machine, re-create a same IP address (Internet Protocol) Address (Internet Protocol address) virtual machine, and mount the same cloud hard disk as a data storage disk to ensure data consistency, so as to achieve business self-healing. However, this business self-healing method has the problem of deleting virtual machines and rebuilding virtual machines for a long time, which requires at least a few minutes, which will cause the problem of long service interruption.
发明内容Summary of the invention
本发明是通过以下技术方案来解决的:The present invention is solved by the following technical solutions:
本发明提供一种基于虚机容灾的业务自愈方法,包括:在运行主用虚拟机的过程中,对所述主用虚拟机进行状态监控;在监控到所述主用虚拟机的状态符合业务自愈触发条件时,控制所述主用虚拟机对应的备用虚拟机处理所述主用虚拟机的业务;其中,所述备用虚拟机与所述主用虚拟机位于不同的数据中心,并且所述备用虚拟机被配置为所述主用虚拟机的容灾虚拟机。The invention provides a business self-healing method based on virtual machine disaster recovery, which includes: monitoring the state of the active virtual machine during the operation of the active virtual machine; and monitoring the state of the active virtual machine When the service self-healing trigger condition is met, controlling the standby virtual machine corresponding to the active virtual machine to process the service of the active virtual machine; wherein the standby virtual machine and the active virtual machine are located in different data centers And the standby virtual machine is configured as a disaster recovery virtual machine of the active virtual machine.
本发明还提供一种基于虚机容灾的业务自愈设备,所述基于虚机容灾的业务自愈设备包括处理器和存储器;所述处理器用于执行所述存储器中存储的基于虚机容灾的业务自愈程序,以实现上述的基于虚机容灾的业务自愈方法。The invention also provides a service self-healing device based on virtual machine disaster recovery. The service self-healing device based on virtual machine disaster recovery includes a processor and a memory; the processor is used to execute the virtual machine-based storage stored in the memory Disaster recovery business self-healing program to achieve the above-mentioned business self-healing method based on virtual machine disaster recovery.
本发明又提供一种存储介质,所述存储介质存储有一个或者多个程序,所述一个或者多个程序可被一个或者多个处理器执行,以实现上述的基于虚机容灾的业务自愈方法。The present invention further provides a storage medium, where the storage medium stores one or more programs, and the one or more programs can be executed by one or more processors to implement the above-mentioned virtual machine disaster recovery-based business self-service The healing method.
本发明又提供一种计算机程序产品,所述计算机程序产品包括存储在非暂态计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,使所述计算机执行上述任意方法实施例中的方法。The present invention also provides a computer program product, the computer program product includes a computer program stored on a non-transitory computer-readable storage medium, the computer program includes program instructions, when the program instructions are executed by a computer, the The computer executes the method in any of the above method embodiments.
附图说明BRIEF DESCRIPTION
图1是根据本发明第一实施例的基于虚机容灾的业务自愈方法的流程图;FIG. 1 is a flowchart of a method for self-healing services based on virtual machine disaster recovery according to a first embodiment of the present invention;
图2是根据本发明第二实施例的基于虚机容灾的业务自愈方法的流程图;2 is a flowchart of a method for self-healing services based on virtual machine disaster recovery according to a second embodiment of the present invention;
图3是根据本发明第三实施例的基于虚机容灾的业务自愈设备的结构图;3 is a structural diagram of a service self-healing device based on virtual machine disaster recovery according to a third embodiment of the present invention;
图4是根据本发明第三实施例的基于虚机容灾的业务自愈系统的结构图。4 is a structural diagram of a business self-healing system based on virtual machine disaster recovery according to a third embodiment of the present invention.
具体实施方式detailed description
以下结合附图以及实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不限定本发明。The present invention will be further described in detail below with reference to the drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, and do not limit the present invention.
实施例一Example one
本实施例提供一种基于虚机容灾的业务自愈方法。如图1所示,为根据本发明第一实施例的基于虚机容灾的业务自愈方法的流程图。This embodiment provides a business self-healing method based on virtual machine disaster recovery. As shown in FIG. 1, it is a flowchart of a service self-healing method based on virtual machine disaster recovery according to the first embodiment of the present invention.
步骤S110,在运行主用虚拟机的过程中,对主用虚拟机进行状态监控。Step S110, in the process of running the primary virtual machine, performing status monitoring on the primary virtual machine.
对主用虚拟机进行状态监控,包括:每隔预设时间段向主用虚拟机发送监控消息;采集所述监控消息对应的返回消息;根据所述监控消息对应的返回消息,确定所述主用虚拟机的状态。Monitoring the status of the active virtual machine includes: sending monitoring messages to the active virtual machine every preset time period; collecting return messages corresponding to the monitoring messages; and determining the active messages according to the return messages corresponding to the monitoring messages The state of the virtual machine.
主用虚拟机的状态,包括:网络状态和/或访问状态。The status of the active virtual machine includes: network status and / or access status.
监控消息的种类包括但不限于:PING消息和URL访问消息。PING消息可以是PING命令对应的ICMP(Internet Control Messages Protocol,因特网信报控制协议)消息。The types of monitoring messages include but are not limited to: PING messages and URL access messages. The PING message may be an ICMP (Internet Control Messages Protocol) message corresponding to the PING command.
监控消息对应的返回消息,包括:PING消息对应的Time Out消息,URL访问消息对应的URL访问失败消息。The return messages corresponding to the monitoring messages include: Time Out messages corresponding to PING messages, and URL access failure messages corresponding to URL access messages.
步骤S120,在监控到主用虚拟机的状态符合业务自愈触发条件时,控制主用虚拟机对应的备用虚拟机处理主用虚拟机的业务;备用虚拟机与主用虚拟机位于不同的数据中心,并且备用虚拟机被配置为主用虚拟机的容灾虚拟机。Step S120, when it is monitored that the state of the active virtual machine meets the service self-healing trigger condition, the standby virtual machine corresponding to the active virtual machine is controlled to process the service of the active virtual machine; the standby virtual machine and the active virtual machine are located in different data Center, and the standby virtual machine is configured as the disaster recovery virtual machine of the primary virtual machine.
容灾虚拟机是指:和主用虚拟机对应相同的业务并且用于替代主用虚拟机处理该业务的虚拟机。The disaster recovery virtual machine refers to a virtual machine corresponding to the same service as the active virtual machine and used to replace the active virtual machine to process the service.
业务自愈触发条件,用于识别主用虚拟机当前的状态是否需要通过备用虚拟机替代主用虚拟机,来实现业务自愈。The service self-healing trigger condition is used to identify whether the current state of the active virtual machine needs to replace the active virtual machine with a standby virtual machine to realize the service self-healing.
业务自愈触发条件,包括:主用虚拟机的网络异常和/或业务异常。The service self-healing trigger conditions include: network abnormality of the active virtual machine and / or service abnormality.
在本实施例中,根据主用虚拟机返回消息的情况,可以确定主用虚拟机的状态是否符合业务自愈触发条件。例如:在主用虚拟机的状态符合业务自愈触发条件时,关闭主用虚拟机,使主用虚拟机的设备状态处于备用状态;开启备用虚拟机,使备用虚拟机的设备状态处于主用状态。由于备用虚拟机是主用虚拟机的容灾虚拟机,所以,在关闭主用虚拟机,开启备用虚拟机之后,备用虚拟机替代主用虚拟机,开始处理主用虚拟机的业务,在此过程中,主用虚拟机转变成备用虚拟机的容灾虚拟机。In this embodiment, according to the situation in which the active virtual machine returns a message, it can be determined whether the state of the active virtual machine meets the service self-healing trigger condition. For example: when the state of the active virtual machine meets the service self-healing trigger condition, turn off the active virtual machine to make the device state of the active virtual machine in the standby state; turn on the standby virtual machine to make the device state of the standby virtual machine in active status. Since the standby virtual machine is the disaster recovery virtual machine of the active virtual machine, after the active virtual machine is turned off and the standby virtual machine is turned on, the standby virtual machine replaces the active virtual machine and begins to process the business of the active virtual machine. During the process, the active virtual machine is transformed into a disaster recovery virtual machine of the standby virtual machine.
备用虚拟机被配置为主用虚拟机的容灾虚拟机,包括:备用虚拟机被配置为与主用虚拟机的IP地址相同;备用虚拟机被配置为与主用虚拟机数据同步。使备用虚拟机的IP地址和主用虚拟机的IP地址相同,并且使备用虚拟机和主用虚拟机数据同步,可以使备用虚拟机替换主用虚拟机。The standby virtual machine is configured as the disaster recovery virtual machine of the active virtual machine, including: the standby virtual machine is configured to have the same IP address as the active virtual machine; the standby virtual machine is configured to synchronize data with the active virtual machine. By making the IP address of the standby virtual machine the same as the IP address of the active virtual machine and synchronizing the data of the standby virtual machine and the active virtual machine, the standby virtual machine can replace the active virtual machine.
在一些情形下,可以通过调用云资源管理系统的API(Application Programming Interface,应用程序编程接口),控制备用虚拟机和主用虚拟机之间的网络交换机,使该网络交换机将备用虚拟机配置为与主用虚拟机的IP地址相同。In some cases, you can control the network switch between the standby virtual machine and the active virtual machine by calling the API (Application Programming Interface) of the cloud resource management system, so that the network switch configures the standby virtual machine as Same as the IP address of the active virtual machine.
在一些情形下,可以通过调用云资源管理系统的API接口,将备用虚拟机挂载的存储装置中的数据配置为与主用虚拟机挂载的存储装置中的数据镜像同步,通过该配置可以实现主用虚拟机和备用虚拟机数据同步。其中,主用虚拟机的数据存储在主用虚拟机挂载的存储装置中,备用虚拟机的数据存储在备用虚拟机挂载的存储装置中。主用虚拟机和备用虚拟机分别挂载的存储装置可以是云硬盘。In some cases, you can call the API interface of the cloud resource management system to configure the data in the storage device mounted by the standby virtual machine to synchronize with the data mirroring in the storage device mounted by the active virtual machine. Data synchronization between the active virtual machine and the standby virtual machine is realized. The data of the active virtual machine is stored in a storage device mounted on the active virtual machine, and the data of the standby virtual machine is stored in a storage device mounted on the standby virtual machine. The storage devices respectively mounted on the active virtual machine and the standby virtual machine may be cloud hard disks.
由于需要保持主用虚拟机和备用虚拟机数据同步,所以,在运行主用虚拟机的过程中,将主用虚拟机挂载的存储装置中的数据镜像复制到备用虚拟机挂载的存储装置中,以便备用虚拟机和主用虚拟机数据同步(数据镜像同步)。Because the data of the active virtual machine and the standby virtual machine need to be synchronized, during the operation of the active virtual machine, the data image in the storage device mounted by the active virtual machine is copied to the storage device mounted by the standby virtual machine In order to synchronize the data of the standby virtual machine and the active virtual machine (data mirroring synchronization).
在本实施例中,配置备用虚拟机作为主用虚拟机的容灾虚拟机,在主用虚拟机的状态符合业务自愈触发条件时,使备用虚拟机替换主用虚拟机,完成业务自愈,在业务自愈过程中,无需重新启动主用虚拟机,更无需重新创建主用虚拟机,直接控制备用虚拟机代替主用虚拟机,处理主用虚拟机的业务即可,而该过程用时较短,可以快速实现业务自愈,缩短业务中断时间。In this embodiment, the standby virtual machine is configured as the disaster recovery virtual machine of the active virtual machine, and when the state of the active virtual machine meets the service self-healing trigger condition, the standby virtual machine replaces the active virtual machine to complete the service self-healing In the process of business self-healing, there is no need to restart the active virtual machine, let alone re-create the active virtual machine, directly control the standby virtual machine to replace the active virtual machine, and process the business of the active virtual machine, and the process takes time Shorter, can quickly achieve business self-healing, shortening business interruption time.
在本实施例中,主用虚拟机和备用虚拟机可以互为容灾虚拟机。也就是说,在控制主用虚拟机对应的备用虚拟机处理主用虚拟机的业务之后,停止业务处理的主用虚拟机已经转换成备用虚拟机,开始业务处理的备用虚拟机已经转换成主用虚拟机。这时,可以对原主用虚拟机(转换后的备用虚拟机)进行故障排除处理,使原主用虚拟机的状态可以正常进行业务处理,这样在监控到原备用虚拟机(转换后的主用虚拟机)的业务状态符合业务自愈触发条件时,可以使原主用虚拟机替代原备用虚拟机,完成业务自愈。In this embodiment, the active virtual machine and the standby virtual machine may be disaster recovery virtual machines. That is, after controlling the standby virtual machine corresponding to the active virtual machine to process the business of the active virtual machine, the active virtual machine that has stopped the business processing has been converted into the standby virtual machine, and the standby virtual machine that started the business processing has been converted into the active Use a virtual machine. At this time, the original active virtual machine (converted standby virtual machine) can be troubleshooting, so that the state of the original active virtual machine can be processed normally, so that the original standby virtual machine (converted active virtual machine) is monitored When the service status of the machine) meets the trigger condition of service self-healing, the original active virtual machine can replace the original standby virtual machine to complete the service self-healing.
实施例二Example 2
为了使本发明更加清楚,下面提供一个较为具体的实施例,来对本发明的基于虚机容灾的业务自愈方法进行描述。In order to make the present invention clearer, a more specific embodiment is provided below to describe the service self-healing method based on virtual machine disaster recovery of the present invention.
图2是根据本发明第二实施例的基于虚机容灾的业务自愈方法的流程图。FIG. 2 is a flowchart of a service self-healing method based on virtual machine disaster recovery according to a second embodiment of the present invention.
步骤S210,在第一数据中心设置第一虚拟机,在第二数据中心设置与该第一虚拟机对应相同业务应用的第二虚拟机。Step S210, a first virtual machine is set in the first data center, and a second virtual machine corresponding to the same service application as the first virtual machine is set in the second data center.
部署对应相同业务应用的第一虚拟机和第二虚拟机,可以使该业务应用分布在不同的两个数据中心,降低业务中断的风险,实现数据中心级容灾。Deploying the first virtual machine and the second virtual machine corresponding to the same business application can distribute the business application in two different data centers, reduce the risk of business interruption, and achieve data center-level disaster tolerance.
步骤S220,为第一虚拟机和第二虚拟机配置虚机容灾策略,使第一虚拟机作为主用虚拟机,第二虚拟机作为备用虚拟机。Step S220: Configure a virtual machine disaster recovery strategy for the first virtual machine and the second virtual machine so that the first virtual machine serves as the active virtual machine and the second virtual machine serves as the standby virtual machine.
配置虚机容灾策略,包括:将第一虚拟机配置为主用虚拟机,将第二虚拟机配置为备用虚拟机,使得第二虚拟机作为第一虚拟机的容灾虚拟机。Configuring the virtual machine disaster recovery strategy includes: configuring the first virtual machine as the primary virtual machine and configuring the second virtual machine as the standby virtual machine, so that the second virtual machine serves as the disaster recovery virtual machine of the first virtual machine.
具体的,将第二虚拟机配置为与第一虚拟机的IP地址相同;将第二虚拟机配置为与第一虚拟机数据同步;将第一虚拟机的设备状态配置为主用状态,将第二虚拟机的设备状态为备用状态,使得第一虚拟机成为主用虚拟机,第二虚拟机成为备用虚拟机。通过该配置方式可以使备用虚拟机作为主用虚拟机的容灾虚拟机,替代主用虚拟机进行业务处理。Specifically, the second virtual machine is configured to have the same IP address as the first virtual machine; the second virtual machine is configured to synchronize data with the first virtual machine; the device state of the first virtual machine is configured as the main state, and The device state of the second virtual machine is the standby state, so that the first virtual machine becomes the active virtual machine, and the second virtual machine becomes the standby virtual machine. Through this configuration mode, the standby virtual machine can be used as the disaster recovery virtual machine of the primary virtual machine, and can replace the primary virtual machine for business processing.
将第一虚拟机的设备状态配置为主用状态,将第二虚拟机的设备状态为备用状态,包括:将第一虚拟机配置为开机状态,将第二虚拟机配置为关机状态,开机状态表示主用状态,关机状态表示备用状态。通过调整虚拟机的开机和关机状态,可以将该虚拟机调整为主用虚拟机或者备用虚拟机。在一些情形下,如果第一虚拟机为主用虚拟机,则第二虚拟机为备用虚拟机;如果第二虚拟机为主用虚拟机,则第一虚拟机为主用虚拟机。Configure the device state of the first virtual machine as the active state, and configure the device state of the second virtual machine as the standby state, including: configuring the first virtual machine as the power-on state, configuring the second virtual machine as the power-off state, power-on state Indicates the active state, and the shutdown state indicates the standby state. By adjusting the power-on and power-off states of the virtual machine, the virtual machine can be adjusted as the active virtual machine or the standby virtual machine. In some cases, if the first virtual machine is the active virtual machine, the second virtual machine is the standby virtual machine; if the second virtual machine is the active virtual machine, the first virtual machine is the active virtual machine.
将第一虚拟机和第二虚拟机配置为IP地址一致,可以利用网络交换机的VRRP(Virtual Router Redundancy Protocol,虚拟路由冗余协议)技术来实现。在一些情形下,可以通过调用云资源管理系统的API接口,控制连接第一虚拟机(主用虚拟机)和第二虚拟机(备用虚拟机)的网络交换机,使该网络交换机利用VRRP技术,为第二虚拟机配置与第一虚拟机相同的IP地址。Configuring the first virtual machine and the second virtual machine to have the same IP address can be achieved by using VRRP (VirtualRouterRedundancyProtocol, virtual routing redundancy protocol) technology of the network switch. In some cases, the network switch connecting the first virtual machine (active virtual machine) and the second virtual machine (standby virtual machine) can be controlled by calling the API interface of the cloud resource management system, so that the network switch uses VRRP technology, Configure the second virtual machine with the same IP address as the first virtual machine.
将第一虚拟机和第二虚拟机配置为数据同步,可以保证第一数据中心的第一虚拟机和第二数据中心的第二虚拟机的数据一致性。该数据同步可以是数据镜像同步。在一些情形下,可以通过调用云资源管理系统的API接口,配置第一虚拟机(主用虚拟机)挂载的存储装置中的数据和第二虚拟机(备用虚拟机)挂载的存储装置中的数据镜像同步。Configuring the first virtual machine and the second virtual machine to synchronize data can ensure data consistency between the first virtual machine in the first data center and the second virtual machine in the second data center. The data synchronization may be data mirroring synchronization. In some cases, the data in the storage device mounted by the first virtual machine (active virtual machine) and the storage device mounted by the second virtual machine (standby virtual machine) can be configured by calling the API interface of the cloud resource management system The data in the image is synchronized.
步骤S230,为第一虚拟机和第二虚拟机配置业务自愈策略。Step S230: Configure a service self-healing strategy for the first virtual machine and the second virtual machine.
配置业务自愈策略,包括:配置主用虚拟机的状态在满足业务自愈触发条件下,启动业务自愈流程。Configuring the business self-healing strategy includes: configuring the state of the primary virtual machine to start the business self-healing process when the conditions for business self-healing are met.
业务自愈触发条件,包括:主用虚拟机的网络异常和/或业务异常。The service self-healing trigger conditions include: network abnormality of the active virtual machine and / or service abnormality.
主用虚拟机的网络异常,包括但不限于:连续N次检测到主用虚拟机的网 络不通。N为大于1的正整数,N可以是经验值或者实验获得的值。例如:连续PING不通主用虚拟机的次数达到N次。The network of the active virtual machine is abnormal, including but not limited to: the network of the active virtual machine is detected to be disconnected N times consecutively. N is a positive integer greater than 1, and N can be an empirical value or an experimentally obtained value. For example, the number of consecutive PING failed to reach the main virtual machine N times.
主用虚拟机的业务异常,包括但不限于:连续M次对主用虚拟机的业务访问失败。M为大于1的正整数,M可以是经验值或者实验获得的值。例如:连续M次对主用虚拟机的URL访问失败。The business of the active virtual machine is abnormal, including but not limited to: the service access to the active virtual machine fails for M consecutive times. M is a positive integer greater than 1, M can be an empirical value or an experimentally obtained value. For example: URL access to the active virtual machine fails for M consecutive times.
步骤S240,在运行第一虚拟机的过程中,对第一虚拟机进行状态监控。Step S240: During the process of running the first virtual machine, perform status monitoring on the first virtual machine.
由于第一虚拟机的设备状态为主用状态(开机状态),所以第一虚拟机作为主用虚拟机可以运行,可以处理业务;第二虚拟机的设备状态为备用状态(关机状态),所以第二虚拟机作为备用虚拟机暂时不可运行,不能处理业务。Since the device state of the first virtual machine is the active state (power-on state), the first virtual machine can run as the active virtual machine and can process services; the device state of the second virtual machine is the standby state (off state), so The second virtual machine is temporarily unavailable as a standby virtual machine and cannot process business.
具体的,每隔预设时间段向作为主用虚拟机的第一虚拟机发送监控消息,采集监控消息对应的返回消息,如:采集表示访问失败的URL访问失败消息,采集表示未PING通的Time Out消息等,依此监控第一虚拟机的状态;根据配置的业务自愈策略,判断第一虚拟机的状态是否符合业务自愈触发条件,如果符合,则启动业务自愈流程,如果不符合,则继续监控第一虚拟机的状态。Specifically, the monitoring message is sent to the first virtual machine that is the active virtual machine every preset time period, and the return message corresponding to the monitoring message is collected, for example, a URL access failure message indicating that the access fails is collected, and a collection message indicating that the connection is not pinged is collected. Time Out messages, etc., to monitor the state of the first virtual machine accordingly; according to the configured business self-healing strategy, determine whether the state of the first virtual machine meets the business self-healing trigger condition, if it meets, then start the business self-healing process, if not If it matches, the state of the first virtual machine continues to be monitored.
监控消息的种类包括但不限于:PING消息和URL访问消息。The types of monitoring messages include but are not limited to: PING messages and URL access messages.
例如:业务自愈触发条件,包括:连续PING不通主用虚拟机的次数达到3次,并且连续URL访问主用虚拟机失败的次数达到3次;每隔5秒钟向第一虚拟机发送一次PING消息,并且连续3次未PING通,每隔5秒钟向第一虚拟机发送一次URL访问消息,并且连续3次访问失败,此时可以确定第一虚拟机的状态符合业务自愈触发条件,可以启动业务自愈流程。For example: the service self-healing trigger condition includes: the number of consecutive PING failures to the main virtual machine reaches 3 times, and the number of consecutive URL access failures to the main virtual machine reaches 3 times; send to the first virtual machine every 5 seconds PING message, and there is no PING connection for 3 consecutive times, and the URL access message is sent to the first virtual machine every 5 seconds, and the access fails for 3 consecutive times. At this time, it can be determined that the status of the first virtual machine meets the service self-healing trigger condition , You can start the business self-healing process.
步骤S250,在监控到第一虚拟机的状态符合业务自愈策略中的业务自愈触发条件时,使第二虚拟机作为主用虚拟机,第一虚拟机作为备用虚拟机。Step S250, when it is monitored that the state of the first virtual machine meets the service self-healing trigger condition in the service self-healing strategy, the second virtual machine is used as the active virtual machine, and the first virtual machine is used as the standby virtual machine.
在启动业务自愈流程之后,调用云资源管理系统的API接口,关闭第一数据中心的第一虚拟机,开启第二数据中心的第二虚拟机。通过该方式,使得第一虚拟机的设备状态处于关机状态,即第一虚拟机进入备用状态;使得第二虚拟机的设备状态处于开机状态,即第二虚拟机进入主用状态,替换第一虚拟机并开始处理第一虚拟机的业务。After starting the business self-healing process, the API interface of the cloud resource management system is called, the first virtual machine in the first data center is turned off, and the second virtual machine in the second data center is turned on. In this way, the device state of the first virtual machine is turned off, that is, the first virtual machine enters the standby state; the device state of the second virtual machine is turned on, that is, the second virtual machine enters the active state, replacing the first The virtual machine starts processing the business of the first virtual machine.
在本实施例中,由于第一虚拟机和第二虚拟机对应相同的应用,第一虚拟机和第二虚拟机的数据是镜像同步的,并且第一虚拟机和第二虚拟机的IP地址相同,所以在关闭第一虚拟机,开启第二虚拟机之后,对该业务的访问没有任何影响,从而实现了业务自愈。In this embodiment, since the first virtual machine and the second virtual machine correspond to the same application, the data of the first virtual machine and the second virtual machine are mirrored and the IP addresses of the first virtual machine and the second virtual machine are synchronized The same, so after the first virtual machine is turned off and the second virtual machine is turned on, there is no impact on the access to the service, thereby achieving self-healing of the service.
在本实施例中,由于在业务自愈处理过程中,并不需要重新创建主用虚拟 机,只需要启动预先配置的容灾虚拟机即可,而启动容灾虚拟机的时间通常小于1分钟,这样就可以有效加快业务自愈的速度,缩短业务中断的时间。In this embodiment, during the process of service self-healing, it is not necessary to re-create the active virtual machine, only the pre-configured disaster recovery virtual machine needs to be started, and the time for starting the disaster recovery virtual machine is usually less than 1 minute In this way, you can effectively accelerate the speed of business self-healing and shorten the time of business interruption.
实施例三Example Three
本实施例提供一种基于虚机容灾的业务自愈设备。如图3所示,为根据本发明第三实施例的基于虚机容灾的业务自愈设备的结构图。This embodiment provides a service self-healing device based on virtual machine disaster recovery. As shown in FIG. 3, it is a structural diagram of a service self-healing device based on virtual machine disaster recovery according to a third embodiment of the present invention.
在本实施例中,基于虚机容灾的业务自愈设备,包括但不限于:处理器310、存储器320。In this embodiment, the service self-healing device based on virtual machine disaster recovery includes but is not limited to: a processor 310 and a memory 320.
处理器310用于执行存储器320中存储的基于虚机容灾的业务自愈程序,以实现上述的基于虚机容灾的业务自愈方法。The processor 310 is used to execute a service self-healing program based on virtual machine disaster recovery stored in the memory 320, so as to implement the above-mentioned service self-healing method based on virtual machine disaster recovery.
具体而言,所述处理器310用于执行存储器320中存储的基于虚机容灾的业务自愈程序,以实现以下基于虚机容灾的业务自愈方法的步骤:在运行主用虚拟机的过程中,对所述主用虚拟机进行状态监控;在监控到所述主用虚拟机的状态符合业务自愈触发条件时,控制所述主用虚拟机对应的备用虚拟机处理所述主用虚拟机的业务;其中,所述备用虚拟机与所述主用虚拟机位于不同的数据中心,并且所述备用虚拟机被配置为所述主用虚拟机的容灾虚拟机。Specifically, the processor 310 is used to execute a service self-healing program based on virtual machine disaster recovery stored in the memory 320 to implement the following steps of the service self-healing method based on virtual machine disaster recovery: running the active virtual machine During the process of monitoring the status of the active virtual machine; when it is detected that the state of the active virtual machine meets the service self-healing trigger condition, the standby virtual machine corresponding to the active virtual machine is controlled to process the active virtual machine Business using virtual machines; wherein the standby virtual machine and the active virtual machine are located in different data centers, and the standby virtual machine is configured as a disaster recovery virtual machine of the active virtual machine.
在一些情形下,所述备用虚拟机被配置为所述主用虚拟机的容灾虚拟机,包括:所述备用虚拟机被配置为与所述主用虚拟机的互联网协议IP地址相同;所述备用虚拟机被配置为与所述主用虚拟机数据同步。In some cases, the standby virtual machine is configured as a disaster recovery virtual machine of the active virtual machine, including: the standby virtual machine is configured to be the same as the Internet protocol IP address of the active virtual machine; The standby virtual machine is configured to synchronize data with the active virtual machine.
在一些情形下,所述备用虚拟机被配置为与所述主用虚拟机的互联网协议IP地址相同,包括:通过调用云资源管理系统的应用程序编程接口API,控制所述备用虚拟机和所述主用虚拟机之间的网络交换机,使所述网络交换机将所述备用虚拟机配置为与所述主用虚拟机的IP地址相同。In some cases, the standby virtual machine is configured to be the same as the Internet protocol IP address of the active virtual machine, including: controlling the standby virtual machine and all the machines by calling an application programming interface API of the cloud resource management system The network switch between the active virtual machines, so that the network switch configures the standby virtual machine to have the same IP address as the active virtual machine.
在一些情形下,所述备用虚拟机被配置为与所述主用虚拟机数据同步,包括:通过调用云资源管理系统的API接口,将所述备用虚拟机挂载的存储装置中的数据配置为与所述主用虚拟机挂载的存储装置中的数据镜像同步。In some cases, the standby virtual machine is configured to synchronize data with the active virtual machine, including: configuring the data in the storage device on which the standby virtual machine is mounted by calling an API interface of the cloud resource management system To synchronize with the data mirror in the storage device mounted by the active virtual machine.
在一些情形下,在运行主用虚拟机的过程中,将所述主用虚拟机挂载的存储装置中的数据镜像复制到所述备用虚拟机挂载的存储装置中,以便所述备用虚拟机和所述主用虚拟机数据同步。In some cases, during the operation of the active virtual machine, the data image in the storage device mounted by the active virtual machine is copied to the storage device mounted by the standby virtual machine, so that the standby virtual machine The data of the machine and the active virtual machine are synchronized.
在一些情形下,所述对所述主用虚拟机进行状态监控,包括:每隔预设时间段向所述主用虚拟机发送监控消息;采集所述监控消息对应的返回消息;根据所述监控消息对应的返回消息,确定所述主用虚拟机的状态。In some cases, the state monitoring of the active virtual machine includes: sending monitoring messages to the active virtual machine every preset time period; collecting return messages corresponding to the monitoring messages; according to the Monitor the return message corresponding to the message to determine the status of the active virtual machine.
在一些情形下,所述控制所述主用虚拟机对应的备用虚拟机处理所述主用虚拟机的业务,包括:关闭所述主用虚拟机,使所述主用虚拟机的设备状态处于备用状态;开启所述备用虚拟机,使所述备用虚拟机的设备状态处于主用状态。In some cases, the controlling the standby virtual machine corresponding to the active virtual machine to process the service of the active virtual machine includes: shutting down the active virtual machine, so that the device state of the active virtual machine is in Standby state; turn on the standby virtual machine so that the device state of the standby virtual machine is in the active state.
在一些情形下,所述业务自愈触发条件,包括:所述主用虚拟机的网络异常和/或业务异常。In some cases, the service self-healing trigger condition includes: a network abnormality and / or a service abnormality of the active virtual machine.
上述基于虚机容灾的业务自愈设备可以设置在云资源管理系统侧,也可以独立设置。如图4所示,为根据本发明第三实施例的基于虚机容灾的业务自愈系统的结构图。在图4中,基于虚机容灾的业务自愈设备与云资源管理系统独立设置。The above-mentioned business self-healing equipment based on virtual machine disaster recovery can be set on the side of the cloud resource management system, or can be set independently. As shown in FIG. 4, it is a structural diagram of a service self-healing system based on virtual machine disaster recovery according to a third embodiment of the present invention. In Figure 4, the business self-healing equipment based on virtual machine disaster recovery and the cloud resource management system are set independently.
在该基于虚机容灾的业务自愈系统中,包括:云资源管理系统410、基于虚机容灾的业务自愈设备420、第一数据中心430和第二数据中心440。在云资源管理系统410中包括网络交换机(图中未示出),第一数据中心430和第二数据中心440可以通过该网络交换机连接。The business self-healing system based on virtual machine disaster recovery includes: a cloud resource management system 410, a business self-healing device 420 based on virtual machine disaster recovery, a first data center 430 and a second data center 440. The cloud resource management system 410 includes a network switch (not shown in the figure), and the first data center 430 and the second data center 440 may be connected through the network switch.
在第一数据中心430设置了第一虚拟机431和第三虚拟机432,在第二数据中心440设置了第二虚拟机441和第四虚拟机442。The first virtual machine 431 and the third virtual machine 432 are installed in the first data center 430, and the second virtual machine 441 and the fourth virtual machine 442 are installed in the second data center 440.
云资源管理系统410用于管理第一数据中心430的虚拟机和第二数据中心440的虚拟机。云资源管理系统410提供一个API接口411,该API接口411分别连接第一数据中心430和第二数据中心440。The cloud resource management system 410 is used to manage virtual machines in the first data center 430 and virtual machines in the second data center 440. The cloud resource management system 410 provides an API interface 411 that is connected to the first data center 430 and the second data center 440, respectively.
基于虚机容灾的业务自愈设备420可以调用该API接口411,对第一虚拟机431、第二虚拟机441、第三虚拟机432和第四虚拟机442进行配置,将第二虚拟机441配置为第一虚拟机431的容灾虚拟机,并且使第一虚拟机431和第二虚拟机441对应处理第一业务,将第四虚拟机442配置为第三虚拟机432的容灾虚拟机,并且使第三虚拟机432和第四虚拟机442对应处理第二业务。基于虚机容灾的业务自愈设备420还可以调用该API接口411,为第一虚拟机431和第二虚拟机441配置第一业务自愈触发条件,为第三虚拟机432和第四虚拟机442配置第二业务自愈触发条件。The service self-healing device 420 based on virtual machine disaster recovery can call the API interface 411 to configure the first virtual machine 431, the second virtual machine 441, the third virtual machine 432, and the fourth virtual machine 442, and convert the second virtual machine 441 is configured as a disaster recovery virtual machine of the first virtual machine 431, and makes the first virtual machine 431 and the second virtual machine 441 process the first service correspondingly, and configures the fourth virtual machine 442 as the disaster recovery virtual of the third virtual machine 432 Machine, and make the third virtual machine 432 and the fourth virtual machine 442 handle the second service correspondingly. The service self-healing device 420 based on virtual machine disaster recovery can also call the API interface 411 to configure the first service self-healing trigger condition for the first virtual machine 431 and the second virtual machine 441, and the third virtual machine 432 and the fourth virtual machine Machine 442 configures the second service self-healing trigger condition.
根据基于虚机容灾的业务自愈设备420的配置,第一虚拟机431作为主用虚拟机处于开机状态,第二虚拟机441作为备用虚拟机处于关机状态;第三虚拟机432作为主用虚拟机处于开机状态,第四虚拟机442作为备用虚拟机处于关机状态。According to the configuration of the service self-healing device 420 based on virtual machine disaster recovery, the first virtual machine 431 is in the power-on state as the active virtual machine, the second virtual machine 441 is in the off state as the standby virtual machine; the third virtual machine 432 is used as the active The virtual machine is in a power-on state, and the fourth virtual machine 442 is in a power-off state as a standby virtual machine.
基于虚机容灾的业务自愈设备420可以分别向第一虚拟机431和第三虚拟 机432发送监控消息,通过采集监控消息对应的返回消息,分别监控第一虚拟机431和第三虚拟机432的状态;基于虚机容灾的业务自愈设备420在监控到第一虚拟机431的状态满足第一业务自愈触发条件时,调用云资源管理系统410的API接口411,关闭第一虚拟机431,开启第二虚拟机441,完成第一业务的业务自愈;基于虚机容灾的业务自愈设备420在监控到第三虚拟机432的状态满足第二业务自愈触发条件时,调用云资源管理系统410的API接口411,关闭第三虚拟机432,开启第四虚拟机442,完成第二业务的业务自愈。The service self-healing device 420 based on virtual machine disaster recovery can send monitoring messages to the first virtual machine 431 and the third virtual machine 432 respectively, and monitor the first virtual machine 431 and the third virtual machine respectively by collecting return messages corresponding to the monitoring messages 432 status; the service self-healing device 420 based on virtual machine disaster recovery calls the API interface 411 of the cloud resource management system 410 to close the first virtual machine when it detects that the status of the first virtual machine 431 meets the first service self-healing trigger Machine 431 turns on the second virtual machine 441 to complete the service self-healing of the first service; when the service self-healing device 420 based on virtual machine disaster recovery monitors that the state of the third virtual machine 432 satisfies the trigger condition of the second service self-healing, Call the API interface 411 of the cloud resource management system 410, turn off the third virtual machine 432, turn on the fourth virtual machine 442, and complete the self-healing of the second service.
实施例四Example 4
本发明实施例还提供了一种存储介质(计算机可读存储介质)。这里的存储介质存储有一个或者多个程序。其中,存储介质可以包括易失性存储器,例如随机存取存储器;存储器也可以包括非易失性存储器,例如只读存储器、快闪存储器、硬盘或固态硬盘;存储器还可以包括上述种类的存储器的组合。An embodiment of the present invention also provides a storage medium (computer-readable storage medium). One or more programs are stored in the storage medium here. Wherein, the storage medium may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as read-only memory, flash memory, hard disk, or solid-state hard disk; the memory may also include combination.
当存储介质中一个或者多个程序可被一个或者多个处理器执行,以实现上述基于虚机容灾的业务自愈方法。When one or more programs in the storage medium can be executed by one or more processors, the above method for self-healing based on virtual machine disaster recovery is implemented.
所述处理器用于执行存储器中存储的基于虚机容灾的业务自愈程序,以实现以下基于虚机容灾的业务自愈方法的步骤:在运行主用虚拟机的过程中,对所述主用虚拟机进行状态监控;在监控到所述主用虚拟机的状态符合业务自愈触发条件时,控制所述主用虚拟机对应的备用虚拟机处理所述主用虚拟机的业务;其中,所述备用虚拟机与所述主用虚拟机位于不同的数据中心,并且所述备用虚拟机被配置为所述主用虚拟机的容灾虚拟机。The processor is used to execute a service self-healing program based on virtual machine disaster recovery stored in the memory, so as to implement the following steps of a service self-healing method based on virtual machine disaster recovery: during the operation of the active virtual machine, The primary virtual machine performs status monitoring; when it is monitored that the status of the primary virtual machine meets the service self-healing trigger condition, the standby virtual machine corresponding to the primary virtual machine is controlled to process the business of the primary virtual machine; wherein , The standby virtual machine and the active virtual machine are located in different data centers, and the standby virtual machine is configured as a disaster recovery virtual machine of the active virtual machine.
在一些情形下,所述备用虚拟机被配置为所述主用虚拟机的容灾虚拟机,包括:所述备用虚拟机被配置为与所述主用虚拟机的互联网协议IP地址相同;所述备用虚拟机被配置为与所述主用虚拟机数据同步。In some cases, the standby virtual machine is configured as a disaster recovery virtual machine of the active virtual machine, including: the standby virtual machine is configured to be the same as the Internet protocol IP address of the active virtual machine; The standby virtual machine is configured to synchronize data with the active virtual machine.
在一些情形下,所述备用虚拟机被配置为与所述主用虚拟机的互联网协议IP地址相同,包括:通过调用云资源管理系统的应用程序编程接口API,控制所述备用虚拟机和所述主用虚拟机之间的网络交换机,使所述网络交换机将所述备用虚拟机配置为与所述主用虚拟机的IP地址相同。In some cases, the standby virtual machine is configured to be the same as the Internet protocol IP address of the active virtual machine, including: controlling the standby virtual machine and all the machines by calling an application programming interface API of the cloud resource management system The network switch between the active virtual machines, so that the network switch configures the standby virtual machine to have the same IP address as the active virtual machine.
在一些情形下,所述备用虚拟机被配置为与所述主用虚拟机数据同步,包括:通过调用云资源管理系统的API接口,将所述备用虚拟机挂载的存储装置中的数据配置为与所述主用虚拟机挂载的存储装置中的数据镜像同步。In some cases, the standby virtual machine is configured to synchronize data with the active virtual machine, including: configuring the data in the storage device on which the standby virtual machine is mounted by calling an API interface of the cloud resource management system To synchronize with the data mirror in the storage device mounted by the active virtual machine.
在一些情形下,在运行主用虚拟机的过程中,将所述主用虚拟机挂载的存 储装置中的数据镜像复制到所述备用虚拟机挂载的存储装置中,以便所述备用虚拟机和所述主用虚拟机数据同步。In some cases, during the operation of the active virtual machine, the data image in the storage device mounted by the active virtual machine is copied to the storage device mounted by the standby virtual machine, so that the standby virtual machine The data of the machine and the active virtual machine are synchronized.
在一些情形下,所述对所述主用虚拟机进行状态监控,包括:每隔预设时间段向所述主用虚拟机发送监控消息;采集所述监控消息对应的返回消息;根据所述监控消息对应的返回消息,确定所述主用虚拟机的状态。In some cases, the state monitoring of the active virtual machine includes: sending monitoring messages to the active virtual machine every preset time period; collecting return messages corresponding to the monitoring messages; according to the Monitor the return message corresponding to the message to determine the status of the active virtual machine.
在一些情形下,所述控制所述主用虚拟机对应的备用虚拟机处理所述主用虚拟机的业务,包括:关闭所述主用虚拟机,使所述主用虚拟机的设备状态处于备用状态;开启所述备用虚拟机,使所述备用虚拟机的设备状态处于主用状态。In some cases, the controlling the standby virtual machine corresponding to the active virtual machine to process the service of the active virtual machine includes: shutting down the active virtual machine, so that the device state of the active virtual machine is in Standby state; turn on the standby virtual machine so that the device state of the standby virtual machine is in the active state.
在一些情形下,所述业务自愈触发条件,包括:所述主用虚拟机的网络异常和/或业务异常。In some cases, the service self-healing trigger condition includes: a network abnormality and / or a service abnormality of the active virtual machine.
实施例五Example 5
本发明实施例还提供了一种计算机程序产品,所述计算机程序产品包括存储在非暂态计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,使所述计算机执行上述任意方法实施例中的方法。An embodiment of the present invention also provides a computer program product. The computer program product includes a computer program stored on a non-transitory computer-readable storage medium. The computer program includes program instructions. When the program instructions are executed by a computer When, the computer is caused to execute the method in any of the above method embodiments.
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统、装置中的功能模块/单元可以被实施为软件、固件、硬件及其适当的组合。在硬件实施方式中,在以上描述中提及的功能模块/单元之间的划分不一定对应于物理组件的划分;例如,一个物理组件可以具有多个功能,或者一个功能或步骤可以由若干物理组件合作执行。某些物理组件或所有物理组件可以被实施为由处理器,如中央处理器、数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常 包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。Those of ordinary skill in the art may understand that all or some of the steps, systems, and functional modules / units in the method disclosed above may be implemented as software, firmware, hardware, and appropriate combinations thereof. In a hardware implementation, the division between the functional modules / units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be composed of several physical The components are executed in cooperation. Some or all physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit . Such software may be distributed on computer-readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). As is well known to those of ordinary skill in the art, the term computer storage media includes both volatile and nonvolatile implemented in any method or technology for storing information such as computer readable instructions, data structures, program modules, or other data Sex, removable and non-removable media. Computer storage media include but are not limited to RAM, ROM, EEPROM, flash memory or other memory technologies, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cartridges, magnetic tape, magnetic disk storage or other magnetic storage devices, or may Any other medium for storing desired information and accessible by a computer. In addition, it is well known to those of ordinary skill in the art that the communication medium generally contains computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transmission mechanism, and may include any information delivery medium .
尽管为示例目的,已经公开了本发明的优选实施例,本领域的技术人员将意识到各种改进、增加和取代也是可能的,因此,本发明的范围应当不限于上述实施例。Although the preferred embodiments of the present invention have been disclosed for illustrative purposes, those skilled in the art will realize that various improvements, additions, and substitutions are also possible, and therefore, the scope of the present invention should not be limited to the above-described embodiments.

Claims (10)

  1. 一种基于虚机容灾的业务自愈方法,其中,包括:A business self-healing method based on virtual machine disaster recovery, which includes:
    在运行主用虚拟机的过程中,对所述主用虚拟机进行状态监控;In the process of running the primary virtual machine, performing status monitoring on the primary virtual machine;
    在监控到所述主用虚拟机的状态符合业务自愈触发条件时,控制所述主用虚拟机对应的备用虚拟机处理所述主用虚拟机的业务;When it is monitored that the state of the active virtual machine meets the service self-healing trigger condition, controlling the standby virtual machine corresponding to the active virtual machine to process the service of the active virtual machine;
    其中,所述备用虚拟机与所述主用虚拟机位于不同的数据中心,并且所述备用虚拟机被配置为所述主用虚拟机的容灾虚拟机。Wherein, the standby virtual machine and the active virtual machine are located in different data centers, and the standby virtual machine is configured as a disaster recovery virtual machine of the active virtual machine.
  2. 如权利要求1所述的方法,其中,所述备用虚拟机被配置为所述主用虚拟机的容灾虚拟机,包括:The method of claim 1, wherein the standby virtual machine is configured as a disaster recovery virtual machine of the active virtual machine, including:
    所述备用虚拟机被配置为与所述主用虚拟机的互联网协议IP地址相同;The standby virtual machine is configured to be the same as the Internet protocol IP address of the active virtual machine;
    所述备用虚拟机被配置为与所述主用虚拟机数据同步。The standby virtual machine is configured to synchronize data with the active virtual machine.
  3. 如权利要求2所述的方法,其中,所述备用虚拟机被配置为与所述主用虚拟机的互联网协议IP地址相同,包括:The method of claim 2, wherein the standby virtual machine is configured to be the same as the Internet protocol IP address of the active virtual machine, including:
    通过调用云资源管理系统的应用程序编程接口API,控制所述备用虚拟机和所述主用虚拟机之间的网络交换机,使所述网络交换机将所述备用虚拟机配置为与所述主用虚拟机的IP地址相同。Call the application programming interface API of the cloud resource management system to control the network switch between the standby virtual machine and the active virtual machine, so that the network switch configures the standby virtual machine to be active with the active virtual machine The IP address of the virtual machine is the same.
  4. 如权利要求2所述的方法,其中,所述备用虚拟机被配置为与所述主用虚拟机数据同步,包括:The method of claim 2, wherein the standby virtual machine is configured to synchronize data with the active virtual machine, including:
    通过调用云资源管理系统的API接口,将所述备用虚拟机挂载的存储装置中的数据配置为与所述主用虚拟机挂载的存储装置中的数据镜像同步。By calling the API interface of the cloud resource management system, the data in the storage device mounted by the standby virtual machine is configured to be synchronized with the data mirroring in the storage device mounted by the active virtual machine.
  5. 如权利要求4所述的方法,其中,所述方法还包括:The method of claim 4, wherein the method further comprises:
    在运行主用虚拟机的过程中,将所述主用虚拟机挂载的存储装置中的数据镜像复制到所述备用虚拟机挂载的存储装置中,以便所述备用虚拟机和所述主用虚拟机数据同步。During the operation of the active virtual machine, copy the data image in the storage device mounted by the active virtual machine to the storage device mounted by the standby virtual machine, so that the standby virtual machine and the active virtual machine Synchronize data with virtual machines.
  6. 如权利要求1所述的方法,其中,所述对所述主用虚拟机进行状态监控,包括:The method of claim 1, wherein the status monitoring of the active virtual machine comprises:
    每隔预设时间段向所述主用虚拟机发送监控消息;Send a monitoring message to the active virtual machine every preset time period;
    采集所述监控消息对应的返回消息;Collecting the return message corresponding to the monitoring message;
    根据所述监控消息对应的返回消息,确定所述主用虚拟机的状态。The state of the active virtual machine is determined according to the return message corresponding to the monitoring message.
  7. 如权利要求1所述的方法,其中,所述控制所述主用虚拟机对应的备用虚拟机处理所述主用虚拟机的业务,包括:The method of claim 1, wherein the controlling the standby virtual machine corresponding to the active virtual machine to process the service of the active virtual machine comprises:
    关闭所述主用虚拟机,使所述主用虚拟机的设备状态处于备用状态;Shut down the active virtual machine, so that the device state of the active virtual machine is in a standby state;
    开启所述备用虚拟机,使所述备用虚拟机的设备状态处于主用状态。Turn on the standby virtual machine so that the device state of the standby virtual machine is in the active state.
  8. 如权利要求1~7中任一项所述的方法,其中,所述业务自愈触发条件,包括:所述主用虚拟机的网络异常和/或业务异常。The method according to any one of claims 1 to 7, wherein the service self-healing trigger condition includes: a network abnormality and / or a service abnormality of the active virtual machine.
  9. 一种基于虚机容灾的业务自愈设备,其中,所述基于虚机容灾的业务自愈设备包括处理器和存储器;所述处理器用于执行所述存储器中存储的基于虚机容灾的业务自愈程序,以实现权利要求1~8中任一项所述的基于虚机容灾的业务自愈方法。A service self-healing device based on virtual machine disaster recovery, wherein the service self-healing device based on virtual machine disaster recovery includes a processor and a memory; the processor is used to perform virtual machine-based disaster recovery stored in the memory The self-healing process of the service to implement the self-healing method of business recovery based on virtual machine disaster recovery according to any one of claims 1-8.
  10. 一种存储介质,其中,所述存储介质存储有一个或者多个程序,所述一个或者多个程序可被一个或者多个处理器执行,以实现权利要求1~8中任一项所述的基于虚机容灾的业务自愈方法。A storage medium, wherein the storage medium stores one or more programs, and the one or more programs can be executed by one or more processors to implement any one of claims 1 to 8. Business self-healing method based on virtual machine disaster recovery.
PCT/CN2019/112364 2018-11-21 2019-10-21 Service self-healing method and device based on virtual machine disaster recovery, and storage medium WO2020103627A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811393959.0A CN111209145A (en) 2018-11-21 2018-11-21 Virtual machine disaster tolerance-based service self-healing method, equipment and storage medium
CN201811393959.0 2018-11-21

Publications (1)

Publication Number Publication Date
WO2020103627A1 true WO2020103627A1 (en) 2020-05-28

Family

ID=70774552

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/112364 WO2020103627A1 (en) 2018-11-21 2019-10-21 Service self-healing method and device based on virtual machine disaster recovery, and storage medium

Country Status (2)

Country Link
CN (1) CN111209145A (en)
WO (1) WO2020103627A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112202853B (en) * 2020-09-17 2022-07-22 杭州安恒信息技术股份有限公司 Data synchronization method, system, computer device and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102497288A (en) * 2011-12-13 2012-06-13 华为技术有限公司 Dual-server backup method and dual system implementation device
CN104579791A (en) * 2015-01-26 2015-04-29 浪潮电子信息产业股份有限公司 Method for achieving automatic K-DB main and standby disaster recovery cluster switching
CN106817238A (en) * 2015-11-30 2017-06-09 中兴通讯股份有限公司 Virtual machine repair method, virtual machine, system and business function network element
US20170220371A1 (en) * 2014-03-28 2017-08-03 Ntt Docomo, Inc. Virtualized resource management node and virtual machine migration method
CN107171870A (en) * 2017-07-17 2017-09-15 郑州云海信息技术有限公司 A kind of two-node cluster hot backup method and device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9461881B2 (en) * 2011-09-30 2016-10-04 Commvault Systems, Inc. Migration of existing computing systems to cloud computing sites or virtual machines
CN204859222U (en) * 2015-06-02 2015-12-09 郑州银行股份有限公司 With two high available systems that live of city data center

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102497288A (en) * 2011-12-13 2012-06-13 华为技术有限公司 Dual-server backup method and dual system implementation device
US20170220371A1 (en) * 2014-03-28 2017-08-03 Ntt Docomo, Inc. Virtualized resource management node and virtual machine migration method
CN104579791A (en) * 2015-01-26 2015-04-29 浪潮电子信息产业股份有限公司 Method for achieving automatic K-DB main and standby disaster recovery cluster switching
CN106817238A (en) * 2015-11-30 2017-06-09 中兴通讯股份有限公司 Virtual machine repair method, virtual machine, system and business function network element
CN107171870A (en) * 2017-07-17 2017-09-15 郑州云海信息技术有限公司 A kind of two-node cluster hot backup method and device

Also Published As

Publication number Publication date
CN111209145A (en) 2020-05-29

Similar Documents

Publication Publication Date Title
US11307943B2 (en) Disaster recovery deployment method, apparatus, and system
CN105743692B (en) Policy-based framework for application management
US10983880B2 (en) Role designation in a high availability node
CN110224871B (en) High-availability method and device for Redis cluster
US8959395B2 (en) Method and system for providing high availability to computer applications
CN109344014B (en) Main/standby switching method and device and communication equipment
CN109286529B (en) Method and system for recovering RabbitMQ network partition
CN105302661A (en) System and method for implementing virtualization management platform high availability
WO2016202051A1 (en) Method and device for managing active and backup nodes in communication system and high-availability cluster
US11153269B2 (en) On-node DHCP implementation for virtual machines
CN111314098A (en) Method and device for realizing VIP address drift in HA system
CN111835685B (en) Method and server for monitoring running state of Nginx network isolation space
CN113169895A (en) N +1 redundancy for virtualization services with low latency failover
CN111935244B (en) Service request processing system and super-integration all-in-one machine
CN111400285B (en) mySQL data fragment processing method, device, computer equipment and readable storage medium
WO2020103627A1 (en) Service self-healing method and device based on virtual machine disaster recovery, and storage medium
CN112860485A (en) Control method of dual-computer hot standby system based on keepalived
JPWO2019049433A1 (en) Cluster system, cluster system control method, server device, control method, and program
JP2012014674A (en) Failure recovery method, server, and program in virtual environment
WO2017092539A1 (en) Virtual machine repairing method, virtual machine device, system, and service functional network element
JP5285044B2 (en) Cluster system recovery method, server, and program
CN107087021B (en) Master-slave server determination method and device
WO2020241032A1 (en) Fault-tolerant system, server, fault-tolerant system operation method, server operation method, and program for server operation method
CN110266790B (en) Edge cluster management method and device, edge cluster and readable storage medium
WO2020083271A1 (en) Aggregated link convergence method and apparatus, and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19886518

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 29/09/2021)

122 Ep: pct application non-entry in european phase

Ref document number: 19886518

Country of ref document: EP

Kind code of ref document: A1