WO2017092539A1 - Virtual machine repairing method, virtual machine device, system, and service functional network element - Google Patents

Virtual machine repairing method, virtual machine device, system, and service functional network element Download PDF

Info

Publication number
WO2017092539A1
WO2017092539A1 PCT/CN2016/104293 CN2016104293W WO2017092539A1 WO 2017092539 A1 WO2017092539 A1 WO 2017092539A1 CN 2016104293 W CN2016104293 W CN 2016104293W WO 2017092539 A1 WO2017092539 A1 WO 2017092539A1
Authority
WO
WIPO (PCT)
Prior art keywords
virtual machine
machine device
alarm
virtual
standby
Prior art date
Application number
PCT/CN2016/104293
Other languages
French (fr)
Chinese (zh)
Inventor
张川
虞振峰
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2017092539A1 publication Critical patent/WO2017092539A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis

Definitions

  • the present disclosure relates to the field of communications, for example, to a virtual machine repair method, a virtual machine device, a system, and a service function network element.
  • a virtual network function (Virtualized Network Function) is usually created by using a dual-machine (primary and standby virtual machine) manner.
  • VNF Virtualized Network Function
  • the dual-system VNF instance is presented as a function node.
  • the management node monitors and alarms the fault of the dual-system VNF instance.
  • the management node generates only when the primary and backup virtual machines in the dual-system VNF instance are abnormal. A fatal alarm can be used to manually restart or regenerate a service instance to complete the fault repair of the primary and backup virtual machines.
  • the dual-system VNF instance has a dual-system internal fault and a service function fault
  • the virtual node that constitutes the dual-system VNF instance is still in the working state, and the management node cannot find the fault, and the management node cannot repair the fault.
  • the VNF instance has an internal fault of the dual-system, which may refer to a failure of one of the primary virtual machine or the standby virtual machine. That is, when the standby virtual machine fails, the primary virtual machine is in a normal working state, or when the primary virtual machine is in the virtual state. The machine is faulty and the standby virtual machine is in normal working condition.
  • the management node finds that the dual-system VNF instance is faulty, the active and standby virtual machines have failed. This has caused service interruption of the VNF instance.
  • the mechanism for creating a VNF instance using the dual-machine mode for disaster recovery backup is not sound enough and the reliability is insufficient.
  • the fault of the virtual machine layer that constitutes the VNF instance is usually reported to the upper-level management node in an alarm manner, and the upper-level management node distinguishes the reported virtual machine alarms.
  • the upper management node needs to rebuild or restart the virtual machine to fix the fault.
  • the instruction to rebuild or restart is issued through the decision of the superior management node.
  • the service function node that is, the service function network element
  • the dual-machine VNF instance Develop a special software interface specification.
  • This special software interface specification has certain limitations on the openness of the system. For a virtual machine that does not conform to the agreed interface specification, the system cannot be accessed. And in the top-down management mode, the upper management node decides to pass a specific software interface. It is not timely enough to issue re-repair instructions, which reduces the efficiency of system fault repair.
  • the present disclosure provides a virtual machine repairing method, a virtual machine device, a system, and a service function network element, which can solve the problem that the management node cannot discover the internal fault of the dual virtual machine in time and the repair efficiency is low after the fault is found.
  • the embodiment of the present disclosure provides a virtual machine repairing method, including:
  • the first virtual machine device in the dual virtual machine detects that the second virtual machine device generates a target alarm that requires virtual machine repair;
  • the first virtual machine device initiates repairing the second virtual machine device.
  • the detecting, by the first virtual machine device, the second virtual machine device to generate the target alarm includes:
  • the first virtual machine device determines that the second virtual machine device generates a target alarm according to the failure notification.
  • the method before the detecting, by the first virtual machine device, the second virtual machine device to generate the target alarm, the method further includes:
  • the first virtual machine device is switched from the standby virtual machine to the primary virtual machine
  • the second virtual machine device is switched from the primary virtual machine to the standby virtual machine.
  • the detecting, by the first virtual machine device, the second virtual machine device to generate the target alarm includes:
  • the first virtual machine device After the first virtual machine device is switched to be the primary virtual machine, the first virtual machine device detects whether the second virtual machine device is normal. When the first virtual machine device detects that the second virtual machine device is abnormal, Determining that the second virtual machine device generates a target alarm that requires virtual machine repair.
  • the method further includes:
  • the second virtual machine device has a target failure, wherein the target failure includes at least one of an abnormal interruption failure and a fatal business function abnormality.
  • the detecting, by the first virtual machine device, whether the second virtual machine device is normal comprises: detecting whether the second virtual machine device is in a position or a state is abnormal.
  • the first virtual machine device initiates repairing the second virtual machine device, including:
  • the first virtual machine device initiates a restart process of restarting the second virtual machine device or initiates a process of reestablishing the second virtual machine device.
  • the method further includes: when the first virtual machine device determines that there is currently an unprocessed virtual machine rebuilding process, After the preset duration is delayed, the process of reestablishing the second virtual machine device is initiated, or the target alarm is re-detected.
  • the process of the first virtual machine device initiating reconstruction of the second virtual machine device includes:
  • the first virtual machine device sends a delete instruction for deleting the second virtual machine device to the virtual machine management node;
  • the first virtual machine device selects, according to a preset re-establishment policy, the second virtual machine device original resource setting or the adjusted second virtual machine device resource setting to the The virtual machine management node sends a virtual machine creation instruction.
  • the method further includes:
  • the first virtual machine device After the creation of the new second virtual machine device is completed, the first virtual machine device initiates an active/standby switchover instruction, so that the new second virtual machine device switches to the primary virtual machine, and switches itself to the standby virtual machine;
  • the new second virtual machine device After the first virtual machine device is deleted, the new second virtual machine device sends a virtual machine creation instruction to the virtual machine management node with the resource setting of the new second virtual machine device to create The new second virtual machine device has a new first virtual machine device of the same resource configuration.
  • the embodiment of the present disclosure further provides a first virtual machine device, including an alarm detection module and a virtual machine repair module;
  • the alarm detection module is configured to detect whether the second virtual machine device generates a target alarm that needs to be repaired by the virtual machine;
  • the virtual machine repair module is configured to initiate repair of the second virtual machine device when the alarm detection module detects a result to the second virtual machine device to generate a target alarm that needs to be repaired by the virtual machine.
  • the alarm detection module includes a first alarm detection sub-module, configured to receive a failure notification sent by the second virtual machine device when the abnormality is interrupted, and determine that the second virtual machine device generates a target alarm.
  • the first virtual machine device further includes an active/standby switching module, configured to initiate an active/standby switchover when the second virtual machine device generates a fault, and switch the first virtual machine device to be a primary virtual machine.
  • an active/standby switching module configured to initiate an active/standby switchover when the second virtual machine device generates a fault, and switch the first virtual machine device to be a primary virtual machine.
  • the alarm detection module includes a second alarm detection sub-module, configured to determine, after the first virtual machine device is switched to be a primary virtual machine, when detecting that the second virtual machine device is abnormally abnormal, The second virtual machine device generates a target alarm.
  • the virtual machine repair module includes a restart submodule or a rebuild submodule
  • the restarting submodule is configured to initiate a restart process of restarting the second virtual machine device when the alarm detection module detects that the result is yes;
  • the reestablishing submodule is configured to initiate a process of reestablishing the second virtual machine device when the alarm detection module detects that the result is yes.
  • the rebuild submodule includes a rebuild initiation unit and a reconstruction unit;
  • the reestablishment initiating unit is configured to send, to the virtual machine management node, a delete instruction for deleting the second virtual machine device;
  • the reconstruction unit is configured to: after the second virtual machine device is deleted, select, according to a preset reconfiguration policy, the second virtual machine device original resource setting or the adjusted second virtual machine device resource setting to the virtual
  • the machine management node sends a virtual machine creation instruction.
  • An embodiment of the present disclosure further provides a virtual machine system, including a second virtual machine device and a first virtual machine device as described above;
  • the first virtual machine device is configured to initiate repairing the second virtual machine device when detecting that the second virtual machine device generates a target alarm that requires virtual machine repair.
  • the embodiment of the present disclosure further provides a service function network element, including the virtual machine system as described above.
  • Embodiments of the present disclosure also provide a non-transitory computer readable storage medium storing computer executable instructions for performing the virtual machine repair method described above.
  • Embodiments of the present disclosure also provide an electronic device including one or more processors, a memory, and one or more programs, the one or more programs being stored in a memory when being processed by one or more processors When executed, the above virtual machine repair method is executed.
  • the virtual machine repairing method, the virtual machine device, the system, and the service function network element provided by the embodiment of the present disclosure perform, by using the first virtual machine device in the dual virtual machine, whether the second virtual machine device generates a target alarm that needs to be repaired by the virtual machine.
  • the internal fault in the dual virtual machine can be discovered in time through the self-detection mode inside the dual-system; and when the first primary virtual machine detects that the second virtual machine device generates the target alarm Then, the modification of the standby virtual machine can be initiated, and the alarm can be sent to the upper management node step by step, and then the management node re-analyzes and then issues the modification instruction, which can cancel the virtual
  • the dependence of the machine system on the upper management node is better, the fault recovery efficiency is higher, and the method is more flexible and effective.
  • FIG. 1 is a schematic flowchart of a virtual machine repairing method according to Embodiment 1 of the present disclosure
  • FIG. 2 is a schematic diagram of a virtual machine re-growth process initiated according to Embodiment 1 of the present disclosure
  • FIG. 3 is a schematic flowchart of performing rebirth when adjusting resource settings according to Embodiment 1 of the present disclosure
  • FIG. 4 is a schematic structural diagram of a virtual machine system according to Embodiment 2 of the present disclosure.
  • FIG. 5 is a schematic structural diagram of a first virtual machine device according to Embodiment 2 of the present disclosure.
  • FIG. 6 is another schematic structural diagram of a first virtual machine device according to Embodiment 2 of the present disclosure.
  • FIG. 7 is a schematic flowchart of performing rebirth when a host service function is abnormal according to Embodiment 3 of the present disclosure
  • FIG. 8 is a schematic flowchart of performing rebirth when a host is abnormally interrupted according to Embodiment 3 of the present disclosure
  • FIG. 9 is a schematic flowchart of performing rebirth when a standby machine is abnormally interrupted according to Embodiment 3 of the present disclosure.
  • FIG. 10 is a schematic structural diagram of hardware of an electronic device provided by five according to an embodiment of the present disclosure.
  • the embodiment of the present disclosure can detect an internal fault in the dual virtual machine in time by the self-detection in the dual virtual machine; and when the first virtual machine device detects that the second virtual machine device generates the target alarm, the second virtual device can be directly initiated.
  • the modification of the machine device can release the dependence on the upper management node, so that the reliability and fault repair efficiency of the system are improved.
  • the virtual machine repairing method in this embodiment includes steps 110 - 120.
  • step 110 the first virtual machine device in the dual virtual machine detects that the second virtual machine device generates a target alarm that needs to be repaired by the virtual machine;
  • step 120 the first virtual machine device initiates repairing the second virtual machine device.
  • the active/standby switchover relationship between the first virtual machine device and the second virtual machine device is dynamically changeable, the first virtual machine device acts as the standby virtual machine, and the second virtual machine device acts as the primary virtual machine.
  • the second virtual machine device can also perform all functions of the first virtual machine device, including but not limited to alarm detection, virtual machine repair, and the like.
  • the first virtual machine device has all the functions of the second virtual machine device, including alarm reporting and function detection by using a third-party monitoring module.
  • the first virtual machine device detects that the second virtual machine device generates the target alarm, and at least includes the following.
  • Case 1 The first virtual machine device in step 110 is the primary virtual machine, and in the normal working process, the second virtual machine device serving as the standby virtual machine is in a standby state, and the second virtual machine device is monitored by self-detection. If the abnormal interrupting fault occurs, the second virtual machine device may send a failure notification to the first virtual machine device, and the first virtual machine device receives the failure notification sent by the second virtual machine device, thereby determining and recording the second virtual device. The device generates a target alarm.
  • the second virtual machine device may detect that an abnormal interruption fault occurs through its own fault detection unit, and may send a failure notification to the first virtual machine device, and the failure notification may be a simple network management protocol (Simple Network Management Protocol). , SNMP) Trap message. After receiving the SNMP Trap message, the first virtual machine device in the normal working state can record the abnormality corresponding to the message as a fatal fault. It should be understood that the abnormal interruption fault of the second virtual machine device in this embodiment may also be monitored by using a third-party monitoring module, and the third-party detection module and the self-detection detecting unit may be jointly monitored.
  • SNMP Simple Network Management Protocol
  • Case 2 The first virtual machine device in step 110 is originally a standby virtual machine, and the second virtual machine device is originally a primary virtual machine.
  • the second virtual machine device When the second virtual machine device generates a failure to initiate an active/standby switchover, According to the active/standby switching instruction, the first virtual machine device switches to the primary virtual machine, and the second virtual machine device switches to the standby virtual machine. After the first virtual machine device is switched to the primary virtual machine, it is detected whether the second virtual machine device (now the standby virtual machine) is normal. If the first virtual machine device detects that the second virtual machine device is abnormal, the second virtual device is determined to be the second virtual device. The device generates a target alarm.
  • both the first virtual machine device and the second virtual machine device perform self-detection in real time, and any one of the first virtual machine device and the second virtual device device needs to be mastered when it is detected.
  • a failure notification is sent to the other of the first virtual machine device and the second virtual machine device.
  • the fault that needs to be performed by the second virtual machine device to perform the active/standby switchover includes at least one of an abnormal interrupt fault and a fatal service function abnormal fault; and the abnormal interrupt fault and the fatal service function abnormal fault are respectively illustrated below. .
  • the second virtual machine device is the primary virtual machine.
  • the second virtual machine device When the second virtual machine device is abnormally interrupted, the second virtual machine device initiates an active/standby switchover command, and switches to the standby virtual machine, where the first virtual machine device is replaced by the original virtual machine. The machine switches to the primary virtual machine.
  • the second virtual machine device (now the standby virtual machine) sends a failure notification to the first virtual machine device (now the primary virtual machine); since the first virtual machine device after the switch is not fully activated at this time, The failure notification is not received, and after the first virtual machine device (now the primary virtual machine) is started, The second virtual machine device (now the standby virtual machine) is actively detected to determine whether the second virtual machine device (now the standby virtual machine) is normal.
  • Checking whether the second virtual machine device is normal may include detecting the Whether the second virtual machine device is in position and/or state is abnormal, for example, when the first virtual machine device detects that the second virtual machine device is not in position or the state is abnormal, it is determined that the second virtual machine device is abnormal, and the first virtual machine device is determined to be abnormal.
  • the second virtual machine device generates a target alarm.
  • the second virtual machine device (now the standby virtual machine) can also detect the failure of the abnormality interrupting unit and send the failure notification to the first virtual machine device (now the primary virtual machine) through its own fault detection unit.
  • the fault notification can be an SNMP Trap message. It should be understood that the abnormal interruption fault of the second virtual machine device in this embodiment may also be monitored or combined with the third-party monitoring module.
  • the second virtual machine device When the second virtual machine device (formerly the primary virtual machine) has a fatal service function abnormality (which may include abnormal business process status (such as loss of business critical process), virtual machine resource failure, network resource failure, etc.), the second virtual machine The device issues an active/standby switchover command. According to the active/standby switchover command, the first virtual machine device switches to the primary virtual machine, and the second virtual machine device switches to the standby virtual machine. After the first virtual machine device (now the primary virtual machine) is started, the second virtual machine device (now the standby virtual machine) is detected to determine whether the second virtual machine device is normal, and the second virtual machine is detected here.
  • a fatal service function abnormality which may include abnormal business process status (such as loss of business critical process), virtual machine resource failure, network resource failure, etc.
  • Whether the decoration is normal may include detecting whether the second virtual machine device is in position and/or state is abnormal, for example, when it is detected that the second virtual machine device is abnormal, the second virtual machine device is determined to be abnormal, and the second virtual machine is recorded.
  • the device generates a target alarm.
  • the second virtual machine device can also detect whether a fatal service function abnormality occurs through its own service function polling detection unit, and can also be monitored or combined with a third-party monitoring module.
  • the second virtual machine device may also perform self-detection, and send a failure notification to the first virtual machine device by self-detection. For example, if a downtime occurs after the second virtual machine device fails, the second virtual machine device cannot work, and the self-detection cannot be performed. In this case, optionally, the second virtual machine device can be used to access the second virtual machine. The status of the device is detected to determine if the second virtual machine device is abnormal.
  • the first virtual machine device initiating the second virtual machine device repair includes: the first virtual machine device initiates a restart process of restarting the second virtual machine device or initiates a process of reestablishing the second virtual machine device.
  • step 210 When the first virtual machine device initiates reconstruction of the second virtual machine device, please refer to FIG. 2, including step 210 and step 220.
  • step 210 the first virtual machine device sends a delete instruction for deleting the second virtual machine device to the virtual machine management node;
  • step 220 after the second virtual machine device is deleted, according to the preset reconfiguration policy, the first virtual machine device selects the second virtual machine device original resource setting or the adjusted second virtual machine device resource setting to virtual
  • the machine management node sends a virtual machine creation instruction to complete the reconstruction of the second virtual machine device.
  • the preset re-establishment policy in this embodiment can be determined according to the service function network element where the virtual machine is located, so it is more flexible and more scalable. For example, the decision can be made according to the type of the service function network element itself.
  • step 220 after the first virtual machine device sends the virtual machine creation instruction to the virtual machine management node according to the current service requirement and the like, the virtual machine creation instruction is sent to the virtual machine management node, as shown in FIG. - Step 330.
  • step 310 after the second virtual machine device is re-created, the first virtual machine device (formerly the primary virtual machine) initiates an active/standby switchover (the active/standby switchover is due to the first virtual machine device and the second created device).
  • the second virtual machine device (formerly the standby virtual machine) is re-created as the primary virtual machine, and the first virtual machine device is switched to the standby virtual machine;
  • step 320 the recreated second virtual machine device (now the primary virtual machine) sends a delete command to delete the first virtual machine device (now the standby virtual machine) to the virtual machine management node;
  • step 330 after the first virtual machine device is deleted, the re-created second virtual machine device (now the primary virtual machine) sends a virtual machine creation instruction to the virtual machine management node with the same settings as its own resources; The first virtual machine device 1 having the same resource in the second virtual machine device 2 is recreated. At this point, the reconstruction process is completed.
  • the reconstruction function of the virtual machine can also be controlled by a switch.
  • the method may further include: determining, by the first virtual machine device, whether there is an unprocessed virtual machine reconstruction process, if it is determined that the current existence exists.
  • the creation instruction of the second virtual machine device 2 may be initiated after the preset duration is delayed, or the target alarm may be re-detected.
  • the internal fault of the virtual machine system in this embodiment can be discovered by the virtual machine self-detection, and the virtual machine reconstruction can also be initiated by the virtual machine itself, and the reliance on the management node is removed, and the management node is Disaster recovery technology is complemented.
  • the reconstruction of the virtual machine can also be determined by the service function network element of the virtual machine, which is more convenient and flexible, and has better scalability.
  • Embodiment 2 is a diagrammatic representation of Embodiment 1:
  • the virtual machine system includes a dual virtual machine, where the dual virtual machine includes a first virtual machine device 1 and a second virtual machine device 2, and the first virtual machine device 1 is configured to initiate the second virtual machine device 2 repair when detecting that the second virtual machine device 2 generates the target alarm; the target alarm may be an alarm that needs to be repaired to the second virtual machine device 2.
  • the fault detection module, the alarm detection module, and the virtual machine repair module may be disposed on the dual virtual machine device in the embodiment; the fault detection module is configured to implement fault detection by using the fault detection module or the third party monitoring module; the alarm module is configured to The target alarm is found and the virtual machine repair module is triggered to perform virtual machine repair.
  • the following describes the case where the specific structure of the first virtual machine device 1 is combined with the case where the target alarm is generated.
  • the first virtual machine device 1 may include an alarm detecting module 11 and a virtual machine repairing module 12;
  • the alarm detecting module 11 is configured to detect whether the second virtual machine device generates a target alarm that needs to be repaired by the virtual machine;
  • the virtual machine repair module 12 is configured to initiate repair of the second virtual machine device when the alarm detection module 11 detects that the result is yes.
  • the alarm detecting module 11 of the first virtual machine device 1 may further include a first alarm detecting sub-module 111 configured to be in the first virtual machine device 1 as the primary virtual machine, and in the normal working state, the second virtual machine device is in the standby state.
  • a first alarm detecting sub-module 111 configured to be in the first virtual machine device 1 as the primary virtual machine, and in the normal working state, the second virtual machine device is in the standby state.
  • the failure notification sent by the second virtual machine device 2 is received, thereby determining that the second virtual machine device 2 has generated the target alarm according to the failure notification.
  • the second virtual machine device 2 can detect an abnormal interruption fault by its own fault detection unit, and the fault detection module of the second virtual machine device 2 can include a fault detection unit, and the fault detection unit is detecting When an abnormal interruption failure occurs to the second virtual machine device, a failure notification is sent to the first virtual machine device 1, wherein the failure notification may be an SNMP Trap message.
  • the alarm module of the first virtual machine device 1 in the normal working state may record the abnormality indicated by the message as a fatal failure. It should be understood that the abnormal interruption fault of the second virtual machine device 2 in this embodiment may also be monitored by using a third-party monitoring module, and the fault detection module is notified after the fault is detected.
  • the alarm detection module 11 of the first virtual machine device 1 may further include a second alarm detection sub-module 112.
  • the first virtual machine device 1 is a standby virtual machine, and may have an active/standby switching module.
  • the second virtual machine device 2 is originally a primary virtual machine, and may also have an active/standby switching module.
  • the second virtual machine device 2 is discovered by self-detection. If the fault occurs, the active/standby switchover is initiated when the fault occurs.
  • the second virtual machine device 2 switches from the original primary virtual machine to the standby virtual machine.
  • the device 1 switches from the original standby virtual machine to the primary virtual machine.
  • the second alarm detecting sub-module 112 of the first virtual machine device 1 actively detects whether the second virtual machine device 2 (now the standby virtual machine) is normal.
  • the first virtual machine device 1 (now the primary virtual machine) determines that the second virtual machine device 2 (now the standby virtual machine) generates a target alarm.
  • the fatal fault generated by the first virtual machine device 1 includes at least one of an abnormal interruption fault and a fatal business function abnormality; the following is an example of abnormal interruption fault and fatal business function abnormality.
  • the second virtual machine device 2 is originally a virtual machine.
  • an active/standby switchover command is generated to switch the second virtual machine device 2 from the original primary virtual machine to the standby device.
  • the first virtual machine device 1 is switched from the original standby virtual machine to the primary virtual machine.
  • the second virtual machine device 2 sends a failure notification to the first virtual machine device 1 (now the primary virtual machine); since the first virtual machine device 1 (now the primary virtual machine) is not fully activated at this time, the first The second alarm detecting sub-module 112 of the virtual machine device 1 does not receive the fault notification.
  • the second alarm detecting sub-module 112 can access the second virtual machine device 2 (now available).
  • the virtual machine performs a test to determine whether the second virtual machine device 2 is normal.
  • the detection of whether the second virtual machine device 2 is normal may include detecting whether the second virtual machine device 2 is in position and/or state is abnormal. For example, when it is detected that the second virtual machine device 2 is abnormal, it is determined that the second virtual machine device 2 is abnormal, and the second virtual machine device 2 is recorded to generate a target alarm.
  • the second virtual machine device 2 (now the standby virtual machine) can also detect the occurrence of an abnormal interruption fault by the fault detecting unit included in the fault detecting module of the fault, and the first virtual machine device 1 is The virtual machine sends a failure notification, which can be an SNMP Trap message. It should be understood that the abnormal interruption fault of the second virtual machine device 2 in this embodiment may also be monitored or combined with the third-party monitoring module, and then the third-party detection module sends the monitoring result to the second virtual machine device 2 for failure. Detection module.
  • the second virtual machine device 2 is originally a primary virtual machine, and the first virtual machine device 1 is originally a standby virtual machine.
  • the second virtual machine device 2 has a fatal service function abnormality (including but not limited to a business process state abnormality (such as a business critical process)
  • the active/standby switchover is initiated, the first virtual machine device 1 switches from the original standby virtual machine to the primary virtual machine according to the active/standby switchover command.
  • the second virtual machine device switches from the original primary virtual machine to the standby virtual machine.
  • the second alarm detecting sub-module 112 of the first virtual machine device 1 actively detects the second virtual machine device 2 (now the standby virtual machine).
  • Determining whether the second virtual machine device 2 is normal, and detecting whether the second virtual machine device 2 is normal includes, but is not limited to, detecting whether the second virtual machine device 2 is in position and/or state is abnormal, for example, when the presence is not detected.
  • the second virtual machine device 2 can also detect whether a fatal service function abnormality occurs through the service function polling detecting unit of the self fault detecting module, and can also be monitored by using or combined with a third-party monitoring module; the third-party monitoring module The monitoring result is sent to the fault detecting module of the second virtual machine device 2.
  • the second virtual machine device when the second virtual machine device fails, for example, the downtime is faulty, the second virtual machine device can be detected by the first virtual machine device, and the active/standby switching module of the first virtual machine device The active/standby switchover is initiated, and the first virtual machine device is switched from the original standby virtual machine to the primary virtual machine, and the second virtual machine device is switched from the original primary virtual machine to the standby virtual machine.
  • the fault detection module in the first virtual machine device 1 and the second virtual machine device 2 in this embodiment may include a fault detection unit of the virtual machine itself and a service function polling detection unit, and may also accept the third-party monitoring module.
  • the sent monitoring result is whether or not a target alarm is generated.
  • the fault detection module in this embodiment may also be configured to detect other types of alarms, and send the alarms to the alarm module, and the alarm module may perform different levels of screening and processing on the received alarms; for example;
  • the virtual machine repair module 12 is triggered to perform virtual machine repair.
  • the virtual machine repair module 12 of the first virtual machine device 1 includes a restart submodule 121 or a rebuild submodule 122;
  • the restarting sub-module 121 is configured to initiate a restart process of restarting the second virtual machine device 2 when the detection result of the alarm detecting module 11 is YES;
  • the reconstruction sub-module 122 is configured to initiate a process of reconstructing the second virtual machine device 2 when the detection result of the alarm detection module 11 is YES.
  • the second virtual machine device 2 may be implemented by the virtual machine management node by initiating a corresponding restart command to the virtual machine management node.
  • the restart of the second virtual machine device 2 can also be completed within the dual virtual machine by a corresponding restart command without passing through the virtual machine management node.
  • the reconstruction sub-module 122 of the first virtual machine device 1 includes a reconstruction initiation unit 1221 and a reconstruction unit 1222;
  • the reestablishment initiating unit 1221 is configured to send the deletion of the second virtual machine device 2 to the virtual machine management node. Delete instruction;
  • the reconstruction unit 1222 is configured to select the second virtual machine device 2 original resource setting or the adjusted second virtual machine device 2 resource setting according to the preset reconstruction policy after the second virtual machine device 2 is deleted, and manage the node to the virtual machine according to the preset reconstruction policy. Send a virtual machine creation command.
  • the preset re-establishment policy in this embodiment can be determined according to the service function network element where the virtual machine is located, so it is more flexible and more scalable. For example, the decision can be made according to the type of the service function network element itself.
  • the reconstruction unit 1222 of the virtual machine repair module 12 further includes: re-creating the second virtual machine device when the virtual machine creation instruction is sent to the virtual machine management node according to the current service requirement and the like, and the adjusted second virtual machine device resource setting. After the completion, the reconstruction unit 1222 initiates an active/standby switchover (the active/standby switchover is caused by the difference between the resources of the first virtual machine device and the new second virtual machine device), and the newly created second virtual machine device 2 is removed from the original device. The virtual machine is switched to the primary virtual machine, and the first virtual machine device 1 is switched from the original primary virtual machine to the standby virtual machine;
  • the virtual machine repair module of the second virtual machine device 2 that is re-created sends a delete instruction for deleting the first virtual machine device 1 to the virtual machine management node;
  • the virtual machine repair module of the second virtual machine device 2 that is re-created sends the virtual machine to the virtual machine management node after the first virtual machine device 1 is deleted, with the same settings as the resources of the newly created second virtual machine device 2. Creating an instruction; re-creating the first virtual machine device 1 having the same resource configuration as the second virtual machine device 2. At this point, the reconstruction process is completed.
  • the virtual machine repair module 12 of the virtual machine can also be controlled by a switch.
  • the virtual machine repair module 12 of the first virtual machine device in the embodiment may further determine whether there is an unprocessed virtual machine rebuild process. If it is determined that there is currently an unprocessed virtual machine re-establishment process, the creation instruction of the second virtual machine device 2 may be initiated after the preset duration is delayed, or the target alarm may be re-detected.
  • the active/standby switching relationship between the first virtual machine device 1 and the second virtual machine device 2 is dynamically changeable, and the first virtual machine device 1 serves as a standby virtual machine, and the second virtual machine device 2 As the primary virtual machine, the second virtual machine device 2 can also perform all the functions of the first virtual machine device 1, including but not limited to functions such as alarm detection and virtual machine repair.
  • the first virtual machine device 1 has all the functions of the second virtual machine device 2 described above, including alarm reporting and fault detection using a third-party monitoring module.
  • the internal fault of the virtual machine system can be discovered by the virtual machine self-detection, and the re-emergence of the virtual machine can also be initiated by the virtual machine itself.
  • the re-emergence of the virtual machine can also be determined by the service function network element of the virtual machine.
  • the dependency on the management node is more convenient, flexible, reliable, and more scalable.
  • the figure shows that the VNF instance deployed in the dual-machine mode is rebuilt and self-healed after the service function is abnormal, and includes steps 701-705.
  • step 701 the virtual machine A serves as the primary virtual machine, and the virtual machine B serves as the standby virtual machine.
  • the virtual machine A polls and detects its own business process. When the key service process of the virtual machine A is found to be lost, the active and standby virtual machine switches are performed. Switch virtual machine B to the primary virtual machine;
  • step 702 the virtual machine B starts and detects the virtual machine A state, and if the virtual machine A is not in the bit or the state is abnormal, a fatal alarm is generated;
  • step 703 after the virtual machine B finds the fatal alarm, the virtual machine B initiates an instruction to delete the virtual machine A to the management node;
  • step 704 after the virtual machine A is successfully deleted, the virtual machine B initiates a new virtual machine (called virtual machine C) according to the resource setting of the original virtual machine A according to the reconfiguration policy corresponding to the alarm type of the service function node. Re-creation;
  • step 705 after the virtual machine C is successfully created, the virtual machine C is started, and the service data is synchronized from the virtual machine B to become a new standby virtual machine.
  • the virtual machine B can also initiate the re-creation of the new virtual machine (referred to as virtual machine C) with the adjusted resource setting of the virtual machine A.
  • virtual machine C the new virtual machine
  • the virtual machine C is successfully started.
  • the virtual machine B initiates the active/standby switchover, the virtual machine C is switched to the virtual machine, and the virtual machine B is switched to the standby virtual machine.
  • the virtual machine C initiates the instruction to delete the virtual machine B to the management node, and the virtual machine B deletes successfully.
  • the virtual machine C is set up to initiate the re-creation of the new virtual machine (referred to as virtual machine D); after the virtual machine D is successfully created, the virtual machine D is started, and the service data is synchronized from the virtual machine C to become a new standby machine.
  • the figure shows that the VNF instance host deployed by the dual-machine is re-healed after being abnormally interrupted, and includes steps 801-step 805.
  • step 801 virtual machine A is used as the primary virtual machine, and virtual machine B is used as the standby virtual machine.
  • virtual machine A initiates a master-slave switchover, and virtual machine B is switched to be the primary virtual machine. Switch to a standby virtual machine;
  • step 802 the virtual machine B starts and detects the state of the standby virtual machine A, and detects whether a fatal alarm sent by the virtual machine A is found.
  • step 803 after discovering the fatal alarm, the virtual machine B initiates an instruction to delete the virtual machine A to the management node;
  • step 804 after the virtual machine A is successfully deleted, the virtual machine B initiates the re-creation of the new virtual machine (called the virtual machine C) according to the resource setting of the original virtual machine A according to the re-establishment policy corresponding to the alarm type.
  • step 805 after the virtual machine C is successfully created, the virtual machine C is started, and the service data is synchronized from the virtual machine B to become a new standby virtual machine.
  • FIG. 9 the figure shows a VNF instance standby virtual machine that is deployed in a dual-machine deployment and is self-healing after the abnormality is interrupted, and includes steps 901 to 904.
  • step 901 the virtual machine B is used as the standby virtual machine, and the virtual machine A is used as the primary virtual machine.
  • the virtual machine B sends a failure notification to the virtual machine A.
  • step 902 after discovering that the fault notification records the fatal alarm, the virtual machine A initiates deletion of the virtual machine B to the management node;
  • step 903 after the virtual machine B is successfully deleted, the virtual machine A initiates the re-creation of the new virtual machine (referred to as virtual machine C) according to the resource setting of the original virtual machine B according to the re-establishment policy corresponding to the alarm type.
  • virtual machine C the new virtual machine
  • step 904 after the virtual machine C is successfully created, the virtual machine C is started, and the service data is synchronized from the virtual machine A to become a new standby virtual machine.
  • the embodiment further provides a non-transitory computer readable storage medium storing computer executable instructions executable by any of the virtual machine repair methods.
  • the embodiment also provides an electronic device.
  • the electronic device is applied to the first virtual machine device, and includes:
  • One or more processors 1001, one processor 1001 is taken as an example in FIG. 10;
  • the electronic device may further include: an input device 1003 and an output device 1004.
  • the processor 1001, the memory 1002, the input device 1003, and the output device 1004 in the electronic device may be connected by a bus or other means, and the connection through the bus is taken as an example in FIG.
  • the memory 1002 is a non-transitory computer readable storage medium that can be used to store software programs, The computer executable program and the module, such as the program, the instruction or the module corresponding to the virtual machine repair method in the embodiment of the present disclosure (for example, the alarm detecting module 11 and the virtual machine repairing module 12 shown in FIG. 5).
  • the processor 1001 executes a plurality of functional applications and data processing by executing software programs, instructions or modules stored in the memory 1002, that is, implementing the virtual machine repairing method of the above method embodiments.
  • the memory 1002 can include a storage program area and a storage data area, wherein the storage program area can store an operating system, an application required by at least one function, and exemplarily, a preset reconstruction strategy can be stored in the storage program area of the electronic device.
  • the storage data area can store data and the like created according to the use of the electronic device.
  • the memory 1002 may include a high speed random access memory, and may also include a nonvolatile memory such as at least one magnetic disk storage device, flash memory device, or other nonvolatile solid state storage device.
  • the memory 1002 can optionally include a memory remotely located relative to the processor 1001 that can be connected to the electronic device over a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • the input device 1003 can be configured to receive input numeric or character information and to generate key signal inputs related to user settings and function control of the electronic device.
  • the output device 1004 can include a display device such as a display screen.
  • the electronic device of the embodiment of the present disclosure may further include a communication device 1005 that transmits and/or receives information through a communication network, for example, the electronic device may receive a target alarm sent by the second virtual machine device through the communication device 1005, and may also pass the communication device 1005. An instruction to delete the second virtual machine device is sent to the virtual machine management node.
  • a communication device 1005 that transmits and/or receives information through a communication network
  • the electronic device may receive a target alarm sent by the second virtual machine device through the communication device 1005, and may also pass the communication device 1005.
  • An instruction to delete the second virtual machine device is sent to the virtual machine management node.
  • modules or steps of the embodiments of the present disclosure may be implemented by a general-purpose computing device, which may be centralized on a single computing device or distributed over a network of multiple computing devices. Alternatively, they may be implemented by program code executable by the computing device such that they may be stored in a storage medium (ROM/RAM, diskette, optical disk) by a computing device, and in some cases
  • the steps shown or described may be performed in a different order than that herein, or they may be separately fabricated into a plurality of integrated circuit modules, or a plurality of the modules or steps may be implemented as a single integrated circuit module. Therefore, embodiments of the present disclosure are not limited to any particular combination of hardware and software.
  • the internal fault of the virtual machine system can be discovered by the virtual machine self-detection, and the internal fault in the dual virtual machine can be discovered in time;
  • the first virtual machine detects that the second virtual machine device generates the target alarm
  • the second virtual machine can be directly repaired, and the dependence on the upper management node is released, the reliability is better, and the fault repair efficiency is high.

Abstract

A virtual machine repairing method, a virtual machine device, a system, and a service functional network element. A first virtual machine device of two virtual machines detects whether a second virtual machine device generates a target warning that a virtual machine repair is required, and upon detecting that the second virtual machine device generates the target warning that a virtual machine repair is required, initiates repair of the second virtual machine device.

Description

虚拟机修复方法、虚拟机装置、系统及业务功能网元Virtual machine repair method, virtual machine device, system, and service function network element 技术领域Technical field
本公开涉及通信领域,例如涉及一种虚拟机修复方法、虚拟机装置、系统及业务功能网元。The present disclosure relates to the field of communications, for example, to a virtual machine repair method, a virtual machine device, a system, and a service function network element.
背景技术Background technique
在计算机/通讯虚拟化领域中,例如在电信网络功能虚拟化(Network Function Virtualization,NFV)协议架构中,通常使用双机(主、备虚拟机)方式创建一个虚拟的网络功能(Virtualized Network Function,VNF)实例实现容灾备份。在管理节点上,双机VNF实例作为一个功能节点呈现,管理节点对双机VNF实例的故障进行监控并告警,只有当双机VNF实例中主、备虚拟机都异常的时候,管理节点才产生致命告警,并且在告警后只能使用手动方式重启或重生业务实例,以完成主、备虚拟机的故障修复。In the field of computer/communication virtualization, for example, in the network function virtualization (NFV) protocol architecture, a virtual network function (Virtualized Network Function) is usually created by using a dual-machine (primary and standby virtual machine) manner. VNF) instances implement disaster recovery backup. On the management node, the dual-system VNF instance is presented as a function node. The management node monitors and alarms the fault of the dual-system VNF instance. The management node generates only when the primary and backup virtual machines in the dual-system VNF instance are abnormal. A fatal alarm can be used to manually restart or regenerate a service instance to complete the fault repair of the primary and backup virtual machines.
但是,当双机VNF实例发生双机系统内部故障和业务功能故障时,由于构成双机VNF实例的虚拟机仍处于工作状态,管理节点无法发现此故障,导致管理节点无法对该故障进行修复。其中,VNF实例发生双机系统内部故障,可以是指主虚拟机或者备虚拟机之一发生故障的情况,也就是说,当备虚拟机发生故障,主虚拟机处于正常工作状态,或者当主虚拟机发生故障,备虚拟机处于正常工作状态的情况。However, when the dual-system VNF instance has a dual-system internal fault and a service function fault, the virtual node that constitutes the dual-system VNF instance is still in the working state, and the management node cannot find the fault, and the management node cannot repair the fault. The VNF instance has an internal fault of the dual-system, which may refer to a failure of one of the primary virtual machine or the standby virtual machine. That is, when the standby virtual machine fails, the primary virtual machine is in a normal working state, or when the primary virtual machine is in the virtual state. The machine is faulty and the standby virtual machine is in normal working condition.
因此,当管理节点发现双机VNF实例故障时,此时的主、备虚拟机都已发生故障,已经造成了VNF实例的业务中断。Therefore, when the management node finds that the dual-system VNF instance is faulty, the active and standby virtual machines have failed. This has caused service interruption of the VNF instance.
可见,使用双机方式创建VNF实例进行容灾备份的机制不够健全,可靠性不足。另外,在其他虚拟化领域中,构成VNF实例的虚拟机层的故障通常是以告警方式上报上级管理节点,由上级管理节点对上报的虚拟机告警进行区分。对于致命故障产生的告警,上级管理节点需要对虚拟机进行重建或重启,以使故障修复。而重建或重启的指令则是通过上级管理节点决策发出,在这种至上而下的管理模式下,管理节点和业务功能节点(也即业务功能网元),例如双机VNF实例,之间需要制定特殊的软件接口规范。而这种特殊的软件接口规范对系统的开放性是有一定的局限性,对于不符合约定接口规范的虚拟机就无法接入该系统。且在该至上而下的管理模式中,上层管理节点决策通过特定的软件接口 规范下发重修指令并不够及时,降低了系统故障修复的效率。It can be seen that the mechanism for creating a VNF instance using the dual-machine mode for disaster recovery backup is not sound enough and the reliability is insufficient. In addition, in other virtualization areas, the fault of the virtual machine layer that constitutes the VNF instance is usually reported to the upper-level management node in an alarm manner, and the upper-level management node distinguishes the reported virtual machine alarms. For the alarm generated by the fatal fault, the upper management node needs to rebuild or restart the virtual machine to fix the fault. The instruction to rebuild or restart is issued through the decision of the superior management node. In this top-down management mode, between the management node and the service function node (that is, the service function network element), such as the dual-machine VNF instance, Develop a special software interface specification. This special software interface specification has certain limitations on the openness of the system. For a virtual machine that does not conform to the agreed interface specification, the system cannot be accessed. And in the top-down management mode, the upper management node decides to pass a specific software interface. It is not timely enough to issue re-repair instructions, which reduces the efficiency of system fault repair.
发明内容Summary of the invention
本公开提供一种虚拟机修复方法、虚拟机装置、系统及业务功能网元,可以解决相关技术中管理节点不能及时发现双虚拟机内部故障,以及发现故障后修复效率低的问题。The present disclosure provides a virtual machine repairing method, a virtual machine device, a system, and a service function network element, which can solve the problem that the management node cannot discover the internal fault of the dual virtual machine in time and the repair efficiency is low after the fault is found.
本公开实施例提供一种虚拟机修复方法,包括:The embodiment of the present disclosure provides a virtual machine repairing method, including:
双虚拟机中的第一虚拟机装置检测到第二虚拟机装置产生需要进行虚拟机修复的目标告警;以及The first virtual machine device in the dual virtual machine detects that the second virtual machine device generates a target alarm that requires virtual machine repair;
所述第一虚拟机装置发起对所述第二虚拟机装置进行修复。The first virtual machine device initiates repairing the second virtual machine device.
可选地,所述第一虚拟机装置检测到第二虚拟机装置产生目标告警包括:Optionally, the detecting, by the first virtual machine device, the second virtual machine device to generate the target alarm includes:
所述第一虚拟机装置接收到所述第二虚拟机装置在异常中断故障时发送的故障通知;以及Receiving, by the first virtual machine device, a failure notification sent by the second virtual machine device when an abnormal interruption failure occurs;
所述第一虚拟机装置根据所述故障通知判定所述第二虚拟机装置产生目标告警。The first virtual machine device determines that the second virtual machine device generates a target alarm according to the failure notification.
可选地,所述第一虚拟机装置检测第二虚拟机装置产生目标告警之前,还包括:Optionally, before the detecting, by the first virtual machine device, the second virtual machine device to generate the target alarm, the method further includes:
所述第一虚拟机装置由备虚拟机切换为主虚拟机,所述第二虚拟机装置由主虚拟机切换为备虚拟机。The first virtual machine device is switched from the standby virtual machine to the primary virtual machine, and the second virtual machine device is switched from the primary virtual machine to the standby virtual machine.
可选地,所述第一虚拟机装置检测第二虚拟机装置产生目标告警包括:Optionally, the detecting, by the first virtual machine device, the second virtual machine device to generate the target alarm includes:
所述第一虚拟机装置由备虚拟机切换为主虚拟机后,检测所述第二虚拟机装置是否正常,当所述第一虚拟机装置检测到所述第二虚拟机装置异常时,则判定所述第二虚拟机装置产生需要进行虚拟机修复的目标告警。After the first virtual machine device is switched to be the primary virtual machine, the first virtual machine device detects whether the second virtual machine device is normal. When the first virtual machine device detects that the second virtual machine device is abnormal, Determining that the second virtual machine device generates a target alarm that requires virtual machine repair.
可选地,在所述第一虚拟机装置检测到第二虚拟机装置产生需要进行虚拟机修复的目标告警后,还包括:Optionally, after the first virtual machine device detects that the second virtual machine device generates a target alarm that needs to be repaired by the virtual machine, the method further includes:
确定第二虚拟机装置发生目标故障,其中,所述目标故障包括异常中断故障和致命业务功能异常中的至少一种。Determining that the second virtual machine device has a target failure, wherein the target failure includes at least one of an abnormal interruption failure and a fatal business function abnormality.
可选地,第一虚拟机装置检测所述第二虚拟机装置是否正常包括:检测所述第二虚拟机装置是否在位或状态是否异常。Optionally, the detecting, by the first virtual machine device, whether the second virtual machine device is normal comprises: detecting whether the second virtual machine device is in a position or a state is abnormal.
可选地,所述第一虚拟机装置发起对所述第二虚拟机装置进行修复包括:Optionally, the first virtual machine device initiates repairing the second virtual machine device, including:
所述第一虚拟机装置发起重启所述第二虚拟机装置的重启流程或发起对所述第二虚拟机装置进行重建的流程。 The first virtual machine device initiates a restart process of restarting the second virtual machine device or initiates a process of reestablishing the second virtual machine device.
可选地,所述第一虚拟机装置发起对所述第二虚拟机装置进行重建的流程之前,还包括:当所述第一虚拟机装置判断当前存在未处理的虚拟机重建流程时,则延迟预设时长后再发起对所述第二虚拟机装置进行重建的流程,或对所述目标告警进行重新检测。Optionally, before the first virtual machine device initiates a process of reestablishing the second virtual machine device, the method further includes: when the first virtual machine device determines that there is currently an unprocessed virtual machine rebuilding process, After the preset duration is delayed, the process of reestablishing the second virtual machine device is initiated, or the target alarm is re-detected.
可选地,所述第一虚拟机装置发起对所述第二虚拟机装置进行重建的流程包括:Optionally, the process of the first virtual machine device initiating reconstruction of the second virtual machine device includes:
所述第一虚拟机装置向虚拟机管理节点发送删除所述第二虚拟机装置的删除指令;The first virtual machine device sends a delete instruction for deleting the second virtual machine device to the virtual machine management node;
所述第一虚拟机装置在所述第二虚拟机装置删除后,根据预设重建策略选择以所述第二虚拟机装置原资源设置或以调整后的第二虚拟机装置资源设置向所述虚拟机管理节点发送虚拟机创建指令。After the second virtual machine device is deleted, the first virtual machine device selects, according to a preset re-establishment policy, the second virtual machine device original resource setting or the adjusted second virtual machine device resource setting to the The virtual machine management node sends a virtual machine creation instruction.
可选地,所述第一虚拟机装置以调整后的第二虚拟机装置资源设置向所述虚拟机管理节点发送虚拟机创建指令之后,还包括:Optionally, after the first virtual machine device sends the virtual machine creation instruction to the virtual machine management node by using the adjusted second virtual machine device resource setting, the method further includes:
在新的第二虚拟机装置创建完成之后,所述第一虚拟机装置发起主备切换指令,以使得所述新的第二虚拟机装置切换为主虚拟机,自身切换为备虚拟机;After the creation of the new second virtual machine device is completed, the first virtual machine device initiates an active/standby switchover instruction, so that the new second virtual machine device switches to the primary virtual machine, and switches itself to the standby virtual machine;
所述新的第二虚拟机装置向所述虚拟机管理节点发送删除所述第一虚拟机装置的删除指令;Sending, by the new second virtual machine device, a delete instruction for deleting the first virtual machine device to the virtual machine management node;
在所述第一虚拟机装置删除后,所述新的第二虚拟机装置以所述新的第二虚拟机装置的资源设置,向所述虚拟机管理节点发送虚拟机创建指令,以创建于所述新的第二虚拟机装置具有相同资源配置的新的第一虚拟机装置。After the first virtual machine device is deleted, the new second virtual machine device sends a virtual machine creation instruction to the virtual machine management node with the resource setting of the new second virtual machine device to create The new second virtual machine device has a new first virtual machine device of the same resource configuration.
本公开实施例还提供了一种第一虚拟机装置,包括告警检测模块和虚拟机修复模块;The embodiment of the present disclosure further provides a first virtual machine device, including an alarm detection module and a virtual machine repair module;
所述告警检测模块设置为检测第二虚拟机装置是否产生需要进行虚拟机修复的目标告警;The alarm detection module is configured to detect whether the second virtual machine device generates a target alarm that needs to be repaired by the virtual machine;
所述虚拟机修复模块设置为在所述告警检测模块检测结果到第二虚拟机装置产生需要进行虚拟机修复的目标告警时,发起对所述第二虚拟机装置进行修复。The virtual machine repair module is configured to initiate repair of the second virtual machine device when the alarm detection module detects a result to the second virtual machine device to generate a target alarm that needs to be repaired by the virtual machine.
可选地,所述告警检测模块包括第一告警检测子模块,设置为接收所述第二虚拟机装置在异常中断故障时发送的故障通知,判定所述第二虚拟机装置产生目标告警。Optionally, the alarm detection module includes a first alarm detection sub-module, configured to receive a failure notification sent by the second virtual machine device when the abnormality is interrupted, and determine that the second virtual machine device generates a target alarm.
可选地,所述第一虚拟机装置还包括主备切换模块,设置为当所述第二虚拟机装置在产生故障时发起主备切换,将所述第一虚拟机装置切换为主虚拟机。 Optionally, the first virtual machine device further includes an active/standby switching module, configured to initiate an active/standby switchover when the second virtual machine device generates a fault, and switch the first virtual machine device to be a primary virtual machine. .
可选地,所述告警检测模块包括第二告警检测子模块,设置为在所述第一虚拟机装置切换为主虚拟机后,当检测所述第二虚拟机装置正常异常时,则判定所述第二虚拟机装置产生目标告警。Optionally, the alarm detection module includes a second alarm detection sub-module, configured to determine, after the first virtual machine device is switched to be a primary virtual machine, when detecting that the second virtual machine device is abnormally abnormal, The second virtual machine device generates a target alarm.
可选地,所述虚拟机修复模块包括重启子模块或重建子模块;Optionally, the virtual machine repair module includes a restart submodule or a rebuild submodule;
所述重启子模块设置为在所述告警检测模块检测结果为是时,发起重启所述第二虚拟机装置的重启流程;The restarting submodule is configured to initiate a restart process of restarting the second virtual machine device when the alarm detection module detects that the result is yes;
所述重建子模块设置为在所述告警检测模块检测结果为是时,发起对所述第二虚拟机装置进行重建的流程。The reestablishing submodule is configured to initiate a process of reestablishing the second virtual machine device when the alarm detection module detects that the result is yes.
可选地,所述虚拟机修复模块包括重建子模块时,所述重建子模块包括重建发起单元以及重建单元;Optionally, when the virtual machine repair module includes a rebuild submodule, the rebuild submodule includes a rebuild initiation unit and a reconstruction unit;
所述重建发起单元设置为向虚拟机管理节点发送删除所述第二虚拟机装置的删除指令;The reestablishment initiating unit is configured to send, to the virtual machine management node, a delete instruction for deleting the second virtual machine device;
所述重建单元设置为在所述第二虚拟机装置删除后,根据预设重建策略选择以所述第二虚拟机装置原资源设置或以调整后的第二虚拟机装置资源设置向所述虚拟机管理节点发送虚拟机创建指令。The reconstruction unit is configured to: after the second virtual machine device is deleted, select, according to a preset reconfiguration policy, the second virtual machine device original resource setting or the adjusted second virtual machine device resource setting to the virtual The machine management node sends a virtual machine creation instruction.
本公开实施例还提供了一种虚拟机系统,包括第二虚拟机装置和如上所述的第一虚拟机装置;An embodiment of the present disclosure further provides a virtual machine system, including a second virtual machine device and a first virtual machine device as described above;
所述第一虚拟机装置设置为检测到第二虚拟机装置产生需要进行虚拟机修复的目标告警时,发起对所述第二虚拟机装置进行修复。The first virtual machine device is configured to initiate repairing the second virtual machine device when detecting that the second virtual machine device generates a target alarm that requires virtual machine repair.
本公开实施例还提供了一种业务功能网元,包括如上所述的虚拟机系统。The embodiment of the present disclosure further provides a service function network element, including the virtual machine system as described above.
本公开实施例还提供一种非瞬时性计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令用于执行上述虚拟机修复方法。Embodiments of the present disclosure also provide a non-transitory computer readable storage medium storing computer executable instructions for performing the virtual machine repair method described above.
本公开实施例还提供一种电子设备,该电子设备包括一个或多个处理器、存储器以及一个或多个程序,所述一个或多个程序存储在存储器中,当被一个或多个处理器执行时,执行上述虚拟机修复方法。Embodiments of the present disclosure also provide an electronic device including one or more processors, a memory, and one or more programs, the one or more programs being stored in a memory when being processed by one or more processors When executed, the above virtual machine repair method is executed.
本公开实施例提供的虚拟机修复方法、虚拟机装置、系统及业务功能网元,通过双虚拟机中的第一虚拟机装置对第二虚拟机装置是否产生需要进行虚拟机修复的目标告警进行检测,而不是仅依靠管理节点,通过这种双机系统内部的自侦测方式可以及时发现双虚拟机中的内部故障;且在第一主虚拟机检测到第二虚拟机装置产生目标告警时,则可发起对备虚拟机的修改,可无需将告警逐级发给上层的管理节点,再等管理节点重新分析后下发修改指令,可解除虚拟 机系统对上层管理节点的依赖,可靠性更好,且故障修复效率更高,方式更为灵活有效。The virtual machine repairing method, the virtual machine device, the system, and the service function network element provided by the embodiment of the present disclosure perform, by using the first virtual machine device in the dual virtual machine, whether the second virtual machine device generates a target alarm that needs to be repaired by the virtual machine. Detecting, rather than relying solely on the management node, the internal fault in the dual virtual machine can be discovered in time through the self-detection mode inside the dual-system; and when the first primary virtual machine detects that the second virtual machine device generates the target alarm Then, the modification of the standby virtual machine can be initiated, and the alarm can be sent to the upper management node step by step, and then the management node re-analyzes and then issues the modification instruction, which can cancel the virtual The dependence of the machine system on the upper management node is better, the fault recovery efficiency is higher, and the method is more flexible and effective.
附图说明DRAWINGS
图1为本公开实施例一提供的虚拟机修复方法流程示意图;FIG. 1 is a schematic flowchart of a virtual machine repairing method according to Embodiment 1 of the present disclosure;
图2为本公开实施例一提供的虚拟机重生流程发起示意图;FIG. 2 is a schematic diagram of a virtual machine re-growth process initiated according to Embodiment 1 of the present disclosure;
图3为本公开实施例一提供的调整资源设置时进行重生的流程示意图;3 is a schematic flowchart of performing rebirth when adjusting resource settings according to Embodiment 1 of the present disclosure;
图4为本公开实施例二提供的虚拟机系统结构示意图;4 is a schematic structural diagram of a virtual machine system according to Embodiment 2 of the present disclosure;
图5为本公开实施例二提供的第一虚拟机装置结构示意图;FIG. 5 is a schematic structural diagram of a first virtual machine device according to Embodiment 2 of the present disclosure;
图6为本公开实施例二提供的第一虚拟机装置的另一结构示意图;FIG. 6 is another schematic structural diagram of a first virtual machine device according to Embodiment 2 of the present disclosure;
图7为本公开实施例三提供的主机业务功能异常时进行重生的流程示意图;FIG. 7 is a schematic flowchart of performing rebirth when a host service function is abnormal according to Embodiment 3 of the present disclosure;
图8为本公开实施例三提供的主机异常中断时进行重生的流程示意图;FIG. 8 is a schematic flowchart of performing rebirth when a host is abnormally interrupted according to Embodiment 3 of the present disclosure;
图9为本公开实施例三提供的备机异常中断时进行重生的流程示意图;以及FIG. 9 is a schematic flowchart of performing rebirth when a standby machine is abnormally interrupted according to Embodiment 3 of the present disclosure;
图10为本公开实施例提供五提供的电子设备的硬件结构示意图。FIG. 10 is a schematic structural diagram of hardware of an electronic device provided by five according to an embodiment of the present disclosure.
具体实施方式detailed description
本公开实施例通过双虚拟机中的自侦测可以及时发现双虚拟机中的内部故障;且在第一虚拟机装置检测到第二虚拟机装置产生目标告警时,可直接发起对第二虚拟机装置的修改,可解除对上层管理节点的依赖,使得系统的可靠性和故障修复效率提高。The embodiment of the present disclosure can detect an internal fault in the dual virtual machine in time by the self-detection in the dual virtual machine; and when the first virtual machine device detects that the second virtual machine device generates the target alarm, the second virtual device can be directly initiated. The modification of the machine device can release the dependence on the upper management node, so that the reliability and fault repair efficiency of the system are improved.
实施例一Embodiment 1
请参见图1所示,本实施例中的虚拟机修复方法包括步骤110-步骤120。Referring to FIG. 1 , the virtual machine repairing method in this embodiment includes steps 110 - 120.
在步骤110中,双虚拟机中的第一虚拟机装置检测到第二虚拟机装置产生需要进行虚拟机修复的目标告警;In step 110, the first virtual machine device in the dual virtual machine detects that the second virtual machine device generates a target alarm that needs to be repaired by the virtual machine;
在步骤120中,第一虚拟机装置发起对第二虚拟机装置进行修复。In step 120, the first virtual machine device initiates repairing the second virtual machine device.
应当理解的是,本实施例中第一虚拟机装置和第二虚拟机装置的主备倒换关系是可动态变化的,第一虚拟机装置作为备虚拟机,第二虚拟机装置作为主虚拟机时,第二虚拟机装置也即可执行第一虚拟机装置的所有功能,包括但不限于告警检测、虚拟机修复等功能。第一虚拟机装置具有第二虚拟机装置的所有功能,包括告警上报以及利用第三方监测模块进行故障检测等功能。 It should be understood that, in this embodiment, the active/standby switchover relationship between the first virtual machine device and the second virtual machine device is dynamically changeable, the first virtual machine device acts as the standby virtual machine, and the second virtual machine device acts as the primary virtual machine. The second virtual machine device can also perform all functions of the first virtual machine device, including but not limited to alarm detection, virtual machine repair, and the like. The first virtual machine device has all the functions of the second virtual machine device, including alarm reporting and function detection by using a third-party monitoring module.
上述步骤110中,第一虚拟机装置检测到第二虚拟机装置产生目标告警的情况至少包含以下几种。In the above step 110, the first virtual machine device detects that the second virtual machine device generates the target alarm, and at least includes the following.
情况一:步骤110中的第一虚拟机装置为主虚拟机且在正常工作过程中,作为备虚拟机的第二虚拟机装置处于待机状态,且通过自侦测监测到该第二虚拟机装置出现异常中断故障,则第二虚拟机装置可以向第一虚拟机装置发送故障通知,此时第一虚拟机装置会接收到第二虚拟机装置发送的故障通知,从而判断并记录该第二虚拟机装置产生了目标告警。Case 1: The first virtual machine device in step 110 is the primary virtual machine, and in the normal working process, the second virtual machine device serving as the standby virtual machine is in a standby state, and the second virtual machine device is monitored by self-detection. If the abnormal interrupting fault occurs, the second virtual machine device may send a failure notification to the first virtual machine device, and the first virtual machine device receives the failure notification sent by the second virtual machine device, thereby determining and recording the second virtual device. The device generates a target alarm.
本实施例中,第二虚拟机装置可通过自身的故障侦测单元监测到出现异常中断故障,可向第一虚拟机装置发送故障通知,该故障通知可以为简单网络管理协议(Simple Network Management Protocol,SNMP)Trap消息。处于正常工作状态的第一虚拟机装置接收到该SNMP Trap消息后,即可将此消息对应的异常记为出现致命故障。应当理解的是,本实施例中第二虚拟机装置的异常中断故障也可采用第三方监测模块进行监测,也可采用第三方检测模块和自身故障侦测单元共同进行监测。In this embodiment, the second virtual machine device may detect that an abnormal interruption fault occurs through its own fault detection unit, and may send a failure notification to the first virtual machine device, and the failure notification may be a simple network management protocol (Simple Network Management Protocol). , SNMP) Trap message. After receiving the SNMP Trap message, the first virtual machine device in the normal working state can record the abnormality corresponding to the message as a fatal fault. It should be understood that the abnormal interruption fault of the second virtual machine device in this embodiment may also be monitored by using a third-party monitoring module, and the third-party detection module and the self-detection detecting unit may be jointly monitored.
情况二:步骤110中的第一虚拟机装置原为备虚拟机,第二虚拟机装置原为主虚拟机,当第二虚拟机装置产生需要进行主、备切换的故障发起主备切换时,根据该主备切换指令,第一虚拟机装置切换为主虚拟机,第二虚拟机装置切换为备虚拟机。第一虚拟机装置切换为主虚拟机后,检测第二虚拟机装置(现为备虚拟机)是否正常,如第一虚拟机装置检测到第二虚拟机装置状态异常,则判定该第二虚拟机装置产生目标告警。Case 2: The first virtual machine device in step 110 is originally a standby virtual machine, and the second virtual machine device is originally a primary virtual machine. When the second virtual machine device generates a failure to initiate an active/standby switchover, According to the active/standby switching instruction, the first virtual machine device switches to the primary virtual machine, and the second virtual machine device switches to the standby virtual machine. After the first virtual machine device is switched to the primary virtual machine, it is detected whether the second virtual machine device (now the standby virtual machine) is normal. If the first virtual machine device detects that the second virtual machine device is abnormal, the second virtual device is determined to be the second virtual device. The device generates a target alarm.
需要说明的是,第一虚拟机装置和第二虚拟机装置均分别实时对自身进行自侦测,当第一虚拟机装置和第二虚拟机装置中的任意一个自侦测到产生需要进行主、备切换的故障时,向第一虚拟机装置和第二虚拟机装置中的另一个发送故障通知。本实施例中,第二虚拟机装置产生的需要进行主备切换的故障包括异常中断故障和致命业务功能异常故障中的至少一种;下面分别对异常中断故障和致命业务功能异常故障进行示例说明。It should be noted that both the first virtual machine device and the second virtual machine device perform self-detection in real time, and any one of the first virtual machine device and the second virtual device device needs to be mastered when it is detected. When the backup fails, a failure notification is sent to the other of the first virtual machine device and the second virtual machine device. In this embodiment, the fault that needs to be performed by the second virtual machine device to perform the active/standby switchover includes at least one of an abnormal interrupt fault and a fatal service function abnormal fault; and the abnormal interrupt fault and the fatal service function abnormal fault are respectively illustrated below. .
第二虚拟机装置原为主虚拟机,当第二虚拟机装置出现异常中断故障时,第二虚拟机装置发起主备切换指令,切换为备虚拟机,第一虚拟机装置由原来的备虚拟机切换为主虚拟机。The second virtual machine device is the primary virtual machine. When the second virtual machine device is abnormally interrupted, the second virtual machine device initiates an active/standby switchover command, and switches to the standby virtual machine, where the first virtual machine device is replaced by the original virtual machine. The machine switches to the primary virtual machine.
切换完成之后,第二虚拟机装置(现为备虚拟机)向第一虚拟机装置(现为主虚拟机)发送故障通知;由于此时切换后的第一虚拟机装置还未完全启动,因此接收不到该故障通知,在该第一虚拟机装置(现为主虚拟机)启动后,可 以主动对第二虚拟机装置(现为备虚拟机)进行检测,以确定该第二虚拟机装置(现为备虚拟机)是否正常,此处检测第二虚拟机装置是否正常可包括检测该第二虚拟机装置是否在位和/或状态是否异常,例如当第一虚拟机装置检测到第二虚拟机装置不在位或状态异常时,则判断该第二虚拟机装置不正常,记录该第二虚拟机装置产生目标告警。After the handover is completed, the second virtual machine device (now the standby virtual machine) sends a failure notification to the first virtual machine device (now the primary virtual machine); since the first virtual machine device after the switch is not fully activated at this time, The failure notification is not received, and after the first virtual machine device (now the primary virtual machine) is started, The second virtual machine device (now the standby virtual machine) is actively detected to determine whether the second virtual machine device (now the standby virtual machine) is normal. Checking whether the second virtual machine device is normal may include detecting the Whether the second virtual machine device is in position and/or state is abnormal, for example, when the first virtual machine device detects that the second virtual machine device is not in position or the state is abnormal, it is determined that the second virtual machine device is abnormal, and the first virtual machine device is determined to be abnormal. The second virtual machine device generates a target alarm.
本实施例中,第二虚拟机装置(现为备虚拟机)也可通过自身的故障侦测单元检测到出现异常中断故障,向第一虚拟机装置(现为主虚拟机)发送的故障通知,其中故障通知可以为SNMP Trap消息。应当理解的是,本实施例中第二虚拟机装置的异常中断故障也可采用或结合第三方监测模块进行监测。In this embodiment, the second virtual machine device (now the standby virtual machine) can also detect the failure of the abnormality interrupting unit and send the failure notification to the first virtual machine device (now the primary virtual machine) through its own fault detection unit. , wherein the fault notification can be an SNMP Trap message. It should be understood that the abnormal interruption fault of the second virtual machine device in this embodiment may also be monitored or combined with the third-party monitoring module.
当第二虚拟机装置(原为主虚拟机)出现致命业务功能异常(可以包括业务进程状态异常(如出现业务关键进程丢失)、虚拟机资源故障、网络资源故障等)时,第二虚拟机装置发出主备切换指令,根据该主备切换指令,第一虚拟机装置切换为主虚拟机,第二虚拟机装置切换为备虚拟机。第一虚拟机装置(现为主虚拟机)启动后,对第二虚拟机装置(现为备虚拟机)进行检测,以确定该第二虚拟机装置是否正常,此处检测该第二虚拟机装饰是否正常可包括检测该第二虚拟机装置是否在位和/或状态是否异常,例如当检测到不在位或状态异常时,则判断该第二虚拟机装置不正常,记录该第二虚拟机装置产生目标告警。本实施例中,第二虚拟机装置也可通过自身的业务功能轮询检测单元检测是否出现致命业务功能异常,也可采用或结合第三方监测模块进行监测。When the second virtual machine device (formerly the primary virtual machine) has a fatal service function abnormality (which may include abnormal business process status (such as loss of business critical process), virtual machine resource failure, network resource failure, etc.), the second virtual machine The device issues an active/standby switchover command. According to the active/standby switchover command, the first virtual machine device switches to the primary virtual machine, and the second virtual machine device switches to the standby virtual machine. After the first virtual machine device (now the primary virtual machine) is started, the second virtual machine device (now the standby virtual machine) is detected to determine whether the second virtual machine device is normal, and the second virtual machine is detected here. Whether the decoration is normal may include detecting whether the second virtual machine device is in position and/or state is abnormal, for example, when it is detected that the second virtual machine device is abnormal, the second virtual machine device is determined to be abnormal, and the second virtual machine is recorded. The device generates a target alarm. In this embodiment, the second virtual machine device can also detect whether a fatal service function abnormality occurs through its own service function polling detection unit, and can also be monitored or combined with a third-party monitoring module.
需要说明的是,上述产生的需要进行主、备切换的故障不会导致第二虚拟机装置停止工作,可以理解为仅导致第二虚拟机装置无法正常工作的故障。此时,第二虚拟机装置还可以进行自侦测,通过自侦测向第一虚拟机装置发送故障通知。例如,如果第二虚拟机装置发生故障后发生宕机,则第二虚拟机装置无法工作,也无法进行自侦测,此时,可选地,可以通过第一虚拟机装置对第二虚拟机装置的状态进行检测,以确定第二虚拟机装置是否异常。It should be noted that the above-mentioned failure that needs to perform the master-slave switching does not cause the second virtual machine device to stop working, and can be understood as a failure that only causes the second virtual machine device to fail to operate normally. At this time, the second virtual machine device may also perform self-detection, and send a failure notification to the first virtual machine device by self-detection. For example, if a downtime occurs after the second virtual machine device fails, the second virtual machine device cannot work, and the self-detection cannot be performed. In this case, optionally, the second virtual machine device can be used to access the second virtual machine. The status of the device is detected to determine if the second virtual machine device is abnormal.
在步骤120中,第一虚拟机装置发起第二虚拟机装置修复包括:第一虚拟机装置发起重启第二虚拟机装置的重启流程或发起对第二虚拟机装置进行重建的流程。In step 120, the first virtual machine device initiating the second virtual machine device repair includes: the first virtual machine device initiates a restart process of restarting the second virtual machine device or initiates a process of reestablishing the second virtual machine device.
第一虚拟机装置发起重启第二虚拟机装置的重启流程时,可通过向虚拟机管理节点发起相应的重启指令,经虚拟机管理节点实现对第二虚拟机装置的重启;也可以在双虚拟机内部通过相应的重启指令完成第二虚拟机装置的重启,而不经过虚拟机管理节点。 When the first virtual machine device initiates a restart process of restarting the second virtual machine device, restarting the second virtual machine device by using the virtual machine management node by initiating a corresponding restart instruction to the virtual machine management node; The inside of the machine completes the restart of the second virtual machine device through the corresponding restart command without passing through the virtual machine management node.
第一虚拟机装置发起对第二虚拟机装置进行重建时,请参见图2所示,包括步骤210和步骤220。When the first virtual machine device initiates reconstruction of the second virtual machine device, please refer to FIG. 2, including step 210 and step 220.
在步骤210中,第一虚拟机装置向虚拟机管理节点发送删除第二虚拟机装置的删除指令;In step 210, the first virtual machine device sends a delete instruction for deleting the second virtual machine device to the virtual machine management node;
在步骤220中,在第二虚拟机装置删除后,根据预设重建策略,第一虚拟机装置选择以第二虚拟机装置原资源设置或以调整后的第二虚拟机装置资源设置,向虚拟机管理节点发送虚拟机创建指令,完成第二虚拟机装置的重建。In step 220, after the second virtual machine device is deleted, according to the preset reconfiguration policy, the first virtual machine device selects the second virtual machine device original resource setting or the adjusted second virtual machine device resource setting to virtual The machine management node sends a virtual machine creation instruction to complete the reconstruction of the second virtual machine device.
本实施例中的预设重建策略可根据虚拟机所在的业务功能网元自身决策,因此更为灵活,且扩展性更强。例如可根据业务功能网元自身的类型等进行决策。The preset re-establishment policy in this embodiment can be determined according to the service function network element where the virtual machine is located, so it is more flexible and more scalable. For example, the decision can be made according to the type of the service function network element itself.
上述步骤220中,第一虚拟机装置根据当前业务需要等因素以调整后的第二虚拟机装置资源设置,向虚拟机管理节点发送虚拟机创建指令后,请参见图3所示,包括步骤310-步骤330。In the foregoing step 220, after the first virtual machine device sends the virtual machine creation instruction to the virtual machine management node according to the current service requirement and the like, the virtual machine creation instruction is sent to the virtual machine management node, as shown in FIG. - Step 330.
在步骤310中,在第二虚拟机装置重新创建完成之后,第一虚拟机装置(原为主虚拟机)发起主备切换(该主备切换是由于第一虚拟机装置与重新创建的第二虚拟机装置的资源不同导致的),将重新创建的第二虚拟机装置(原为备虚拟机)切换为主虚拟机,第一虚拟机装置切换为备虚拟机;In step 310, after the second virtual machine device is re-created, the first virtual machine device (formerly the primary virtual machine) initiates an active/standby switchover (the active/standby switchover is due to the first virtual machine device and the second created device). The second virtual machine device (formerly the standby virtual machine) is re-created as the primary virtual machine, and the first virtual machine device is switched to the standby virtual machine;
在步骤320中,重新创建的第二虚拟机装置(现为主虚拟机)向虚拟机管理节点发送删除第一虚拟机装置(现为备虚拟机)的删除指令;In step 320, the recreated second virtual machine device (now the primary virtual machine) sends a delete command to delete the first virtual machine device (now the standby virtual machine) to the virtual machine management node;
在步骤330中,在第一虚拟机装置删除后,重新创建的第二虚拟机装置(现为主虚拟机)以与自身的资源相同的设置,向虚拟机管理节点发送虚拟机创建指令;以重新创建于该第二虚拟机装置2资源相同的第一虚拟机装置1。至此,重建流程才完成。In step 330, after the first virtual machine device is deleted, the re-created second virtual machine device (now the primary virtual machine) sends a virtual machine creation instruction to the virtual machine management node with the same settings as its own resources; The first virtual machine device 1 having the same resource in the second virtual machine device 2 is recreated. At this point, the reconstruction process is completed.
另外,本实施例中,虚拟机的重建功能还可通过开关控制。In addition, in this embodiment, the reconstruction function of the virtual machine can also be controlled by a switch.
由于虚拟机重建请求具有唯一性,对于系统中可能出现的频繁目标告警可以采用相应的机制保护虚拟机重建的唯一完整性。因此在本实施例中的第一虚拟机装置发起对第二虚拟机装置进行重建的流程之前,还可以包括:第一虚拟机装置判断当前是否存在未处理的虚拟机重建流程,如果判定当前存在未处理的虚拟机重建流程,则可延迟预设时长后再发起第二虚拟机装置2的创建指令,或还可以对目标告警进行重新检测。Since the virtual machine reestablishment request is unique, a corresponding mechanism can be used to protect the unique integrity of the virtual machine reconstruction for frequent target alarms that may occur in the system. Therefore, before the first virtual machine device in the embodiment initiates the process of reestablishing the second virtual machine device, the method may further include: determining, by the first virtual machine device, whether there is an unprocessed virtual machine reconstruction process, if it is determined that the current existence exists. In the unprocessed virtual machine re-establishment process, the creation instruction of the second virtual machine device 2 may be initiated after the preset duration is delayed, or the target alarm may be re-detected.
可见,本实施例中虚拟机系统的内部故障可通过虚拟机自侦测发现,且虚拟机的重建也可有虚拟机自身发起,脱离了对管理节点的依赖,对管理节点的 容灾技术进行补充。另外,虚拟机的重建也可由虚拟机所在业务功能网元自身决策,更为方便灵活,且扩展性更好。It can be seen that the internal fault of the virtual machine system in this embodiment can be discovered by the virtual machine self-detection, and the virtual machine reconstruction can also be initiated by the virtual machine itself, and the reliance on the management node is removed, and the management node is Disaster recovery technology is complemented. In addition, the reconstruction of the virtual machine can also be determined by the service function network element of the virtual machine, which is more convenient and flexible, and has better scalability.
实施例二:Embodiment 2:
本实施例提供了一种虚拟机系统,请参见图4所示,该虚拟机系统包括双虚拟机,双虚拟机包括第一虚拟机装置1和第二虚拟机装置2,第一虚拟机装置1设置为当检测到第二虚拟机装置2产生目标告警时,发起第二虚拟机装置2修复;该目标告警可以是指需要对第二虚拟机装置2进行修复的告警。This embodiment provides a virtual machine system. As shown in FIG. 4, the virtual machine system includes a dual virtual machine, where the dual virtual machine includes a first virtual machine device 1 and a second virtual machine device 2, and the first virtual machine device 1 is configured to initiate the second virtual machine device 2 repair when detecting that the second virtual machine device 2 generates the target alarm; the target alarm may be an alarm that needs to be repaired to the second virtual machine device 2.
本实施例中的双虚拟机装置上都可设置故障检测模块、告警检测模块以及虚拟机修复模块;故障检测模块设置为通过该故障检测模块或第三方监测模块实现故障的检测;告警模块设置为发现目标告警,并触发虚拟机修复模块进行虚拟机修复。下面以第一虚拟机装置1的具体结构结合产生目标告警的几种情况进行示例说明。The fault detection module, the alarm detection module, and the virtual machine repair module may be disposed on the dual virtual machine device in the embodiment; the fault detection module is configured to implement fault detection by using the fault detection module or the third party monitoring module; the alarm module is configured to The target alarm is found and the virtual machine repair module is triggered to perform virtual machine repair. The following describes the case where the specific structure of the first virtual machine device 1 is combined with the case where the target alarm is generated.
请参见图5所示,第一虚拟机装置1可包括告警检测模块11和虚拟机修复模块12;Referring to FIG. 5, the first virtual machine device 1 may include an alarm detecting module 11 and a virtual machine repairing module 12;
告警检测模块11设置为检测第二虚拟机装置是否产生需要进行虚拟机修复的目标告警;The alarm detecting module 11 is configured to detect whether the second virtual machine device generates a target alarm that needs to be repaired by the virtual machine;
虚拟机修复模块12设置为在告警检测模块11检测结果为是时,发起对第二虚拟机装置进行修复。The virtual machine repair module 12 is configured to initiate repair of the second virtual machine device when the alarm detection module 11 detects that the result is yes.
第一虚拟机装置1的告警检测模块11还可包括第一告警检测子模块111,设置为在第一虚拟机装置1为主虚拟机,且处于在正常工作状态,第二虚拟机装置为备虚拟机时,当第二虚拟机装置2出现异常中断故障时,接收第二虚拟机装置2发送的故障通知,从而根据该故障通知确定该第二虚拟机装置2产生了目标告警。The alarm detecting module 11 of the first virtual machine device 1 may further include a first alarm detecting sub-module 111 configured to be in the first virtual machine device 1 as the primary virtual machine, and in the normal working state, the second virtual machine device is in the standby state. In the case of the virtual machine, when the second virtual machine device 2 has an abnormal interruption failure, the failure notification sent by the second virtual machine device 2 is received, thereby determining that the second virtual machine device 2 has generated the target alarm according to the failure notification.
本实施例中,第二虚拟机装置2可通过自身的故障侦测单元检测到出现异常中断故障,第二虚拟机装置2的故障检测模块可包括故障侦测单元,该故障侦测单元在检测到第二虚拟机装置出现异常中断故障时,向第一虚拟机装置1发送故障通知,其中该故障通知可以为SNMP Trap消息。In this embodiment, the second virtual machine device 2 can detect an abnormal interruption fault by its own fault detection unit, and the fault detection module of the second virtual machine device 2 can include a fault detection unit, and the fault detection unit is detecting When an abnormal interruption failure occurs to the second virtual machine device, a failure notification is sent to the first virtual machine device 1, wherein the failure notification may be an SNMP Trap message.
处于正常工作状态的第一虚拟机装置1的告警模块接收到该SNMP Trap消息后,可将此消息表示的异常记为出现致命故障。应当理解的是,本实施例中第二虚拟机装置2的异常中断故障也可通过采用第三方监测模块进行监测,在监测到故障后通知其故障检测模块。 After receiving the SNMP Trap message, the alarm module of the first virtual machine device 1 in the normal working state may record the abnormality indicated by the message as a fatal failure. It should be understood that the abnormal interruption fault of the second virtual machine device 2 in this embodiment may also be monitored by using a third-party monitoring module, and the fault detection module is notified after the fault is detected.
第一虚拟机装置1的告警检测模块11还可包括第二告警检测子模块112。第一虚拟机装1原为备虚拟机,可具有主备切换模块,第二虚拟机装置2原为主虚拟机,也可以具备主备切换模块;第二虚拟机装置2通过自侦测发现自身故障,产生需要进行主备切换的故障时发起主备切换,通过第二虚拟机装置的主备切换模块,第二虚拟机装置2从原来的主虚拟机切换为备虚拟机,第一虚拟机装置1从原来的备虚拟机切换为主虚拟机。第一虚拟机装置1切换为主虚拟机后,该第一虚拟机装置1的第二告警检测子模块112主动检测第二虚拟机装置2(现为备虚拟机)是否正常,如否,则第一虚拟机装置1(现为主虚拟机)判定该第二虚拟机装置2(现为备虚拟机)产生目标告警。The alarm detection module 11 of the first virtual machine device 1 may further include a second alarm detection sub-module 112. The first virtual machine device 1 is a standby virtual machine, and may have an active/standby switching module. The second virtual machine device 2 is originally a primary virtual machine, and may also have an active/standby switching module. The second virtual machine device 2 is discovered by self-detection. If the fault occurs, the active/standby switchover is initiated when the fault occurs. The second virtual machine device 2 switches from the original primary virtual machine to the standby virtual machine. The device 1 switches from the original standby virtual machine to the primary virtual machine. After the first virtual machine device 1 is switched to the primary virtual machine, the second alarm detecting sub-module 112 of the first virtual machine device 1 actively detects whether the second virtual machine device 2 (now the standby virtual machine) is normal. The first virtual machine device 1 (now the primary virtual machine) determines that the second virtual machine device 2 (now the standby virtual machine) generates a target alarm.
本实施例中,第一虚拟机装置1产生的致命故障包括异常中断故障和致命业务功能异常中的至少一种;下面分别为以异常中断故障和致命业务功能异常进行示例说明。In this embodiment, the fatal fault generated by the first virtual machine device 1 includes at least one of an abnormal interruption fault and a fatal business function abnormality; the following is an example of abnormal interruption fault and fatal business function abnormality.
第二虚拟机装置2原为主虚拟机,当该第二虚拟机装置2出现异常中断故障时,产生主备切换指令,以使得该第二虚拟机装置2从原来的主虚拟机切换为备虚拟机,第一虚拟机装置1从原来的备虚拟机切换为主虚拟机。同时第二虚拟机装置2向第一虚拟机装置1(现为主虚拟机)发送故障通知;由于此时第一虚拟机装置1(现为主虚拟机)还未完全启动,因此该第一虚拟机装置1的第二告警检测子模块112接收不到该故障通知,在该第一虚拟机装置1启动后,该第二告警检测子模块112可对第二虚拟机装置2(现为备虚拟机)进行检测,以判定该第二虚拟机装置2是否正常,此处检测该第二虚拟机装置2是否正常可以包括检测该第二虚拟机装置2是否在位和/或状态是否异常,例如当检测到不在位或状态异常时,则判断该第二虚拟机装置2不正常,记录该第二虚拟机装置2产生目标告警。本实施例中,第二虚拟机装置2(现为备虚拟机)也可通过自身故障检测模块包括的故障侦测单元检测到出现异常中断故障,并向第一虚拟机装置1(现为主虚拟机)发送故障通知,其中该故障通知可为SNMP Trap消息。应当理解的是,本实施例中第二虚拟机装置2的异常中断故障也可采用或结合第三方监测模块进行监测,然后第三方检测模块将监测结果发给该第二虚拟机装置2的故障检测模块。The second virtual machine device 2 is originally a virtual machine. When the second virtual machine device 2 has an abnormal interrupt failure, an active/standby switchover command is generated to switch the second virtual machine device 2 from the original primary virtual machine to the standby device. In the virtual machine, the first virtual machine device 1 is switched from the original standby virtual machine to the primary virtual machine. At the same time, the second virtual machine device 2 sends a failure notification to the first virtual machine device 1 (now the primary virtual machine); since the first virtual machine device 1 (now the primary virtual machine) is not fully activated at this time, the first The second alarm detecting sub-module 112 of the virtual machine device 1 does not receive the fault notification. After the first virtual machine device 1 is started, the second alarm detecting sub-module 112 can access the second virtual machine device 2 (now available). The virtual machine performs a test to determine whether the second virtual machine device 2 is normal. The detection of whether the second virtual machine device 2 is normal may include detecting whether the second virtual machine device 2 is in position and/or state is abnormal. For example, when it is detected that the second virtual machine device 2 is abnormal, it is determined that the second virtual machine device 2 is abnormal, and the second virtual machine device 2 is recorded to generate a target alarm. In this embodiment, the second virtual machine device 2 (now the standby virtual machine) can also detect the occurrence of an abnormal interruption fault by the fault detecting unit included in the fault detecting module of the fault, and the first virtual machine device 1 is The virtual machine sends a failure notification, which can be an SNMP Trap message. It should be understood that the abnormal interruption fault of the second virtual machine device 2 in this embodiment may also be monitored or combined with the third-party monitoring module, and then the third-party detection module sends the monitoring result to the second virtual machine device 2 for failure. Detection module.
第二虚拟机装置2原为主虚拟机,第一虚拟机装置1原为备虚拟机当第二虚拟机装置2出现致命业务功能异常(包括但不限于业务进程状态异常(如出现业务关键进程丢失)、虚拟机资源故障、网络资源故障等)时,发起主备切换,根据该主备切换指令,第一虚拟机装置1从原来的备虚拟机切换为主虚拟机, 第二虚拟机装置从原来的主虚拟机切换为备虚拟机。第一虚拟机装置1(现为主虚拟机)启动后,该第一虚拟机装置1的第二告警检测子模块112主动对第二虚拟机装置2(现为备虚拟机)进行检测,以确定该第二虚拟机装置2是否正常,此处检测该第二虚拟机装置2是否正常包括但不限于检测该第二虚拟机装置2是否在位和/或状态是否异常,例如当检测到不在位或状态异常时,则判断该第二虚拟机装置2不正常,记录该第二虚拟机装置2产生目标告警。本实施例中,第二虚拟机装置2还可通过自身故障检测模块的业务功能轮询检测单元检测是否出现致命业务功能异常,也可采用或结合第三方监测模块进行监测;该第三方监测模块将监测结果发给第二虚拟机装置2的故障检测模块。The second virtual machine device 2 is originally a primary virtual machine, and the first virtual machine device 1 is originally a standby virtual machine. When the second virtual machine device 2 has a fatal service function abnormality (including but not limited to a business process state abnormality (such as a business critical process) When the active/standby switchover is initiated, the first virtual machine device 1 switches from the original standby virtual machine to the primary virtual machine according to the active/standby switchover command. The second virtual machine device switches from the original primary virtual machine to the standby virtual machine. After the first virtual machine device 1 (now the primary virtual machine) is started, the second alarm detecting sub-module 112 of the first virtual machine device 1 actively detects the second virtual machine device 2 (now the standby virtual machine). Determining whether the second virtual machine device 2 is normal, and detecting whether the second virtual machine device 2 is normal includes, but is not limited to, detecting whether the second virtual machine device 2 is in position and/or state is abnormal, for example, when the presence is not detected. When the bit or state is abnormal, it is determined that the second virtual machine device 2 is abnormal, and the second virtual machine device 2 is recorded to generate a target alarm. In this embodiment, the second virtual machine device 2 can also detect whether a fatal service function abnormality occurs through the service function polling detecting unit of the self fault detecting module, and can also be monitored by using or combined with a third-party monitoring module; the third-party monitoring module The monitoring result is sent to the fault detecting module of the second virtual machine device 2.
此外,需要说明的是,当第二虚拟机装置发生故障时,例如宕机故障,则可以通过第一虚拟机装置对第二虚拟机装置进行检测,由第一虚拟机装置的主备切换模块发起主备切换,将第一虚拟机装置从原来的备虚拟机切换为主虚拟机,第二虚拟机装置从原来的主虚拟机切换为备虚拟机。In addition, it should be noted that when the second virtual machine device fails, for example, the downtime is faulty, the second virtual machine device can be detected by the first virtual machine device, and the active/standby switching module of the first virtual machine device The active/standby switchover is initiated, and the first virtual machine device is switched from the original standby virtual machine to the primary virtual machine, and the second virtual machine device is switched from the original primary virtual machine to the standby virtual machine.
可见,本实施例中第一虚拟机装置1和第二虚拟机装置2中的故障检测模块可包括虚拟机自身的故障侦测单元和业务功能轮询检测单元,也可通过接受第三方监测模块发送的监测结果得到是否产生目标告警。且应当理解的是,本实施例中的故障检测模块还可设置为检测其他类型的告警,并将告警发给告警模块,告警模块则可对收到的告警进行不同级别的筛选和处理;例如当筛选出的告警为目标告警时,则触发虚拟机修复模块12进行虚拟机修复。It can be seen that the fault detection module in the first virtual machine device 1 and the second virtual machine device 2 in this embodiment may include a fault detection unit of the virtual machine itself and a service function polling detection unit, and may also accept the third-party monitoring module. The sent monitoring result is whether or not a target alarm is generated. It should be understood that the fault detection module in this embodiment may also be configured to detect other types of alarms, and send the alarms to the alarm module, and the alarm module may perform different levels of screening and processing on the received alarms; for example; When the selected alarm is the target alarm, the virtual machine repair module 12 is triggered to perform virtual machine repair.
请参见图6所示,第一虚拟机装置1的虚拟机修复模块12包括重启子模块121或重建子模块122;Referring to FIG. 6, the virtual machine repair module 12 of the first virtual machine device 1 includes a restart submodule 121 or a rebuild submodule 122;
重启子模块121设置为在告警检测模块11检测结果为是时,发起重启第二虚拟机装置2的重启流程;The restarting sub-module 121 is configured to initiate a restart process of restarting the second virtual machine device 2 when the detection result of the alarm detecting module 11 is YES;
重建子模块122设置为在告警检测模块11检测结果为是时,发起对第二虚拟机装置2进行重建的流程。The reconstruction sub-module 122 is configured to initiate a process of reconstructing the second virtual machine device 2 when the detection result of the alarm detection module 11 is YES.
第一虚拟机装置1的重启子模块121发起重启第二虚拟机装置2的重启流程时,可通过向虚拟机管理节点发起相应的重启指令,经虚拟机管理节点实现对第二虚拟机装置2的重启;也可以在双虚拟机内部通过相应的重启指令完成第二虚拟机装置2的重启,而不经过虚拟机管理节点。When the restarting sub-module 121 of the first virtual machine device 1 initiates a restart process of restarting the second virtual machine device 2, the second virtual machine device 2 may be implemented by the virtual machine management node by initiating a corresponding restart command to the virtual machine management node. The restart of the second virtual machine device 2 can also be completed within the dual virtual machine by a corresponding restart command without passing through the virtual machine management node.
第一虚拟机装置1的重建子模块122包括重建发起单元1221以及重建单元1222;The reconstruction sub-module 122 of the first virtual machine device 1 includes a reconstruction initiation unit 1221 and a reconstruction unit 1222;
重建发起单元1221设置为向虚拟机管理节点发送删除第二虚拟机装置2的 删除指令;The reestablishment initiating unit 1221 is configured to send the deletion of the second virtual machine device 2 to the virtual machine management node. Delete instruction;
重建单元1222设置为在第二虚拟机装置2删除后,根据预设重建策略选择以第二虚拟机装置2原资源设置或以调整后的第二虚拟机装置2资源设置,向虚拟机管理节点发送虚拟机创建指令。The reconstruction unit 1222 is configured to select the second virtual machine device 2 original resource setting or the adjusted second virtual machine device 2 resource setting according to the preset reconstruction policy after the second virtual machine device 2 is deleted, and manage the node to the virtual machine according to the preset reconstruction policy. Send a virtual machine creation command.
本实施例中的预设重建策略可根据虚拟机所在的业务功能网元自身决策,因此更为灵活,且扩展性更强。例如可根据业务功能网元自身的类型等进行决策。The preset re-establishment policy in this embodiment can be determined according to the service function network element where the virtual machine is located, so it is more flexible and more scalable. For example, the decision can be made according to the type of the service function network element itself.
虚拟机修复模块12的重建单元1222根据当前业务需要等因素以调整后的第二虚拟机装置资源设置,向虚拟机管理节点发送虚拟机创建指令时,还包括:在第二虚拟机装置重新创建完成之后,重建单元1222发起主备切换(该主备切换是该第一虚拟机装置与新第二虚拟机装置的资源不同导致的),将重新创建的第二虚拟机装置2从原来的备虚拟机切换为主虚拟机,将第一虚拟机装置1从原来的主虚拟机切换为备虚拟机;The reconstruction unit 1222 of the virtual machine repair module 12 further includes: re-creating the second virtual machine device when the virtual machine creation instruction is sent to the virtual machine management node according to the current service requirement and the like, and the adjusted second virtual machine device resource setting. After the completion, the reconstruction unit 1222 initiates an active/standby switchover (the active/standby switchover is caused by the difference between the resources of the first virtual machine device and the new second virtual machine device), and the newly created second virtual machine device 2 is removed from the original device. The virtual machine is switched to the primary virtual machine, and the first virtual machine device 1 is switched from the original primary virtual machine to the standby virtual machine;
重新创建的第二虚拟机装置2的虚拟机修复模块向虚拟机管理节点发送删除该第一虚拟机装置1的删除指令;The virtual machine repair module of the second virtual machine device 2 that is re-created sends a delete instruction for deleting the first virtual machine device 1 to the virtual machine management node;
重新创建的第二虚拟机装置2的虚拟机修复模块在第一虚拟机装置1删除后,以与该重新创建的第二虚拟机装置2的资源相同的设置,向虚拟机管理节点发送虚拟机创建指令;以重新创建与该第二虚拟机装置2资源配置相同的第一虚拟机装置1。至此,重建流程才完成。The virtual machine repair module of the second virtual machine device 2 that is re-created sends the virtual machine to the virtual machine management node after the first virtual machine device 1 is deleted, with the same settings as the resources of the newly created second virtual machine device 2. Creating an instruction; re-creating the first virtual machine device 1 having the same resource configuration as the second virtual machine device 2. At this point, the reconstruction process is completed.
另外,本实施例中,虚拟机的虚拟机修复模块12还可通过开关控制。In addition, in this embodiment, the virtual machine repair module 12 of the virtual machine can also be controlled by a switch.
由于虚拟机重建请求具有唯一性,对于系统中可能出现的频繁目标告警可以采用相应的机制保护虚拟机重建的唯一完整性。因此本实施例中的第一虚拟机装置的虚拟机修复模块12发起对第二虚拟机装置进行重建的流程之前,还可以包括:虚拟机修复模块12判断当前是否存在未处理的虚拟机重建流程,如果判定当前存在未处理的虚拟机重建流程,则可延迟预设时长后再发起第二虚拟机装置2的创建指令,或还可以对目标告警进行重新检测。Since the virtual machine reestablishment request is unique, a corresponding mechanism can be used to protect the unique integrity of the virtual machine reconstruction for frequent target alarms that may occur in the system. Therefore, before the virtual machine repair module 12 of the first virtual machine device in the embodiment initiates the process of rebuilding the second virtual machine device, the virtual machine repair module 12 may further determine whether there is an unprocessed virtual machine rebuild process. If it is determined that there is currently an unprocessed virtual machine re-establishment process, the creation instruction of the second virtual machine device 2 may be initiated after the preset duration is delayed, or the target alarm may be re-detected.
应当理解的是,本实施例中第一虚拟机装置1和第二虚拟机装置2的主备倒换关系是可动态变化的,第一虚拟机装置1作为备虚拟机,第二虚拟机装置2作为主虚拟机时,第二虚拟机装置2也即可执行上述第一虚拟机装置1的所有功能,包括但不限于告警检测、虚拟机修复等功能。第一虚拟机装置1具有上述第二虚拟机装置2的所有功能,包括告警上报以及利用第三方监测模块进行故障检测等功能。 It should be understood that, in this embodiment, the active/standby switching relationship between the first virtual machine device 1 and the second virtual machine device 2 is dynamically changeable, and the first virtual machine device 1 serves as a standby virtual machine, and the second virtual machine device 2 As the primary virtual machine, the second virtual machine device 2 can also perform all the functions of the first virtual machine device 1, including but not limited to functions such as alarm detection and virtual machine repair. The first virtual machine device 1 has all the functions of the second virtual machine device 2 described above, including alarm reporting and fault detection using a third-party monitoring module.
本实施例中虚拟机系统的内部故障可通过虚拟机自侦测发现,且虚拟机的重生也可有虚拟机自身发起,虚拟机的重生也可由虚拟机所在业务功能网元自身决策,可脱离对管理节点的依赖,更为方便灵活、可靠,且扩展性更好。In this embodiment, the internal fault of the virtual machine system can be discovered by the virtual machine self-detection, and the re-emergence of the virtual machine can also be initiated by the virtual machine itself. The re-emergence of the virtual machine can also be determined by the service function network element of the virtual machine. The dependency on the management node is more convenient, flexible, reliable, and more scalable.
实施例三Embodiment 3
以电信NFV协议规范下的几种使用实例对本公开实施例作相关示例说明。The related embodiments of the present disclosure are described by way of several examples of use under the telecommunication NFV protocol specification.
请参见图7所示,该图所示为利用本公开实施例实现电信NFV协议架构下,采用双机部署的VNF实例在业务功能异常后重建自愈,包括步骤701-步骤705。Referring to FIG. 7, the figure shows that the VNF instance deployed in the dual-machine mode is rebuilt and self-healed after the service function is abnormal, and includes steps 701-705.
在步骤701中,虚拟机A作为主虚拟机,虚拟机B作为备虚拟机,虚拟机A轮询检测自身业务进程,当发现虚拟机A的关键业务进程丢失时,进行主备虚拟机切换,将虚拟机B切换为主虚拟机;In step 701, the virtual machine A serves as the primary virtual machine, and the virtual machine B serves as the standby virtual machine. The virtual machine A polls and detects its own business process. When the key service process of the virtual machine A is found to be lost, the active and standby virtual machine switches are performed. Switch virtual machine B to the primary virtual machine;
在步骤702中,虚拟机B启动并检测虚拟机A状态,如果虚拟机A不在位或状态异常,产生致命告警;In step 702, the virtual machine B starts and detects the virtual machine A state, and if the virtual machine A is not in the bit or the state is abnormal, a fatal alarm is generated;
在步骤703中,虚拟机B发现该致命告警后,向管理节点发起删除虚拟机A的指令;In step 703, after the virtual machine B finds the fatal alarm, the virtual machine B initiates an instruction to delete the virtual machine A to the management node;
在步骤704中,虚拟机B在虚拟机A删除成功后,根据所在业务功能节点对于该告警类型对应的重建策略,以原虚拟机A的资源设置,发起新虚拟机(称为虚拟机C)的重新创建;In step 704, after the virtual machine A is successfully deleted, the virtual machine B initiates a new virtual machine (called virtual machine C) according to the resource setting of the original virtual machine A according to the reconfiguration policy corresponding to the alarm type of the service function node. Re-creation;
在步骤705中,虚拟机C创建成功后启动,从虚拟机B上同步业务数据,成为新的备虚拟机。In step 705, after the virtual machine C is successfully created, the virtual machine C is started, and the service data is synchronized from the virtual machine B to become a new standby virtual machine.
在上述步骤704中,虚拟机B还可以以调整后的虚拟机A的资源设置,发起新虚拟机(简称虚拟机C)的重新创建,此时,在步骤705中虚拟机C创建成功后启动后,虚拟机B发起主备切换,将虚拟机C切换为主虚拟机,虚拟机B切换为备虚拟机;然后虚拟机C向管理节点发起删除虚拟机B的指令,在虚拟机B删除成功后,以虚拟机C的资源设置,发起新虚拟机(简称虚拟机D)的重新创建;虚拟机D创建成功后启动,从虚拟机C上同步业务数据,成为新的备机。In the above step 704, the virtual machine B can also initiate the re-creation of the new virtual machine (referred to as virtual machine C) with the adjusted resource setting of the virtual machine A. At this time, after the virtual machine C is successfully created, the virtual machine C is successfully started. After the virtual machine B initiates the active/standby switchover, the virtual machine C is switched to the virtual machine, and the virtual machine B is switched to the standby virtual machine. Then, the virtual machine C initiates the instruction to delete the virtual machine B to the management node, and the virtual machine B deletes successfully. Then, the virtual machine C is set up to initiate the re-creation of the new virtual machine (referred to as virtual machine D); after the virtual machine D is successfully created, the virtual machine D is started, and the service data is synchronized from the virtual machine C to become a new standby machine.
请参见图8所示,该图所示为利用本公开实现电信NFV协议架构下,采用双机部署的VNF实例主机异常中断后重生自愈,包括步骤801-步骤805。Referring to FIG. 8, the figure shows that the VNF instance host deployed by the dual-machine is re-healed after being abnormally interrupted, and includes steps 801-step 805.
在步骤801中,虚拟机A作为主虚拟机,虚拟机B作为备虚拟机,虚拟机A在出现异常中断故障时,发起主备机切换,将虚拟机B切换为主虚拟机,虚拟机A切换为备虚拟机; In step 801, virtual machine A is used as the primary virtual machine, and virtual machine B is used as the standby virtual machine. When abnormal interrupt occurs, virtual machine A initiates a master-slave switchover, and virtual machine B is switched to be the primary virtual machine. Switch to a standby virtual machine;
在步骤802中,虚拟机B启动并检测备虚拟机A状态,检测是否发现虚拟机A发送的致命告警;In step 802, the virtual machine B starts and detects the state of the standby virtual machine A, and detects whether a fatal alarm sent by the virtual machine A is found.
在步骤803中,虚拟机B发现该致命告警后,向管理节点发起删除虚拟机A的指令;In step 803, after discovering the fatal alarm, the virtual machine B initiates an instruction to delete the virtual machine A to the management node;
在步骤804中,虚拟机A删除成功后,根据该告警类型对应的重建策略,虚拟机B以原虚拟机A的资源设置,发起新虚拟机(称为虚拟机C)的重新创建;In step 804, after the virtual machine A is successfully deleted, the virtual machine B initiates the re-creation of the new virtual machine (called the virtual machine C) according to the resource setting of the original virtual machine A according to the re-establishment policy corresponding to the alarm type.
在步骤805中,虚拟机C创建成功后启动,从虚拟机B上同步业务数据,成为新的备虚拟机。In step 805, after the virtual machine C is successfully created, the virtual machine C is started, and the service data is synchronized from the virtual machine B to become a new standby virtual machine.
请参见图9所示,该图所示为利用本公开实现电信NFV协议架构下,采用双机部署的VNF实例备虚拟机异常中断后重建自愈,包括步骤901-步骤904。Referring to FIG. 9 , the figure shows a VNF instance standby virtual machine that is deployed in a dual-machine deployment and is self-healing after the abnormality is interrupted, and includes steps 901 to 904.
在步骤901中,虚拟机B作为备虚拟机,虚拟机A作为主虚拟机,虚拟机B在出现异常中断故障时,向虚拟机A发送故障通知;In step 901, the virtual machine B is used as the standby virtual machine, and the virtual machine A is used as the primary virtual machine. When the abnormal interrupt occurs, the virtual machine B sends a failure notification to the virtual machine A.
在步骤902中,虚拟机A发现该故障通知记录致命告警后,向管理节点发起删除虚拟机B;In step 902, after discovering that the fault notification records the fatal alarm, the virtual machine A initiates deletion of the virtual machine B to the management node;
在步骤903中,虚拟机B删除成功后,根据该告警类型对应的重建策略,虚拟机A以原虚拟机B的资源设置,发起新虚拟机(称为虚拟机C)的重新创建;In step 903, after the virtual machine B is successfully deleted, the virtual machine A initiates the re-creation of the new virtual machine (referred to as virtual machine C) according to the resource setting of the original virtual machine B according to the re-establishment policy corresponding to the alarm type.
在步骤904中,虚拟机C创建成功后启动,从虚拟机A上同步业务数据,成为新的备虚拟机。In step 904, after the virtual machine C is successfully created, the virtual machine C is started, and the service data is synchronized from the virtual machine A to become a new standby virtual machine.
实施例四Embodiment 4
本实施例还提供了一种非瞬时性计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令可执行任意一种虚拟机修复方法。The embodiment further provides a non-transitory computer readable storage medium storing computer executable instructions executable by any of the virtual machine repair methods.
实施例五Embodiment 5
本实施例还提供了一种电子设备,请参见图10所示,该电子设备应用于第一虚拟机装置,包括:The embodiment also provides an electronic device. Referring to FIG. 10, the electronic device is applied to the first virtual machine device, and includes:
一个或多个处理器1001,图10中以一个处理器1001为例;One or more processors 1001, one processor 1001 is taken as an example in FIG. 10;
存储器1002; Memory 1002;
所述电子设备还可以包括:输入装置1003和输出装置1004。The electronic device may further include: an input device 1003 and an output device 1004.
所述电子设备中的处理器1001、存储器1002、输入装置1003和输出装置1004可以通过总线或者其他方式连接,图10中以通过总线连接为例。The processor 1001, the memory 1002, the input device 1003, and the output device 1004 in the electronic device may be connected by a bus or other means, and the connection through the bus is taken as an example in FIG.
存储器1002作为一种非瞬时性计算机可读存储介质,可用于存储软件程序、 计算机可执行程序以及模块,如本公开实施例中的虚拟机修复方法对应的程序、指令或模块(例如,附图5所示的告警检测模块11和虚拟机修复模块12)。处理器1001通过运行存储在存储器1002中的软件程序、指令或者模块,从而执行多种功能应用以及数据处理,即实现上述方法实施例的虚拟机修复方法。The memory 1002 is a non-transitory computer readable storage medium that can be used to store software programs, The computer executable program and the module, such as the program, the instruction or the module corresponding to the virtual machine repair method in the embodiment of the present disclosure (for example, the alarm detecting module 11 and the virtual machine repairing module 12 shown in FIG. 5). The processor 1001 executes a plurality of functional applications and data processing by executing software programs, instructions or modules stored in the memory 1002, that is, implementing the virtual machine repairing method of the above method embodiments.
存储器1002可以包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需要的应用程序,示例性地,该电子设备的存储程序区中可存储预设的重建策略;存储数据区可存储根据电子设备的使用所创建的数据等。此外,存储器1002可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。在一些实施例中,存储器1002可选包括相对于处理器1001远程设置的存储器,这些远程存储器可以通过网络连接至电子设备。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The memory 1002 can include a storage program area and a storage data area, wherein the storage program area can store an operating system, an application required by at least one function, and exemplarily, a preset reconstruction strategy can be stored in the storage program area of the electronic device. The storage data area can store data and the like created according to the use of the electronic device. Further, the memory 1002 may include a high speed random access memory, and may also include a nonvolatile memory such as at least one magnetic disk storage device, flash memory device, or other nonvolatile solid state storage device. In some embodiments, the memory 1002 can optionally include a memory remotely located relative to the processor 1001 that can be connected to the electronic device over a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
输入装置1003可用于接收输入的数字或字符信息,以及产生与电子设备的用户设置以及功能控制有关的键信号输入。输出装置1004可包括显示屏等显示设备。The input device 1003 can be configured to receive input numeric or character information and to generate key signal inputs related to user settings and function control of the electronic device. The output device 1004 can include a display device such as a display screen.
本公开实施例的电子设备还可以包括通信装置1005,通过通信网络传输和/或接收信息,例如电子设备可通过通信装置1005接收第二虚拟机装置发送的目标告警,还可通过该通信装置1005向虚拟机管理节点发送删除第二虚拟机装置的指令。The electronic device of the embodiment of the present disclosure may further include a communication device 1005 that transmits and/or receives information through a communication network, for example, the electronic device may receive a target alarm sent by the second virtual machine device through the communication device 1005, and may also pass the communication device 1005. An instruction to delete the second virtual machine device is sent to the virtual machine management node.
显然,本领域的技术人员应该明白,上述本公开实施例的模块或步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储介质(ROM/RAM、磁碟、光盘)中由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤,或者将它们分别制作成多个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。所以,本公开实施例不限制于任何特定的硬件和软件结合。It will be apparent to those skilled in the art that the above-described modules or steps of the embodiments of the present disclosure may be implemented by a general-purpose computing device, which may be centralized on a single computing device or distributed over a network of multiple computing devices. Alternatively, they may be implemented by program code executable by the computing device such that they may be stored in a storage medium (ROM/RAM, diskette, optical disk) by a computing device, and in some cases The steps shown or described may be performed in a different order than that herein, or they may be separately fabricated into a plurality of integrated circuit modules, or a plurality of the modules or steps may be implemented as a single integrated circuit module. Therefore, embodiments of the present disclosure are not limited to any particular combination of hardware and software.
以上内容是结合可选的实施方式对本公开所作的相关说明,不能认定本公开的可选实施只局限于这些说明。 The above is a description of the present disclosure in connection with the optional embodiments, and it is not considered that the optional implementation of the present disclosure is limited to the description.
工业实用性Industrial applicability
本公开实施例提供的虚拟机修复方法、虚拟机装置、系统以及业务功能网元,虚拟机系统的内部故障可通过虚拟机自侦测发现,可以及时发现双虚拟机中的内部故障;且在第一主虚拟机检测到第二虚拟机装置产生目标告警时,可直接对第二虚拟机进行修复,解除了对上层管理节点的依赖,可靠性更好,故障修复效率高。 The virtual machine repairing method, the virtual machine device, the system, and the service function network element provided by the embodiment of the present disclosure, the internal fault of the virtual machine system can be discovered by the virtual machine self-detection, and the internal fault in the dual virtual machine can be discovered in time; When the first virtual machine detects that the second virtual machine device generates the target alarm, the second virtual machine can be directly repaired, and the dependence on the upper management node is released, the reliability is better, and the fault repair efficiency is high.

Claims (19)

  1. 一种虚拟机修复方法,包括:A virtual machine repair method includes:
    双虚拟机中的第一虚拟机装置检测到第二虚拟机装置产生需要进行虚拟机修复的目标告警;以及The first virtual machine device in the dual virtual machine detects that the second virtual machine device generates a target alarm that requires virtual machine repair;
    所述第一虚拟机装置发起对所述第二虚拟机装置进行修复。The first virtual machine device initiates repairing the second virtual machine device.
  2. 如权利要求1所述的虚拟机修复方法,其中,所述第一虚拟机装置检测到第二虚拟机装置产生目标告警包括:所述第一虚拟机装置接收到所述第二虚拟机装置在异常中断故障时发送的故障通知;以及The virtual machine repairing method according to claim 1, wherein the detecting, by the first virtual machine device, the second virtual machine device to generate the target alarm comprises: the first virtual machine device receiving the second virtual machine device A failure notification sent when an abnormal interruption occurs;
    所述第一虚拟机装置根据所述故障通知判定所述第二虚拟机装置产生目标告警。The first virtual machine device determines that the second virtual machine device generates a target alarm according to the failure notification.
  3. 如权利要求1所述的虚拟机修复方法,所述第一虚拟机装置检测第二虚拟机装置产生目标告警之前,还包括:所述第一虚拟机装置由备虚拟机切换为主虚拟机,所述第二虚拟机装置由主虚拟机切换为备虚拟机。The virtual machine repairing method according to claim 1, wherein the first virtual machine device detects that the second virtual machine device generates the target alarm, and further includes: the first virtual machine device is switched from the standby virtual machine to the primary virtual machine, The second virtual machine device is switched from the primary virtual machine to the standby virtual machine.
  4. 如权利要求3所述的虚拟机修复方法,其中,所述第一虚拟机装置检测到第二虚拟机装置产生需要进行虚拟机修复的目标告警包括:The virtual machine repairing method of claim 3, wherein the detecting, by the first virtual machine device, that the second virtual machine device generates a target alarm that requires virtual machine repair includes:
    所述第一虚拟机装置由备虚拟机切换为主虚拟机后,检测所述第二虚拟机装置是否正常;以及After the first virtual machine device is switched to be the primary virtual machine by the standby virtual machine, detecting whether the second virtual machine device is normal;
    当所述第一虚拟机装置检测到所述第二虚拟机装置异常时,则判定所述第二虚拟机装置产生需要进行虚拟机修复的目标告警。When the first virtual machine device detects that the second virtual machine device is abnormal, it is determined that the second virtual machine device generates a target alarm that needs to perform virtual machine repair.
  5. 如权利要求3所述的虚拟机修复方法,在所述第一虚拟机装置检测到第二虚拟机装置产生需要进行虚拟机修复的目标告警后,还包括:The virtual machine repairing method according to claim 3, after the first virtual machine device detects that the second virtual machine device generates a target alarm that needs to be repaired by the virtual machine, the method further includes:
    确定第二虚拟机装置发生目标故障,其中,Determining that the second virtual machine device has a target failure, wherein
    所述目标故障包括异常中断故障和致命业务功能异常中的至少一种。The target failure includes at least one of an abnormal interruption failure and a fatal business function abnormality.
  6. 如权利要求4所述的虚拟机修复方法,其中,所述第一虚拟机装置检测所述第二虚拟机装置是否正常包括:检测所述第二虚拟机装置是否在位或状态是否异常。The virtual machine repairing method according to claim 4, wherein the detecting, by the first virtual machine device, whether the second virtual machine device is normal comprises: detecting whether the second virtual machine device is in position or in a state of abnormality.
  7. 如权利要求1-6任一项所述的虚拟机修复方法,其中,所述第一虚拟机装置发起对所述第二虚拟机装置进行修复包括:The virtual machine repairing method according to any one of claims 1 to 6, wherein the first virtual machine device initiates repairing the second virtual machine device, including:
    所述第一虚拟机装置发起重启所述第二虚拟机装置的重启流程或发起对所述第二虚拟机装置进行重建的流程。The first virtual machine device initiates a restart process of restarting the second virtual machine device or initiates a process of reestablishing the second virtual machine device.
  8. 如权利要求7所述的虚拟机修复方法,所述第一虚拟机装置发起对所述第二虚拟机装置进行重建的流程之前,还包括:The virtual machine repairing method of claim 7, before the first virtual machine device initiates a process of rebuilding the second virtual machine device, the method further includes:
    当所述第一虚拟机装置判断当前存在未处理的虚拟机重建流程时,则延迟 预设时长后再发起对所述第二虚拟机装置进行重建的流程,或对所述目标告警进行重新检测。Delaying when the first virtual machine device determines that there is currently an unprocessed virtual machine rebuild process After the preset duration, the process of reestablishing the second virtual machine device is initiated, or the target alarm is re-detected.
  9. 如权利要求7所述的虚拟机修复方法,其中,所述第一虚拟机装置发起对所述第二虚拟机装置进行重建的流程包括:The virtual machine repairing method of claim 7, wherein the process of the first virtual machine device initiating reconstruction of the second virtual machine device comprises:
    所述第一虚拟机装置向虚拟机管理节点发送删除所述第二虚拟机装置的删除指令;以及The first virtual machine device sends a delete instruction to delete the second virtual machine device to the virtual machine management node;
    所述第一虚拟机装置在所述第二虚拟机装置删除后,根据预设重建策略选择以所述第二虚拟机装置原资源设置或以调整后的第二虚拟机装置资源设置,向所述虚拟机管理节点发送虚拟机创建指令。After the second virtual machine device is deleted, the first virtual machine device selects the second virtual device device original resource setting or the adjusted second virtual device device resource setting according to the preset reconfiguration policy. The virtual machine management node sends a virtual machine creation instruction.
  10. 如权利要求9所述的虚拟机修复方法,所述第一虚拟机装置以调整后的第二虚拟机装置资源设置,向所述虚拟机管理节点发送虚拟机创建指令之后,还包括:The virtual machine repairing method of claim 9, after the first virtual machine device sends the virtual machine creation instruction to the virtual machine management node by using the adjusted second virtual machine device resource setting, the method further includes:
    在新的第二虚拟机装置创建完成之后,所述第一虚拟机装置向新的第二虚拟机装置发送主备切换指令,以使得所述新的第二虚拟机装置切换为主虚拟机,第一虚拟机装置切换为备虚拟机;After the creation of the new second virtual machine device is completed, the first virtual machine device sends a master/slave switch instruction to the new second virtual machine device, so that the new second virtual machine device switches to the master virtual machine. Switching the first virtual machine device to a standby virtual machine;
    所述新的第二虚拟机装置向所述虚拟机管理节点发送删除所述第一虚拟机装置的删除指令;以及Sending, by the new second virtual machine device, a delete instruction for deleting the first virtual machine device to the virtual machine management node;
    在所述第一虚拟机装置删除后,所述新第二虚拟机装置以所述新的第二虚拟机装置的资源设置,向所述虚拟机管理节点发送虚拟机创建指令,以创建于所述新的第二虚拟机装置具有相同资源配置的新的第一虚拟机装置。After the first virtual machine device is deleted, the new second virtual machine device sends a virtual machine creation instruction to the virtual machine management node to create a virtual machine creation node with the resource setting of the new second virtual machine device. The new second virtual machine device has a new first virtual machine device with the same resource configuration.
  11. 一种第一虚拟机装置,包括告警检测模块和虚拟机修复模块;A first virtual machine device includes an alarm detection module and a virtual machine repair module;
    所述告警检测模块设置为检测第二虚拟机装置是否产生需要进行虚拟机修复的目标告警;以及The alarm detection module is configured to detect whether the second virtual machine device generates a target alarm that needs to be repaired by the virtual machine;
    所述虚拟机修复模块设置为在所述告警检测模块检测到第二虚拟机装置产生需要进行虚拟机修复的目标告警时,发起对所述第二虚拟机装置进行修复。The virtual machine repair module is configured to initiate repairing the second virtual machine device when the alarm detecting module detects that the second virtual machine device generates a target alarm that needs to be repaired by the virtual machine.
  12. 如权利要求11所述的第一虚拟机装置,其中,The first virtual machine device according to claim 11, wherein
    所述告警检测模块包括第一告警检测子模块,设置为接收所述第二虚拟机装置在异常中断故障时发送的故障通知,判定所述第二虚拟机装置产生目标告警。The alarm detection module includes a first alarm detection sub-module configured to receive a failure notification sent by the second virtual machine device when the abnormality is interrupted, and determine that the second virtual machine device generates a target alarm.
  13. 如权利要求11所述的第一虚拟机装置,还包括主备切换模块,设置为当所述第二虚拟机装置在产生故障并发起主备切换时,将所述第一虚拟机装置切换为主虚拟机。 The first virtual machine device according to claim 11, further comprising an active/standby switching module, configured to switch the first virtual machine device to when the second virtual machine device generates a failure and initiates an active/standby switchover The main virtual machine.
  14. 如权利要求13所述的第一虚拟机装置,其中,所述告警检测模块包括第二告警检测子模块,设置为在所述第一虚拟机装置切换为主虚拟机后,当检测所述第二虚拟机装置工作异常时,则判定所述第二虚拟机装置产生目标告警。The first virtual machine device according to claim 13, wherein the alarm detecting module comprises a second alarm detecting submodule, configured to detect the first virtual machine device after switching to a primary virtual machine When the second virtual machine device is abnormal, the second virtual machine device is determined to generate a target alarm.
  15. 如权利要求11-14任一项所述的第一虚拟机装置,其中,所述虚拟机修复模块包括重启子模块或重建子模块;The first virtual machine device according to any one of claims 11-14, wherein the virtual machine repair module comprises a restart submodule or a rebuild submodule;
    所述重启子模块设置为在所述告警检测模块检测结果为是时,发起重启所述第二虚拟机装置的重启流程;以及The restarting submodule is configured to initiate a restart process of restarting the second virtual machine device when the alarm detection module detects that the result is yes;
    所述重建子模块设置为在所述告警检测模块检测结果为是时,发起对所述第二虚拟机装置进行重建的流程。The reestablishing submodule is configured to initiate a process of reestablishing the second virtual machine device when the alarm detection module detects that the result is yes.
  16. 如权利要求15所述的第一虚拟机装置,其中,所述虚拟机修复模块包括重建子模块时,所述重生子模块包括重建发起单元以及重建单元;The first virtual machine device according to claim 15, wherein when the virtual machine repair module includes a rebuild submodule, the regenerated submodule includes a rebuild initiation unit and a reconstruction unit;
    所述重建发起单元设置为向虚拟机管理节点发送删除所述第二虚拟机装置的删除指令;以及The reestablishment initiating unit is configured to send a delete instruction to delete the second virtual machine device to the virtual machine management node;
    所述重建单元设置为在所述第二虚拟机装置删除后,根据预设重建策略选择以所述第二虚拟机装置原资源设置或以调整后的第二虚拟机装置资源设置向所述虚拟机管理节点发送虚拟机创建指令。The reconstruction unit is configured to: after the second virtual machine device is deleted, select, according to a preset reconfiguration policy, the second virtual machine device original resource setting or the adjusted second virtual machine device resource setting to the virtual The machine management node sends a virtual machine creation instruction.
  17. 一种虚拟机系统,包括第二虚拟机装置和如权利要求11-16任一项所述的第一虚拟机装置;A virtual machine system comprising a second virtual machine device and the first virtual machine device according to any one of claims 11-16;
    所述第一虚拟机装置设置为当检测到第二虚拟机装置产生需要进行虚拟机修复的目标告警时,发起对所述第二虚拟机装置进行修复。The first virtual machine device is configured to initiate repairing the second virtual machine device when detecting that the second virtual machine device generates a target alarm that requires virtual machine repair.
  18. 一种业务功能网元,包括如权利要求17所述的虚拟机系统。A service function network element comprising the virtual machine system of claim 17.
  19. 一种非瞬时性计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令用于执行权利要求1-10任一项的虚拟机修复方法。 A non-transitory computer readable storage medium storing computer executable instructions for performing the virtual machine repair method of any of claims 1-10.
PCT/CN2016/104293 2015-11-30 2016-11-02 Virtual machine repairing method, virtual machine device, system, and service functional network element WO2017092539A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510863669.8 2015-11-30
CN201510863669.8A CN106817238A (en) 2015-11-30 2015-11-30 Virtual machine repair method, virtual machine, system and business function network element

Publications (1)

Publication Number Publication Date
WO2017092539A1 true WO2017092539A1 (en) 2017-06-08

Family

ID=58796250

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/104293 WO2017092539A1 (en) 2015-11-30 2016-11-02 Virtual machine repairing method, virtual machine device, system, and service functional network element

Country Status (2)

Country Link
CN (1) CN106817238A (en)
WO (1) WO2017092539A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11803452B2 (en) * 2019-02-14 2023-10-31 Nippon Telegraph And Telephone Corporation Duplexed operation system and method therefor

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111209145A (en) * 2018-11-21 2020-05-29 中兴通讯股份有限公司 Virtual machine disaster tolerance-based service self-healing method, equipment and storage medium
CN115396278A (en) * 2022-08-11 2022-11-25 西安雷风电子科技有限公司 System exception handling method and device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101895540A (en) * 2010-07-12 2010-11-24 中兴通讯股份有限公司 Daemon system and method for application service
CN102110217A (en) * 2009-12-28 2011-06-29 北京安码科技有限公司 Method for automatic repairing through virtual machine station rotation
CN102708027A (en) * 2012-05-11 2012-10-03 中兴通讯股份有限公司 Method and system for avoiding outage of communication device
CN102917064A (en) * 2012-10-23 2013-02-06 广州杰赛科技股份有限公司 Double-machine hot-standby method based on private cloud computing platform
CN103152419A (en) * 2013-03-08 2013-06-12 中标软件有限公司 High availability cluster management method for cloud computing platform
CN103838593A (en) * 2012-11-22 2014-06-04 华为技术有限公司 Method and system for restoring virtual machine, controller, server and hosting host
CN104572241A (en) * 2013-10-18 2015-04-29 南京中兴新软件有限责任公司 Method and device for switching over application programs and system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102801806A (en) * 2012-08-10 2012-11-28 薛海强 Cloud computing system and cloud computing resource management method
CN103019849B (en) * 2012-12-31 2015-10-07 无锡城市云计算中心有限公司 Virtual machine management method under cloud computing environment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102110217A (en) * 2009-12-28 2011-06-29 北京安码科技有限公司 Method for automatic repairing through virtual machine station rotation
CN101895540A (en) * 2010-07-12 2010-11-24 中兴通讯股份有限公司 Daemon system and method for application service
CN102708027A (en) * 2012-05-11 2012-10-03 中兴通讯股份有限公司 Method and system for avoiding outage of communication device
CN102917064A (en) * 2012-10-23 2013-02-06 广州杰赛科技股份有限公司 Double-machine hot-standby method based on private cloud computing platform
CN103838593A (en) * 2012-11-22 2014-06-04 华为技术有限公司 Method and system for restoring virtual machine, controller, server and hosting host
CN103152419A (en) * 2013-03-08 2013-06-12 中标软件有限公司 High availability cluster management method for cloud computing platform
CN104572241A (en) * 2013-10-18 2015-04-29 南京中兴新软件有限责任公司 Method and device for switching over application programs and system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11803452B2 (en) * 2019-02-14 2023-10-31 Nippon Telegraph And Telephone Corporation Duplexed operation system and method therefor

Also Published As

Publication number Publication date
CN106817238A (en) 2017-06-09

Similar Documents

Publication Publication Date Title
US11249860B2 (en) Node down recovery method and apparatus, electronic device, and storage medium
CN105790980B (en) fault repairing method and device
JP6466003B2 (en) Method and apparatus for VNF failover
WO2016107172A1 (en) Post-cluster brain split quorum processing method and quorum storage device and system
EP2882136B1 (en) Method and system for implementing remote disaster recovery switching of service delivery platform
US10764119B2 (en) Link handover method for service in storage system, and storage device
US20110289344A1 (en) Automated node fencing integrated within a quorum service of a cluster infrastructure
US11892922B2 (en) State management methods, methods for switching between master application server and backup application server, and electronic devices
WO2018095414A1 (en) Method and apparatus for detecting and recovering fault of virtual machine
WO2016045439A1 (en) Vnfm disaster-tolerant protection method and device, nfvo and storage medium
CN106936613B (en) Method and system for rapidly switching main and standby Openflow switch
CN102394914A (en) Cluster brain-split processing method and device
WO2012174893A1 (en) Dual-center disaster recovery-based switching method and device in iptv system
CN110673981B (en) Fault recovery method, device and system
WO2017092539A1 (en) Virtual machine repairing method, virtual machine device, system, and service functional network element
WO2015154620A1 (en) Openflow multi-controller system and management method therefor
CN112380062A (en) Method and system for rapidly recovering system for multiple times based on system backup point
CN111342986B (en) Distributed node management method and device, distributed system and storage medium
WO2018094686A1 (en) Smb service failure handling method, and storage device
JP6421516B2 (en) Server device, redundant server system, information takeover program, and information takeover method
CN113438111A (en) Method for restoring RabbitMQ network partition based on Raft distribution and application
WO2017080362A1 (en) Data managing method and device
US8965199B2 (en) Method and apparatus for automatically restoring node resource state in WSON system
CN110661599B (en) HA implementation method, device and storage medium between main node and standby node
US10514991B2 (en) Failover device ports

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16869849

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16869849

Country of ref document: EP

Kind code of ref document: A1