WO2017092539A1 - Procédé de réparation de machine virtuelle, dispositif de machine virtuelle, système et élément de réseau fonctionnel de service - Google Patents

Procédé de réparation de machine virtuelle, dispositif de machine virtuelle, système et élément de réseau fonctionnel de service Download PDF

Info

Publication number
WO2017092539A1
WO2017092539A1 PCT/CN2016/104293 CN2016104293W WO2017092539A1 WO 2017092539 A1 WO2017092539 A1 WO 2017092539A1 CN 2016104293 W CN2016104293 W CN 2016104293W WO 2017092539 A1 WO2017092539 A1 WO 2017092539A1
Authority
WO
WIPO (PCT)
Prior art keywords
virtual machine
machine device
alarm
virtual
standby
Prior art date
Application number
PCT/CN2016/104293
Other languages
English (en)
Chinese (zh)
Inventor
张川
虞振峰
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2017092539A1 publication Critical patent/WO2017092539A1/fr

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis

Definitions

  • the present disclosure relates to the field of communications, for example, to a virtual machine repair method, a virtual machine device, a system, and a service function network element.
  • a virtual network function (Virtualized Network Function) is usually created by using a dual-machine (primary and standby virtual machine) manner.
  • VNF Virtualized Network Function
  • the dual-system VNF instance is presented as a function node.
  • the management node monitors and alarms the fault of the dual-system VNF instance.
  • the management node generates only when the primary and backup virtual machines in the dual-system VNF instance are abnormal. A fatal alarm can be used to manually restart or regenerate a service instance to complete the fault repair of the primary and backup virtual machines.
  • the dual-system VNF instance has a dual-system internal fault and a service function fault
  • the virtual node that constitutes the dual-system VNF instance is still in the working state, and the management node cannot find the fault, and the management node cannot repair the fault.
  • the VNF instance has an internal fault of the dual-system, which may refer to a failure of one of the primary virtual machine or the standby virtual machine. That is, when the standby virtual machine fails, the primary virtual machine is in a normal working state, or when the primary virtual machine is in the virtual state. The machine is faulty and the standby virtual machine is in normal working condition.
  • the management node finds that the dual-system VNF instance is faulty, the active and standby virtual machines have failed. This has caused service interruption of the VNF instance.
  • the mechanism for creating a VNF instance using the dual-machine mode for disaster recovery backup is not sound enough and the reliability is insufficient.
  • the fault of the virtual machine layer that constitutes the VNF instance is usually reported to the upper-level management node in an alarm manner, and the upper-level management node distinguishes the reported virtual machine alarms.
  • the upper management node needs to rebuild or restart the virtual machine to fix the fault.
  • the instruction to rebuild or restart is issued through the decision of the superior management node.
  • the service function node that is, the service function network element
  • the dual-machine VNF instance Develop a special software interface specification.
  • This special software interface specification has certain limitations on the openness of the system. For a virtual machine that does not conform to the agreed interface specification, the system cannot be accessed. And in the top-down management mode, the upper management node decides to pass a specific software interface. It is not timely enough to issue re-repair instructions, which reduces the efficiency of system fault repair.
  • the present disclosure provides a virtual machine repairing method, a virtual machine device, a system, and a service function network element, which can solve the problem that the management node cannot discover the internal fault of the dual virtual machine in time and the repair efficiency is low after the fault is found.
  • the embodiment of the present disclosure provides a virtual machine repairing method, including:
  • the first virtual machine device in the dual virtual machine detects that the second virtual machine device generates a target alarm that requires virtual machine repair;
  • the first virtual machine device initiates repairing the second virtual machine device.
  • the detecting, by the first virtual machine device, the second virtual machine device to generate the target alarm includes:
  • the first virtual machine device determines that the second virtual machine device generates a target alarm according to the failure notification.
  • the method before the detecting, by the first virtual machine device, the second virtual machine device to generate the target alarm, the method further includes:
  • the first virtual machine device is switched from the standby virtual machine to the primary virtual machine
  • the second virtual machine device is switched from the primary virtual machine to the standby virtual machine.
  • the detecting, by the first virtual machine device, the second virtual machine device to generate the target alarm includes:
  • the first virtual machine device After the first virtual machine device is switched to be the primary virtual machine, the first virtual machine device detects whether the second virtual machine device is normal. When the first virtual machine device detects that the second virtual machine device is abnormal, Determining that the second virtual machine device generates a target alarm that requires virtual machine repair.
  • the method further includes:
  • the second virtual machine device has a target failure, wherein the target failure includes at least one of an abnormal interruption failure and a fatal business function abnormality.
  • the detecting, by the first virtual machine device, whether the second virtual machine device is normal comprises: detecting whether the second virtual machine device is in a position or a state is abnormal.
  • the first virtual machine device initiates repairing the second virtual machine device, including:
  • the first virtual machine device initiates a restart process of restarting the second virtual machine device or initiates a process of reestablishing the second virtual machine device.
  • the method further includes: when the first virtual machine device determines that there is currently an unprocessed virtual machine rebuilding process, After the preset duration is delayed, the process of reestablishing the second virtual machine device is initiated, or the target alarm is re-detected.
  • the process of the first virtual machine device initiating reconstruction of the second virtual machine device includes:
  • the first virtual machine device sends a delete instruction for deleting the second virtual machine device to the virtual machine management node;
  • the first virtual machine device selects, according to a preset re-establishment policy, the second virtual machine device original resource setting or the adjusted second virtual machine device resource setting to the The virtual machine management node sends a virtual machine creation instruction.
  • the method further includes:
  • the first virtual machine device After the creation of the new second virtual machine device is completed, the first virtual machine device initiates an active/standby switchover instruction, so that the new second virtual machine device switches to the primary virtual machine, and switches itself to the standby virtual machine;
  • the new second virtual machine device After the first virtual machine device is deleted, the new second virtual machine device sends a virtual machine creation instruction to the virtual machine management node with the resource setting of the new second virtual machine device to create The new second virtual machine device has a new first virtual machine device of the same resource configuration.
  • the embodiment of the present disclosure further provides a first virtual machine device, including an alarm detection module and a virtual machine repair module;
  • the alarm detection module is configured to detect whether the second virtual machine device generates a target alarm that needs to be repaired by the virtual machine;
  • the virtual machine repair module is configured to initiate repair of the second virtual machine device when the alarm detection module detects a result to the second virtual machine device to generate a target alarm that needs to be repaired by the virtual machine.
  • the alarm detection module includes a first alarm detection sub-module, configured to receive a failure notification sent by the second virtual machine device when the abnormality is interrupted, and determine that the second virtual machine device generates a target alarm.
  • the first virtual machine device further includes an active/standby switching module, configured to initiate an active/standby switchover when the second virtual machine device generates a fault, and switch the first virtual machine device to be a primary virtual machine.
  • an active/standby switching module configured to initiate an active/standby switchover when the second virtual machine device generates a fault, and switch the first virtual machine device to be a primary virtual machine.
  • the alarm detection module includes a second alarm detection sub-module, configured to determine, after the first virtual machine device is switched to be a primary virtual machine, when detecting that the second virtual machine device is abnormally abnormal, The second virtual machine device generates a target alarm.
  • the virtual machine repair module includes a restart submodule or a rebuild submodule
  • the restarting submodule is configured to initiate a restart process of restarting the second virtual machine device when the alarm detection module detects that the result is yes;
  • the reestablishing submodule is configured to initiate a process of reestablishing the second virtual machine device when the alarm detection module detects that the result is yes.
  • the rebuild submodule includes a rebuild initiation unit and a reconstruction unit;
  • the reestablishment initiating unit is configured to send, to the virtual machine management node, a delete instruction for deleting the second virtual machine device;
  • the reconstruction unit is configured to: after the second virtual machine device is deleted, select, according to a preset reconfiguration policy, the second virtual machine device original resource setting or the adjusted second virtual machine device resource setting to the virtual
  • the machine management node sends a virtual machine creation instruction.
  • An embodiment of the present disclosure further provides a virtual machine system, including a second virtual machine device and a first virtual machine device as described above;
  • the first virtual machine device is configured to initiate repairing the second virtual machine device when detecting that the second virtual machine device generates a target alarm that requires virtual machine repair.
  • the embodiment of the present disclosure further provides a service function network element, including the virtual machine system as described above.
  • Embodiments of the present disclosure also provide a non-transitory computer readable storage medium storing computer executable instructions for performing the virtual machine repair method described above.
  • Embodiments of the present disclosure also provide an electronic device including one or more processors, a memory, and one or more programs, the one or more programs being stored in a memory when being processed by one or more processors When executed, the above virtual machine repair method is executed.
  • the virtual machine repairing method, the virtual machine device, the system, and the service function network element provided by the embodiment of the present disclosure perform, by using the first virtual machine device in the dual virtual machine, whether the second virtual machine device generates a target alarm that needs to be repaired by the virtual machine.
  • the internal fault in the dual virtual machine can be discovered in time through the self-detection mode inside the dual-system; and when the first primary virtual machine detects that the second virtual machine device generates the target alarm Then, the modification of the standby virtual machine can be initiated, and the alarm can be sent to the upper management node step by step, and then the management node re-analyzes and then issues the modification instruction, which can cancel the virtual
  • the dependence of the machine system on the upper management node is better, the fault recovery efficiency is higher, and the method is more flexible and effective.
  • FIG. 1 is a schematic flowchart of a virtual machine repairing method according to Embodiment 1 of the present disclosure
  • FIG. 2 is a schematic diagram of a virtual machine re-growth process initiated according to Embodiment 1 of the present disclosure
  • FIG. 3 is a schematic flowchart of performing rebirth when adjusting resource settings according to Embodiment 1 of the present disclosure
  • FIG. 4 is a schematic structural diagram of a virtual machine system according to Embodiment 2 of the present disclosure.
  • FIG. 5 is a schematic structural diagram of a first virtual machine device according to Embodiment 2 of the present disclosure.
  • FIG. 6 is another schematic structural diagram of a first virtual machine device according to Embodiment 2 of the present disclosure.
  • FIG. 7 is a schematic flowchart of performing rebirth when a host service function is abnormal according to Embodiment 3 of the present disclosure
  • FIG. 8 is a schematic flowchart of performing rebirth when a host is abnormally interrupted according to Embodiment 3 of the present disclosure
  • FIG. 9 is a schematic flowchart of performing rebirth when a standby machine is abnormally interrupted according to Embodiment 3 of the present disclosure.
  • FIG. 10 is a schematic structural diagram of hardware of an electronic device provided by five according to an embodiment of the present disclosure.
  • the embodiment of the present disclosure can detect an internal fault in the dual virtual machine in time by the self-detection in the dual virtual machine; and when the first virtual machine device detects that the second virtual machine device generates the target alarm, the second virtual device can be directly initiated.
  • the modification of the machine device can release the dependence on the upper management node, so that the reliability and fault repair efficiency of the system are improved.
  • the virtual machine repairing method in this embodiment includes steps 110 - 120.
  • step 110 the first virtual machine device in the dual virtual machine detects that the second virtual machine device generates a target alarm that needs to be repaired by the virtual machine;
  • step 120 the first virtual machine device initiates repairing the second virtual machine device.
  • the active/standby switchover relationship between the first virtual machine device and the second virtual machine device is dynamically changeable, the first virtual machine device acts as the standby virtual machine, and the second virtual machine device acts as the primary virtual machine.
  • the second virtual machine device can also perform all functions of the first virtual machine device, including but not limited to alarm detection, virtual machine repair, and the like.
  • the first virtual machine device has all the functions of the second virtual machine device, including alarm reporting and function detection by using a third-party monitoring module.
  • the first virtual machine device detects that the second virtual machine device generates the target alarm, and at least includes the following.
  • Case 1 The first virtual machine device in step 110 is the primary virtual machine, and in the normal working process, the second virtual machine device serving as the standby virtual machine is in a standby state, and the second virtual machine device is monitored by self-detection. If the abnormal interrupting fault occurs, the second virtual machine device may send a failure notification to the first virtual machine device, and the first virtual machine device receives the failure notification sent by the second virtual machine device, thereby determining and recording the second virtual device. The device generates a target alarm.
  • the second virtual machine device may detect that an abnormal interruption fault occurs through its own fault detection unit, and may send a failure notification to the first virtual machine device, and the failure notification may be a simple network management protocol (Simple Network Management Protocol). , SNMP) Trap message. After receiving the SNMP Trap message, the first virtual machine device in the normal working state can record the abnormality corresponding to the message as a fatal fault. It should be understood that the abnormal interruption fault of the second virtual machine device in this embodiment may also be monitored by using a third-party monitoring module, and the third-party detection module and the self-detection detecting unit may be jointly monitored.
  • SNMP Simple Network Management Protocol
  • Case 2 The first virtual machine device in step 110 is originally a standby virtual machine, and the second virtual machine device is originally a primary virtual machine.
  • the second virtual machine device When the second virtual machine device generates a failure to initiate an active/standby switchover, According to the active/standby switching instruction, the first virtual machine device switches to the primary virtual machine, and the second virtual machine device switches to the standby virtual machine. After the first virtual machine device is switched to the primary virtual machine, it is detected whether the second virtual machine device (now the standby virtual machine) is normal. If the first virtual machine device detects that the second virtual machine device is abnormal, the second virtual device is determined to be the second virtual device. The device generates a target alarm.
  • both the first virtual machine device and the second virtual machine device perform self-detection in real time, and any one of the first virtual machine device and the second virtual device device needs to be mastered when it is detected.
  • a failure notification is sent to the other of the first virtual machine device and the second virtual machine device.
  • the fault that needs to be performed by the second virtual machine device to perform the active/standby switchover includes at least one of an abnormal interrupt fault and a fatal service function abnormal fault; and the abnormal interrupt fault and the fatal service function abnormal fault are respectively illustrated below. .
  • the second virtual machine device is the primary virtual machine.
  • the second virtual machine device When the second virtual machine device is abnormally interrupted, the second virtual machine device initiates an active/standby switchover command, and switches to the standby virtual machine, where the first virtual machine device is replaced by the original virtual machine. The machine switches to the primary virtual machine.
  • the second virtual machine device (now the standby virtual machine) sends a failure notification to the first virtual machine device (now the primary virtual machine); since the first virtual machine device after the switch is not fully activated at this time, The failure notification is not received, and after the first virtual machine device (now the primary virtual machine) is started, The second virtual machine device (now the standby virtual machine) is actively detected to determine whether the second virtual machine device (now the standby virtual machine) is normal.
  • Checking whether the second virtual machine device is normal may include detecting the Whether the second virtual machine device is in position and/or state is abnormal, for example, when the first virtual machine device detects that the second virtual machine device is not in position or the state is abnormal, it is determined that the second virtual machine device is abnormal, and the first virtual machine device is determined to be abnormal.
  • the second virtual machine device generates a target alarm.
  • the second virtual machine device (now the standby virtual machine) can also detect the failure of the abnormality interrupting unit and send the failure notification to the first virtual machine device (now the primary virtual machine) through its own fault detection unit.
  • the fault notification can be an SNMP Trap message. It should be understood that the abnormal interruption fault of the second virtual machine device in this embodiment may also be monitored or combined with the third-party monitoring module.
  • the second virtual machine device When the second virtual machine device (formerly the primary virtual machine) has a fatal service function abnormality (which may include abnormal business process status (such as loss of business critical process), virtual machine resource failure, network resource failure, etc.), the second virtual machine The device issues an active/standby switchover command. According to the active/standby switchover command, the first virtual machine device switches to the primary virtual machine, and the second virtual machine device switches to the standby virtual machine. After the first virtual machine device (now the primary virtual machine) is started, the second virtual machine device (now the standby virtual machine) is detected to determine whether the second virtual machine device is normal, and the second virtual machine is detected here.
  • a fatal service function abnormality which may include abnormal business process status (such as loss of business critical process), virtual machine resource failure, network resource failure, etc.
  • Whether the decoration is normal may include detecting whether the second virtual machine device is in position and/or state is abnormal, for example, when it is detected that the second virtual machine device is abnormal, the second virtual machine device is determined to be abnormal, and the second virtual machine is recorded.
  • the device generates a target alarm.
  • the second virtual machine device can also detect whether a fatal service function abnormality occurs through its own service function polling detection unit, and can also be monitored or combined with a third-party monitoring module.
  • the second virtual machine device may also perform self-detection, and send a failure notification to the first virtual machine device by self-detection. For example, if a downtime occurs after the second virtual machine device fails, the second virtual machine device cannot work, and the self-detection cannot be performed. In this case, optionally, the second virtual machine device can be used to access the second virtual machine. The status of the device is detected to determine if the second virtual machine device is abnormal.
  • the first virtual machine device initiating the second virtual machine device repair includes: the first virtual machine device initiates a restart process of restarting the second virtual machine device or initiates a process of reestablishing the second virtual machine device.
  • step 210 When the first virtual machine device initiates reconstruction of the second virtual machine device, please refer to FIG. 2, including step 210 and step 220.
  • step 210 the first virtual machine device sends a delete instruction for deleting the second virtual machine device to the virtual machine management node;
  • step 220 after the second virtual machine device is deleted, according to the preset reconfiguration policy, the first virtual machine device selects the second virtual machine device original resource setting or the adjusted second virtual machine device resource setting to virtual
  • the machine management node sends a virtual machine creation instruction to complete the reconstruction of the second virtual machine device.
  • the preset re-establishment policy in this embodiment can be determined according to the service function network element where the virtual machine is located, so it is more flexible and more scalable. For example, the decision can be made according to the type of the service function network element itself.
  • step 220 after the first virtual machine device sends the virtual machine creation instruction to the virtual machine management node according to the current service requirement and the like, the virtual machine creation instruction is sent to the virtual machine management node, as shown in FIG. - Step 330.
  • step 310 after the second virtual machine device is re-created, the first virtual machine device (formerly the primary virtual machine) initiates an active/standby switchover (the active/standby switchover is due to the first virtual machine device and the second created device).
  • the second virtual machine device (formerly the standby virtual machine) is re-created as the primary virtual machine, and the first virtual machine device is switched to the standby virtual machine;
  • step 320 the recreated second virtual machine device (now the primary virtual machine) sends a delete command to delete the first virtual machine device (now the standby virtual machine) to the virtual machine management node;
  • step 330 after the first virtual machine device is deleted, the re-created second virtual machine device (now the primary virtual machine) sends a virtual machine creation instruction to the virtual machine management node with the same settings as its own resources; The first virtual machine device 1 having the same resource in the second virtual machine device 2 is recreated. At this point, the reconstruction process is completed.
  • the reconstruction function of the virtual machine can also be controlled by a switch.
  • the method may further include: determining, by the first virtual machine device, whether there is an unprocessed virtual machine reconstruction process, if it is determined that the current existence exists.
  • the creation instruction of the second virtual machine device 2 may be initiated after the preset duration is delayed, or the target alarm may be re-detected.
  • the internal fault of the virtual machine system in this embodiment can be discovered by the virtual machine self-detection, and the virtual machine reconstruction can also be initiated by the virtual machine itself, and the reliance on the management node is removed, and the management node is Disaster recovery technology is complemented.
  • the reconstruction of the virtual machine can also be determined by the service function network element of the virtual machine, which is more convenient and flexible, and has better scalability.
  • Embodiment 2 is a diagrammatic representation of Embodiment 1:
  • the virtual machine system includes a dual virtual machine, where the dual virtual machine includes a first virtual machine device 1 and a second virtual machine device 2, and the first virtual machine device 1 is configured to initiate the second virtual machine device 2 repair when detecting that the second virtual machine device 2 generates the target alarm; the target alarm may be an alarm that needs to be repaired to the second virtual machine device 2.
  • the fault detection module, the alarm detection module, and the virtual machine repair module may be disposed on the dual virtual machine device in the embodiment; the fault detection module is configured to implement fault detection by using the fault detection module or the third party monitoring module; the alarm module is configured to The target alarm is found and the virtual machine repair module is triggered to perform virtual machine repair.
  • the following describes the case where the specific structure of the first virtual machine device 1 is combined with the case where the target alarm is generated.
  • the first virtual machine device 1 may include an alarm detecting module 11 and a virtual machine repairing module 12;
  • the alarm detecting module 11 is configured to detect whether the second virtual machine device generates a target alarm that needs to be repaired by the virtual machine;
  • the virtual machine repair module 12 is configured to initiate repair of the second virtual machine device when the alarm detection module 11 detects that the result is yes.
  • the alarm detecting module 11 of the first virtual machine device 1 may further include a first alarm detecting sub-module 111 configured to be in the first virtual machine device 1 as the primary virtual machine, and in the normal working state, the second virtual machine device is in the standby state.
  • a first alarm detecting sub-module 111 configured to be in the first virtual machine device 1 as the primary virtual machine, and in the normal working state, the second virtual machine device is in the standby state.
  • the failure notification sent by the second virtual machine device 2 is received, thereby determining that the second virtual machine device 2 has generated the target alarm according to the failure notification.
  • the second virtual machine device 2 can detect an abnormal interruption fault by its own fault detection unit, and the fault detection module of the second virtual machine device 2 can include a fault detection unit, and the fault detection unit is detecting When an abnormal interruption failure occurs to the second virtual machine device, a failure notification is sent to the first virtual machine device 1, wherein the failure notification may be an SNMP Trap message.
  • the alarm module of the first virtual machine device 1 in the normal working state may record the abnormality indicated by the message as a fatal failure. It should be understood that the abnormal interruption fault of the second virtual machine device 2 in this embodiment may also be monitored by using a third-party monitoring module, and the fault detection module is notified after the fault is detected.
  • the alarm detection module 11 of the first virtual machine device 1 may further include a second alarm detection sub-module 112.
  • the first virtual machine device 1 is a standby virtual machine, and may have an active/standby switching module.
  • the second virtual machine device 2 is originally a primary virtual machine, and may also have an active/standby switching module.
  • the second virtual machine device 2 is discovered by self-detection. If the fault occurs, the active/standby switchover is initiated when the fault occurs.
  • the second virtual machine device 2 switches from the original primary virtual machine to the standby virtual machine.
  • the device 1 switches from the original standby virtual machine to the primary virtual machine.
  • the second alarm detecting sub-module 112 of the first virtual machine device 1 actively detects whether the second virtual machine device 2 (now the standby virtual machine) is normal.
  • the first virtual machine device 1 (now the primary virtual machine) determines that the second virtual machine device 2 (now the standby virtual machine) generates a target alarm.
  • the fatal fault generated by the first virtual machine device 1 includes at least one of an abnormal interruption fault and a fatal business function abnormality; the following is an example of abnormal interruption fault and fatal business function abnormality.
  • the second virtual machine device 2 is originally a virtual machine.
  • an active/standby switchover command is generated to switch the second virtual machine device 2 from the original primary virtual machine to the standby device.
  • the first virtual machine device 1 is switched from the original standby virtual machine to the primary virtual machine.
  • the second virtual machine device 2 sends a failure notification to the first virtual machine device 1 (now the primary virtual machine); since the first virtual machine device 1 (now the primary virtual machine) is not fully activated at this time, the first The second alarm detecting sub-module 112 of the virtual machine device 1 does not receive the fault notification.
  • the second alarm detecting sub-module 112 can access the second virtual machine device 2 (now available).
  • the virtual machine performs a test to determine whether the second virtual machine device 2 is normal.
  • the detection of whether the second virtual machine device 2 is normal may include detecting whether the second virtual machine device 2 is in position and/or state is abnormal. For example, when it is detected that the second virtual machine device 2 is abnormal, it is determined that the second virtual machine device 2 is abnormal, and the second virtual machine device 2 is recorded to generate a target alarm.
  • the second virtual machine device 2 (now the standby virtual machine) can also detect the occurrence of an abnormal interruption fault by the fault detecting unit included in the fault detecting module of the fault, and the first virtual machine device 1 is The virtual machine sends a failure notification, which can be an SNMP Trap message. It should be understood that the abnormal interruption fault of the second virtual machine device 2 in this embodiment may also be monitored or combined with the third-party monitoring module, and then the third-party detection module sends the monitoring result to the second virtual machine device 2 for failure. Detection module.
  • the second virtual machine device 2 is originally a primary virtual machine, and the first virtual machine device 1 is originally a standby virtual machine.
  • the second virtual machine device 2 has a fatal service function abnormality (including but not limited to a business process state abnormality (such as a business critical process)
  • the active/standby switchover is initiated, the first virtual machine device 1 switches from the original standby virtual machine to the primary virtual machine according to the active/standby switchover command.
  • the second virtual machine device switches from the original primary virtual machine to the standby virtual machine.
  • the second alarm detecting sub-module 112 of the first virtual machine device 1 actively detects the second virtual machine device 2 (now the standby virtual machine).
  • Determining whether the second virtual machine device 2 is normal, and detecting whether the second virtual machine device 2 is normal includes, but is not limited to, detecting whether the second virtual machine device 2 is in position and/or state is abnormal, for example, when the presence is not detected.
  • the second virtual machine device 2 can also detect whether a fatal service function abnormality occurs through the service function polling detecting unit of the self fault detecting module, and can also be monitored by using or combined with a third-party monitoring module; the third-party monitoring module The monitoring result is sent to the fault detecting module of the second virtual machine device 2.
  • the second virtual machine device when the second virtual machine device fails, for example, the downtime is faulty, the second virtual machine device can be detected by the first virtual machine device, and the active/standby switching module of the first virtual machine device The active/standby switchover is initiated, and the first virtual machine device is switched from the original standby virtual machine to the primary virtual machine, and the second virtual machine device is switched from the original primary virtual machine to the standby virtual machine.
  • the fault detection module in the first virtual machine device 1 and the second virtual machine device 2 in this embodiment may include a fault detection unit of the virtual machine itself and a service function polling detection unit, and may also accept the third-party monitoring module.
  • the sent monitoring result is whether or not a target alarm is generated.
  • the fault detection module in this embodiment may also be configured to detect other types of alarms, and send the alarms to the alarm module, and the alarm module may perform different levels of screening and processing on the received alarms; for example;
  • the virtual machine repair module 12 is triggered to perform virtual machine repair.
  • the virtual machine repair module 12 of the first virtual machine device 1 includes a restart submodule 121 or a rebuild submodule 122;
  • the restarting sub-module 121 is configured to initiate a restart process of restarting the second virtual machine device 2 when the detection result of the alarm detecting module 11 is YES;
  • the reconstruction sub-module 122 is configured to initiate a process of reconstructing the second virtual machine device 2 when the detection result of the alarm detection module 11 is YES.
  • the second virtual machine device 2 may be implemented by the virtual machine management node by initiating a corresponding restart command to the virtual machine management node.
  • the restart of the second virtual machine device 2 can also be completed within the dual virtual machine by a corresponding restart command without passing through the virtual machine management node.
  • the reconstruction sub-module 122 of the first virtual machine device 1 includes a reconstruction initiation unit 1221 and a reconstruction unit 1222;
  • the reestablishment initiating unit 1221 is configured to send the deletion of the second virtual machine device 2 to the virtual machine management node. Delete instruction;
  • the reconstruction unit 1222 is configured to select the second virtual machine device 2 original resource setting or the adjusted second virtual machine device 2 resource setting according to the preset reconstruction policy after the second virtual machine device 2 is deleted, and manage the node to the virtual machine according to the preset reconstruction policy. Send a virtual machine creation command.
  • the preset re-establishment policy in this embodiment can be determined according to the service function network element where the virtual machine is located, so it is more flexible and more scalable. For example, the decision can be made according to the type of the service function network element itself.
  • the reconstruction unit 1222 of the virtual machine repair module 12 further includes: re-creating the second virtual machine device when the virtual machine creation instruction is sent to the virtual machine management node according to the current service requirement and the like, and the adjusted second virtual machine device resource setting. After the completion, the reconstruction unit 1222 initiates an active/standby switchover (the active/standby switchover is caused by the difference between the resources of the first virtual machine device and the new second virtual machine device), and the newly created second virtual machine device 2 is removed from the original device. The virtual machine is switched to the primary virtual machine, and the first virtual machine device 1 is switched from the original primary virtual machine to the standby virtual machine;
  • the virtual machine repair module of the second virtual machine device 2 that is re-created sends a delete instruction for deleting the first virtual machine device 1 to the virtual machine management node;
  • the virtual machine repair module of the second virtual machine device 2 that is re-created sends the virtual machine to the virtual machine management node after the first virtual machine device 1 is deleted, with the same settings as the resources of the newly created second virtual machine device 2. Creating an instruction; re-creating the first virtual machine device 1 having the same resource configuration as the second virtual machine device 2. At this point, the reconstruction process is completed.
  • the virtual machine repair module 12 of the virtual machine can also be controlled by a switch.
  • the virtual machine repair module 12 of the first virtual machine device in the embodiment may further determine whether there is an unprocessed virtual machine rebuild process. If it is determined that there is currently an unprocessed virtual machine re-establishment process, the creation instruction of the second virtual machine device 2 may be initiated after the preset duration is delayed, or the target alarm may be re-detected.
  • the active/standby switching relationship between the first virtual machine device 1 and the second virtual machine device 2 is dynamically changeable, and the first virtual machine device 1 serves as a standby virtual machine, and the second virtual machine device 2 As the primary virtual machine, the second virtual machine device 2 can also perform all the functions of the first virtual machine device 1, including but not limited to functions such as alarm detection and virtual machine repair.
  • the first virtual machine device 1 has all the functions of the second virtual machine device 2 described above, including alarm reporting and fault detection using a third-party monitoring module.
  • the internal fault of the virtual machine system can be discovered by the virtual machine self-detection, and the re-emergence of the virtual machine can also be initiated by the virtual machine itself.
  • the re-emergence of the virtual machine can also be determined by the service function network element of the virtual machine.
  • the dependency on the management node is more convenient, flexible, reliable, and more scalable.
  • the figure shows that the VNF instance deployed in the dual-machine mode is rebuilt and self-healed after the service function is abnormal, and includes steps 701-705.
  • step 701 the virtual machine A serves as the primary virtual machine, and the virtual machine B serves as the standby virtual machine.
  • the virtual machine A polls and detects its own business process. When the key service process of the virtual machine A is found to be lost, the active and standby virtual machine switches are performed. Switch virtual machine B to the primary virtual machine;
  • step 702 the virtual machine B starts and detects the virtual machine A state, and if the virtual machine A is not in the bit or the state is abnormal, a fatal alarm is generated;
  • step 703 after the virtual machine B finds the fatal alarm, the virtual machine B initiates an instruction to delete the virtual machine A to the management node;
  • step 704 after the virtual machine A is successfully deleted, the virtual machine B initiates a new virtual machine (called virtual machine C) according to the resource setting of the original virtual machine A according to the reconfiguration policy corresponding to the alarm type of the service function node. Re-creation;
  • step 705 after the virtual machine C is successfully created, the virtual machine C is started, and the service data is synchronized from the virtual machine B to become a new standby virtual machine.
  • the virtual machine B can also initiate the re-creation of the new virtual machine (referred to as virtual machine C) with the adjusted resource setting of the virtual machine A.
  • virtual machine C the new virtual machine
  • the virtual machine C is successfully started.
  • the virtual machine B initiates the active/standby switchover, the virtual machine C is switched to the virtual machine, and the virtual machine B is switched to the standby virtual machine.
  • the virtual machine C initiates the instruction to delete the virtual machine B to the management node, and the virtual machine B deletes successfully.
  • the virtual machine C is set up to initiate the re-creation of the new virtual machine (referred to as virtual machine D); after the virtual machine D is successfully created, the virtual machine D is started, and the service data is synchronized from the virtual machine C to become a new standby machine.
  • the figure shows that the VNF instance host deployed by the dual-machine is re-healed after being abnormally interrupted, and includes steps 801-step 805.
  • step 801 virtual machine A is used as the primary virtual machine, and virtual machine B is used as the standby virtual machine.
  • virtual machine A initiates a master-slave switchover, and virtual machine B is switched to be the primary virtual machine. Switch to a standby virtual machine;
  • step 802 the virtual machine B starts and detects the state of the standby virtual machine A, and detects whether a fatal alarm sent by the virtual machine A is found.
  • step 803 after discovering the fatal alarm, the virtual machine B initiates an instruction to delete the virtual machine A to the management node;
  • step 804 after the virtual machine A is successfully deleted, the virtual machine B initiates the re-creation of the new virtual machine (called the virtual machine C) according to the resource setting of the original virtual machine A according to the re-establishment policy corresponding to the alarm type.
  • step 805 after the virtual machine C is successfully created, the virtual machine C is started, and the service data is synchronized from the virtual machine B to become a new standby virtual machine.
  • FIG. 9 the figure shows a VNF instance standby virtual machine that is deployed in a dual-machine deployment and is self-healing after the abnormality is interrupted, and includes steps 901 to 904.
  • step 901 the virtual machine B is used as the standby virtual machine, and the virtual machine A is used as the primary virtual machine.
  • the virtual machine B sends a failure notification to the virtual machine A.
  • step 902 after discovering that the fault notification records the fatal alarm, the virtual machine A initiates deletion of the virtual machine B to the management node;
  • step 903 after the virtual machine B is successfully deleted, the virtual machine A initiates the re-creation of the new virtual machine (referred to as virtual machine C) according to the resource setting of the original virtual machine B according to the re-establishment policy corresponding to the alarm type.
  • virtual machine C the new virtual machine
  • step 904 after the virtual machine C is successfully created, the virtual machine C is started, and the service data is synchronized from the virtual machine A to become a new standby virtual machine.
  • the embodiment further provides a non-transitory computer readable storage medium storing computer executable instructions executable by any of the virtual machine repair methods.
  • the embodiment also provides an electronic device.
  • the electronic device is applied to the first virtual machine device, and includes:
  • One or more processors 1001, one processor 1001 is taken as an example in FIG. 10;
  • the electronic device may further include: an input device 1003 and an output device 1004.
  • the processor 1001, the memory 1002, the input device 1003, and the output device 1004 in the electronic device may be connected by a bus or other means, and the connection through the bus is taken as an example in FIG.
  • the memory 1002 is a non-transitory computer readable storage medium that can be used to store software programs, The computer executable program and the module, such as the program, the instruction or the module corresponding to the virtual machine repair method in the embodiment of the present disclosure (for example, the alarm detecting module 11 and the virtual machine repairing module 12 shown in FIG. 5).
  • the processor 1001 executes a plurality of functional applications and data processing by executing software programs, instructions or modules stored in the memory 1002, that is, implementing the virtual machine repairing method of the above method embodiments.
  • the memory 1002 can include a storage program area and a storage data area, wherein the storage program area can store an operating system, an application required by at least one function, and exemplarily, a preset reconstruction strategy can be stored in the storage program area of the electronic device.
  • the storage data area can store data and the like created according to the use of the electronic device.
  • the memory 1002 may include a high speed random access memory, and may also include a nonvolatile memory such as at least one magnetic disk storage device, flash memory device, or other nonvolatile solid state storage device.
  • the memory 1002 can optionally include a memory remotely located relative to the processor 1001 that can be connected to the electronic device over a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • the input device 1003 can be configured to receive input numeric or character information and to generate key signal inputs related to user settings and function control of the electronic device.
  • the output device 1004 can include a display device such as a display screen.
  • the electronic device of the embodiment of the present disclosure may further include a communication device 1005 that transmits and/or receives information through a communication network, for example, the electronic device may receive a target alarm sent by the second virtual machine device through the communication device 1005, and may also pass the communication device 1005. An instruction to delete the second virtual machine device is sent to the virtual machine management node.
  • a communication device 1005 that transmits and/or receives information through a communication network
  • the electronic device may receive a target alarm sent by the second virtual machine device through the communication device 1005, and may also pass the communication device 1005.
  • An instruction to delete the second virtual machine device is sent to the virtual machine management node.
  • modules or steps of the embodiments of the present disclosure may be implemented by a general-purpose computing device, which may be centralized on a single computing device or distributed over a network of multiple computing devices. Alternatively, they may be implemented by program code executable by the computing device such that they may be stored in a storage medium (ROM/RAM, diskette, optical disk) by a computing device, and in some cases
  • the steps shown or described may be performed in a different order than that herein, or they may be separately fabricated into a plurality of integrated circuit modules, or a plurality of the modules or steps may be implemented as a single integrated circuit module. Therefore, embodiments of the present disclosure are not limited to any particular combination of hardware and software.
  • the internal fault of the virtual machine system can be discovered by the virtual machine self-detection, and the internal fault in the dual virtual machine can be discovered in time;
  • the first virtual machine detects that the second virtual machine device generates the target alarm
  • the second virtual machine can be directly repaired, and the dependence on the upper management node is released, the reliability is better, and the fault repair efficiency is high.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Hardware Redundancy (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

L'invention concerne un procédé de réparation de machine virtuelle, un dispositif de machine virtuelle, un système et un élément de réseau fonctionnel de service. Un premier dispositif de machine virtuelle de deux machines virtuelles détecte si un second dispositif de machine virtuelle génère ou non un avertissement cible selon lequel une réparation de machine virtuelle est nécessaire, et lors de la détection du fait que le second dispositif de machine virtuelle génère l'avertissement cible selon lequel une réparation de machine virtuelle est nécessaire, initier la réparation du second dispositif de machine virtuelle.
PCT/CN2016/104293 2015-11-30 2016-11-02 Procédé de réparation de machine virtuelle, dispositif de machine virtuelle, système et élément de réseau fonctionnel de service WO2017092539A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510863669.8A CN106817238A (zh) 2015-11-30 2015-11-30 虚拟机修复方法、虚拟机装置、系统及业务功能网元
CN201510863669.8 2015-11-30

Publications (1)

Publication Number Publication Date
WO2017092539A1 true WO2017092539A1 (fr) 2017-06-08

Family

ID=58796250

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/104293 WO2017092539A1 (fr) 2015-11-30 2016-11-02 Procédé de réparation de machine virtuelle, dispositif de machine virtuelle, système et élément de réseau fonctionnel de service

Country Status (2)

Country Link
CN (1) CN106817238A (fr)
WO (1) WO2017092539A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11803452B2 (en) * 2019-02-14 2023-10-31 Nippon Telegraph And Telephone Corporation Duplexed operation system and method therefor

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111209145A (zh) * 2018-11-21 2020-05-29 中兴通讯股份有限公司 一种基于虚机容灾的业务自愈方法、设备和存储介质
CN115396278A (zh) * 2022-08-11 2022-11-25 西安雷风电子科技有限公司 一种系统异常处理方法及装置

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101895540A (zh) * 2010-07-12 2010-11-24 中兴通讯股份有限公司 用于应用服务进程守护的系统和方法
CN102110217A (zh) * 2009-12-28 2011-06-29 北京安码科技有限公司 一种通过虚拟机岗位轮换实现自动修复的方法
CN102708027A (zh) * 2012-05-11 2012-10-03 中兴通讯股份有限公司 一种避免通信设备运行中断的方法及系统
CN102917064A (zh) * 2012-10-23 2013-02-06 广州杰赛科技股份有限公司 基于私有云计算平台的双机热备方法
CN103152419A (zh) * 2013-03-08 2013-06-12 中标软件有限公司 一种云计算平台的高可用集群管理方法
CN103838593A (zh) * 2012-11-22 2014-06-04 华为技术有限公司 恢复虚拟机的方法、系统及控制器、服务器、寄宿主机
CN104572241A (zh) * 2013-10-18 2015-04-29 南京中兴新软件有限责任公司 应用程序的切换方法及装置、系统

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102801806A (zh) * 2012-08-10 2012-11-28 薛海强 一种云计算系统及云计算资源管理方法
CN103019849B (zh) * 2012-12-31 2015-10-07 无锡城市云计算中心有限公司 云计算环境下的虚拟机管理方法

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102110217A (zh) * 2009-12-28 2011-06-29 北京安码科技有限公司 一种通过虚拟机岗位轮换实现自动修复的方法
CN101895540A (zh) * 2010-07-12 2010-11-24 中兴通讯股份有限公司 用于应用服务进程守护的系统和方法
CN102708027A (zh) * 2012-05-11 2012-10-03 中兴通讯股份有限公司 一种避免通信设备运行中断的方法及系统
CN102917064A (zh) * 2012-10-23 2013-02-06 广州杰赛科技股份有限公司 基于私有云计算平台的双机热备方法
CN103838593A (zh) * 2012-11-22 2014-06-04 华为技术有限公司 恢复虚拟机的方法、系统及控制器、服务器、寄宿主机
CN103152419A (zh) * 2013-03-08 2013-06-12 中标软件有限公司 一种云计算平台的高可用集群管理方法
CN104572241A (zh) * 2013-10-18 2015-04-29 南京中兴新软件有限责任公司 应用程序的切换方法及装置、系统

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11803452B2 (en) * 2019-02-14 2023-10-31 Nippon Telegraph And Telephone Corporation Duplexed operation system and method therefor

Also Published As

Publication number Publication date
CN106817238A (zh) 2017-06-09

Similar Documents

Publication Publication Date Title
US11249860B2 (en) Node down recovery method and apparatus, electronic device, and storage medium
CN105790980B (zh) 一种故障修复方法及装置
US7577720B2 (en) Migration method for software application in a multi-computing architecture, method for carrying out functional continuity implementing said migration method and multi-computing system provided therewith
JP6466003B2 (ja) Vnfフェイルオーバの方法及び装置
US9684574B2 (en) Method and system for implementing remote disaster recovery switching of service delivery platform
WO2016107172A1 (fr) Procédé de traitement de quorum après déconnexion de deux parties d'une grappe, et dispositif et système de stockage de quorum
US10764119B2 (en) Link handover method for service in storage system, and storage device
US20110289344A1 (en) Automated node fencing integrated within a quorum service of a cluster infrastructure
US11892922B2 (en) State management methods, methods for switching between master application server and backup application server, and electronic devices
WO2016058307A1 (fr) Appareil et procédé de gestion de défaut pour une ressource
WO2018095414A1 (fr) Procédé et appareil de détection et de reprise après défaillance d'une machine virtuelle
WO2016045439A1 (fr) Procédé et dispositif de protection tolérant aux catastrophes de vnfm, nfvo et support de stockage
CN106936613B (zh) 一种Openflow交换机快速主备切换的方法和系统
CN102394914A (zh) 集群脑裂处理方法和装置
WO2012174893A1 (fr) Procédé et dispositif de commutation sur la base d'une reprise sur sinistre de deux centres dans un système iptv
CN110673981B (zh) 故障恢复方法、装置和系统
WO2017092539A1 (fr) Procédé de réparation de machine virtuelle, dispositif de machine virtuelle, système et élément de réseau fonctionnel de service
WO2015154620A1 (fr) Système à multiples contrôleurs openflow et son procédé de gestion
CN112380062A (zh) 一种基于系统备份点多次快速恢复系统的方法及系统
CN111342986B (zh) 分布式节点管理方法及装置、分布式系统、存储介质
WO2018094686A1 (fr) Procédé de gestion d'interruption de service smb, et dispositif de stockage
JP6421516B2 (ja) サーバ装置、冗長構成サーバシステム、情報引継プログラム及び情報引継方法
CN113438111A (zh) 基于Raft分布式恢复RabbitMQ网络分区的方法及应用
WO2017080362A1 (fr) Procédé et dispositif de gestion de données
US8965199B2 (en) Method and apparatus for automatically restoring node resource state in WSON system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16869849

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16869849

Country of ref document: EP

Kind code of ref document: A1