WO2017092539A1

WO2017092539A1 - Virtual machine repairing method, virtual machine device, system, and service functional network element

Info

Publication number: WO2017092539A1
Application number: PCT/CN2016/104293
Authority: WO
Inventors: 张川; 虞振峰
Original assignee: 中兴通讯股份有限公司
Priority date: 2015-11-30
Filing date: 2016-11-02
Publication date: 2017-06-08
Also published as: CN106817238A

Abstract

A virtual machine repairing method, a virtual machine device, a system, and a service functional network element. A first virtual machine device of two virtual machines detects whether a second virtual machine device generates a target warning that a virtual machine repair is required, and upon detecting that the second virtual machine device generates the target warning that a virtual machine repair is required, initiates repair of the second virtual machine device.

Description

Virtual machine repair method, virtual machine device, system, and service function network element

Technical field

The present disclosure relates to the field of communications, for example, to a virtual machine repair method, a virtual machine device, a system, and a service function network element.

Background technique

In the field of computer/communication virtualization, for example, in the network function virtualization (NFV) protocol architecture, a virtual network function (Virtualized Network Function) is usually created by using a dual-machine (primary and standby virtual machine) manner. VNF) instances implement disaster recovery backup. On the management node, the dual-system VNF instance is presented as a function node. The management node monitors and alarms the fault of the dual-system VNF instance. The management node generates only when the primary and backup virtual machines in the dual-system VNF instance are abnormal. A fatal alarm can be used to manually restart or regenerate a service instance to complete the fault repair of the primary and backup virtual machines.

However, when the dual-system VNF instance has a dual-system internal fault and a service function fault, the virtual node that constitutes the dual-system VNF instance is still in the working state, and the management node cannot find the fault, and the management node cannot repair the fault. The VNF instance has an internal fault of the dual-system, which may refer to a failure of one of the primary virtual machine or the standby virtual machine. That is, when the standby virtual machine fails, the primary virtual machine is in a normal working state, or when the primary virtual machine is in the virtual state. The machine is faulty and the standby virtual machine is in normal working condition.

Therefore, when the management node finds that the dual-system VNF instance is faulty, the active and standby virtual machines have failed. This has caused service interruption of the VNF instance.

It can be seen that the mechanism for creating a VNF instance using the dual-machine mode for disaster recovery backup is not sound enough and the reliability is insufficient. In addition, in other virtualization areas, the fault of the virtual machine layer that constitutes the VNF instance is usually reported to the upper-level management node in an alarm manner, and the upper-level management node distinguishes the reported virtual machine alarms. For the alarm generated by the fatal fault, the upper management node needs to rebuild or restart the virtual machine to fix the fault. The instruction to rebuild or restart is issued through the decision of the superior management node. In this top-down management mode, between the management node and the service function node (that is, the service function network element), such as the dual-machine VNF instance, Develop a special software interface specification. This special software interface specification has certain limitations on the openness of the system. For a virtual machine that does not conform to the agreed interface specification, the system cannot be accessed. And in the top-down management mode, the upper management node decides to pass a specific software interface. It is not timely enough to issue re-repair instructions, which reduces the efficiency of system fault repair.

Summary of the invention

The present disclosure provides a virtual machine repairing method, a virtual machine device, a system, and a service function network element, which can solve the problem that the management node cannot discover the internal fault of the dual virtual machine in time and the repair efficiency is low after the fault is found.

The embodiment of the present disclosure provides a virtual machine repairing method, including:

The first virtual machine device in the dual virtual machine detects that the second virtual machine device generates a target alarm that requires virtual machine repair;

The first virtual machine device initiates repairing the second virtual machine device.

Optionally, the detecting, by the first virtual machine device, the second virtual machine device to generate the target alarm includes:

Receiving, by the first virtual machine device, a failure notification sent by the second virtual machine device when an abnormal interruption failure occurs;

The first virtual machine device determines that the second virtual machine device generates a target alarm according to the failure notification.

Optionally, before the detecting, by the first virtual machine device, the second virtual machine device to generate the target alarm, the method further includes:

The first virtual machine device is switched from the standby virtual machine to the primary virtual machine, and the second virtual machine device is switched from the primary virtual machine to the standby virtual machine.

After the first virtual machine device is switched to be the primary virtual machine, the first virtual machine device detects whether the second virtual machine device is normal. When the first virtual machine device detects that the second virtual machine device is abnormal, Determining that the second virtual machine device generates a target alarm that requires virtual machine repair.

Optionally, after the first virtual machine device detects that the second virtual machine device generates a target alarm that needs to be repaired by the virtual machine, the method further includes:

Determining that the second virtual machine device has a target failure, wherein the target failure includes at least one of an abnormal interruption failure and a fatal business function abnormality.

Optionally, the detecting, by the first virtual machine device, whether the second virtual machine device is normal comprises: detecting whether the second virtual machine device is in a position or a state is abnormal.

Optionally, the first virtual machine device initiates repairing the second virtual machine device, including:

The first virtual machine device initiates a restart process of restarting the second virtual machine device or initiates a process of reestablishing the second virtual machine device.

Optionally, before the first virtual machine device initiates a process of reestablishing the second virtual machine device, the method further includes: when the first virtual machine device determines that there is currently an unprocessed virtual machine rebuilding process, After the preset duration is delayed, the process of reestablishing the second virtual machine device is initiated, or the target alarm is re-detected.

Optionally, the process of the first virtual machine device initiating reconstruction of the second virtual machine device includes:

The first virtual machine device sends a delete instruction for deleting the second virtual machine device to the virtual machine management node;

After the second virtual machine device is deleted, the first virtual machine device selects, according to a preset re-establishment policy, the second virtual machine device original resource setting or the adjusted second virtual machine device resource setting to the The virtual machine management node sends a virtual machine creation instruction.

Optionally, after the first virtual machine device sends the virtual machine creation instruction to the virtual machine management node by using the adjusted second virtual machine device resource setting, the method further includes:

After the creation of the new second virtual machine device is completed, the first virtual machine device initiates an active/standby switchover instruction, so that the new second virtual machine device switches to the primary virtual machine, and switches itself to the standby virtual machine;

Sending, by the new second virtual machine device, a delete instruction for deleting the first virtual machine device to the virtual machine management node;

After the first virtual machine device is deleted, the new second virtual machine device sends a virtual machine creation instruction to the virtual machine management node with the resource setting of the new second virtual machine device to create The new second virtual machine device has a new first virtual machine device of the same resource configuration.

The embodiment of the present disclosure further provides a first virtual machine device, including an alarm detection module and a virtual machine repair module;

The alarm detection module is configured to detect whether the second virtual machine device generates a target alarm that needs to be repaired by the virtual machine;

The virtual machine repair module is configured to initiate repair of the second virtual machine device when the alarm detection module detects a result to the second virtual machine device to generate a target alarm that needs to be repaired by the virtual machine.

Optionally, the alarm detection module includes a first alarm detection sub-module, configured to receive a failure notification sent by the second virtual machine device when the abnormality is interrupted, and determine that the second virtual machine device generates a target alarm.

Optionally, the first virtual machine device further includes an active/standby switching module, configured to initiate an active/standby switchover when the second virtual machine device generates a fault, and switch the first virtual machine device to be a primary virtual machine. .

Optionally, the alarm detection module includes a second alarm detection sub-module, configured to determine, after the first virtual machine device is switched to be a primary virtual machine, when detecting that the second virtual machine device is abnormally abnormal, The second virtual machine device generates a target alarm.

Optionally, the virtual machine repair module includes a restart submodule or a rebuild submodule;

The restarting submodule is configured to initiate a restart process of restarting the second virtual machine device when the alarm detection module detects that the result is yes;

The reestablishing submodule is configured to initiate a process of reestablishing the second virtual machine device when the alarm detection module detects that the result is yes.

Optionally, when the virtual machine repair module includes a rebuild submodule, the rebuild submodule includes a rebuild initiation unit and a reconstruction unit;

The reestablishment initiating unit is configured to send, to the virtual machine management node, a delete instruction for deleting the second virtual machine device;

The reconstruction unit is configured to: after the second virtual machine device is deleted, select, according to a preset reconfiguration policy, the second virtual machine device original resource setting or the adjusted second virtual machine device resource setting to the virtual The machine management node sends a virtual machine creation instruction.

An embodiment of the present disclosure further provides a virtual machine system, including a second virtual machine device and a first virtual machine device as described above;

The first virtual machine device is configured to initiate repairing the second virtual machine device when detecting that the second virtual machine device generates a target alarm that requires virtual machine repair.

The embodiment of the present disclosure further provides a service function network element, including the virtual machine system as described above.

Embodiments of the present disclosure also provide a non-transitory computer readable storage medium storing computer executable instructions for performing the virtual machine repair method described above.

Embodiments of the present disclosure also provide an electronic device including one or more processors, a memory, and one or more programs, the one or more programs being stored in a memory when being processed by one or more processors When executed, the above virtual machine repair method is executed.

The virtual machine repairing method, the virtual machine device, the system, and the service function network element provided by the embodiment of the present disclosure perform, by using the first virtual machine device in the dual virtual machine, whether the second virtual machine device generates a target alarm that needs to be repaired by the virtual machine. Detecting, rather than relying solely on the management node, the internal fault in the dual virtual machine can be discovered in time through the self-detection mode inside the dual-system; and when the first primary virtual machine detects that the second virtual machine device generates the target alarm Then, the modification of the standby virtual machine can be initiated, and the alarm can be sent to the upper management node step by step, and then the management node re-analyzes and then issues the modification instruction, which can cancel the virtual The dependence of the machine system on the upper management node is better, the fault recovery efficiency is higher, and the method is more flexible and effective.

DRAWINGS

FIG. 1 is a schematic flowchart of a virtual machine repairing method according to Embodiment 1 of the present disclosure;

FIG. 2 is a schematic diagram of a virtual machine re-growth process initiated according to Embodiment 1 of the present disclosure;

3 is a schematic flowchart of performing rebirth when adjusting resource settings according to Embodiment 1 of the present disclosure;

4 is a schematic structural diagram of a virtual machine system according to Embodiment 2 of the present disclosure;

FIG. 5 is a schematic structural diagram of a first virtual machine device according to Embodiment 2 of the present disclosure;

FIG. 6 is another schematic structural diagram of a first virtual machine device according to Embodiment 2 of the present disclosure;

FIG. 7 is a schematic flowchart of performing rebirth when a host service function is abnormal according to Embodiment 3 of the present disclosure;

FIG. 8 is a schematic flowchart of performing rebirth when a host is abnormally interrupted according to Embodiment 3 of the present disclosure;

FIG. 9 is a schematic flowchart of performing rebirth when a standby machine is abnormally interrupted according to Embodiment 3 of the present disclosure;

FIG. 10 is a schematic structural diagram of hardware of an electronic device provided by five according to an embodiment of the present disclosure.

detailed description

The embodiment of the present disclosure can detect an internal fault in the dual virtual machine in time by the self-detection in the dual virtual machine; and when the first virtual machine device detects that the second virtual machine device generates the target alarm, the second virtual device can be directly initiated. The modification of the machine device can release the dependence on the upper management node, so that the reliability and fault repair efficiency of the system are improved.

Embodiment 1

Referring to FIG. 1 , the virtual machine repairing method in this embodiment includes steps 110 - 120.

In step 110, the first virtual machine device in the dual virtual machine detects that the second virtual machine device generates a target alarm that needs to be repaired by the virtual machine;

In step 120, the first virtual machine device initiates repairing the second virtual machine device.

It should be understood that, in this embodiment, the active/standby switchover relationship between the first virtual machine device and the second virtual machine device is dynamically changeable, the first virtual machine device acts as the standby virtual machine, and the second virtual machine device acts as the primary virtual machine. The second virtual machine device can also perform all functions of the first virtual machine device, including but not limited to alarm detection, virtual machine repair, and the like. The first virtual machine device has all the functions of the second virtual machine device, including alarm reporting and function detection by using a third-party monitoring module.

In the above step 110, the first virtual machine device detects that the second virtual machine device generates the target alarm, and at least includes the following.

Case 1: The first virtual machine device in step 110 is the primary virtual machine, and in the normal working process, the second virtual machine device serving as the standby virtual machine is in a standby state, and the second virtual machine device is monitored by self-detection. If the abnormal interrupting fault occurs, the second virtual machine device may send a failure notification to the first virtual machine device, and the first virtual machine device receives the failure notification sent by the second virtual machine device, thereby determining and recording the second virtual device. The device generates a target alarm.

In this embodiment, the second virtual machine device may detect that an abnormal interruption fault occurs through its own fault detection unit, and may send a failure notification to the first virtual machine device, and the failure notification may be a simple network management protocol (Simple Network Management Protocol). , SNMP) Trap message. After receiving the SNMP Trap message, the first virtual machine device in the normal working state can record the abnormality corresponding to the message as a fatal fault. It should be understood that the abnormal interruption fault of the second virtual machine device in this embodiment may also be monitored by using a third-party monitoring module, and the third-party detection module and the self-detection detecting unit may be jointly monitored.

Case 2: The first virtual machine device in step 110 is originally a standby virtual machine, and the second virtual machine device is originally a primary virtual machine. When the second virtual machine device generates a failure to initiate an active/standby switchover, According to the active/standby switching instruction, the first virtual machine device switches to the primary virtual machine, and the second virtual machine device switches to the standby virtual machine. After the first virtual machine device is switched to the primary virtual machine, it is detected whether the second virtual machine device (now the standby virtual machine) is normal. If the first virtual machine device detects that the second virtual machine device is abnormal, the second virtual device is determined to be the second virtual device. The device generates a target alarm.

It should be noted that both the first virtual machine device and the second virtual machine device perform self-detection in real time, and any one of the first virtual machine device and the second virtual device device needs to be mastered when it is detected. When the backup fails, a failure notification is sent to the other of the first virtual machine device and the second virtual machine device. In this embodiment, the fault that needs to be performed by the second virtual machine device to perform the active/standby switchover includes at least one of an abnormal interrupt fault and a fatal service function abnormal fault; and the abnormal interrupt fault and the fatal service function abnormal fault are respectively illustrated below. .

The second virtual machine device is the primary virtual machine. When the second virtual machine device is abnormally interrupted, the second virtual machine device initiates an active/standby switchover command, and switches to the standby virtual machine, where the first virtual machine device is replaced by the original virtual machine. The machine switches to the primary virtual machine.

After the handover is completed, the second virtual machine device (now the standby virtual machine) sends a failure notification to the first virtual machine device (now the primary virtual machine); since the first virtual machine device after the switch is not fully activated at this time, The failure notification is not received, and after the first virtual machine device (now the primary virtual machine) is started, The second virtual machine device (now the standby virtual machine) is actively detected to determine whether the second virtual machine device (now the standby virtual machine) is normal. Checking whether the second virtual machine device is normal may include detecting the Whether the second virtual machine device is in position and/or state is abnormal, for example, when the first virtual machine device detects that the second virtual machine device is not in position or the state is abnormal, it is determined that the second virtual machine device is abnormal, and the first virtual machine device is determined to be abnormal. The second virtual machine device generates a target alarm.

In this embodiment, the second virtual machine device (now the standby virtual machine) can also detect the failure of the abnormality interrupting unit and send the failure notification to the first virtual machine device (now the primary virtual machine) through its own fault detection unit. , wherein the fault notification can be an SNMP Trap message. It should be understood that the abnormal interruption fault of the second virtual machine device in this embodiment may also be monitored or combined with the third-party monitoring module.

When the second virtual machine device (formerly the primary virtual machine) has a fatal service function abnormality (which may include abnormal business process status (such as loss of business critical process), virtual machine resource failure, network resource failure, etc.), the second virtual machine The device issues an active/standby switchover command. According to the active/standby switchover command, the first virtual machine device switches to the primary virtual machine, and the second virtual machine device switches to the standby virtual machine. After the first virtual machine device (now the primary virtual machine) is started, the second virtual machine device (now the standby virtual machine) is detected to determine whether the second virtual machine device is normal, and the second virtual machine is detected here. Whether the decoration is normal may include detecting whether the second virtual machine device is in position and/or state is abnormal, for example, when it is detected that the second virtual machine device is abnormal, the second virtual machine device is determined to be abnormal, and the second virtual machine is recorded. The device generates a target alarm. In this embodiment, the second virtual machine device can also detect whether a fatal service function abnormality occurs through its own service function polling detection unit, and can also be monitored or combined with a third-party monitoring module.

It should be noted that the above-mentioned failure that needs to perform the master-slave switching does not cause the second virtual machine device to stop working, and can be understood as a failure that only causes the second virtual machine device to fail to operate normally. At this time, the second virtual machine device may also perform self-detection, and send a failure notification to the first virtual machine device by self-detection. For example, if a downtime occurs after the second virtual machine device fails, the second virtual machine device cannot work, and the self-detection cannot be performed. In this case, optionally, the second virtual machine device can be used to access the second virtual machine. The status of the device is detected to determine if the second virtual machine device is abnormal.

In step 120, the first virtual machine device initiating the second virtual machine device repair includes: the first virtual machine device initiates a restart process of restarting the second virtual machine device or initiates a process of reestablishing the second virtual machine device.

When the first virtual machine device initiates a restart process of restarting the second virtual machine device, restarting the second virtual machine device by using the virtual machine management node by initiating a corresponding restart instruction to the virtual machine management node; The inside of the machine completes the restart of the second virtual machine device through the corresponding restart command without passing through the virtual machine management node.

When the first virtual machine device initiates reconstruction of the second virtual machine device, please refer to FIG. 2, including step 210 and step 220.

In step 210, the first virtual machine device sends a delete instruction for deleting the second virtual machine device to the virtual machine management node;

In step 220, after the second virtual machine device is deleted, according to the preset reconfiguration policy, the first virtual machine device selects the second virtual machine device original resource setting or the adjusted second virtual machine device resource setting to virtual The machine management node sends a virtual machine creation instruction to complete the reconstruction of the second virtual machine device.

The preset re-establishment policy in this embodiment can be determined according to the service function network element where the virtual machine is located, so it is more flexible and more scalable. For example, the decision can be made according to the type of the service function network element itself.

In the foregoing step 220, after the first virtual machine device sends the virtual machine creation instruction to the virtual machine management node according to the current service requirement and the like, the virtual machine creation instruction is sent to the virtual machine management node, as shown in FIG. - Step 330.

In step 310, after the second virtual machine device is re-created, the first virtual machine device (formerly the primary virtual machine) initiates an active/standby switchover (the active/standby switchover is due to the first virtual machine device and the second created device). The second virtual machine device (formerly the standby virtual machine) is re-created as the primary virtual machine, and the first virtual machine device is switched to the standby virtual machine;

In step 320, the recreated second virtual machine device (now the primary virtual machine) sends a delete command to delete the first virtual machine device (now the standby virtual machine) to the virtual machine management node;

In step 330, after the first virtual machine device is deleted, the re-created second virtual machine device (now the primary virtual machine) sends a virtual machine creation instruction to the virtual machine management node with the same settings as its own resources; The first virtual machine device 1 having the same resource in the second virtual machine device 2 is recreated. At this point, the reconstruction process is completed.

In addition, in this embodiment, the reconstruction function of the virtual machine can also be controlled by a switch.

Since the virtual machine reestablishment request is unique, a corresponding mechanism can be used to protect the unique integrity of the virtual machine reconstruction for frequent target alarms that may occur in the system. Therefore, before the first virtual machine device in the embodiment initiates the process of reestablishing the second virtual machine device, the method may further include: determining, by the first virtual machine device, whether there is an unprocessed virtual machine reconstruction process, if it is determined that the current existence exists. In the unprocessed virtual machine re-establishment process, the creation instruction of the second virtual machine device 2 may be initiated after the preset duration is delayed, or the target alarm may be re-detected.

It can be seen that the internal fault of the virtual machine system in this embodiment can be discovered by the virtual machine self-detection, and the virtual machine reconstruction can also be initiated by the virtual machine itself, and the reliance on the management node is removed, and the management node is Disaster recovery technology is complemented. In addition, the reconstruction of the virtual machine can also be determined by the service function network element of the virtual machine, which is more convenient and flexible, and has better scalability.

Embodiment 2:

This embodiment provides a virtual machine system. As shown in FIG. 4, the virtual machine system includes a dual virtual machine, where the dual virtual machine includes a first virtual machine device 1 and a second virtual machine device 2, and the first virtual machine device 1 is configured to initiate the second virtual machine device 2 repair when detecting that the second virtual machine device 2 generates the target alarm; the target alarm may be an alarm that needs to be repaired to the second virtual machine device 2.

The fault detection module, the alarm detection module, and the virtual machine repair module may be disposed on the dual virtual machine device in the embodiment; the fault detection module is configured to implement fault detection by using the fault detection module or the third party monitoring module; the alarm module is configured to The target alarm is found and the virtual machine repair module is triggered to perform virtual machine repair. The following describes the case where the specific structure of the first virtual machine device 1 is combined with the case where the target alarm is generated.

Referring to FIG. 5, the first virtual machine device 1 may include an alarm detecting module 11 and a virtual machine repairing module 12;

The alarm detecting module 11 is configured to detect whether the second virtual machine device generates a target alarm that needs to be repaired by the virtual machine;

The virtual machine repair module 12 is configured to initiate repair of the second virtual machine device when the alarm detection module 11 detects that the result is yes.

The alarm detecting module 11 of the first virtual machine device 1 may further include a first alarm detecting sub-module 111 configured to be in the first virtual machine device 1 as the primary virtual machine, and in the normal working state, the second virtual machine device is in the standby state. In the case of the virtual machine, when the second virtual machine device 2 has an abnormal interruption failure, the failure notification sent by the second virtual machine device 2 is received, thereby determining that the second virtual machine device 2 has generated the target alarm according to the failure notification.

In this embodiment, the second virtual machine device 2 can detect an abnormal interruption fault by its own fault detection unit, and the fault detection module of the second virtual machine device 2 can include a fault detection unit, and the fault detection unit is detecting When an abnormal interruption failure occurs to the second virtual machine device, a failure notification is sent to the first virtual machine device 1, wherein the failure notification may be an SNMP Trap message.

After receiving the SNMP Trap message, the alarm module of the first virtual machine device 1 in the normal working state may record the abnormality indicated by the message as a fatal failure. It should be understood that the abnormal interruption fault of the second virtual machine device 2 in this embodiment may also be monitored by using a third-party monitoring module, and the fault detection module is notified after the fault is detected.

The alarm detection module 11 of the first virtual machine device 1 may further include a second alarm detection sub-module 112. The first virtual machine device 1 is a standby virtual machine, and may have an active/standby switching module. The second virtual machine device 2 is originally a primary virtual machine, and may also have an active/standby switching module. The second virtual machine device 2 is discovered by self-detection. If the fault occurs, the active/standby switchover is initiated when the fault occurs. The second virtual machine device 2 switches from the original primary virtual machine to the standby virtual machine. The device 1 switches from the original standby virtual machine to the primary virtual machine. After the first virtual machine device 1 is switched to the primary virtual machine, the second alarm detecting sub-module 112 of the first virtual machine device 1 actively detects whether the second virtual machine device 2 (now the standby virtual machine) is normal. The first virtual machine device 1 (now the primary virtual machine) determines that the second virtual machine device 2 (now the standby virtual machine) generates a target alarm.

In this embodiment, the fatal fault generated by the first virtual machine device 1 includes at least one of an abnormal interruption fault and a fatal business function abnormality; the following is an example of abnormal interruption fault and fatal business function abnormality.

The second virtual machine device 2 is originally a virtual machine. When the second virtual machine device 2 has an abnormal interrupt failure, an active/standby switchover command is generated to switch the second virtual machine device 2 from the original primary virtual machine to the standby device. In the virtual machine, the first virtual machine device 1 is switched from the original standby virtual machine to the primary virtual machine. At the same time, the second virtual machine device 2 sends a failure notification to the first virtual machine device 1 (now the primary virtual machine); since the first virtual machine device 1 (now the primary virtual machine) is not fully activated at this time, the first The second alarm detecting sub-module 112 of the virtual machine device 1 does not receive the fault notification. After the first virtual machine device 1 is started, the second alarm detecting sub-module 112 can access the second virtual machine device 2 (now available). The virtual machine performs a test to determine whether the second virtual machine device 2 is normal. The detection of whether the second virtual machine device 2 is normal may include detecting whether the second virtual machine device 2 is in position and/or state is abnormal. For example, when it is detected that the second virtual machine device 2 is abnormal, it is determined that the second virtual machine device 2 is abnormal, and the second virtual machine device 2 is recorded to generate a target alarm. In this embodiment, the second virtual machine device 2 (now the standby virtual machine) can also detect the occurrence of an abnormal interruption fault by the fault detecting unit included in the fault detecting module of the fault, and the first virtual machine device 1 is The virtual machine sends a failure notification, which can be an SNMP Trap message. It should be understood that the abnormal interruption fault of the second virtual machine device 2 in this embodiment may also be monitored or combined with the third-party monitoring module, and then the third-party detection module sends the monitoring result to the second virtual machine device 2 for failure. Detection module.

The second virtual machine device 2 is originally a primary virtual machine, and the first virtual machine device 1 is originally a standby virtual machine. When the second virtual machine device 2 has a fatal service function abnormality (including but not limited to a business process state abnormality (such as a business critical process) When the active/standby switchover is initiated, the first virtual machine device 1 switches from the original standby virtual machine to the primary virtual machine according to the active/standby switchover command. The second virtual machine device switches from the original primary virtual machine to the standby virtual machine. After the first virtual machine device 1 (now the primary virtual machine) is started, the second alarm detecting sub-module 112 of the first virtual machine device 1 actively detects the second virtual machine device 2 (now the standby virtual machine). Determining whether the second virtual machine device 2 is normal, and detecting whether the second virtual machine device 2 is normal includes, but is not limited to, detecting whether the second virtual machine device 2 is in position and/or state is abnormal, for example, when the presence is not detected. When the bit or state is abnormal, it is determined that the second virtual machine device 2 is abnormal, and the second virtual machine device 2 is recorded to generate a target alarm. In this embodiment, the second virtual machine device 2 can also detect whether a fatal service function abnormality occurs through the service function polling detecting unit of the self fault detecting module, and can also be monitored by using or combined with a third-party monitoring module; the third-party monitoring module The monitoring result is sent to the fault detecting module of the second virtual machine device 2.

In addition, it should be noted that when the second virtual machine device fails, for example, the downtime is faulty, the second virtual machine device can be detected by the first virtual machine device, and the active/standby switching module of the first virtual machine device The active/standby switchover is initiated, and the first virtual machine device is switched from the original standby virtual machine to the primary virtual machine, and the second virtual machine device is switched from the original primary virtual machine to the standby virtual machine.

It can be seen that the fault detection module in the first virtual machine device 1 and the second virtual machine device 2 in this embodiment may include a fault detection unit of the virtual machine itself and a service function polling detection unit, and may also accept the third-party monitoring module. The sent monitoring result is whether or not a target alarm is generated. It should be understood that the fault detection module in this embodiment may also be configured to detect other types of alarms, and send the alarms to the alarm module, and the alarm module may perform different levels of screening and processing on the received alarms; for example; When the selected alarm is the target alarm, the virtual machine repair module 12 is triggered to perform virtual machine repair.

Referring to FIG. 6, the virtual machine repair module 12 of the first virtual machine device 1 includes a restart submodule 121 or a rebuild submodule 122;

The restarting sub-module 121 is configured to initiate a restart process of restarting the second virtual machine device 2 when the detection result of the alarm detecting module 11 is YES;

The reconstruction sub-module 122 is configured to initiate a process of reconstructing the second virtual machine device 2 when the detection result of the alarm detection module 11 is YES.

When the restarting sub-module 121 of the first virtual machine device 1 initiates a restart process of restarting the second virtual machine device 2, the second virtual machine device 2 may be implemented by the virtual machine management node by initiating a corresponding restart command to the virtual machine management node. The restart of the second virtual machine device 2 can also be completed within the dual virtual machine by a corresponding restart command without passing through the virtual machine management node.

The reconstruction sub-module 122 of the first virtual machine device 1 includes a reconstruction initiation unit 1221 and a reconstruction unit 1222;

The reestablishment initiating unit 1221 is configured to send the deletion of the second virtual machine device 2 to the virtual machine management node. Delete instruction;

The reconstruction unit 1222 is configured to select the second virtual machine device 2 original resource setting or the adjusted second virtual machine device 2 resource setting according to the preset reconstruction policy after the second virtual machine device 2 is deleted, and manage the node to the virtual machine according to the preset reconstruction policy. Send a virtual machine creation command.

The reconstruction unit 1222 of the virtual machine repair module 12 further includes: re-creating the second virtual machine device when the virtual machine creation instruction is sent to the virtual machine management node according to the current service requirement and the like, and the adjusted second virtual machine device resource setting. After the completion, the reconstruction unit 1222 initiates an active/standby switchover (the active/standby switchover is caused by the difference between the resources of the first virtual machine device and the new second virtual machine device), and the newly created second virtual machine device 2 is removed from the original device. The virtual machine is switched to the primary virtual machine, and the first virtual machine device 1 is switched from the original primary virtual machine to the standby virtual machine;

The virtual machine repair module of the second virtual machine device 2 that is re-created sends a delete instruction for deleting the first virtual machine device 1 to the virtual machine management node;

The virtual machine repair module of the second virtual machine device 2 that is re-created sends the virtual machine to the virtual machine management node after the first virtual machine device 1 is deleted, with the same settings as the resources of the newly created second virtual machine device 2. Creating an instruction; re-creating the first virtual machine device 1 having the same resource configuration as the second virtual machine device 2. At this point, the reconstruction process is completed.

In addition, in this embodiment, the virtual machine repair module 12 of the virtual machine can also be controlled by a switch.

Since the virtual machine reestablishment request is unique, a corresponding mechanism can be used to protect the unique integrity of the virtual machine reconstruction for frequent target alarms that may occur in the system. Therefore, before the virtual machine repair module 12 of the first virtual machine device in the embodiment initiates the process of rebuilding the second virtual machine device, the virtual machine repair module 12 may further determine whether there is an unprocessed virtual machine rebuild process. If it is determined that there is currently an unprocessed virtual machine re-establishment process, the creation instruction of the second virtual machine device 2 may be initiated after the preset duration is delayed, or the target alarm may be re-detected.

It should be understood that, in this embodiment, the active/standby switching relationship between the first virtual machine device 1 and the second virtual machine device 2 is dynamically changeable, and the first virtual machine device 1 serves as a standby virtual machine, and the second virtual machine device 2 As the primary virtual machine, the second virtual machine device 2 can also perform all the functions of the first virtual machine device 1, including but not limited to functions such as alarm detection and virtual machine repair. The first virtual machine device 1 has all the functions of the second virtual machine device 2 described above, including alarm reporting and fault detection using a third-party monitoring module.

In this embodiment, the internal fault of the virtual machine system can be discovered by the virtual machine self-detection, and the re-emergence of the virtual machine can also be initiated by the virtual machine itself. The re-emergence of the virtual machine can also be determined by the service function network element of the virtual machine. The dependency on the management node is more convenient, flexible, reliable, and more scalable.

Embodiment 3

The related embodiments of the present disclosure are described by way of several examples of use under the telecommunication NFV protocol specification.

Referring to FIG. 7, the figure shows that the VNF instance deployed in the dual-machine mode is rebuilt and self-healed after the service function is abnormal, and includes steps 701-705.

In step 701, the virtual machine A serves as the primary virtual machine, and the virtual machine B serves as the standby virtual machine. The virtual machine A polls and detects its own business process. When the key service process of the virtual machine A is found to be lost, the active and standby virtual machine switches are performed. Switch virtual machine B to the primary virtual machine;

In step 702, the virtual machine B starts and detects the virtual machine A state, and if the virtual machine A is not in the bit or the state is abnormal, a fatal alarm is generated;

In step 703, after the virtual machine B finds the fatal alarm, the virtual machine B initiates an instruction to delete the virtual machine A to the management node;

In step 704, after the virtual machine A is successfully deleted, the virtual machine B initiates a new virtual machine (called virtual machine C) according to the resource setting of the original virtual machine A according to the reconfiguration policy corresponding to the alarm type of the service function node. Re-creation;

In step 705, after the virtual machine C is successfully created, the virtual machine C is started, and the service data is synchronized from the virtual machine B to become a new standby virtual machine.

In the above step 704, the virtual machine B can also initiate the re-creation of the new virtual machine (referred to as virtual machine C) with the adjusted resource setting of the virtual machine A. At this time, after the virtual machine C is successfully created, the virtual machine C is successfully started. After the virtual machine B initiates the active/standby switchover, the virtual machine C is switched to the virtual machine, and the virtual machine B is switched to the standby virtual machine. Then, the virtual machine C initiates the instruction to delete the virtual machine B to the management node, and the virtual machine B deletes successfully. Then, the virtual machine C is set up to initiate the re-creation of the new virtual machine (referred to as virtual machine D); after the virtual machine D is successfully created, the virtual machine D is started, and the service data is synchronized from the virtual machine C to become a new standby machine.

Referring to FIG. 8, the figure shows that the VNF instance host deployed by the dual-machine is re-healed after being abnormally interrupted, and includes steps 801-step 805.

In step 801, virtual machine A is used as the primary virtual machine, and virtual machine B is used as the standby virtual machine. When abnormal interrupt occurs, virtual machine A initiates a master-slave switchover, and virtual machine B is switched to be the primary virtual machine. Switch to a standby virtual machine;

In step 802, the virtual machine B starts and detects the state of the standby virtual machine A, and detects whether a fatal alarm sent by the virtual machine A is found.

In step 803, after discovering the fatal alarm, the virtual machine B initiates an instruction to delete the virtual machine A to the management node;

In step 804, after the virtual machine A is successfully deleted, the virtual machine B initiates the re-creation of the new virtual machine (called the virtual machine C) according to the resource setting of the original virtual machine A according to the re-establishment policy corresponding to the alarm type.

In step 805, after the virtual machine C is successfully created, the virtual machine C is started, and the service data is synchronized from the virtual machine B to become a new standby virtual machine.

Referring to FIG. 9 , the figure shows a VNF instance standby virtual machine that is deployed in a dual-machine deployment and is self-healing after the abnormality is interrupted, and includes steps 901 to 904.

In step 901, the virtual machine B is used as the standby virtual machine, and the virtual machine A is used as the primary virtual machine. When the abnormal interrupt occurs, the virtual machine B sends a failure notification to the virtual machine A.

In step 902, after discovering that the fault notification records the fatal alarm, the virtual machine A initiates deletion of the virtual machine B to the management node;

In step 903, after the virtual machine B is successfully deleted, the virtual machine A initiates the re-creation of the new virtual machine (referred to as virtual machine C) according to the resource setting of the original virtual machine B according to the re-establishment policy corresponding to the alarm type.

In step 904, after the virtual machine C is successfully created, the virtual machine C is started, and the service data is synchronized from the virtual machine A to become a new standby virtual machine.

Embodiment 4

The embodiment further provides a non-transitory computer readable storage medium storing computer executable instructions executable by any of the virtual machine repair methods.

Embodiment 5

The embodiment also provides an electronic device. Referring to FIG. 10, the electronic device is applied to the first virtual machine device, and includes:

One or more processors 1001, one processor 1001 is taken as an example in FIG. 10;

Memory 1002;

The electronic device may further include: an input device 1003 and an output device 1004.

The processor 1001, the memory 1002, the input device 1003, and the output device 1004 in the electronic device may be connected by a bus or other means, and the connection through the bus is taken as an example in FIG.

The memory 1002 is a non-transitory computer readable storage medium that can be used to store software programs, The computer executable program and the module, such as the program, the instruction or the module corresponding to the virtual machine repair method in the embodiment of the present disclosure (for example, the alarm detecting module 11 and the virtual machine repairing module 12 shown in FIG. 5). The processor 1001 executes a plurality of functional applications and data processing by executing software programs, instructions or modules stored in the memory 1002, that is, implementing the virtual machine repairing method of the above method embodiments.

The memory 1002 can include a storage program area and a storage data area, wherein the storage program area can store an operating system, an application required by at least one function, and exemplarily, a preset reconstruction strategy can be stored in the storage program area of the electronic device. The storage data area can store data and the like created according to the use of the electronic device. Further, the memory 1002 may include a high speed random access memory, and may also include a nonvolatile memory such as at least one magnetic disk storage device, flash memory device, or other nonvolatile solid state storage device. In some embodiments, the memory 1002 can optionally include a memory remotely located relative to the processor 1001 that can be connected to the electronic device over a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 1003 can be configured to receive input numeric or character information and to generate key signal inputs related to user settings and function control of the electronic device. The output device 1004 can include a display device such as a display screen.

The electronic device of the embodiment of the present disclosure may further include a communication device 1005 that transmits and/or receives information through a communication network, for example, the electronic device may receive a target alarm sent by the second virtual machine device through the communication device 1005, and may also pass the communication device 1005. An instruction to delete the second virtual machine device is sent to the virtual machine management node.

It will be apparent to those skilled in the art that the above-described modules or steps of the embodiments of the present disclosure may be implemented by a general-purpose computing device, which may be centralized on a single computing device or distributed over a network of multiple computing devices. Alternatively, they may be implemented by program code executable by the computing device such that they may be stored in a storage medium (ROM/RAM, diskette, optical disk) by a computing device, and in some cases The steps shown or described may be performed in a different order than that herein, or they may be separately fabricated into a plurality of integrated circuit modules, or a plurality of the modules or steps may be implemented as a single integrated circuit module. Therefore, embodiments of the present disclosure are not limited to any particular combination of hardware and software.

The above is a description of the present disclosure in connection with the optional embodiments, and it is not considered that the optional implementation of the present disclosure is limited to the description.

Industrial applicability

The virtual machine repairing method, the virtual machine device, the system, and the service function network element provided by the embodiment of the present disclosure, the internal fault of the virtual machine system can be discovered by the virtual machine self-detection, and the internal fault in the dual virtual machine can be discovered in time; When the first virtual machine detects that the second virtual machine device generates the target alarm, the second virtual machine can be directly repaired, and the dependence on the upper management node is released, the reliability is better, and the fault repair efficiency is high.

Claims

A virtual machine repair method includes:

The first virtual machine device in the dual virtual machine detects that the second virtual machine device generates a target alarm that requires virtual machine repair;

The first virtual machine device initiates repairing the second virtual machine device.
The virtual machine repairing method according to claim 1, wherein the detecting, by the first virtual machine device, the second virtual machine device to generate the target alarm comprises: the first virtual machine device receiving the second virtual machine device A failure notification sent when an abnormal interruption occurs;

The first virtual machine device determines that the second virtual machine device generates a target alarm according to the failure notification.
The virtual machine repairing method according to claim 1, wherein the first virtual machine device detects that the second virtual machine device generates the target alarm, and further includes: the first virtual machine device is switched from the standby virtual machine to the primary virtual machine, The second virtual machine device is switched from the primary virtual machine to the standby virtual machine.
The virtual machine repairing method of claim 3, wherein the detecting, by the first virtual machine device, that the second virtual machine device generates a target alarm that requires virtual machine repair includes:

After the first virtual machine device is switched to be the primary virtual machine by the standby virtual machine, detecting whether the second virtual machine device is normal;

When the first virtual machine device detects that the second virtual machine device is abnormal, it is determined that the second virtual machine device generates a target alarm that needs to perform virtual machine repair.
The virtual machine repairing method according to claim 3, after the first virtual machine device detects that the second virtual machine device generates a target alarm that needs to be repaired by the virtual machine, the method further includes:

Determining that the second virtual machine device has a target failure, wherein

The target failure includes at least one of an abnormal interruption failure and a fatal business function abnormality.
The virtual machine repairing method according to claim 4, wherein the detecting, by the first virtual machine device, whether the second virtual machine device is normal comprises: detecting whether the second virtual machine device is in position or in a state of abnormality.
The virtual machine repairing method according to any one of claims 1 to 6, wherein the first virtual machine device initiates repairing the second virtual machine device, including:

The first virtual machine device initiates a restart process of restarting the second virtual machine device or initiates a process of reestablishing the second virtual machine device.
The virtual machine repairing method of claim 7, before the first virtual machine device initiates a process of rebuilding the second virtual machine device, the method further includes:

Delaying when the first virtual machine device determines that there is currently an unprocessed virtual machine rebuild process After the preset duration, the process of reestablishing the second virtual machine device is initiated, or the target alarm is re-detected.
The virtual machine repairing method of claim 7, wherein the process of the first virtual machine device initiating reconstruction of the second virtual machine device comprises:

The first virtual machine device sends a delete instruction to delete the second virtual machine device to the virtual machine management node;

After the second virtual machine device is deleted, the first virtual machine device selects the second virtual device device original resource setting or the adjusted second virtual device device resource setting according to the preset reconfiguration policy. The virtual machine management node sends a virtual machine creation instruction.
The virtual machine repairing method of claim 9, after the first virtual machine device sends the virtual machine creation instruction to the virtual machine management node by using the adjusted second virtual machine device resource setting, the method further includes:

After the creation of the new second virtual machine device is completed, the first virtual machine device sends a master/slave switch instruction to the new second virtual machine device, so that the new second virtual machine device switches to the master virtual machine. Switching the first virtual machine device to a standby virtual machine;

Sending, by the new second virtual machine device, a delete instruction for deleting the first virtual machine device to the virtual machine management node;

After the first virtual machine device is deleted, the new second virtual machine device sends a virtual machine creation instruction to the virtual machine management node to create a virtual machine creation node with the resource setting of the new second virtual machine device. The new second virtual machine device has a new first virtual machine device with the same resource configuration.
A first virtual machine device includes an alarm detection module and a virtual machine repair module;

The alarm detection module is configured to detect whether the second virtual machine device generates a target alarm that needs to be repaired by the virtual machine;

The virtual machine repair module is configured to initiate repairing the second virtual machine device when the alarm detecting module detects that the second virtual machine device generates a target alarm that needs to be repaired by the virtual machine.
The first virtual machine device according to claim 11, wherein

The alarm detection module includes a first alarm detection sub-module configured to receive a failure notification sent by the second virtual machine device when the abnormality is interrupted, and determine that the second virtual machine device generates a target alarm.
The first virtual machine device according to claim 11, further comprising an active/standby switching module, configured to switch the first virtual machine device to when the second virtual machine device generates a failure and initiates an active/standby switchover The main virtual machine.
The first virtual machine device according to claim 13, wherein the alarm detecting module comprises a second alarm detecting submodule, configured to detect the first virtual machine device after switching to a primary virtual machine When the second virtual machine device is abnormal, the second virtual machine device is determined to generate a target alarm.
The first virtual machine device according to any one of claims 11-14, wherein the virtual machine repair module comprises a restart submodule or a rebuild submodule;

The restarting submodule is configured to initiate a restart process of restarting the second virtual machine device when the alarm detection module detects that the result is yes;

The reestablishing submodule is configured to initiate a process of reestablishing the second virtual machine device when the alarm detection module detects that the result is yes.
The first virtual machine device according to claim 15, wherein when the virtual machine repair module includes a rebuild submodule, the regenerated submodule includes a rebuild initiation unit and a reconstruction unit;

The reestablishment initiating unit is configured to send a delete instruction to delete the second virtual machine device to the virtual machine management node;

The reconstruction unit is configured to: after the second virtual machine device is deleted, select, according to a preset reconfiguration policy, the second virtual machine device original resource setting or the adjusted second virtual machine device resource setting to the virtual The machine management node sends a virtual machine creation instruction.
A virtual machine system comprising a second virtual machine device and the first virtual machine device according to any one of claims 11-16;

The first virtual machine device is configured to initiate repairing the second virtual machine device when detecting that the second virtual machine device generates a target alarm that requires virtual machine repair.
A service function network element comprising the virtual machine system of claim 17.
A non-transitory computer readable storage medium storing computer executable instructions for performing the virtual machine repair method of any of claims 1-10.