WO2016101486A1

WO2016101486A1 - Fault recovery method, device and computer storage medium

Info

Publication number: WO2016101486A1
Application number: PCT/CN2015/078370
Authority: WO
Inventors: 倪华
Original assignee: 中兴通讯股份有限公司
Priority date: 2014-12-22
Filing date: 2015-05-06
Publication date: 2016-06-30
Also published as: CN105790980B; CN105790980A

Abstract

Disclosed are a fault recovery method, device and computer storage medium, the method comprising: monitoring whether a VNF or a VM reports a fault; if a fault is reported, determining a fault type, the fault type comprising one of the following types: a partial VM fault, a complete VM fault or a VNF fault; and determining a fault recovery policy according to the fault type, and performing a fault recovery according to the fault recovery policy.

Description

Fault repairing method, device and computer storage medium

Technical field

The present invention relates to the field of communications, and in particular, to a fault repair method, apparatus, and computer storage medium.

Background technique

The management structure after NFV (Network Function Virtualization), as shown in Figure 1, is the architecture diagram and reference point of NFV-MANO (NFV Management and Orchestration, NFV Management and Coordinator). Among them, NFVO (Network Functions Virtualization Orchestrator) is responsible for network service lifecycle management and NFVI (Network Functions Virtualization Infrastructure) across VIM (Virtualised Infrastructure Manager). VNFM (Virtualized Network Function Manager) is responsible for the lifecycle management of VNF (Virtualized Network Function) instances. Each VNF instance assumes an associated VNFM, VIM. Responsible for controlling and managing NFVI computing, storage and network resources. The figure shows the virtual architecture, but does not show the individual VM (Virtual Machine) entities under the VIM.

Under the NFV architecture, the source of alarms can be divided into multiple types, including physical architecture (such as NFVI computing, storage, and network-related alarms), virtual infrastructure (such as virtual machine-related alarms), and application logic (for example, VNF instance-related). Alarm). The NFVI-related alarms are generated by the NFVI and reported to the VNFM or NFVO through the VIM. The virtual machine-related alarms are generated by the VIM and reported to the VNFM or NFVO. The VNF application layer alarms are generated by the VNF and reported to the VNFM or EM (Element Management, network element). management). No matter what kind of alarm, in the event of a failure, it may eventually affect the network service and need to be resolved as soon as possible. Existing repair methods All are manual repairs, but manual repairs have a long-term impact on the business and waste a lot of labor costs.

Summary of the invention

In view of this, the embodiment of the present invention is to provide a fault repairing method, a device, and a computer medium, which can solve the problem that the fault repairing methods in the NFV architecture in the prior art are all manual repairs, but the manual repair has a long-term impact on the service. The problem of wasting a lot of labor costs.

To solve the above technical problem, in one aspect, an embodiment of the present invention provides a fault repairing method, where the method includes:

Monitoring whether the VNF or the VM reports a fault; in the case of reporting the fault, determining the fault type, wherein the fault type includes one of the following: a partial VM fault, all VM faults, a VNF fault; determining a fault repair strategy according to the fault type And repairing the fault according to the fault repair strategy.

Further, in a case that the fault type is a partial VM fault, determining a fault repair policy according to the fault type, and performing fault repair according to the fault repair policy includes:

Determining whether there is a primary VM and a standby VM in the VM; if yes, performing the active/standby VM switching in the case that the faulty VM is the primary VM, and deleting the failed primary VM from the VNF, In case the fault VM is the standby VM, the faulty spare VM is deleted from the VNF; if not, the function of the VNF is reduced, and the fault VM is further removed from the VNF delete.

Further, after the faulty VM is deleted from the VNF, the method further includes: sending an allocation request to request to allocate a new VM; and adding the new VM to the VNF if the new VM is allocated in.

Further, in the case that the fault type is all VM fault or VNF fault, determining a fault repair policy according to the fault type, and performing fault repair according to the fault repair policy includes:

Performing fault repair on the VNF according to the fault repair policy, where the fault repair The complex strategy includes: performing a switching operation between the primary VNF and the standby VNF, or an internal logical repair operation; and reconstructing a new VNF if the fault repair fails.

Further, before monitoring whether the VNF and the VM are faulty, the method further includes:

Record the mapping relationship between VNF and VM.

Further, the method further includes: monitoring whether there is a fault cancellation message in real time; and reusing the VNF or VM after the fault is removed in the presence of the fault cancellation message.

Further, after the fault repairing is performed according to the fault repairing strategy, the method further includes: detecting whether the number of times the fault repairing strategy is executed reaches a preset number of times; and if the predetermined number of times is not reached, continuing according to the fault repairing strategy Performing fault repair; if the predetermined number of times is reached, the fault repair strategy is no longer executed.

On the other hand, the embodiment of the present invention further provides a fault repairing apparatus, where the apparatus includes: a monitoring module configured to monitor whether a VNF or a VM reports a fault; and a determining module configured to determine a fault type when the fault is reported The fault type includes one of the following: a partial VM fault, all VM faults, and a VNF fault; and a fault repair module configured to determine a fault repair policy according to the fault type, and perform fault repair according to the fault repair policy.

Further, the fault repairing module includes: a determining unit configured to determine whether the primary VM and the standby VM exist in the VM if the fault type is a partial VM fault; the first repairing unit is configured to exist in the presence When the primary VM and the standby VM are described, in the case that the faulty VM is the primary VM, the active and standby VMs are switched, and the faulty primary VM is deleted from the VNF, in the faulty VM. In the case of the standby VM, the failed standby VM is deleted from the VNF; and the second repairing unit is configured to reduce the VNF when the primary VM and the standby VM are not present And then deleting the faulty VM from the VNF.

Further, the fault repair module includes: a third repair unit configured to perform the fault repair strategy according to the fault repair type in the case that the fault type is all VM fault or VNF fault The VNF performs fault repair, where the fault repair strategy includes: a primary VNF and a standby VNF performing a switching operation, or an internal logical repair operation; and a reconstruction unit configured to reconstruct a new one if the fault repair fails VNF.

Further, the device further includes: a recording module configured to record a mapping relationship between the VNF and the VM.

Further, the device further includes: a first detecting module configured to monitor whether there is a fault release message in real time; and an adding module configured to reuse the VNF or VM after the fault is removed in the presence of the fault cancellation message.

Further, the device further includes: a second detecting module, configured to detect whether the number of times the fault repairing policy is executed reaches a preset number of times; and the fault repairing module is further configured to: when the predetermined number of times is reached, The fault repairing strategy is no longer continued; or, if the predetermined number of times is not reached, the fault repair is continued according to the fault repairing policy.

The embodiment of the invention further provides a computer storage medium, wherein the computer storage medium stores a computer program, and the computer program is used to execute the fault repair method described above.

The fault repairing method, device and computer medium provided by the embodiments of the present invention actively monitor whether the VNF or the VM reports a fault. In the case of reporting a fault, different fault repair strategies are automatically adopted according to the fault type, and the NFV architecture is adopted in the above process. The virtual device is automatically repaired in the event of a failure, the response time is short, and the manpower is saved, and the fault repair methods under the existing NFV architecture are all manually repaired, but the manual repair has a long-term impact on the business and wastes a large amount of The issue of labor costs.

DRAWINGS

1 is a diagram of an NFV-MANO architecture in the prior art;

2 is a flowchart of a fault repair method in an embodiment of the present invention;

3 is a schematic structural diagram of a fault repairing apparatus according to an embodiment of the present invention;

4 is a schematic diagram showing the first structure of a fault repairing device fault repairing module according to an embodiment of the present invention; Figure

FIG. 5 is a second schematic structural diagram of a fault repairing device fault repair module according to an embodiment of the present invention; FIG.

6 is a schematic diagram of a preferred structure of a fault repairing apparatus according to an embodiment of the present invention;

7 is a flowchart of a method for self-healing after a fault is reported in an alternative embodiment of the present invention;

FIG. 8 is a flowchart of a self-healing processing method after failure recovery reporting in an alternative embodiment of the present invention; FIG.

FIG. 9 is a schematic diagram of fault repair of an EM across a VNFM in an alternative embodiment of the present invention.

detailed description

The invention will be further described in detail below with reference to the drawings and embodiments. It is understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

An embodiment of the present invention provides a fault repairing method. The flowchart of the method is as shown in FIG. 2, and includes S201 to S203:

S201. Monitor whether the VNF or the VM reports a fault.

S202, in the case of reporting a fault, determining a fault type, where the fault type includes one of the following: a partial VM fault, all VM faults, and a VNF fault;

S203: Determine a fault repair policy according to the fault type, and perform fault repair according to the fault repair strategy.

The embodiment of the present invention provides a fault repairing method, which actively monitors whether a VNF or a VM reports a fault. In the case of reporting a fault, different fault repair strategies are automatically adopted according to the fault type, and the NFV architecture is used in the above process. The virtual device automatically repairs when it encounters a fault, the response time is short, and the manpower is saved. The fault repair methods under the existing NFV architecture are all manually repaired, but the manual repair has a long-term impact on the business and wastes a lot of manpower. The problem of cost.

Preferably, the VNF and the VNF can also be recorded before monitoring whether the VNF and the VM are faulty. The mapping relationship of the VM is saved and used to determine which VNF has a problem according to the mapping relationship.

During the implementation process, different fault repair strategies are adopted according to different types of faults. If the fault type is a partial VM fault, it is determined whether there is a primary VM and a standby VM in the VM; if yes, in the case that the faulty VM is the primary VM, the active and standby VMs are switched, and the faulty primary VM is removed from the VNF. Delete, in the case that the failed VM is a standby VM, the failed standby VM is deleted from the VNF; if not, the function of the VNF is reduced, and the failed VM is deleted from the VNF. Regardless of whether there is a primary VM and a standby VM, after the failed VM is removed from the VNF, an allocation request can be sent to request the allocation of a new VM; in the case of a new VM, a new VM is added to the VNF. .

If the fault type is all VM faults or VNF faults, the VNF will be in an unavailable state, and the VNF will be repaired according to the fault repair strategy. The fault repair strategy may be that the primary VNF and the standby VNF are switched (in the presence of the master) In case of standby VNF), or, internal logic repair operation. If the failure repair fails, rebuild a new VNF.

After the fault has been found, it is detected in real time whether there is a fault release message. If there is a fault cancellation message indicating that the original faulty VNF or VM is already available, the VNF or VM after the fault is re-used can be reused. In the above process, it is detected in real time whether there is a fault cancellation message, and a person skilled in the art can also set a time interval of a short time interval, and then detect whether there is a fault cancellation message according to a predetermined time interval, and the implementation manner thereof belongs to the above embodiment of the present invention. Transformations are also within the scope of the invention.

After the fault is repaired according to the fault repair policy, it is detected whether the number of times the fault repair policy is executed reaches a preset number. If the fault is not repaired after the predetermined number of times is reached, the automatic repair cannot be cancelled. If manual repair is required, the fault will not be continued. Perform a fault repair policy. If the predetermined number of times is not reached, continue to repair the fault according to the fault repair strategy.

In the implementation process, after determining the fault type, you can also set VNF for different fault types. Set different fault repair status for subsequent operations. For example, if the fault type is partial VM fault, set the fault repair status of the VNF to VNF partial fault status. In case the fault type is all VM fault or VNF fault, Set the fault repair status of the VNF to all fault states of the VNF. After removing the fault VM from the VNF, set the fault repair status of the VNF to VNF fault isolation repair.

The embodiment of the present invention further describes a computer storage medium, wherein the computer storage medium stores a computer program, and the computer program is used to execute the fault repair method shown in FIG. 2 in the embodiment of the present invention.

The embodiment of the present invention further provides a fault repairing device, which is shown in FIG. 3, and includes a monitoring module 10 configured to monitor whether a VNF or a VM reports a fault. The determining module 20 is coupled to the monitoring module 10 and configured to In the case of reporting a fault, determining the fault type, wherein the fault type includes one of the following: a partial VM fault, all VM faults, and a VNF fault; the fault repair module 30, coupled with the determining module 20, configured to determine fault repair according to the fault type Policy and fault repair based on the fault repair strategy. Preferably, the foregoing apparatus may further include a recording module, which may be disposed between the monitoring module 10 and the determining module 20, configured to record a mapping relationship between the VNF and the VM, so as to subsequently determine which VNF has a problem according to the mapping relationship.

The structure of the fault repair module 30 can be as shown in FIG. 4, including: a determining unit 301, configured to determine whether there is a primary VM and a standby VM in the VM if the fault type is a partial VM fault; The unit 302 is coupled to the determining unit 301 and configured to perform the active/standby VM switching in the case where the faulty VM is the primary VM when the primary VM and the standby VM are present, and then delete the failed primary VM from the VNF. In case the fault VM is a standby VM, the faulty standby VM is deleted from the VNF; the second repairing unit 303 is coupled with the determining unit 301, configured to reduce the function of the VNF when there is no active VM and the standby VM, and then Remove the failed VM from the VNF.

The schematic diagram of the fault repair module 30 can also be as shown in FIG. 5, including: a third repair order The element 304 is configured to perform fault repair on the VNF according to the fault repair policy in the case that the fault type is all VM fault or VNF fault, wherein the fault repair strategy includes: the primary VNF and the standby VNF perform the switching operation, or the internal logic Repair operation; reconstruction unit 305, coupled to third repair unit 304, configured to reconstruct a new VNF in the event of a failure repair failure.

In the embodiment of the present invention, the fault repair modules 30 of FIG. 4 and FIG. 5 are not combined. Of course, those skilled in the art can combine the two structures in FIG. 3 and FIG. 4 when setting. So, the fault repair module 30 can have a relatively complete function.

During the implementation, the device may also be configured as shown in FIG. 6 , and further includes: a first detecting module 40 configured to detect whether there is a fault release message in real time; the adding module 50 is coupled with the first detecting module 40 and the monitoring module 10, and configured To re-use the VNF or VM after the fault is removed in the presence of the fault release message, the second detection module 60 is coupled to the fault repair module 30, configured to detect whether the number of times the fault repair policy is executed reaches a preset number of times; The module 30 is further configured to stop performing the fault repair policy if the number of times of the fault repair policy reaches a predetermined number of times; or, if the predetermined number of times is not reached, continue to perform fault repair according to the fault repair policy. In this embodiment, the first detecting module 40 may be configured to be coupled to the monitoring module 10, or may be coupled to the fault repairing module 30, or may be independently set, and is not limited herein.

Alternative embodiment

At present, after network virtualization, a method for automatic repair of network faults that has been implemented has not been proposed. In order to solve this problem, the embodiment of the present invention proposes a fault repairing method after the network function is virtualized, so as to solve the fault as soon as possible and minimize the impact on the network service.

Generally, NFVI-related alarms are directly related to hardware. Such alarms may cause the VM to malfunction and may be completely invalid or partially invalid. In the prior art, it is possible to isolate hardware failures by virtual machines, such as some disk failures, but the entire magnetic array is shareable, from the perspective of virtual hard disks. It can also be considered that the virtual hard disk is normal, but the total capacity is reduced. Therefore, in the embodiment of the present invention, the impact of the NFVI alarm is not directly considered, and only the self-healing process after the VM failure is considered.

The fault repairing method provided by the embodiment of the present invention automatically repairs the fault as soon as possible after the virtual machine fault (referred to as VM fault) and the application logic fault (referred to as VNF fault), and ensures that the network service is not interrupted or recovered as soon as possible. The technical solution includes the following aspects:

(1) Record the mapping relationship between VNF and VM. VNFM records the mapping relationship between VNF and VM when VNF is created, deleted, or changed. A VNF may have multiple VMs.

(2) Record the fault repair status of the VNF. The fault repair status may include, but is not limited to, the following values: normal, full fault, partial fault, fault isolation repair, automatic repair failure.

(3) Using the mapping relationship between VNF and VM, VNF fault repair status, automatic repair by fault trigger or timing trigger.

(4) Fault trigger: Monitor the fault reported by the VNF and VM. Both VNF and VM may report failures. Automatic repair processing is started separately for different faults.

If a VM failure is reported, the VM failure is repaired one by one; if one or more VMs supporting a VNF fail, but not all VMs fail, the VNF status is identified as a partial failure, and the following process is performed.

a) First isolate the fault point. If the VM is already in active/standby mode and the faulty VM is the standby VM, the faulty VM is directly deleted from the VNF. If the faulty VM is the primary VM, the faulty VM is deleted from the VNF after the active and standby VMs are switched. If it is not the active and standby VMs, perform a reduction operation on the VNF to avoid affecting the existing services, and then remove the VM from the VNF.

b) After the isolation is completed, set the VNF status to fault isolation repair.

c) Try to apply for a new VM again. After the application is successful, assign the new VM to the VNF and set the VNF status to normal.

d) If the VM cannot be successfully applied due to insufficient resources or other reasons, proceed to step (6), and the fault repair module completes the fault repair.

(5) If all VMs fail, or all VMs are normal, and the VNF reports a fault, the following process is performed.

Set the VNF status to a complete fault and perform the VNF repair action. If the VNF supports the repair, the repair is called first (the repair may be an active/standby switchover or other VNF internal defined repair operation). After the repair fails, perform the rebuild VNF action. When rebuilding the VNF, you can indicate whether the original VM is used (when the original VM is normal) or apply for a new VM. If the rebuild is successful, set the VNF status to normal; if the rebuild fails, go to step (6) and complete the fault repair by the timing repair module.

(6) Monitor the VM failure recovery message, or trigger the repair operation periodically.

When the VM fails to recover, it indicates that new VM resources are available, and it can continue to be automatically repaired; or, new physical settings are added to the network, and automatic repair can be continued.

Traverse to check all VNF states, if it is a complete fault state, execute (5), try to repair again automatically; if it is part of the fault state, execute (4), try to repair again automatically; if it is fault isolation state, it is VNF Apply for a new VM. After the application is successful, assign the new VM to the VNF and set the VNF status to normal. If it still fails, continue to enter (6).

(7) The number of automatic repairs of each VNF can be preset, and when the number of automatic repairs is greater than a predetermined number of times, the automatic repair process is exited. Cannot be fixed automatically, usually requires manual repair. The maintenance personnel can query the fault recovery status of the VNF alarm and the VNF. When the fault repair status of the VNF is “automatic repair failure”, manual processing can be performed.

(8) Whether the automatic repair or manual repair, when the VNF alarm is restored, the fault repair status of the VNF is set to normal.

According to the above method, the following fault repair device after the network function is virtualized can be realized:

The device includes:

The fault repair information recording module is responsible for recording the mapping relationship between the VNF and the VM, and recording the fault repair status of the VNF.

Automatic fault repair module (equivalent to monitoring module, determining module and fault repair module): supervision After hearing the fault reported by the VM or VNF, it initiates an automatic repair of the fault and updates the fault repair status of the VNF according to the repair situation.

The fault timing repairing module (corresponding to the process of detecting by the first detecting module and the second detecting module): monitoring the fault recovery message of the VM, or periodically traversing the fault repairing state of the VNF, and using the fault automatic repairing module to the VNF in the fault state Continue to implement automatic repair.

The above fault repair device can be deployed in VNFM or in EM or NFVO. It can be deployed in EM or NFVO to manage multiple VNFMs. If there is insufficient resources or other failures under one VNFM, you can rebuild the VNF in another VNFM. The fault repair information recording module, if deployed in the EM, requires the VNFM to carry the identification information of the corresponding VM when issuing the VNF addition, deletion, and modification notification.

The above embodiments will be further described in detail below with reference to the accompanying drawings.

Embodiment 1

The implementation of the automatic repair processing after the fault is reported in the NFV according to the embodiment of the present invention is as follows. It is assumed that the fault repairing apparatus is deployed on the VNFM, and the process includes S701 to S710.

S701. The fault automatic repair module monitors faults reported by the VM and the VNF, and determines the fault type.

After receiving the faults of the VM and the VNF, the fault information record is searched for the mapping relationship between the VNF and the VM and the fault handling state of the VNF.

S702. If the VM is faulty, determine whether all VMs are faulty. If yes, execute S707, otherwise execute S703.

S703, only some VMs in the corresponding VNF are faulty, and the remaining VMs can still provide services normally. First, the impact of the faulty VM on the service is shielded. The VNFM can send a shrink request to the VNF, and the parameter carries the faulty VM that you want to uninstall. After the VNF is reduced, it is equivalent to isolating the faulty device point, and the provided network service may be degraded, but the basic functions are still guaranteed. After successful shrinking, VNFM sets the VNF state to fault isolation repair.

The isolated faulty VM will be managed in the VIM. VIM can further analyze the true source of VM failures, hardware-related failures caused by NFVI, or VM logic objects themselves. According to the analysis results, self-healing or manual processing by the user is adopted. This section is outside the scope of the present invention.

S704, after isolating the faulty VM, the VNFM re-issues the request, requests to extend the VNF, requests to allocate a new VM to the VNF, and determines whether to allocate a new VM. If yes, then execute S705, otherwise S706.

S705. If the system has sufficient resources, the application extension may succeed. After the VM is successfully allocated, the VNF state is set to a normal state, and then S710 is performed.

S706. If the application extension fails due to insufficient resources or other reasons, set a timer, wait for the next automatic repair (the process will be described in Embodiment 2), and then execute S710.

S707. If the fault type is all VM faults or VNF faults, perform fault repair on the VNF according to the fault repair policy. Specifically, if the VM is faulty and all the VMs of the VNF are faulty after the query, the VNF state is set to be a complete fault, and the repair operation of the VNF is started. The VNFM first queries whether the VNF supports the repair action. If the repair is supported, the repair is first called (the repair may be an active/standby switchover or other VNF internal defined repair operation, and the VNF may provide a repair interface, and the specific repair operation is determined internally by the VNF).

S708, after the repair fails, perform a rebuild VNF action to determine whether the reconstruction is successful. If not, execute S706, otherwise S709.

S709, the reconstruction is successful, and the VNF state is set to be normal. Execute S710.

S710, the process ends.

Embodiment 2

As shown in the flowchart of FIG. 8, the automatic repair processing after the fault recovery is reported in the NFV of the present invention is as follows. It is assumed that the fault repairing apparatus is deployed on the VNFM, and the process includes S801 to S810.

S801. The fault timing repair module monitors fault recovery reported by the VM and the VNF. Receive VM and VNF failure recovery, or timer arrival message, start automatic repair again.

S802. If the fault recovery of the VNF is received, execute S803.

S803: Set the fault recovery state of the VNF to be normal, and execute S810.

S804. If the fault recovery of the VM is received, or the timer expires message, execute S805.

S805, traversing to check all VNF states, and executing S806 or S808 according to the situation.

S806, if it is a complete fault state or a partial fault state, try to repair and determine whether the repair is successful. If successful, execute S803, otherwise execute S807.

For the repair process of the complete fault state and the partial fault state, refer to Embodiment 1, and details are not described herein again.

S807, waiting for the next repair operation. Execute S810.

S808. If the fault is isolated, the VNFM sends an application requesting to extend the VNF, requesting to allocate a new VM to the VNF, and determining whether the VM is successfully applied. If the system has sufficient resources, the application extension may be successful, and S809 is executed. If the resource is insufficient or other reasons, the application extension fails. Then set the timer and execute S807.

S809, setting the VNF state to a normal state, and then executing S810.

S810, the process ends.

In the above process, if the timer setting number has reached the specified upper limit, the automatic repair process fails and ends.

Embodiment 3

The automatic repair processing device after the fault recovery is reported in the NFV according to the embodiment of the present invention may be deployed on the VNFM or in the EM or NFVO. This embodiment describes an embodiment in which the automatic repair device is deployed on the EM, as shown in FIG. 9, which is similarly set in the NFVO.

The EM can receive the creation, deletion, modification, and scaling of the VNF reported from multiple VNFMs. After receiving the message, the mapping relationship between the VNF and the VM is recorded in the fault information recording module.

The EM can receive fault reporting and fault recovery of VNFs and VMs reported by multiple VNFMs, and record the fault repair status of the VNFs in the fault information recording module.

After receiving the fault reporting message of the VNF and the VM, the EM may use the procedures in the first embodiment and the second embodiment to repair the fault, but the instructions issued by the VNFM in the first embodiment and the second embodiment are required to be sent to the EM first. VNFM, which is then executed by VNFM.

For the repair process of VNF reconstruction, if the reconstruction fails in a VNFM managed by EM, the EM can find out whether the same type of VNF can be managed in other VNFMs managed by it, and if possible, try to initiate in other VNFMs. Reconstruction of the faulty VNF. As shown in FIG. 9, the specific process is: EM first sends a VNF1 stop command to VNFM1, and then VNFM2 sends a request to create VNF2, wherein the parameters of VNF2 are exactly the same as the original VNF1. If the creation is successful, the fault VNF status is normal, and the VNF1 delete command is sent to VNFM1. If VNF2 fails to be created, a VNF1 recovery command is sent to VNFM1, and the VNF is still in a fault state.

The method and apparatus provided by the embodiments of the present invention can automatically monitor VNF and VM alarms in the NFV system and attempt automatic repair. And when the automatic repair fails, the fault point can be automatically isolated. After the recovery point is restored, try automatic repair again to maintain the network service provided by VNF at the expected target.

While the preferred embodiments of the present invention have been disclosed for purposes of illustration, those skilled in the art will recognize that various modifications, additions and substitutions are possible, and the scope of the invention should not be limited to the embodiments described above.

Industrial applicability

In the embodiment of the present invention, the VNF or the VM is reported to report the fault; in the case of reporting the fault, the fault type is determined; the fault repair strategy is determined according to the fault type, and the fault repair is performed according to the fault repair strategy. In this way, in the case of reporting a fault, different fault repair strategies are automatically adopted according to the fault type. In the above process, the virtual device under the NFV architecture automatically repairs when the fault is encountered, the response time is short, and manpower is saved.

Claims

A fault repair method, the method comprising:

Monitor whether the virtualized network function VNF or virtual machine VM reports a fault;

In the case of reporting a fault, determining a fault type, wherein the fault type includes one of: a partial VM fault, a full VM fault, a VNF fault;

Determining a fault repair policy according to the fault type, and performing fault repair according to the fault repair policy.
The fault repairing method according to claim 1, wherein in the case that the fault type is a partial VM fault, the fault repair strategy is determined according to the fault type, and the fault repair is performed according to the fault repair policy, including:

Determining whether there is a primary VM and a standby VM in the VM;

If yes, in the case that the faulty VM is the active VM, perform active/standby VM switching, and then delete the failed primary VM from the VNF, where the faulty VM is the standby VM. Deleting the failed standby VM from the VNF;

If not, the function of the VNF is reduced, and the faulty VM is deleted from the VNF.
The fault repairing method of claim 2, wherein after the removing the faulty VM from the VNF, the method further comprises:

Send an allocation request to request the allocation of a new VM;

In the case of allocation to the new VM, the new VM is added to the VNF.
The fault repairing method according to claim 1, wherein in the case that the fault type is all VM fault or VNF fault, the fault repair strategy is determined according to the fault type, and the fault repair is performed according to the fault repair strategy. include:

Performing fault repair on the VNF according to the fault repairing strategy, where the fault repairing strategy includes: performing a switching operation between the primary VNF and the standby VNF, or internal logic repairing operating;

In the event that the failure repair fails, a new VNF is reconstructed.
The fault repairing method of claim 1 , wherein before the monitoring of the VNF and the VM is faulty, the method further comprises:

Record the mapping relationship between VNF and VM.
The fault repairing method according to any one of claims 1 to 5, wherein the method further comprises:

Real-time monitoring whether there is a fault cancellation message;

In the case where the failure release message is present, the VNF or VM after the failure is re-used.
The fault repairing method according to any one of claims 1 to 5, wherein after the fault repairing according to the fault repairing strategy, the method further comprises:

Detecting whether the number of times the fault repairing strategy is executed reaches a preset number of times;

If the predetermined number of times is not reached, continue to perform fault repair according to the fault repairing strategy;

In the case where the predetermined number of times is reached, the failure repair strategy is not continuously executed.
A fault repairing device, the device comprising:

The monitoring module is configured to monitor whether the VNF or the VM reports a fault;

Determining a module, configured to determine a fault type in the case of reporting a fault, wherein the fault type includes one of: a partial VM fault, a full VM fault, and a VNF fault;

The fault repair module is configured to determine a fault repair policy according to the fault type, and perform fault repair according to the fault repair policy.
The fault repairing apparatus according to claim 8, wherein the fault repairing module comprises:

a determining unit, configured to determine whether the primary VM and the standby VM exist in the VM if the fault type is a partial VM failure;

The first repairing unit is configured to perform the active/standby VM switching when the faulty VM is the active VM, and then the faulty primary VM is removed from the active VM and the standby VM. Deleting in the VNF, in case the faulty VM is the standby VM, deleting the standby VM that is faulty from the VNF;

The second repairing unit is configured to reduce the function of the VNF when the primary VM and the standby VM are not present, and then delete the faulty VM from the VNF.
The fault repairing apparatus according to claim 8, wherein the fault repairing module further comprises:

The third repairing unit is configured to perform fault repair on the VNF according to the fault repairing policy, where the fault type is all VM faults or VNF faults, where the fault repairing strategy includes: a primary VNF and a fault The standby VNF performs a switching operation, or an internal logic repair operation;

A reconstruction unit configured to reconstruct a new VNF if the failure repair fails.
The fault repair apparatus of claim 8, wherein the apparatus further comprises:

A recording module configured to record a mapping relationship between the VNF and the VM.
The fault repairing apparatus according to any one of claims 8 to 11, wherein the apparatus further comprises:

The first detecting module is configured to monitor whether there is a fault cancellation message in real time;

The module is added to be configured to reuse the VNF or VM after the failure in the presence of the fault release message.
The fault repairing apparatus according to any one of claims 8 to 11, wherein the apparatus further comprises:

The second detecting module is configured to detect whether the number of times the fault repairing strategy is executed reaches a preset number of times;

The fault repair module is further configured to not continue after the predetermined number of times is reached The fault repairing strategy is continued; or, if the predetermined number of times is not reached, the fault repair is continued according to the fault repairing strategy.
A computer storage medium having stored therein computer executable instructions for performing the method of any one of claims 1 to 7.