WO2016101486A1 - Fault recovery method, device and computer storage medium - Google Patents

Fault recovery method, device and computer storage medium Download PDF

Info

Publication number
WO2016101486A1
WO2016101486A1 PCT/CN2015/078370 CN2015078370W WO2016101486A1 WO 2016101486 A1 WO2016101486 A1 WO 2016101486A1 CN 2015078370 W CN2015078370 W CN 2015078370W WO 2016101486 A1 WO2016101486 A1 WO 2016101486A1
Authority
WO
WIPO (PCT)
Prior art keywords
fault
vnf
repair
repairing
standby
Prior art date
Application number
PCT/CN2015/078370
Other languages
French (fr)
Chinese (zh)
Inventor
倪华
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2016101486A1 publication Critical patent/WO2016101486A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/40Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using virtualisation of network functions or resources, e.g. SDN or NFV entities

Definitions

  • the present invention relates to the field of communications, and in particular, to a fault repair method, apparatus, and computer storage medium.
  • NFV Network Function Virtualization
  • NFVO Network Functions Virtualization Orchestrator
  • NFVI Network Functions Virtualization Infrastructure
  • VIM Virtualised Infrastructure Manager
  • VNFM Virtualized Network Function Manager
  • the source of alarms can be divided into multiple types, including physical architecture (such as NFVI computing, storage, and network-related alarms), virtual infrastructure (such as virtual machine-related alarms), and application logic (for example, VNF instance-related). Alarm).
  • the NFVI-related alarms are generated by the NFVI and reported to the VNFM or NFVO through the VIM.
  • the virtual machine-related alarms are generated by the VIM and reported to the VNFM or NFVO.
  • the VNF application layer alarms are generated by the VNF and reported to the VNFM or EM (Element Management, network element). management). No matter what kind of alarm, in the event of a failure, it may eventually affect the network service and need to be resolved as soon as possible.
  • Existing repair methods All are manual repairs, but manual repairs have a long-term impact on the business and waste a lot of labor costs.
  • the embodiment of the present invention is to provide a fault repairing method, a device, and a computer medium, which can solve the problem that the fault repairing methods in the NFV architecture in the prior art are all manual repairs, but the manual repair has a long-term impact on the service. The problem of wasting a lot of labor costs.
  • an embodiment of the present invention provides a fault repairing method, where the method includes:
  • the VNF or the VM reports a fault; in the case of reporting the fault, determining the fault type, wherein the fault type includes one of the following: a partial VM fault, all VM faults, a VNF fault; determining a fault repair strategy according to the fault type And repairing the fault according to the fault repair strategy.
  • determining a fault repair policy according to the fault type, and performing fault repair according to the fault repair policy includes:
  • Determining whether there is a primary VM and a standby VM in the VM if yes, performing the active/standby VM switching in the case that the faulty VM is the primary VM, and deleting the failed primary VM from the VNF, In case the fault VM is the standby VM, the faulty spare VM is deleted from the VNF; if not, the function of the VNF is reduced, and the fault VM is further removed from the VNF delete.
  • the method further includes: sending an allocation request to request to allocate a new VM; and adding the new VM to the VNF if the new VM is allocated in.
  • determining a fault repair policy according to the fault type, and performing fault repair according to the fault repair policy includes:
  • the complex strategy includes: performing a switching operation between the primary VNF and the standby VNF, or an internal logical repair operation; and reconstructing a new VNF if the fault repair fails.
  • the method further includes:
  • the method further includes: monitoring whether there is a fault cancellation message in real time; and reusing the VNF or VM after the fault is removed in the presence of the fault cancellation message.
  • the method further includes: detecting whether the number of times the fault repairing strategy is executed reaches a preset number of times; and if the predetermined number of times is not reached, continuing according to the fault repairing strategy Performing fault repair; if the predetermined number of times is reached, the fault repair strategy is no longer executed.
  • the embodiment of the present invention further provides a fault repairing apparatus, where the apparatus includes: a monitoring module configured to monitor whether a VNF or a VM reports a fault; and a determining module configured to determine a fault type when the fault is reported
  • the fault type includes one of the following: a partial VM fault, all VM faults, and a VNF fault; and a fault repair module configured to determine a fault repair policy according to the fault type, and perform fault repair according to the fault repair policy.
  • the fault repairing module includes: a determining unit configured to determine whether the primary VM and the standby VM exist in the VM if the fault type is a partial VM fault; the first repairing unit is configured to exist in the presence When the primary VM and the standby VM are described, in the case that the faulty VM is the primary VM, the active and standby VMs are switched, and the faulty primary VM is deleted from the VNF, in the faulty VM. In the case of the standby VM, the failed standby VM is deleted from the VNF; and the second repairing unit is configured to reduce the VNF when the primary VM and the standby VM are not present And then deleting the faulty VM from the VNF.
  • the fault repair module includes: a third repair unit configured to perform the fault repair strategy according to the fault repair type in the case that the fault type is all VM fault or VNF fault
  • the VNF performs fault repair, where the fault repair strategy includes: a primary VNF and a standby VNF performing a switching operation, or an internal logical repair operation; and a reconstruction unit configured to reconstruct a new one if the fault repair fails VNF.
  • the device further includes: a recording module configured to record a mapping relationship between the VNF and the VM.
  • the device further includes: a first detecting module configured to monitor whether there is a fault release message in real time; and an adding module configured to reuse the VNF or VM after the fault is removed in the presence of the fault cancellation message.
  • the device further includes: a second detecting module, configured to detect whether the number of times the fault repairing policy is executed reaches a preset number of times; and the fault repairing module is further configured to: when the predetermined number of times is reached, The fault repairing strategy is no longer continued; or, if the predetermined number of times is not reached, the fault repair is continued according to the fault repairing policy.
  • a second detecting module configured to detect whether the number of times the fault repairing policy is executed reaches a preset number of times
  • the fault repairing module is further configured to: when the predetermined number of times is reached, The fault repairing strategy is no longer continued; or, if the predetermined number of times is not reached, the fault repair is continued according to the fault repairing policy.
  • the embodiment of the invention further provides a computer storage medium, wherein the computer storage medium stores a computer program, and the computer program is used to execute the fault repair method described above.
  • the fault repairing method, device and computer medium provided by the embodiments of the present invention actively monitor whether the VNF or the VM reports a fault.
  • different fault repair strategies are automatically adopted according to the fault type, and the NFV architecture is adopted in the above process.
  • the virtual device is automatically repaired in the event of a failure, the response time is short, and the manpower is saved, and the fault repair methods under the existing NFV architecture are all manually repaired, but the manual repair has a long-term impact on the business and wastes a large amount of The issue of labor costs.
  • FIG. 1 is a diagram of an NFV-MANO architecture in the prior art
  • FIG. 2 is a flowchart of a fault repair method in an embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of a fault repairing apparatus according to an embodiment of the present invention.
  • FIG. 4 is a schematic diagram showing the first structure of a fault repairing device fault repairing module according to an embodiment of the present invention.
  • FIG. 5 is a second schematic structural diagram of a fault repairing device fault repair module according to an embodiment of the present invention.
  • FIG. 6 is a schematic diagram of a preferred structure of a fault repairing apparatus according to an embodiment of the present invention.
  • FIG. 8 is a flowchart of a self-healing processing method after failure recovery reporting in an alternative embodiment of the present invention.
  • FIG. 9 is a schematic diagram of fault repair of an EM across a VNFM in an alternative embodiment of the present invention.
  • An embodiment of the present invention provides a fault repairing method.
  • the flowchart of the method is as shown in FIG. 2, and includes S201 to S203:
  • determining a fault type where the fault type includes one of the following: a partial VM fault, all VM faults, and a VNF fault;
  • S203 Determine a fault repair policy according to the fault type, and perform fault repair according to the fault repair strategy.
  • the embodiment of the present invention provides a fault repairing method, which actively monitors whether a VNF or a VM reports a fault.
  • different fault repair strategies are automatically adopted according to the fault type, and the NFV architecture is used in the above process.
  • the virtual device automatically repairs when it encounters a fault, the response time is short, and the manpower is saved.
  • the fault repair methods under the existing NFV architecture are all manually repaired, but the manual repair has a long-term impact on the business and wastes a lot of manpower. The problem of cost.
  • the VNF and the VNF can also be recorded before monitoring whether the VNF and the VM are faulty.
  • the mapping relationship of the VM is saved and used to determine which VNF has a problem according to the mapping relationship.
  • the fault type is a partial VM fault
  • Delete in the case that the failed VM is a standby VM, the failed standby VM is deleted from the VNF; if not, the function of the VNF is reduced, and the failed VM is deleted from the VNF.
  • an allocation request can be sent to request the allocation of a new VM; in the case of a new VM, a new VM is added to the VNF. .
  • the VNF will be in an unavailable state, and the VNF will be repaired according to the fault repair strategy.
  • the fault repair strategy may be that the primary VNF and the standby VNF are switched (in the presence of the master) In case of standby VNF), or, internal logic repair operation. If the failure repair fails, rebuild a new VNF.
  • the fault repair policy After the fault is repaired according to the fault repair policy, it is detected whether the number of times the fault repair policy is executed reaches a preset number. If the fault is not repaired after the predetermined number of times is reached, the automatic repair cannot be cancelled. If manual repair is required, the fault will not be continued. Perform a fault repair policy. If the predetermined number of times is not reached, continue to repair the fault according to the fault repair strategy.
  • VNF VNF for different fault types.
  • Set different fault repair status for subsequent operations For example, if the fault type is partial VM fault, set the fault repair status of the VNF to VNF partial fault status. In case the fault type is all VM fault or VNF fault, Set the fault repair status of the VNF to all fault states of the VNF. After removing the fault VM from the VNF, set the fault repair status of the VNF to VNF fault isolation repair.
  • the embodiment of the present invention further describes a computer storage medium, wherein the computer storage medium stores a computer program, and the computer program is used to execute the fault repair method shown in FIG. 2 in the embodiment of the present invention.
  • the embodiment of the present invention further provides a fault repairing device, which is shown in FIG. 3, and includes a monitoring module 10 configured to monitor whether a VNF or a VM reports a fault.
  • the determining module 20 is coupled to the monitoring module 10 and configured to In the case of reporting a fault, determining the fault type, wherein the fault type includes one of the following: a partial VM fault, all VM faults, and a VNF fault; the fault repair module 30, coupled with the determining module 20, configured to determine fault repair according to the fault type Policy and fault repair based on the fault repair strategy.
  • the foregoing apparatus may further include a recording module, which may be disposed between the monitoring module 10 and the determining module 20, configured to record a mapping relationship between the VNF and the VM, so as to subsequently determine which VNF has a problem according to the mapping relationship.
  • a recording module which may be disposed between the monitoring module 10 and the determining module 20, configured to record a mapping relationship between the VNF and the VM, so as to subsequently determine which VNF has a problem according to the mapping relationship.
  • the structure of the fault repair module 30 can be as shown in FIG. 4, including: a determining unit 301, configured to determine whether there is a primary VM and a standby VM in the VM if the fault type is a partial VM fault;
  • the unit 302 is coupled to the determining unit 301 and configured to perform the active/standby VM switching in the case where the faulty VM is the primary VM when the primary VM and the standby VM are present, and then delete the failed primary VM from the VNF.
  • the fault VM is a standby VM
  • the faulty standby VM is deleted from the VNF
  • the second repairing unit 303 is coupled with the determining unit 301, configured to reduce the function of the VNF when there is no active VM and the standby VM, and then Remove the failed VM from the VNF.
  • the schematic diagram of the fault repair module 30 can also be as shown in FIG. 5, including: a third repair order
  • the element 304 is configured to perform fault repair on the VNF according to the fault repair policy in the case that the fault type is all VM fault or VNF fault, wherein the fault repair strategy includes: the primary VNF and the standby VNF perform the switching operation, or the internal logic Repair operation; reconstruction unit 305, coupled to third repair unit 304, configured to reconstruct a new VNF in the event of a failure repair failure.
  • the fault repair modules 30 of FIG. 4 and FIG. 5 are not combined.
  • those skilled in the art can combine the two structures in FIG. 3 and FIG. 4 when setting. So, the fault repair module 30 can have a relatively complete function.
  • the device may also be configured as shown in FIG. 6 , and further includes: a first detecting module 40 configured to detect whether there is a fault release message in real time; the adding module 50 is coupled with the first detecting module 40 and the monitoring module 10, and configured To re-use the VNF or VM after the fault is removed in the presence of the fault release message, the second detection module 60 is coupled to the fault repair module 30, configured to detect whether the number of times the fault repair policy is executed reaches a preset number of times; The module 30 is further configured to stop performing the fault repair policy if the number of times of the fault repair policy reaches a predetermined number of times; or, if the predetermined number of times is not reached, continue to perform fault repair according to the fault repair policy.
  • the first detecting module 40 may be configured to be coupled to the monitoring module 10, or may be coupled to the fault repairing module 30, or may be independently set, and is not limited herein.
  • the embodiment of the present invention proposes a fault repairing method after the network function is virtualized, so as to solve the fault as soon as possible and minimize the impact on the network service.
  • NFVI-related alarms are directly related to hardware. Such alarms may cause the VM to malfunction and may be completely invalid or partially invalid.
  • the fault repairing method provided by the embodiment of the present invention automatically repairs the fault as soon as possible after the virtual machine fault (referred to as VM fault) and the application logic fault (referred to as VNF fault), and ensures that the network service is not interrupted or recovered as soon as possible.
  • VM fault virtual machine fault
  • VNF fault application logic fault
  • VNFM records the mapping relationship between VNF and VM when VNF is created, deleted, or changed.
  • a VNF may have multiple VMs.
  • the fault repair status may include, but is not limited to, the following values: normal, full fault, partial fault, fault isolation repair, automatic repair failure.
  • a VM failure is reported, the VM failure is repaired one by one; if one or more VMs supporting a VNF fail, but not all VMs fail, the VNF status is identified as a partial failure, and the following process is performed.
  • step (6) If the VM cannot be successfully applied due to insufficient resources or other reasons, proceed to step (6), and the fault repair module completes the fault repair.
  • VNF status to a complete fault and perform the VNF repair action. If the VNF supports the repair, the repair is called first (the repair may be an active/standby switchover or other VNF internal defined repair operation). After the repair fails, perform the rebuild VNF action. When rebuilding the VNF, you can indicate whether the original VM is used (when the original VM is normal) or apply for a new VM. If the rebuild is successful, set the VNF status to normal; if the rebuild fails, go to step (6) and complete the fault repair by the timing repair module.
  • the VM fails to recover, it indicates that new VM resources are available, and it can continue to be automatically repaired; or, new physical settings are added to the network, and automatic repair can be continued.
  • the number of automatic repairs of each VNF can be preset, and when the number of automatic repairs is greater than a predetermined number of times, the automatic repair process is exited. Cannot be fixed automatically, usually requires manual repair. The maintenance personnel can query the fault recovery status of the VNF alarm and the VNF. When the fault repair status of the VNF is “automatic repair failure”, manual processing can be performed.
  • the following fault repair device after the network function is virtualized can be realized:
  • the device includes:
  • the fault repair information recording module is responsible for recording the mapping relationship between the VNF and the VM, and recording the fault repair status of the VNF.
  • Automatic fault repair module (equivalent to monitoring module, determining module and fault repair module): supervision After hearing the fault reported by the VM or VNF, it initiates an automatic repair of the fault and updates the fault repair status of the VNF according to the repair situation.
  • the fault timing repairing module (corresponding to the process of detecting by the first detecting module and the second detecting module): monitoring the fault recovery message of the VM, or periodically traversing the fault repairing state of the VNF, and using the fault automatic repairing module to the VNF in the fault state Continue to implement automatic repair.
  • the above fault repair device can be deployed in VNFM or in EM or NFVO. It can be deployed in EM or NFVO to manage multiple VNFMs. If there is insufficient resources or other failures under one VNFM, you can rebuild the VNF in another VNFM.
  • the fault repair information recording module if deployed in the EM, requires the VNFM to carry the identification information of the corresponding VM when issuing the VNF addition, deletion, and modification notification.
  • the implementation of the automatic repair processing after the fault is reported in the NFV according to the embodiment of the present invention is as follows. It is assumed that the fault repairing apparatus is deployed on the VNFM, and the process includes S701 to S710.
  • the fault automatic repair module monitors faults reported by the VM and the VNF, and determines the fault type.
  • the fault information record is searched for the mapping relationship between the VNF and the VM and the fault handling state of the VNF.
  • VNFM can send a shrink request to the VNF, and the parameter carries the faulty VM that you want to uninstall. After the VNF is reduced, it is equivalent to isolating the faulty device point, and the provided network service may be degraded, but the basic functions are still guaranteed. After successful shrinking, VNFM sets the VNF state to fault isolation repair.
  • VIM can further analyze the true source of VM failures, hardware-related failures caused by NFVI, or VM logic objects themselves. According to the analysis results, self-healing or manual processing by the user is adopted. This section is outside the scope of the present invention.
  • the VNFM re-issues the request, requests to extend the VNF, requests to allocate a new VM to the VNF, and determines whether to allocate a new VM. If yes, then execute S705, otherwise S706.
  • the application extension may succeed. After the VM is successfully allocated, the VNF state is set to a normal state, and then S710 is performed.
  • the fault type is all VM faults or VNF faults
  • the VNFM first queries whether the VNF supports the repair action. If the repair is supported, the repair is first called (the repair may be an active/standby switchover or other VNF internal defined repair operation, and the VNF may provide a repair interface, and the specific repair operation is determined internally by the VNF).
  • the automatic repair processing after the fault recovery is reported in the NFV of the present invention is as follows. It is assumed that the fault repairing apparatus is deployed on the VNFM, and the process includes S801 to S810.
  • the fault timing repair module monitors fault recovery reported by the VM and the VNF. Receive VM and VNF failure recovery, or timer arrival message, start automatic repair again.
  • the VNFM sends an application requesting to extend the VNF, requesting to allocate a new VM to the VNF, and determining whether the VM is successfully applied. If the system has sufficient resources, the application extension may be successful, and S809 is executed. If the resource is insufficient or other reasons, the application extension fails. Then set the timer and execute S807.
  • the automatic repair processing device after the fault recovery is reported in the NFV may be deployed on the VNFM or in the EM or NFVO.
  • This embodiment describes an embodiment in which the automatic repair device is deployed on the EM, as shown in FIG. 9, which is similarly set in the NFVO.
  • the EM can receive the creation, deletion, modification, and scaling of the VNF reported from multiple VNFMs. After receiving the message, the mapping relationship between the VNF and the VM is recorded in the fault information recording module.
  • the EM can receive fault reporting and fault recovery of VNFs and VMs reported by multiple VNFMs, and record the fault repair status of the VNFs in the fault information recording module.
  • the EM may use the procedures in the first embodiment and the second embodiment to repair the fault, but the instructions issued by the VNFM in the first embodiment and the second embodiment are required to be sent to the EM first. VNFM, which is then executed by VNFM.
  • VNFM For the repair process of VNF reconstruction, if the reconstruction fails in a VNFM managed by EM, the EM can find out whether the same type of VNF can be managed in other VNFMs managed by it, and if possible, try to initiate in other VNFMs. Reconstruction of the faulty VNF. As shown in FIG. 9, the specific process is: EM first sends a VNF1 stop command to VNFM1, and then VNFM2 sends a request to create VNF2, wherein the parameters of VNF2 are exactly the same as the original VNF1. If the creation is successful, the fault VNF status is normal, and the VNF1 delete command is sent to VNFM1. If VNF2 fails to be created, a VNF1 recovery command is sent to VNFM1, and the VNF is still in a fault state.
  • the method and apparatus provided by the embodiments of the present invention can automatically monitor VNF and VM alarms in the NFV system and attempt automatic repair. And when the automatic repair fails, the fault point can be automatically isolated. After the recovery point is restored, try automatic repair again to maintain the network service provided by VNF at the expected target.
  • the VNF or the VM is reported to report the fault; in the case of reporting the fault, the fault type is determined; the fault repair strategy is determined according to the fault type, and the fault repair is performed according to the fault repair strategy. In this way, in the case of reporting a fault, different fault repair strategies are automatically adopted according to the fault type.
  • the virtual device under the NFV architecture automatically repairs when the fault is encountered, the response time is short, and manpower is saved.

Abstract

Disclosed are a fault recovery method, device and computer storage medium, the method comprising: monitoring whether a VNF or a VM reports a fault; if a fault is reported, determining a fault type, the fault type comprising one of the following types: a partial VM fault, a complete VM fault or a VNF fault; and determining a fault recovery policy according to the fault type, and performing a fault recovery according to the fault recovery policy.

Description

一种故障修复方法、装置及计算机存储介质Fault repairing method, device and computer storage medium 技术领域Technical field
本发明涉及通讯领域,特别是涉及一种故障修复方法、装置及计算机存储介质。The present invention relates to the field of communications, and in particular, to a fault repair method, apparatus, and computer storage medium.
背景技术Background technique
NFV(Network Function Virtualization,网络功能虚拟化)后的管理架构,如图1所示,NFV-MANO(NFV Management and Orchestration,NFV管理和协调器)的架构图及参考点。其中,NFVO(Network Functions Virtualization Orchestrator,NFV协调器)负责网络服务的生命周期管理、跨VIM(Virtualised Infrastructure Manager,虚拟化基础设施管理者)的NFVI(Network Functions Virtualization Infrastructure,网络功能虚拟化基础设施)资源调度等功能,VNFM(Virtualised Network Function Manager,虚拟化网络功能管理者)负责VNF(Virtualised Network Function,虚拟化网络功能)实例的生命周期管理,每个VNF实例假设都有一个关联的VNFM,VIM负责控制和管理NFVI计算,存储和网络资源,其中,图中示出的是虚拟架构,但并未示出VIM下的各个VM(Virtual Machine,虚拟机)实体。The management structure after NFV (Network Function Virtualization), as shown in Figure 1, is the architecture diagram and reference point of NFV-MANO (NFV Management and Orchestration, NFV Management and Coordinator). Among them, NFVO (Network Functions Virtualization Orchestrator) is responsible for network service lifecycle management and NFVI (Network Functions Virtualization Infrastructure) across VIM (Virtualised Infrastructure Manager). VNFM (Virtualized Network Function Manager) is responsible for the lifecycle management of VNF (Virtualized Network Function) instances. Each VNF instance assumes an associated VNFM, VIM. Responsible for controlling and managing NFVI computing, storage and network resources. The figure shows the virtual architecture, but does not show the individual VM (Virtual Machine) entities under the VIM.
在NFV的架构下,告警的来源可以分成多种,包括物理架构(比如,NFVI的计算、存储和网络相关告警)、虚拟架构(比如,虚拟机相关告警)、应用逻辑(比如,VNF实例相关的告警)。其中,NFVI相关告警由NFVI产生并通过VIM上报给VNFM或NFVO,虚拟机相关告警由VIM产生并上报给VNFM或NFVO,VNF应用层告警由VNF产生并上报给VNFM或EM(Element Management,网元管理)。无论是什么样的告警,一旦出现故障,最终都可能对网络服务产生影响,需要尽快解决。现有的修复方式 都是人工修复,但人工修复对业务的影响时间长,也浪费大量的人力成本。Under the NFV architecture, the source of alarms can be divided into multiple types, including physical architecture (such as NFVI computing, storage, and network-related alarms), virtual infrastructure (such as virtual machine-related alarms), and application logic (for example, VNF instance-related). Alarm). The NFVI-related alarms are generated by the NFVI and reported to the VNFM or NFVO through the VIM. The virtual machine-related alarms are generated by the VIM and reported to the VNFM or NFVO. The VNF application layer alarms are generated by the VNF and reported to the VNFM or EM (Element Management, network element). management). No matter what kind of alarm, in the event of a failure, it may eventually affect the network service and need to be resolved as soon as possible. Existing repair methods All are manual repairs, but manual repairs have a long-term impact on the business and waste a lot of labor costs.
发明内容Summary of the invention
有鉴于此,本发明实施例期望提供一种故障修复方法、装置及计算机介质,能解决现有技术中NFV架构下的故障修复方式都是人工修复,但人工修复对业务的影响时间长,也浪费大量的人力成本的问题。In view of this, the embodiment of the present invention is to provide a fault repairing method, a device, and a computer medium, which can solve the problem that the fault repairing methods in the NFV architecture in the prior art are all manual repairs, but the manual repair has a long-term impact on the service. The problem of wasting a lot of labor costs.
为解决上述技术问题,一方面,本发明实施例提供一种故障修复方法,所述方法包括:To solve the above technical problem, in one aspect, an embodiment of the present invention provides a fault repairing method, where the method includes:
监控VNF或VM是否上报故障;在上报故障的情况下,确定故障类型,其中,所述故障类型包括以下之一:部分VM故障,全部VM故障,VNF故障;按照所述故障类型确定故障修复策略,并根据所述故障修复策略进行故障修复。Monitoring whether the VNF or the VM reports a fault; in the case of reporting the fault, determining the fault type, wherein the fault type includes one of the following: a partial VM fault, all VM faults, a VNF fault; determining a fault repair strategy according to the fault type And repairing the fault according to the fault repair strategy.
进一步地,在所述故障类型为部分VM故障的情况下,按照所述故障类型确定故障修复策略,并根据所述故障修复策略进行故障修复包括:Further, in a case that the fault type is a partial VM fault, determining a fault repair policy according to the fault type, and performing fault repair according to the fault repair policy includes:
判断VM中是否存在主用VM和备用VM;如果是,则在故障VM为所述主用VM情况下,进行主备VM倒换,再将故障的所述主用VM从所述VNF中删除,在所述故障VM为所述备用VM的情况下,将故障的所述备用VM从所述VNF中删除;如果不是,则缩小所述VNF的功能,再将所述故障VM从所述VNF中删除。Determining whether there is a primary VM and a standby VM in the VM; if yes, performing the active/standby VM switching in the case that the faulty VM is the primary VM, and deleting the failed primary VM from the VNF, In case the fault VM is the standby VM, the faulty spare VM is deleted from the VNF; if not, the function of the VNF is reduced, and the fault VM is further removed from the VNF delete.
进一步地,将故障VM从所述VNF中删除之后,还包括:发送分配请求,以请求分配新的VM;在分配所述新的VM的情况下,将所述新的VM加入到所述VNF中。Further, after the faulty VM is deleted from the VNF, the method further includes: sending an allocation request to request to allocate a new VM; and adding the new VM to the VNF if the new VM is allocated in.
进一步地,在所述故障类型为全部VM故障或VNF故障的情况下,按照所述故障类型确定故障修复策略,并根据所述故障修复策略进行故障修复包括:Further, in the case that the fault type is all VM fault or VNF fault, determining a fault repair policy according to the fault type, and performing fault repair according to the fault repair policy includes:
按照所述故障修复策略对所述VNF进行故障修复,其中,所述故障修 复策略包括:主用VNF与备用VNF进行倒换操作,或者,内部逻辑修复操作;在所述故障修复失败的情况下,重建一个新的VNF。Performing fault repair on the VNF according to the fault repair policy, where the fault repair The complex strategy includes: performing a switching operation between the primary VNF and the standby VNF, or an internal logical repair operation; and reconstructing a new VNF if the fault repair fails.
进一步地,监控VNF与VM是否存在故障之前,还包括:Further, before monitoring whether the VNF and the VM are faulty, the method further includes:
记录VNF与VM的映射关系。Record the mapping relationship between VNF and VM.
进一步地,所述方法还包括:实时监听是否存在故障解除消息;在存在所述故障解除消息的情况下,重新使用解除故障后的VNF或VM。Further, the method further includes: monitoring whether there is a fault cancellation message in real time; and reusing the VNF or VM after the fault is removed in the presence of the fault cancellation message.
进一步地,根据所述故障修复策略进行故障修复之后,还包括:检测执行所述故障修复策略的次数是否达到预设次数;在未达到所述预定次数的情况下,继续根据所述故障修复策略进行故障修复;在达到所述预定次数的情况下,不再继续执行所述故障修复策略。Further, after the fault repairing is performed according to the fault repairing strategy, the method further includes: detecting whether the number of times the fault repairing strategy is executed reaches a preset number of times; and if the predetermined number of times is not reached, continuing according to the fault repairing strategy Performing fault repair; if the predetermined number of times is reached, the fault repair strategy is no longer executed.
另一方面,本发明实施例还提供了一种故障修复装置,所述装置包括:监控模块,配置为监控VNF或VM是否上报故障;确定模块,配置为在上报故障的情况下,确定故障类型,其中,所述故障类型包括以下之一:部分VM故障,全部VM故障,VNF故障;故障修复模块,配置为按照所述故障类型确定故障修复策略,并根据所述故障修复策略进行故障修复。On the other hand, the embodiment of the present invention further provides a fault repairing apparatus, where the apparatus includes: a monitoring module configured to monitor whether a VNF or a VM reports a fault; and a determining module configured to determine a fault type when the fault is reported The fault type includes one of the following: a partial VM fault, all VM faults, and a VNF fault; and a fault repair module configured to determine a fault repair policy according to the fault type, and perform fault repair according to the fault repair policy.
进一步地,所述故障修复模块包括:判断单元,配置为在所述故障类型为部分VM故障的情况下,判断VM中是否存在主用VM和备用VM;第一修复单元,配置为在存在所述主用VM和所述备用VM时,在故障VM为所述主用VM情况下,进行主备VM倒换,再将故障的所述主用VM从所述VNF中删除,在所述故障VM为所述备用VM的情况下,将故障的所述备用VM从所述VNF中删除;第二修复单元,配置为在不存在所述主用VM和所述备用VM时,缩小所述VNF的功能,再将所述故障VM从所述VNF中删除。Further, the fault repairing module includes: a determining unit configured to determine whether the primary VM and the standby VM exist in the VM if the fault type is a partial VM fault; the first repairing unit is configured to exist in the presence When the primary VM and the standby VM are described, in the case that the faulty VM is the primary VM, the active and standby VMs are switched, and the faulty primary VM is deleted from the VNF, in the faulty VM. In the case of the standby VM, the failed standby VM is deleted from the VNF; and the second repairing unit is configured to reduce the VNF when the primary VM and the standby VM are not present And then deleting the faulty VM from the VNF.
进一步地,所述故障修复模块包括:第三修复单元,配置为在所述故障类型为全部VM故障或VNF故障的情况下,按照所述故障修复策略对所 述VNF进行故障修复,其中,所述故障修复策略包括:主用VNF与备用VNF进行倒换操作,或者,内部逻辑修复操作;重建单元,配置为在所述故障修复失败的情况下,重建一个新的VNF。Further, the fault repair module includes: a third repair unit configured to perform the fault repair strategy according to the fault repair type in the case that the fault type is all VM fault or VNF fault The VNF performs fault repair, where the fault repair strategy includes: a primary VNF and a standby VNF performing a switching operation, or an internal logical repair operation; and a reconstruction unit configured to reconstruct a new one if the fault repair fails VNF.
进一步地,所述装置还包括:记录模块,配置为记录VNF与VM的映射关系。Further, the device further includes: a recording module configured to record a mapping relationship between the VNF and the VM.
进一步地,所述装置还包括:第一检测模块,配置为实时监听是否存在故障解除消息;添加模块,配置为在存在所述故障解除消息的情况下,重新使用解除故障后的VNF或VM。Further, the device further includes: a first detecting module configured to monitor whether there is a fault release message in real time; and an adding module configured to reuse the VNF or VM after the fault is removed in the presence of the fault cancellation message.
进一步地,所述装置还包括:第二检测模块,配置为检测执行所述故障修复策略的次数是否达到预设次数;所述故障修复模块,还配置为在达到所述预定次数的情况下,不再继续执行所述故障修复策略;或者,在未达到所述预定次数的情况下,继续根据所述故障修复策略进行故障修复。Further, the device further includes: a second detecting module, configured to detect whether the number of times the fault repairing policy is executed reaches a preset number of times; and the fault repairing module is further configured to: when the predetermined number of times is reached, The fault repairing strategy is no longer continued; or, if the predetermined number of times is not reached, the fault repair is continued according to the fault repairing policy.
本发明实施例还提供了一种计算机存储介质,所述计算机存储介质中存储有计算机程序,所述计算机程序用于执行以上所述的故障修复方法。The embodiment of the invention further provides a computer storage medium, wherein the computer storage medium stores a computer program, and the computer program is used to execute the fault repair method described above.
本发明实施例提供的故障修复方法、装置及计算机介质,主动监控VNF或VM是否上报故障,在上报故障的情况下,根据故障类型来自动采取不同的故障修复策略,上述过程中NFV的架构下的虚拟设备在遇到故障时自动修复,响应时间短,还节省人力,解决了现有NFV的架构下的故障修复方式都是人工修复,但人工修复对业务的影响时间长,也浪费大量的人力成本的问题。The fault repairing method, device and computer medium provided by the embodiments of the present invention actively monitor whether the VNF or the VM reports a fault. In the case of reporting a fault, different fault repair strategies are automatically adopted according to the fault type, and the NFV architecture is adopted in the above process. The virtual device is automatically repaired in the event of a failure, the response time is short, and the manpower is saved, and the fault repair methods under the existing NFV architecture are all manually repaired, but the manual repair has a long-term impact on the business and wastes a large amount of The issue of labor costs.
附图说明DRAWINGS
图1是现有技术中NFV-MANO架构图;1 is a diagram of an NFV-MANO architecture in the prior art;
图2是本发明实施例中故障修复方法的流程图;2 is a flowchart of a fault repair method in an embodiment of the present invention;
图3是本发明实施例中故障修复装置的结构示意图;3 is a schematic structural diagram of a fault repairing apparatus according to an embodiment of the present invention;
图4是本发明实施例中故障修复装置故障修复模块的第一种结构示意 图;4 is a schematic diagram showing the first structure of a fault repairing device fault repairing module according to an embodiment of the present invention; Figure
图5是本发明实施例中故障修复装置故障修复模块的第二种结构示意图;FIG. 5 is a second schematic structural diagram of a fault repairing device fault repair module according to an embodiment of the present invention; FIG.
图6是本发明实施例中故障修复装置的优选结构示意图;6 is a schematic diagram of a preferred structure of a fault repairing apparatus according to an embodiment of the present invention;
图7是本发明可选实施例中故障上报后自愈处理方法的流程图;7 is a flowchart of a method for self-healing after a fault is reported in an alternative embodiment of the present invention;
图8是本发明可选实施例中故障恢复上报后的自愈处理方法的流程图;FIG. 8 is a flowchart of a self-healing processing method after failure recovery reporting in an alternative embodiment of the present invention; FIG.
图9是本发明可选实施例中EM的跨VNFM的故障修复示意图。FIG. 9 is a schematic diagram of fault repair of an EM across a VNFM in an alternative embodiment of the present invention.
具体实施方式detailed description
以下结合附图以及实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不限定本发明。The invention will be further described in detail below with reference to the drawings and embodiments. It is understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
本发明实施例提供了一种故障修复方法,该方法的流程图如图2所示,包括S201至S203:An embodiment of the present invention provides a fault repairing method. The flowchart of the method is as shown in FIG. 2, and includes S201 to S203:
S201,监控VNF或VM是否上报故障;S201. Monitor whether the VNF or the VM reports a fault.
S202,在上报故障的情况下,确定故障类型,其中,故障类型包括以下之一:部分VM故障,全部VM故障,VNF故障;S202, in the case of reporting a fault, determining a fault type, where the fault type includes one of the following: a partial VM fault, all VM faults, and a VNF fault;
S203,按照故障类型确定故障修复策略,并根据故障修复策略进行故障修复。S203: Determine a fault repair policy according to the fault type, and perform fault repair according to the fault repair strategy.
本发明实施例提供了一种故障修复方法,该方法主动监控VNF或VM是否上报故障,在上报故障的情况下,根据故障类型来自动采取不同的故障修复策略,上述过程中NFV的架构下的虚拟设备在遇到故障时自动修复,响应时间短,还节省人力,解决了现有NFV的架构下的故障修复方式都是人工修复,但人工修复对业务的影响时间长,也浪费大量的人力成本的问题。The embodiment of the present invention provides a fault repairing method, which actively monitors whether a VNF or a VM reports a fault. In the case of reporting a fault, different fault repair strategies are automatically adopted according to the fault type, and the NFV architecture is used in the above process. The virtual device automatically repairs when it encounters a fault, the response time is short, and the manpower is saved. The fault repair methods under the existing NFV architecture are all manually repaired, but the manual repair has a long-term impact on the business and wastes a lot of manpower. The problem of cost.
优选地,在监控VNF与VM是否存在故障之前,还可以记录VNF与 VM的映射关系,并进行保存,以便后续根据该映射关系确定哪一个VNF出现问题。Preferably, the VNF and the VNF can also be recorded before monitoring whether the VNF and the VM are faulty. The mapping relationship of the VM is saved and used to determine which VNF has a problem according to the mapping relationship.
实现过程中,根据故障类型不同,采取不同的故障修复策略。如果故障类型为部分VM故障,则判断VM中是否存在主用VM和备用VM;如果是,则在故障VM为主用VM情况下,进行主备VM倒换,再将故障的主用VM从VNF中删除,在故障VM为备用VM的情况下,将故障的备用VM从VNF中删除;如果不是,则缩小VNF的功能,再将故障VM从VNF中删除。不管是否存在主用VM和备用VM,在将故障VM从VNF中删除之后,都可以发送分配请求,来请求分配新的VM;在分配新的VM的情况下,将新的VM加入到VNF中。During the implementation process, different fault repair strategies are adopted according to different types of faults. If the fault type is a partial VM fault, it is determined whether there is a primary VM and a standby VM in the VM; if yes, in the case that the faulty VM is the primary VM, the active and standby VMs are switched, and the faulty primary VM is removed from the VNF. Delete, in the case that the failed VM is a standby VM, the failed standby VM is deleted from the VNF; if not, the function of the VNF is reduced, and the failed VM is deleted from the VNF. Regardless of whether there is a primary VM and a standby VM, after the failed VM is removed from the VNF, an allocation request can be sent to request the allocation of a new VM; in the case of a new VM, a new VM is added to the VNF. .
如果故障类型为全部VM故障或VNF故障,则都会导致VNF处于不可用状态,按照故障修复策略对VNF进行故障修复,其中,故障修复策略可以是主用VNF与备用VNF进行倒换操作(在存在主备VNF的情况下),或者,内部逻辑修复操作。如果故障修复失败的情况下,重建一个新的VNF。If the fault type is all VM faults or VNF faults, the VNF will be in an unavailable state, and the VNF will be repaired according to the fault repair strategy. The fault repair strategy may be that the primary VNF and the standby VNF are switched (in the presence of the master) In case of standby VNF), or, internal logic repair operation. If the failure repair fails, rebuild a new VNF.
在已经发现存在故障之后,实时检测是否存在故障解除消息,如果存在故障解除消息,说明原来出现故障的VNF或VM已经可以使用了,则可以重新使用解除故障后的VNF或VM。上述过程中是实时检测是否存在故障解除消息,本领域技术人员也可以设置一个时间间隔较短的时间,然后按照预定时间间隔检测是否存在故障解除消息,其实施方式属于本发明上述实施方式等价变换,也在本发明的保护范围内。After the fault has been found, it is detected in real time whether there is a fault release message. If there is a fault cancellation message indicating that the original faulty VNF or VM is already available, the VNF or VM after the fault is re-used can be reused. In the above process, it is detected in real time whether there is a fault cancellation message, and a person skilled in the art can also set a time interval of a short time interval, and then detect whether there is a fault cancellation message according to a predetermined time interval, and the implementation manner thereof belongs to the above embodiment of the present invention. Transformations are also within the scope of the invention.
在根据故障修复策略进行故障修复之后,检测执行故障修复策略的次数是否达到预设次数,如果达到预定次数仍然没有修复好,则说明自动修复无法将该故障解除,需要人工修复,便不再继续执行故障修复策略,如果没有达到预定次数,则继续根据故障修复策略进行故障修复。After the fault is repaired according to the fault repair policy, it is detected whether the number of times the fault repair policy is executed reaches a preset number. If the fault is not repaired after the predetermined number of times is reached, the automatic repair cannot be cancelled. If manual repair is required, the fault will not be continued. Perform a fault repair policy. If the predetermined number of times is not reached, continue to repair the fault according to the fault repair strategy.
在实施过程中,确定故障类型后,还可以为不同的故障类型设置为VNF 设置不同的故障修复状态,以便后续操作,例如,在故障类型为部分VM故障的情况下,设置VNF的故障修复状态为VNF部分故障状态,在故障类型为全部VM故障或VNF故障的情况下,设置VNF的故障修复状态为VNF全部故障状态,在把故障VM从VNF中删除后,设置VNF的故障修复状态为VNF故障隔离修复。In the implementation process, after determining the fault type, you can also set VNF for different fault types. Set different fault repair status for subsequent operations. For example, if the fault type is partial VM fault, set the fault repair status of the VNF to VNF partial fault status. In case the fault type is all VM fault or VNF fault, Set the fault repair status of the VNF to all fault states of the VNF. After removing the fault VM from the VNF, set the fault repair status of the VNF to VNF fault isolation repair.
本发明实施例还记载一种计算机存储介质,所述计算机存储介质中存储有计算机程序,所述计算机程序用于执行本发明实施例中图2所示的故障修复方法。The embodiment of the present invention further describes a computer storage medium, wherein the computer storage medium stores a computer program, and the computer program is used to execute the fault repair method shown in FIG. 2 in the embodiment of the present invention.
本发明实施例还提供了一种故障修复装置,其结构示意如图3所示,包括:监控模块10,配置为监控VNF或VM是否上报故障;确定模块20,与监控模块10耦合,配置为在上报故障的情况下,确定故障类型,其中,故障类型包括以下之一:部分VM故障,全部VM故障,VNF故障;故障修复模块30,与确定模块20耦合,配置为按照故障类型确定故障修复策略,并根据故障修复策略进行故障修复。优选的,上述装置还可以包括记录模块,可以设置在监控模块10和确定模块20之间,配置为记录VNF与VM的映射关系,以便后续根据该映射关系确定哪一个VNF出现问题。The embodiment of the present invention further provides a fault repairing device, which is shown in FIG. 3, and includes a monitoring module 10 configured to monitor whether a VNF or a VM reports a fault. The determining module 20 is coupled to the monitoring module 10 and configured to In the case of reporting a fault, determining the fault type, wherein the fault type includes one of the following: a partial VM fault, all VM faults, and a VNF fault; the fault repair module 30, coupled with the determining module 20, configured to determine fault repair according to the fault type Policy and fault repair based on the fault repair strategy. Preferably, the foregoing apparatus may further include a recording module, which may be disposed between the monitoring module 10 and the determining module 20, configured to record a mapping relationship between the VNF and the VM, so as to subsequently determine which VNF has a problem according to the mapping relationship.
其中,故障修复模块30的结构示意图可以如图4所示,包括:判断单元301,配置为在故障类型为部分VM故障的情况下,判断VM中是否存在主用VM和备用VM;第一修复单元302,与判断单元301耦合,配置为在存在主用VM和备用VM时,在故障VM为主用VM情况下,进行主备VM倒换,再将故障的主用VM从VNF中删除,在故障VM为备用VM的情况下,将故障的备用VM从VNF中删除;第二修复单元303,与判断单元301耦合,配置为在不存在主用VM和备用VM时,缩小VNF的功能,再将故障VM从VNF中删除。The structure of the fault repair module 30 can be as shown in FIG. 4, including: a determining unit 301, configured to determine whether there is a primary VM and a standby VM in the VM if the fault type is a partial VM fault; The unit 302 is coupled to the determining unit 301 and configured to perform the active/standby VM switching in the case where the faulty VM is the primary VM when the primary VM and the standby VM are present, and then delete the failed primary VM from the VNF. In case the fault VM is a standby VM, the faulty standby VM is deleted from the VNF; the second repairing unit 303 is coupled with the determining unit 301, configured to reduce the function of the VNF when there is no active VM and the standby VM, and then Remove the failed VM from the VNF.
故障修复模块30的结构示意图还可以如图5所示,包括:第三修复单 元304,配置为在故障类型为全部VM故障或VNF故障的情况下,按照故障修复策略对VNF进行故障修复,其中,故障修复策略包括:主用VNF与备用VNF进行倒换操作,或者,内部逻辑修复操作;重建单元305,与第三修复单元304耦合,配置为在故障修复失败的情况下,重建一个新的VNF。The schematic diagram of the fault repair module 30 can also be as shown in FIG. 5, including: a third repair order The element 304 is configured to perform fault repair on the VNF according to the fault repair policy in the case that the fault type is all VM fault or VNF fault, wherein the fault repair strategy includes: the primary VNF and the standby VNF perform the switching operation, or the internal logic Repair operation; reconstruction unit 305, coupled to third repair unit 304, configured to reconstruct a new VNF in the event of a failure repair failure.
在本发明实施例中,并未将图4和图5两种故障修复模块30进行结合设置,当然,本领域技术人员在设置时,可以将上述图3和图4中的两种结构进行结合设置,这样故障修复模块30能够具备较为完备的功能。In the embodiment of the present invention, the fault repair modules 30 of FIG. 4 and FIG. 5 are not combined. Of course, those skilled in the art can combine the two structures in FIG. 3 and FIG. 4 when setting. So, the fault repair module 30 can have a relatively complete function.
实施过程中,装置还可以如图6所示,还包括:第一检测模块40,配置为实时监听检测是否存在故障解除消息;添加模块50,与第一检测模块40和监控模块10耦合,配置为在存在故障解除消息的情况下,重新使用解除故障后的VNF或VM;第二检测模块60,与故障修复模块30耦合,配置为检测执行故障修复策略的次数是否达到预设次数;故障修复模块30,还配置为在故障修复策略的次数达到预定次数的情况下,不再继续执行故障修复策略;或者,在未达到预定次数的情况下,继续根据故障修复策略进行故障修复。在本实施例中,第一检测模块40可以与监控模块10耦合设置,也可以与故障修复模块30耦合设置,也可以独立设置,此处不进行限定。During the implementation, the device may also be configured as shown in FIG. 6 , and further includes: a first detecting module 40 configured to detect whether there is a fault release message in real time; the adding module 50 is coupled with the first detecting module 40 and the monitoring module 10, and configured To re-use the VNF or VM after the fault is removed in the presence of the fault release message, the second detection module 60 is coupled to the fault repair module 30, configured to detect whether the number of times the fault repair policy is executed reaches a preset number of times; The module 30 is further configured to stop performing the fault repair policy if the number of times of the fault repair policy reaches a predetermined number of times; or, if the predetermined number of times is not reached, continue to perform fault repair according to the fault repair policy. In this embodiment, the first detecting module 40 may be configured to be coupled to the monitoring module 10, or may be coupled to the fault repairing module 30, or may be independently set, and is not limited herein.
可选实施例Alternative embodiment
目前网络虚拟化后,还未提出可实施的网络故障自动修复的方法。本发明实施例为解决此问题,提出一种网络功能虚拟化后的故障修复方法,实现尽快解决故障、对网络服务的影响降到最低。At present, after network virtualization, a method for automatic repair of network faults that has been implemented has not been proposed. In order to solve this problem, the embodiment of the present invention proposes a fault repairing method after the network function is virtualized, so as to solve the fault as soon as possible and minimize the impact on the network service.
通常,NFVI相关告警,与硬件直接相关,此类告警可能导致VM不正常工作,可能完全失效或部分失效。现有技术下,可能做到由虚拟机来隔离硬件故障,如某些磁盘故障,但整个磁阵是可共享的,从虚拟硬盘来看, 还可以认为虚拟硬盘正常,只是总容量有所降低。因此,本发明实施例中不直接考虑NFVI告警带来的影响,仅考虑VM故障后的自愈处理。Generally, NFVI-related alarms are directly related to hardware. Such alarms may cause the VM to malfunction and may be completely invalid or partially invalid. In the prior art, it is possible to isolate hardware failures by virtual machines, such as some disk failures, but the entire magnetic array is shareable, from the perspective of virtual hard disks. It can also be considered that the virtual hard disk is normal, but the total capacity is reduced. Therefore, in the embodiment of the present invention, the impact of the NFVI alarm is not directly considered, and only the self-healing process after the VM failure is considered.
本发明实施例提供的故障修复方法在虚拟机故障(简称VM故障)、应用逻辑故障(简称VNF故障)发生时,尽快自动修复故障,保证网络服务不中断或尽快恢复,技术方案包括以下方面:The fault repairing method provided by the embodiment of the present invention automatically repairs the fault as soon as possible after the virtual machine fault (referred to as VM fault) and the application logic fault (referred to as VNF fault), and ensures that the network service is not interrupted or recovered as soon as possible. The technical solution includes the following aspects:
(1)记录VNF和VM间的映射关系。VNFM在VNF的创建、删除、变更时记录下VNF和VM的映射关系,一个VNF可能有多个VM。(1) Record the mapping relationship between VNF and VM. VNFM records the mapping relationship between VNF and VM when VNF is created, deleted, or changed. A VNF may have multiple VMs.
(2)记录VNF的故障修复状态。故障修复状态可以包括但不限于以下取值:正常、完全故障、部分故障、故障隔离修复、自动修复失败。(2) Record the fault repair status of the VNF. The fault repair status may include, but is not limited to, the following values: normal, full fault, partial fault, fault isolation repair, automatic repair failure.
(3)使用VNF和VM的映射关系、VNF故障修复状态,由故障触发或定时触发自动修复。(3) Using the mapping relationship between VNF and VM, VNF fault repair status, automatic repair by fault trigger or timing trigger.
(4)故障触发:监控VNF和VM上报的故障。VNF和VM都可能上报故障。针对不同的故障,分别开始进行自动修复处理。(4) Fault trigger: Monitor the fault reported by the VNF and VM. Both VNF and VM may report failures. Automatic repair processing is started separately for different faults.
若上报了VM故障,逐个修复VM故障;支撑某个VNF的一个或多个VM发生故障,但不是全部VM发生故障,则标识VNF状态为部分故障,其执行如下过程。If a VM failure is reported, the VM failure is repaired one by one; if one or more VMs supporting a VNF fail, but not all VMs fail, the VNF status is identified as a partial failure, and the following process is performed.
a)首先隔离故障点。若VM已经存在主备,且故障VM为备VM,则直接把故障VM从VNF中删除;若故障VM为主VM,则执行主备VM倒换后,再故障VM从VNF中删除。若非主备VM,则对VNF执行缩小操作,避免对现有业务产生影响,再把VM从VNF中删除。a) First isolate the fault point. If the VM is already in active/standby mode and the faulty VM is the standby VM, the faulty VM is directly deleted from the VNF. If the faulty VM is the primary VM, the faulty VM is deleted from the VNF after the active and standby VMs are switched. If it is not the active and standby VMs, perform a reduction operation on the VNF to avoid affecting the existing services, and then remove the VM from the VNF.
b)隔离完成后,设置VNF状态为故障隔离修复。b) After the isolation is completed, set the VNF status to fault isolation repair.
c)再尝试申请新的VM,申请成功后把新的VM分配给VNF,设置VNF状态为正常。c) Try to apply for a new VM again. After the application is successful, assign the new VM to the VNF and set the VNF status to normal.
d)若由于资源不足或其他原因,VM无法申请成功,则进入步骤(6),由定时修复模块来完成故障修复。 d) If the VM cannot be successfully applied due to insufficient resources or other reasons, proceed to step (6), and the fault repair module completes the fault repair.
(5)若全部VM发生故障,或全部VM正常,而VNF上报故障,其执行如下过程。(5) If all VMs fail, or all VMs are normal, and the VNF reports a fault, the following process is performed.
设置VNF状态为完全故障,执行VNF修复动作;若VNF支持修复,则先调用修复(修复可能是主备倒换或其他VNF内部定义的修复操作)。修复失败后,再执行重建VNF动作,重建VNF时,可指明使用原有的VM(原VM正常时)或申请新的VM。重建成功,设置VNF状态为正常;重建失败,进入步骤(6),由定时修复模块来完成故障修复。Set the VNF status to a complete fault and perform the VNF repair action. If the VNF supports the repair, the repair is called first (the repair may be an active/standby switchover or other VNF internal defined repair operation). After the repair fails, perform the rebuild VNF action. When rebuilding the VNF, you can indicate whether the original VM is used (when the original VM is normal) or apply for a new VM. If the rebuild is successful, set the VNF status to normal; if the rebuild fails, go to step (6) and complete the fault repair by the timing repair module.
(6)监控VM故障恢复消息,或定时触发执行修复操作。(6) Monitor the VM failure recovery message, or trigger the repair operation periodically.
当VM故障恢复,表示有新的VM资源可用,可以继续自动修复;或者,网络中增加了新的物理设置,也可以继续自动修复。When the VM fails to recover, it indicates that new VM resources are available, and it can continue to be automatically repaired; or, new physical settings are added to the network, and automatic repair can be continued.
遍历检查所有的VNF状态,若为完全故障状态,则执行(5),尝试再次自动修复;若为部分故障状态,则执行(4),尝试再次自动修复;若为故障隔离状态,则为VNF申请新的VM,申请成功后把新的VM分配给VNF,设置VNF状态为正常;若依然失败,则继续进入(6)。Traverse to check all VNF states, if it is a complete fault state, execute (5), try to repair again automatically; if it is part of the fault state, execute (4), try to repair again automatically; if it is fault isolation state, it is VNF Apply for a new VM. After the application is successful, assign the new VM to the VNF and set the VNF status to normal. If it still fails, continue to enter (6).
(7)可预先设置每个VNF自动修复次数,当自动修复次数大于预先指定的次数,则退出自动修复流程。无法自动修复,通常需要人工修复了。维护人员可以查询VNF告警和VNF的故障修复状态,当VNF的故障修复状态为“自动修复失败”,可以进行人工处理。(7) The number of automatic repairs of each VNF can be preset, and when the number of automatic repairs is greater than a predetermined number of times, the automatic repair process is exited. Cannot be fixed automatically, usually requires manual repair. The maintenance personnel can query the fault recovery status of the VNF alarm and the VNF. When the fault repair status of the VNF is “automatic repair failure”, manual processing can be performed.
(8)无论是自动修复,还是人工修复,VNF告警恢复时,将VNF的故障修复状态设置为正常。(8) Whether the automatic repair or manual repair, when the VNF alarm is restored, the fault repair status of the VNF is set to normal.
根据上述方法,可以实现如下网络功能虚拟化后的故障修复装置:According to the above method, the following fault repair device after the network function is virtualized can be realized:
该装置包括:The device includes:
故障修复信息记录模块:负责记录VNF和VM的映射关系,记录VNF的故障修复状态。The fault repair information recording module is responsible for recording the mapping relationship between the VNF and the VM, and recording the fault repair status of the VNF.
故障自动修复模块(相当于监控模块、确定模块和故障修复模块):监 听到VM或VNF上报的故障后,发起对故障的自动修复,并根据修复情况,更新VNF的故障修复状态。Automatic fault repair module (equivalent to monitoring module, determining module and fault repair module): supervision After hearing the fault reported by the VM or VNF, it initiates an automatic repair of the fault and updates the fault repair status of the VNF according to the repair situation.
故障定时修复模块(相当于第一检测模块和第二检测模块进行检测的过程):监听到VM的故障恢复消息,或定时遍历VNF的故障修复状态,通过故障自动修复模块对处于故障状态的VNF继续实施自动修复。The fault timing repairing module (corresponding to the process of detecting by the first detecting module and the second detecting module): monitoring the fault recovery message of the VM, or periodically traversing the fault repairing state of the VNF, and using the fault automatic repairing module to the VNF in the fault state Continue to implement automatic repair.
上述故障修复装置,既可以部署在VNFM,也可以部署在EM或NFVO中。部署在EM或NFVO中,可以实现多VNFM的管理。如果一个VNFM下的资源不足或其他故障,可以到另一个VNFM中重建VNF。其中故障修复信息记录模块,若部署在EM中,需要VNFM在发出VNF的增删改通知时,携带对应VM的标识信息。The above fault repair device can be deployed in VNFM or in EM or NFVO. It can be deployed in EM or NFVO to manage multiple VNFMs. If there is insufficient resources or other failures under one VNFM, you can rebuild the VNF in another VNFM. The fault repair information recording module, if deployed in the EM, requires the VNFM to carry the identification information of the corresponding VM when issuing the VNF addition, deletion, and modification notification.
下面结合附图对上述实施例作进一步的详细描述。The above embodiments will be further described in detail below with reference to the accompanying drawings.
实施方式一 Embodiment 1
如图7所示的流程,本发明实施例所述NFV中故障上报后自动修复处理实施方式如下,假设故障修复装置部署在VNFM上,流程包括S701至S710。The implementation of the automatic repair processing after the fault is reported in the NFV according to the embodiment of the present invention is as follows. It is assumed that the fault repairing apparatus is deployed on the VNFM, and the process includes S701 to S710.
S701,故障自动修复模块监控VM和VNF上报的故障,确定故障类型。S701. The fault automatic repair module monitors faults reported by the VM and the VNF, and determines the fault type.
其中,收到VM和VNF的故障后,在故障信息记录中查找VNF和VM间的映射关系、VNF的故障处理状态后,分别处理。After receiving the faults of the VM and the VNF, the fault information record is searched for the mapping relationship between the VNF and the VM and the fault handling state of the VNF.
S702,若为VM故障,则判断是否所有VM均故障。如果是,则执行S707,否则执行S703。S702. If the VM is faulty, determine whether all VMs are faulty. If yes, execute S707, otherwise execute S703.
S703,对应的VNF中只有部分VM故障,其余VM依然可正常提供服务。则首先屏蔽故障VM对业务带来的影响。VNFM可给VNF下发缩小请求,参数中携带希望卸载的故障VM。VNF缩小后,相当于隔离了故障设备点,提供的网络服务可能降质,但基本功能依然可保证。成功缩小后,VNFM设置VNF状态为故障隔离修复。 S703, only some VMs in the corresponding VNF are faulty, and the remaining VMs can still provide services normally. First, the impact of the faulty VM on the service is shielded. The VNFM can send a shrink request to the VNF, and the parameter carries the faulty VM that you want to uninstall. After the VNF is reduced, it is equivalent to isolating the faulty device point, and the provided network service may be degraded, but the basic functions are still guaranteed. After successful shrinking, VNFM sets the VNF state to fault isolation repair.
被隔离的故障VM,将在VIM中被管理。VIM中可以进一步分析VM故障的真正来源,是NFVI引起的硬件相关故障,还是VM逻辑对象自身问题。根据分析结果采用自愈或用户手工处理的方式解决。此部分不在本发明考虑范围内。The isolated faulty VM will be managed in the VIM. VIM can further analyze the true source of VM failures, hardware-related failures caused by NFVI, or VM logic objects themselves. According to the analysis results, self-healing or manual processing by the user is adopted. This section is outside the scope of the present invention.
S704,隔离故障VM后,VNFM再发出申请,要求扩展VNF,请求给VNF分配新的VM,判断是否分配新的VM。如果是,则执行S705,否则S706。S704, after isolating the faulty VM, the VNFM re-issues the request, requests to extend the VNF, requests to allocate a new VM to the VNF, and determines whether to allocate a new VM. If yes, then execute S705, otherwise S706.
S705,若系统有足够的资源,申请扩展可能成功,成功后分配VM,设置VNF状态为正常状态,然后执行S710。S705. If the system has sufficient resources, the application extension may succeed. After the VM is successfully allocated, the VNF state is set to a normal state, and then S710 is performed.
S706,若由于资源不足或其他原因,申请扩展失败,则设置定时器,等待下一次自动修复(该过程会在实施方式二中描述),然后执行S710。S706. If the application extension fails due to insufficient resources or other reasons, set a timer, wait for the next automatic repair (the process will be described in Embodiment 2), and then execute S710.
S707,若故障类型为全部VM故障或VNF故障,按照所述故障修复策略对所述VNF进行故障修复。具体的,若为VM故障,且查询后发现该VNF所有的VM都出现故障,则设置VNF状态为完全故障,开始执行VNF的修复动作。VNFM先查询VNF是否支持修复动作,若支持修复,先调用修复(修复可能是主备倒换或其他VNF内部定义的修复操作,VNF可以提供修复接口,具体修复操作由VNF内部决定)。S707. If the fault type is all VM faults or VNF faults, perform fault repair on the VNF according to the fault repair policy. Specifically, if the VM is faulty and all the VMs of the VNF are faulty after the query, the VNF state is set to be a complete fault, and the repair operation of the VNF is started. The VNFM first queries whether the VNF supports the repair action. If the repair is supported, the repair is first called (the repair may be an active/standby switchover or other VNF internal defined repair operation, and the VNF may provide a repair interface, and the specific repair operation is determined internally by the VNF).
S708,修复失败后再执行重建VNF动作,判断重建是否成功。如果没成功,则执行S706,否则S709。S708, after the repair fails, perform a rebuild VNF action to determine whether the reconstruction is successful. If not, execute S706, otherwise S709.
S709,重建成功,设置VNF状态为正常。执行S710。S709, the reconstruction is successful, and the VNF state is set to be normal. Execute S710.
S710,结束流程。S710, the process ends.
实施方式二 Embodiment 2
如图8所示的流程,本发明所述NFV中故障恢复上报后的自动修复处理实施方式如下,假设故障修复装置部署在VNFM上,流程包括S801至S810。 As shown in the flowchart of FIG. 8, the automatic repair processing after the fault recovery is reported in the NFV of the present invention is as follows. It is assumed that the fault repairing apparatus is deployed on the VNFM, and the process includes S801 to S810.
S801,故障定时修复模块监控VM和VNF上报的故障恢复。收到VM和VNF的故障恢复,或定时器到时消息,开始再次自动修复。S801. The fault timing repair module monitors fault recovery reported by the VM and the VNF. Receive VM and VNF failure recovery, or timer arrival message, start automatic repair again.
S802,若收到VNF的故障恢复,则执行S803。S802. If the fault recovery of the VNF is received, execute S803.
S803,设置VNF的故障恢复状态为正常,执行S810。S803: Set the fault recovery state of the VNF to be normal, and execute S810.
S804,若收到VM的故障恢复,或定时器到时消息,则执行S805。S804. If the fault recovery of the VM is received, or the timer expires message, execute S805.
S805,遍历检查所有的VNF状态,根据情况执行S806或S808。S805, traversing to check all VNF states, and executing S806 or S808 according to the situation.
S806,若为完全故障状态或部分故障状态,尝试修复,并判断修复是否成功。如果成功,则执行S803,否则执行S807。S806, if it is a complete fault state or a partial fault state, try to repair and determine whether the repair is successful. If successful, execute S803, otherwise execute S807.
其中,若为完全故障状态和部分故障状态的修复过程可参见实施方式一,此处不再赘述。For the repair process of the complete fault state and the partial fault state, refer to Embodiment 1, and details are not described herein again.
S807,等待下一次修复操作。执行S810。S807, waiting for the next repair operation. Execute S810.
S808,若为故障隔离状态,则VNFM发出申请,要求扩展VNF,请求给VNF分配新的VM,并判断是否成功申请VM。若系统有足够的资源,申请扩展可能成功,执行S809,若由于资源不足或其他原因,申请扩展失败。则设置定时器,执行S807。S808. If the fault is isolated, the VNFM sends an application requesting to extend the VNF, requesting to allocate a new VM to the VNF, and determining whether the VM is successfully applied. If the system has sufficient resources, the application extension may be successful, and S809 is executed. If the resource is insufficient or other reasons, the application extension fails. Then set the timer and execute S807.
S809,设置VNF状态为正常状态,然后执行S810。S809, setting the VNF state to a normal state, and then executing S810.
S810,结束流程。S810, the process ends.
在上述过程中,若定时器设置次数已经达到指定上限,自动修复过程失败,结束。In the above process, if the timer setting number has reached the specified upper limit, the automatic repair process fails and ends.
实施方式三Embodiment 3
本发明实施例所述NFV中故障恢复上报后的自动修复处理装置,可部署在VNFM上,也可以部署在EM或NFVO中。本实施例描述自动修复装置部署在EM上的实施方式,如图9所示,设置在NFVO中与此类似。The automatic repair processing device after the fault recovery is reported in the NFV according to the embodiment of the present invention may be deployed on the VNFM or in the EM or NFVO. This embodiment describes an embodiment in which the automatic repair device is deployed on the EM, as shown in FIG. 9, which is similarly set in the NFVO.
EM可接收从多个VNFM上报的VNF的创建、删除、修改、缩放等消息,收到消息后,在故障信息记录模块中记录VNF和VM的映射关系。 The EM can receive the creation, deletion, modification, and scaling of the VNF reported from multiple VNFMs. After receiving the message, the mapping relationship between the VNF and the VM is recorded in the fault information recording module.
EM可接收多个VNFM上报的VNF和VM的故障上报和故障恢复,并在故障信息记录模块中记录VNF的故障修复状态。The EM can receive fault reporting and fault recovery of VNFs and VMs reported by multiple VNFMs, and record the fault repair status of the VNFs in the fault information recording module.
EM收到VNF和VM的故障上报消息后,可以采用实施方式一和实施方式二中的流程来修复故障,只是实施方式一和实施方式二中由VNFM发出的指令,需要改为EM先发送给VNFM,再由VNFM执行命令。After receiving the fault reporting message of the VNF and the VM, the EM may use the procedures in the first embodiment and the second embodiment to repair the fault, but the instructions issued by the VNFM in the first embodiment and the second embodiment are required to be sent to the EM first. VNFM, which is then executed by VNFM.
对于VNF重建的修复过程,若在EM管理的一个VNFM中重建失败,则EM可以查找在其管理的其他VNFM中,是否可管理相同类型的VNF,若可以,则可以尝试在其他的VNFM中发起故障VNF的重建。如图9所示,具体过程为:EM先发送VNF1停止的命令给VNFM1,再给VNFM2发送创建VNF2的请求,其中VNF2的参数与原VNF1完全相同。若可创建成功,故障VNF状态为正常,同时给VNFM1发送VNF1删除命令。若VNF2创建失败,则给VNFM1发送VNF1恢复命令,VNF依然是故障状态。For the repair process of VNF reconstruction, if the reconstruction fails in a VNFM managed by EM, the EM can find out whether the same type of VNF can be managed in other VNFMs managed by it, and if possible, try to initiate in other VNFMs. Reconstruction of the faulty VNF. As shown in FIG. 9, the specific process is: EM first sends a VNF1 stop command to VNFM1, and then VNFM2 sends a request to create VNF2, wherein the parameters of VNF2 are exactly the same as the original VNF1. If the creation is successful, the fault VNF status is normal, and the VNF1 delete command is sent to VNFM1. If VNF2 fails to be created, a VNF1 recovery command is sent to VNFM1, and the VNF is still in a fault state.
采用本发明实施例提供的方法和装置,可以自动监测NFV系统中的VNF和VM告警,并尝试自动修复。且在自动修复失败时,能够自动隔离故障点。在故障点恢复后,再次尝试自动修复,让VNF提供的网络服务维持在预期的目标。The method and apparatus provided by the embodiments of the present invention can automatically monitor VNF and VM alarms in the NFV system and attempt automatic repair. And when the automatic repair fails, the fault point can be automatically isolated. After the recovery point is restored, try automatic repair again to maintain the network service provided by VNF at the expected target.
尽管为示例目的,已经公开了本发明的优选实施例,本领域的技术人员将意识到各种改进、增加和取代也是可能的,因此,本发明的范围应当不限于上述实施例。While the preferred embodiments of the present invention have been disclosed for purposes of illustration, those skilled in the art will recognize that various modifications, additions and substitutions are possible, and the scope of the invention should not be limited to the embodiments described above.
工业实用性Industrial applicability
本发明实施例中,监控VNF或VM是否上报故障;在上报故障的情况下,确定故障类型;按照故障类型确定故障修复策略,并根据故障修复策略进行故障修复。如此,在上报故障的情况下,根据故障类型来自动采取不同的故障修复策略,上述过程中NFV的架构下的虚拟设备在遇到故障时自动修复,响应时间短,还节省人力。 In the embodiment of the present invention, the VNF or the VM is reported to report the fault; in the case of reporting the fault, the fault type is determined; the fault repair strategy is determined according to the fault type, and the fault repair is performed according to the fault repair strategy. In this way, in the case of reporting a fault, different fault repair strategies are automatically adopted according to the fault type. In the above process, the virtual device under the NFV architecture automatically repairs when the fault is encountered, the response time is short, and manpower is saved.

Claims (14)

  1. 一种故障修复方法,所述方法包括:A fault repair method, the method comprising:
    监控虚拟化网络功能VNF或虚拟机VM是否上报故障;Monitor whether the virtualized network function VNF or virtual machine VM reports a fault;
    在上报故障的情况下,确定故障类型,其中,所述故障类型包括以下之一:部分VM故障、全部VM故障、VNF故障;In the case of reporting a fault, determining a fault type, wherein the fault type includes one of: a partial VM fault, a full VM fault, a VNF fault;
    按照所述故障类型确定故障修复策略,并根据所述故障修复策略进行故障修复。Determining a fault repair policy according to the fault type, and performing fault repair according to the fault repair policy.
  2. 如权利要求1所述的故障修复方法,其中,在所述故障类型为部分VM故障的情况下,按照所述故障类型确定故障修复策略,并根据所述故障修复策略进行故障修复,包括:The fault repairing method according to claim 1, wherein in the case that the fault type is a partial VM fault, the fault repair strategy is determined according to the fault type, and the fault repair is performed according to the fault repair policy, including:
    判断VM中是否存在主用VM和备用VM;Determining whether there is a primary VM and a standby VM in the VM;
    如果是,则在故障VM为所述主用VM情况下,进行主备VM倒换,再将故障的所述主用VM从所述VNF中删除,在所述故障VM为所述备用VM的情况下,将故障的所述备用VM从所述VNF中删除;If yes, in the case that the faulty VM is the active VM, perform active/standby VM switching, and then delete the failed primary VM from the VNF, where the faulty VM is the standby VM. Deleting the failed standby VM from the VNF;
    如果不是,则缩小所述VNF的功能,再将所述故障VM从所述VNF中删除。If not, the function of the VNF is reduced, and the faulty VM is deleted from the VNF.
  3. 如权利要求2所述的故障修复方法,其中,所述将故障VM从所述VNF中删除之后,还包括:The fault repairing method of claim 2, wherein after the removing the faulty VM from the VNF, the method further comprises:
    发送分配请求,以请求分配新的VM;Send an allocation request to request the allocation of a new VM;
    在分配到所述新的VM的情况下,将所述新的VM加入到所述VNF中。In the case of allocation to the new VM, the new VM is added to the VNF.
  4. 如权利要求1所述的故障修复方法,其中,在所述故障类型为全部VM故障或VNF故障的情况下,按照所述故障类型确定故障修复策略,并根据所述故障修复策略进行故障修复,包括:The fault repairing method according to claim 1, wherein in the case that the fault type is all VM fault or VNF fault, the fault repair strategy is determined according to the fault type, and the fault repair is performed according to the fault repair strategy. include:
    按照所述故障修复策略对所述VNF进行故障修复,其中,所述故障修复策略包括:主用VNF与备用VNF进行倒换操作,或者,内部逻辑修复 操作;Performing fault repair on the VNF according to the fault repairing strategy, where the fault repairing strategy includes: performing a switching operation between the primary VNF and the standby VNF, or internal logic repairing operating;
    在所述故障修复失败的情况下,重建一个新的VNF。In the event that the failure repair fails, a new VNF is reconstructed.
  5. 如权利要求1所述的故障修复方法,其中,所述监控VNF与VM是否存在故障之前,还包括:The fault repairing method of claim 1 , wherein before the monitoring of the VNF and the VM is faulty, the method further comprises:
    记录VNF与VM的映射关系。Record the mapping relationship between VNF and VM.
  6. 如权利要求1至5中任一项所述的故障修复方法,其中,所述方法还包括:The fault repairing method according to any one of claims 1 to 5, wherein the method further comprises:
    实时监听是否存在故障解除消息;Real-time monitoring whether there is a fault cancellation message;
    在存在所述故障解除消息的情况下,重新使用解除故障后的VNF或VM。In the case where the failure release message is present, the VNF or VM after the failure is re-used.
  7. 如权利要求1至5中任一项所述的故障修复方法,其中,所述根据所述故障修复策略进行故障修复之后,还包括:The fault repairing method according to any one of claims 1 to 5, wherein after the fault repairing according to the fault repairing strategy, the method further comprises:
    检测执行所述故障修复策略的次数是否达到预设次数;Detecting whether the number of times the fault repairing strategy is executed reaches a preset number of times;
    在未达到所述预定次数的情况下,继续根据所述故障修复策略进行故障修复;If the predetermined number of times is not reached, continue to perform fault repair according to the fault repairing strategy;
    在达到所述预定次数的情况下,不再继续执行所述故障修复策略。In the case where the predetermined number of times is reached, the failure repair strategy is not continuously executed.
  8. 一种故障修复装置,所述装置包括:A fault repairing device, the device comprising:
    监控模块,配置为监控VNF或VM是否上报故障;The monitoring module is configured to monitor whether the VNF or the VM reports a fault;
    确定模块,配置为在上报故障的情况下,确定故障类型,其中,所述故障类型包括以下之一:部分VM故障、全部VM故障、VNF故障;Determining a module, configured to determine a fault type in the case of reporting a fault, wherein the fault type includes one of: a partial VM fault, a full VM fault, and a VNF fault;
    故障修复模块,配置为按照所述故障类型确定故障修复策略,并根据所述故障修复策略进行故障修复。The fault repair module is configured to determine a fault repair policy according to the fault type, and perform fault repair according to the fault repair policy.
  9. 如权利要求8所述的故障修复装置,其中,所述故障修复模块包括:The fault repairing apparatus according to claim 8, wherein the fault repairing module comprises:
    判断单元,配置为在所述故障类型为部分VM故障的情况下,判断VM中是否存在主用VM和备用VM; a determining unit, configured to determine whether the primary VM and the standby VM exist in the VM if the fault type is a partial VM failure;
    第一修复单元,配置为在存在所述主用VM和所述备用VM时,在故障VM为所述主用VM情况下,进行主备VM倒换,再将故障的所述主用VM从所述VNF中删除,在所述故障VM为所述备用VM的情况下,将故障的所述备用VM从所述VNF中删除;The first repairing unit is configured to perform the active/standby VM switching when the faulty VM is the active VM, and then the faulty primary VM is removed from the active VM and the standby VM. Deleting in the VNF, in case the faulty VM is the standby VM, deleting the standby VM that is faulty from the VNF;
    第二修复单元,配置为在不存在所述主用VM和所述备用VM时,缩小所述VNF的功能,再将所述故障VM从所述VNF中删除。The second repairing unit is configured to reduce the function of the VNF when the primary VM and the standby VM are not present, and then delete the faulty VM from the VNF.
  10. 如权利要求8所述的故障修复装置,其中,所述故障修复模块还包括:The fault repairing apparatus according to claim 8, wherein the fault repairing module further comprises:
    第三修复单元,配置为在所述故障类型为全部VM故障或VNF故障的情况下,按照所述故障修复策略对所述VNF进行故障修复,其中,所述故障修复策略包括:主用VNF与备用VNF进行倒换操作,或者,内部逻辑修复操作;The third repairing unit is configured to perform fault repair on the VNF according to the fault repairing policy, where the fault type is all VM faults or VNF faults, where the fault repairing strategy includes: a primary VNF and a fault The standby VNF performs a switching operation, or an internal logic repair operation;
    重建单元,配置为在所述故障修复失败的情况下,重建一个新的VNF。A reconstruction unit configured to reconstruct a new VNF if the failure repair fails.
  11. 如权利要求8所述的故障修复装置,其中,所述装置还包括:The fault repair apparatus of claim 8, wherein the apparatus further comprises:
    记录模块,配置为记录VNF与VM的映射关系。A recording module configured to record a mapping relationship between the VNF and the VM.
  12. 如权利要求8至11中任一项所述的故障修复装置,其中,所述装置还包括:The fault repairing apparatus according to any one of claims 8 to 11, wherein the apparatus further comprises:
    第一检测模块,配置为实时监听是否存在故障解除消息;The first detecting module is configured to monitor whether there is a fault cancellation message in real time;
    添加模块,配置为在存在所述故障解除消息的情况下,重新使用解除故障后的VNF或VM。The module is added to be configured to reuse the VNF or VM after the failure in the presence of the fault release message.
  13. 如权利要求8至11中任一项所述的故障修复装置,其中,所述装置还包括:The fault repairing apparatus according to any one of claims 8 to 11, wherein the apparatus further comprises:
    第二检测模块,配置为检测执行所述故障修复策略的次数是否达到预设次数;The second detecting module is configured to detect whether the number of times the fault repairing strategy is executed reaches a preset number of times;
    所述故障修复模块,还配置为在达到所述预定次数的情况下,不再继 续执行所述故障修复策略;或者,在未达到所述预定次数的情况下,继续根据所述故障修复策略进行故障修复。The fault repair module is further configured to not continue after the predetermined number of times is reached The fault repairing strategy is continued; or, if the predetermined number of times is not reached, the fault repair is continued according to the fault repairing strategy.
  14. 一种计算机存储介质,所述计算机存储介质中存储有计算机可执行指令,所述计算机可执行指令用于执行权利要求1至7任一项所述的方法。 A computer storage medium having stored therein computer executable instructions for performing the method of any one of claims 1 to 7.
PCT/CN2015/078370 2014-12-22 2015-05-06 Fault recovery method, device and computer storage medium WO2016101486A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410811245.2 2014-12-22
CN201410811245.2A CN105790980B (en) 2014-12-22 2014-12-22 fault repairing method and device

Publications (1)

Publication Number Publication Date
WO2016101486A1 true WO2016101486A1 (en) 2016-06-30

Family

ID=56149079

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/078370 WO2016101486A1 (en) 2014-12-22 2015-05-06 Fault recovery method, device and computer storage medium

Country Status (2)

Country Link
CN (1) CN105790980B (en)
WO (1) WO2016101486A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107783855A (en) * 2016-08-30 2018-03-09 中兴通讯股份有限公司 The fault self-recovery control device and method of Virtual NE
US10880370B2 (en) 2018-11-27 2020-12-29 At&T Intellectual Property I, L.P. Virtual network manager system
CN112366694A (en) * 2020-10-29 2021-02-12 国网山东省电力公司泰安供电公司 Multi-station cooperation based automatic fault repairing method and device for power system

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106338982A (en) * 2016-09-26 2017-01-18 深圳前海弘稼科技有限公司 Fault processing method, fault processing device and server
CN108268021A (en) * 2016-12-30 2018-07-10 北京金风科创风电设备有限公司 Fault handling method and device
CN108347339B (en) * 2017-01-24 2020-06-16 华为技术有限公司 Service recovery method and device
CN108540298B (en) * 2017-03-01 2022-06-17 中兴通讯股份有限公司 Method and device for automatically processing garbage service
CN106992877B (en) * 2017-03-08 2019-07-09 中国人民解放军国防科学技术大学 Network Fault Detection and restorative procedure based on SDN framework
CN107395710B (en) * 2017-07-17 2020-09-22 苏州浪潮智能科技有限公司 Method and device for realizing configuration and high availability HA of cloud platform network element
CN109391481A (en) * 2017-08-02 2019-02-26 中国电信股份有限公司 Virtualize network element failure self-healing method and device
CN107623596A (en) * 2017-09-15 2018-01-23 郑州云海信息技术有限公司 Start the method for testing network element positioning investigation failure in a kind of NFV platforms
CN109995574A (en) * 2018-01-02 2019-07-09 中兴通讯股份有限公司 It is a kind of to repair the method for VNFM failure, monitor, VIM, VNFM and storage medium
CN112434819B (en) * 2019-08-09 2023-09-05 中国移动通信集团浙江有限公司 Service guarantee method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102325192A (en) * 2011-09-30 2012-01-18 上海宝信软件股份有限公司 Cloud computing implementation method and system
CN102801806A (en) * 2012-08-10 2012-11-28 薛海强 Cloud computing system and cloud computing resource management method
CN103607296A (en) * 2013-11-01 2014-02-26 杭州华三通信技术有限公司 Virtual machine fault processing method and equipment thereof

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102325192A (en) * 2011-09-30 2012-01-18 上海宝信软件股份有限公司 Cloud computing implementation method and system
CN102801806A (en) * 2012-08-10 2012-11-28 薛海强 Cloud computing system and cloud computing resource management method
CN103607296A (en) * 2013-11-01 2014-02-26 杭州华三通信技术有限公司 Virtual machine fault processing method and equipment thereof

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107783855A (en) * 2016-08-30 2018-03-09 中兴通讯股份有限公司 The fault self-recovery control device and method of Virtual NE
US10880370B2 (en) 2018-11-27 2020-12-29 At&T Intellectual Property I, L.P. Virtual network manager system
US11451624B2 (en) 2018-11-27 2022-09-20 At&T Intellectual Property I, L.P. Virtual network manager system
CN112366694A (en) * 2020-10-29 2021-02-12 国网山东省电力公司泰安供电公司 Multi-station cooperation based automatic fault repairing method and device for power system

Also Published As

Publication number Publication date
CN105790980B (en) 2020-01-31
CN105790980A (en) 2016-07-20

Similar Documents

Publication Publication Date Title
WO2016101486A1 (en) Fault recovery method, device and computer storage medium
US10915412B2 (en) System and method for live migration of a virtual machine
JP6285511B2 (en) Method and system for monitoring virtual machine cluster
CN102821158B (en) A kind of method and cloud system realizing virtual machine (vm) migration
WO2016045439A1 (en) Vnfm disaster-tolerant protection method and device, nfvo and storage medium
WO2017114325A1 (en) Fault processing method, device and system
US10635473B2 (en) Setting support program, setting support method, and setting support device
CN110807064B (en) Data recovery device in RAC distributed database cluster system
US20150234713A1 (en) Information processing apparatus and virtual machine migration method
KR20190041033A (en) Replaying jobs at a secondary location of a service
WO2008092912A1 (en) System and method of error recovery for backup applications
CN102708018A (en) Method and system for exception handling, proxy equipment and control device
WO2017157199A1 (en) Garbage collection method and device
CN110413218B (en) Method, apparatus and computer program product for fault recovery in a storage system
CN115562911B (en) Virtual machine data backup method, device, system, electronic equipment and storage medium
CN109582459A (en) The method and device that the trustship process of application is migrated
CN111342986B (en) Distributed node management method and device, distributed system and storage medium
CN108170507A (en) Virtual application management method/system, computer readable storage medium and server-side
US20230308342A1 (en) Network service management apparatus and network service management method
CA3052758A1 (en) Method and system for providing service redundancy between a master server and slave server
CN111897626A (en) Cloud computing scene-oriented virtual machine high-reliability system and implementation method
CN105743696A (en) Cloud computing platform management method
US20230254200A1 (en) Network service management apparatus and network service management method
CN110287066B (en) Server partition migration method and related device
WO2017124829A1 (en) Method and device for restoring virtual machine

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15871562

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15871562

Country of ref document: EP

Kind code of ref document: A1