CN113543180A - Alarm processing method and device - Google Patents

Alarm processing method and device Download PDF

Info

Publication number
CN113543180A
CN113543180A CN202010297416.XA CN202010297416A CN113543180A CN 113543180 A CN113543180 A CN 113543180A CN 202010297416 A CN202010297416 A CN 202010297416A CN 113543180 A CN113543180 A CN 113543180A
Authority
CN
China
Prior art keywords
alarm
mode
fault
root cause
processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010297416.XA
Other languages
Chinese (zh)
Other versions
CN113543180B (en
Inventor
弋景峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Datang Mobile Communications Equipment Co Ltd
Original Assignee
Datang Mobile Communications Equipment Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Datang Mobile Communications Equipment Co Ltd filed Critical Datang Mobile Communications Equipment Co Ltd
Priority to CN202010297416.XA priority Critical patent/CN113543180B/en
Publication of CN113543180A publication Critical patent/CN113543180A/en
Application granted granted Critical
Publication of CN113543180B publication Critical patent/CN113543180B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/04Arrangements for maintaining operational condition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0893Assignment of logical groups to network elements

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The embodiment of the invention provides an alarm processing method and device, which comprises the following steps: when the network slice in the running state is monitored to generate a fault alarm, acquiring an alarm mode corresponding to the fault alarm; matching the alarm mode with an alarm mode library, and determining a first alarm root cause corresponding to the alarm mode; matching the first alarm root cause with an alarm processing strategy library, and determining a first alarm processing strategy corresponding to the first alarm root cause; and processing a first alarm root factor by adopting a first alarm processing strategy. The embodiment of the invention determines a first alarm root factor by matching an alarm mode corresponding to a fault alarm generated by a network slice with an alarm mode library; then, a first alarm root cause is matched with an alarm processing strategy library to determine a first alarm processing strategy; to process the first alarm root cause via the first alarm handling policy. By the embodiment of the invention, the difficulty of fault positioning and processing is reduced, the efficiency of fault positioning and processing is improved, and the analysis time is saved.

Description

Alarm processing method and device
Technical Field
The present invention relates to the field of alarm processing technologies, and in particular, to an alarm processing method and an alarm processing apparatus.
Background
The 5G Network slice is a technology implemented based on NFV (Network Function virtualization). Network function virtualization brings unprecedented new capabilities and new features to communication networks. Network functions can be flexibly combined to form network slices meeting the requirements of various scenes to provide communication services, but compared with a traditional network, operation and maintenance personnel under the NFV technical condition face virtual network functions on cloud and pooled resources instead of network elements at exact physical positions, and compared with a traditional communication network architecture, the architecture of the network is more complex, multiple layers are spanned from top to bottom, and multiple domains are transversely spanned. If the traditional fault positioning and fault processing method is used for processing, the workload and the difficulty are huge. The operation and maintenance personnel need to cross the slice management layer, the sub-slice management layer, the network service arrangement, the virtual network element, the virtual machine and the network element of a plurality of domains of the physical layer, the radio network, the transmission network and the core network from top to bottom for fault processing; the comprehensive analysis can find the fault source, and the targeted fault processing can be carried out. The 5G network slicing operation and maintenance personnel carry out analysis and processing according to faults, performance and configuration data, completely rely on manual experience, have high working strength, have higher manual requirements compared with the traditional network, and have higher operation and maintenance difficulty.
The existing fault location processing mode requires that operation and maintenance personnel are very familiar with the association relationship between the functional objects in the system, including the horizontal association relationship and the vertical hierarchical relationship. The operation and maintenance personnel are required to manually analyze a series of fault information in the system, find the position of a fault object in the system, analyze the root cause fault and the secondary fault according to the time relationship of the fault and the relationship between the objects, and then process the root cause fault according to the manual experience. The existing fault processing mode has high requirements on personnel, is more difficult under the condition of virtualization technology, has low processing efficiency and needs to spend more analysis processing time.
Disclosure of Invention
In view of the above problems, embodiments of the present invention are proposed to provide an alarm processing method and a corresponding alarm processing apparatus that overcome or at least partially solve the above problems.
In order to solve the above problems, the embodiment of the present invention discloses an alarm processing method, which is applied to a fault processing system; the fault processing system monitors at least one network slice, and is configured with an alarm mode library and an alarm processing strategy library; the method comprises the following steps:
when it is monitored that the network slice in the running state generates a fault alarm, acquiring an alarm mode corresponding to the fault alarm;
matching the alarm mode with the alarm mode library, and determining a first alarm root cause corresponding to the alarm mode;
matching the first alarm root cause with the alarm processing strategy library, and determining a first alarm processing strategy corresponding to the first alarm root cause;
and processing the first alarm root factor by adopting the first alarm processing strategy.
Optionally, the network slice is composed of a plurality of hierarchical levels of objects; the method further comprises the following steps:
when the network slice is not operated, acquiring configuration data of objects of all levels in the network slice;
acquiring topological hierarchy relations among objects of different hierarchies based on the configuration data;
generating an alarm association relation based on the topological hierarchical relation;
and storing the alarm association relation into the alarm mode library.
Optionally, the method further comprises:
acquiring an alarm processing strategy corresponding to the alarm association relation;
and storing the alarm processing strategy into the alarm processing strategy library.
Optionally, the configuration data includes object identifiers and association relations between different object identifiers; the step of obtaining a topological hierarchical relationship between objects of different hierarchies based on the configuration data includes:
and generating a topological hierarchy relation between the objects of different hierarchies by adopting the object identification and the incidence relation between different object identifications.
Optionally, the step of generating an alarm association relationship based on the topological hierarchical relationship includes:
receiving a fault simulation instruction;
responding to the fault simulation instruction, and generating a root cause alarm signal;
and generating an alarm association relation based on the root cause alarm signal and the topological hierarchy relation.
Optionally, the step of generating an alarm association relationship based on the root cause alarm signal and the topology hierarchical relationship includes:
generating a secondary alarm signal based on the root cause alarm signal and the topological hierarchy relation;
and generating an alarm association relation by adopting the root cause alarm signal and the secondary alarm signal.
Optionally, the method further comprises:
when the alarm mode is not matched in the alarm mode library, determining at least one alarm object corresponding to the alarm mode;
matching a plurality of target alarm modes in the alarm mode library by adopting the at least one alarm object;
sequentially determining corresponding second alarm root causes from the target alarm modes;
acquiring a second alarm processing strategy corresponding to the second alarm root;
and processing the second alarm root factor by adopting the alarm processing strategy.
Optionally, the method further comprises:
saving the alert mode in the alert mode library;
and storing the alarm processing strategy in the alarm strategy library.
The embodiment of the invention also discloses an alarm processing device which is applied to the fault processing system; the fault processing system monitors at least one network slice, and is configured with an alarm mode library and an alarm processing strategy library; the device comprises:
the alarm mode acquisition module is used for acquiring an alarm mode corresponding to a fault alarm when the network slice in the running state is monitored to generate the fault alarm;
the alarm root cause determining module is used for matching the alarm mode with the alarm mode library and determining a first alarm root cause corresponding to the alarm mode;
the first alarm processing strategy determining module is used for matching the first alarm root cause with the alarm processing strategy library and determining a first alarm processing strategy corresponding to the first alarm root cause;
and the first alarm root cause processing module is used for processing the first alarm root cause by adopting the first alarm processing strategy.
Optionally, the network slice is composed of a plurality of hierarchical levels of objects; the device further comprises:
the configuration data acquisition module is used for acquiring configuration data of objects of all levels in the network slice when the network slice is not operated;
the topological hierarchy relation obtaining module is used for obtaining the topological hierarchy relation among the objects of different hierarchies based on the configuration data;
the alarm incidence relation generating module is used for generating an alarm incidence relation based on the topological hierarchical relation;
and the alarm association relation storage module is used for storing the alarm association relation into the alarm mode library.
Optionally, the apparatus further comprises:
the alarm processing strategy acquisition module is used for acquiring an alarm processing strategy corresponding to the alarm association relation;
and the alarm processing strategy storage module is used for storing the alarm processing strategy into the alarm processing strategy library.
Optionally, the configuration data includes object identifiers and association relations between different object identifiers; the topology hierarchy relation obtaining module comprises:
and the topological hierarchical relationship generation submodule is used for generating the topological hierarchical relationship among the objects in different hierarchies by adopting the object identifications and the incidence relationship among the different object identifications.
Optionally, the alarm association relationship generating module includes:
the fault simulation instruction receiving submodule is used for receiving a fault simulation instruction;
a root cause alarm signal generation submodule for responding to the fault simulation instruction and generating a root cause alarm signal;
and the alarm incidence relation generating submodule is used for generating an alarm incidence relation based on the root cause alarm signal and the topological hierarchy relation.
Optionally, the alarm association relationship generation sub-module includes:
the secondary alarm signal generating unit is used for generating a secondary alarm signal based on the root alarm signal and the topological hierarchy relation;
and the alarm incidence relation generating unit is used for generating an alarm incidence relation by adopting the root alarm signal and the secondary alarm signal.
Optionally, the apparatus further comprises:
the alarm object determining module is used for determining at least one alarm object corresponding to the alarm mode when the alarm mode is not matched in the alarm mode library;
the target alarm mode determining module is used for matching a plurality of target alarm modes in the alarm mode library by adopting the at least one alarm object;
the second alarm root cause determining module is used for sequentially determining corresponding second alarm root causes from the target alarm mode;
the second alarm processing strategy acquisition module is used for acquiring a second alarm processing strategy corresponding to the second alarm root;
and the second alarm root cause processing module is used for processing the second alarm root cause by adopting the alarm processing strategy.
Optionally, the apparatus further comprises:
the alarm mode library storage module is used for storing the alarm modes in the alarm mode library;
and the alarm strategy base storage module is used for storing the alarm processing strategy in the alarm strategy base.
The embodiment of the invention also discloses a device, which comprises:
one or more processors; and
one or more machine readable media having instructions stored thereon that, when executed by the one or more processors, cause the apparatus to perform one or more methods as described above.
Embodiments of the invention also disclose one or more machine-readable media having instructions stored thereon, which when executed by one or more processors, cause the processors to perform one or more of the methods described above.
The embodiment of the invention has the following advantages: the embodiment of the invention determines a first alarm root factor by matching an alarm mode corresponding to a fault alarm generated by a network slice with an alarm mode library; then, a first alarm root cause is matched with an alarm processing strategy library to determine a first alarm processing strategy; to process the first alarm root cause via the first alarm handling policy. By the embodiment of the invention, the difficulty of fault positioning and processing is reduced, the efficiency of fault positioning and processing is improved, and the analysis time is saved.
Drawings
FIG. 1 is a block diagram of a fault handling system of the present invention;
FIG. 2 is a flow chart of the steps of an embodiment of a method of alarm handling of the present invention;
FIG. 3 is a flow chart of the steps of an embodiment of a method of alarm handling of the present invention;
FIG. 4 is a topological hierarchical relationship between objects derived from configuration data in accordance with the present invention;
FIG. 5 is a relationship between an alarm resulting from a manufacturing failure and a secondary alarm in a learning mode of the present invention;
fig. 6 is a block diagram of an alarm processing apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
5G network slicing is a technology based on NFV (network Function virtualization) network Function virtualization.
The network slice is a networking mode according to needs, an operator can separate a plurality of virtual end-to-end networks on a unified infrastructure, and each network slice is logically isolated from a wireless access network bearing network to a core network so as to adapt to various types of applications.
Network function virtualization brings unprecedented new capabilities and new features to communication networks. Network functions can be flexibly combined to form network slices meeting the requirements of various scenes to provide communication services, but the problem is that the complexity of the network is increased, and the network slices are more complicated and difficult in fault location and fault processing compared with the traditional communication network slices. Physical layer equipment, virtual layer equipment and network function equipment may come from different manufacturers, fault location becomes very complicated, how to fully exert flexibility brought by NFV and reduce complexity brought by flexibility to fault processing is a problem to be solved by 5G network slicing.
The existing fault location processing mode requires that operation and maintenance personnel are very familiar with the association relationship between the functional objects in the system, including the horizontal association relationship and the vertical hierarchical relationship. The operation and maintenance personnel are required to manually analyze a series of fault information in the system, find the position of a fault object in the system, analyze the root cause fault and the secondary fault according to the time relationship of the fault and the relationship between the objects, and then process the root cause fault according to the manual experience. The existing fault processing mode has high requirements on personnel, is more difficult under the condition of virtualization technology, has low processing efficiency and needs to spend more analysis processing time.
Based on the drawbacks of the prior art, the embodiment of the present invention provides a core concept that a set of fault handling system for monitoring network slices as shown in fig. 1 is constructed to optimize the fault location and handling process when a network slice fails. The method mainly comprises the steps of determining a plurality of fault alarm modes through a mode of simulating faults to form an alarm mode library, and performing fault matching when the faults occur to realize quick positioning of the faults. Meanwhile, a fault processing strategy library is established, wherein fault processing strategies corresponding to a plurality of fault modes are stored, so that the faults can be quickly processed after the fault modes are determined. Compared with the prior art, the method and the device simplify the positioning process and the processing process when the network slice fault occurs. Thereby reducing the manpower consumption and the operation and maintenance difficulty.
The alarm processing method of the present invention is described below by way of embodiments:
referring to FIG. 2, a flowchart illustrating steps of an embodiment of an alarm handling method of the present invention is shown, applied to a fault handling system; the fault processing system monitors at least one network slice, and is configured with an alarm mode library and an alarm processing strategy library; the method specifically comprises the following steps:
step 201, when it is monitored that the network slice in the running state generates a fault alarm, acquiring an alarm mode corresponding to the fault alarm;
in the embodiment of the invention, when the fault processing system monitors that the network slice generates the fault alarm, the fault alarm is collected. And combining the generated series of alarms into a specific alarm mode according to a certain mode according to the time sequence.
Step 202, matching the alarm mode with the alarm mode library, and determining a first alarm root cause corresponding to the alarm mode;
the alarm root cause is a root fault which causes a series of associated faults in the network slice, and is the fault which is the lowest hierarchy and occurs first in the series of faults occurring in the network slice.
The alarm mode library stores a plurality of alarm modes corresponding to a plurality of forms of fault alarms which may appear on the network slice.
Based on the existing fault location processing mode, operation and maintenance personnel are required to be very familiar with the association relationship between the functional objects in the system, including the horizontal association relationship and the vertical hierarchical relationship. The operation and maintenance personnel are required to manually analyze a series of fault information in the system to find the position of a fault object in the system, and the problem of the root cause fault can be found according to the time relationship of the fault and the relationship between the objects. In one embodiment of the invention, an alarm mode library is configured for the fault system, and a first alarm root cause corresponding to the alarm mode is determined by matching the alarm mode with the alarm mode library. So that the failure mode can be located quickly.
Step 203, matching the first alarm root cause with the alarm processing strategy library, and determining a first alarm processing strategy corresponding to the first alarm root cause;
the alarm processing strategy base stores the alarm processing strategy corresponding to the alarm mode in the alarm mode base.
Further, the first alarm mode obtained in the last step is matched with an alarm processing strategy library, so that a first alarm processing strategy corresponding to the first alarm mode of the fault alarm can be obtained.
And 204, processing the first alarm root factor by adopting the first alarm processing strategy.
After the first alarm processing strategy is obtained, the first alarm root cause can be processed by adopting the first alarm processing strategy.
The embodiment of the invention determines a first alarm root factor by matching an alarm mode corresponding to a fault alarm generated by a network slice with an alarm mode library; then, a first alarm root cause is matched with an alarm processing strategy library to determine a first alarm processing strategy; to process the first alarm root cause via the first alarm handling policy. By the embodiment of the invention, the difficulty of fault positioning and processing is reduced, the efficiency of fault positioning and processing is improved, and the analysis time is saved.
Referring to fig. 3, a flow diagram illustrating the steps of an alarm handling method embodiment of the present invention is shown, involving a fault handling system and network slices. The fault handling system according to the embodiment of the present invention may include three parts, which are a data acquisition layer 101, a data analysis layer 102, and a decision control layer 103. A failure handling engine 1031 (shown in fig. 1) is configured at the decision control layer 103.
The method specifically comprises the following steps:
step 301, when the network slice is not running, acquiring configuration data of objects of each hierarchy in the network slice;
in the embodiment of the present invention, the decision control layer 103 sets an automatic learning mode, and when the network slice deployment is completed but the network slice is not formally put on line and put into use, the automatic learning mode is started. The automatic learning mode can simulate the generation of faults and carry out root cause analysis and automatic fault handling.
In the process of simulating the fault, in order to know the association relationship among a plurality of faults, the association relationship among objects of each hierarchy in the network slice needs to be known firstly, so as to determine a plurality of faults which can be generated according to the generation of a single fault. Each hierarchy of the network slice may include a physical layer, a virtual network function layer, and a network slice layer.
To implement this process, configuration data for objects at various levels in a network slice needs to be acquired first.
The configuration data may specifically include:
the configuration data of the physical layer comprises the configured physical servers, network card information, memory information, disk information, the number of processors and other information of each physical server;
configuration data of the virtual layer comprises information of virtual machines, a physical server corresponding to each virtual machine, memory, disks and processor resources occupied by each virtual machine, binding relations between ports of the virtual machines and physical ports and the like;
the Virtual Network Function layer configuration data includes which VNFs 110(Virtual Network Function), which Virtual machines each VNF110 corresponds to, which Virtual networks each VNF110 establishes, a connection relationship between VNFs 110, and the like.
The configuration data of the network slice layer includes information about which slices are included and which VNFs 110 constitute each slice.
As shown in fig. 1, in the data collection process, EMS104(Element Management System) collects alarm and configuration data of VNF110 and PNF111, and reports the collected data to fault handling engine 1031 through NFVO +105 shown in fig. 1. In addition, the alarm and configuration data of VIM106 (virtualized Infrastructure Manager) and PIM107 (physical Infrastructure Manager) may also be reported to fault handling engine 1031 via NFVO + 105. Meanwhile, the NSSMF108(Network Slice Subnet Management Function) in fig. 1 may report the alarm and configuration data of the sub-Slice dimension to the fault handling system. The NSMF109(Network Slice Management Function) in fig. 1 may report performance, alarm, and configuration data of Slice dimensions. The data collection layer 101 thus collects the full amount of information from the physical layer to the virtual layer, from the physical devices to the virtual machines, to the VNF110, to the sub-slices, up to the alarm and configuration data of the slice levels.
Step 302, acquiring topological hierarchy relations among objects of different hierarchies based on the configuration data;
after the configuration data between the objects of different hierarchies is obtained, the lower layer resources used by each hierarchy can be obtained based on the configuration data, including the information of the sub-slices included in the slice, the VNF110/PNF111(Physical Network Function) included in the sub-slices, the deployment unit and the virtual machine included in the deployment unit included in the VNF110, and the Physical device 112 in which the virtual machine is located.
After the resource usage relationship between different levels is analyzed by the configuration analysis engine 1021 of the data analysis layer 102, the topological level relationship from the physical layer to the virtual layer, to the VNF110, to the sub-slices, and to the association relationship between the slices can be obtained.
In one example, the topological hierarchical relationship between layers may be as shown in fig. 4, with a slice NSIa401 containing sub-slices NSIa 402; the subslice NSSIa402 contains virtual network function network elements VNFa1101 and VNFb 1102; VNFa1101 includes VMa4031(Virtual Machine) and VMb 4032; VNFb1102 includes VMc4033 and VMd 4034; wherein VMa4031 and VMb4032 are associated with VIMa1061, and virtualized VMc4033 and VMd4034 are associated with VIMb 1062; VIMa1061 is on physical device a1121 and VIMb1062 is on physical device b 1122.
Further, the configuration data may include object identifications and association relations between different object identifications; therefore, in the embodiment of the present invention, the topological hierarchical relationship between objects in different hierarchies may also be generated by using the object identifier and the association relationship between different object identifiers.
For example, the virtual machine ID corresponds to an association relationship of a physical server ID of the virtual machine, and the ID of the VNF110 corresponds to a virtual machine ID because the VNF110 uses the virtual machine. Therefore, the topological hierarchical relationship is generated according to the ID of each object and the incidence relationship among different object IDs.
Step 303, generating an alarm association relation based on the topological hierarchical relation;
in an actual scene, due to the association relationship among the layers of the network slice, when a certain object of one layer fails, an association failure is often generated based on the topology layer relationship, so that various failure alarms are generated, that is, an association relationship exists between a failure and a failure alarm generated by the association failure.
In the embodiment of the present invention, the generation of the alarm association relationship may be implemented by the following steps:
s11, receiving a fault simulation instruction;
s12, responding to the fault simulation instruction, and generating a root cause alarm signal;
s13, generating a secondary alarm signal based on the root cause alarm signal and the topological hierarchy relation;
and S14, generating an alarm association relation by adopting the root cause alarm signal and the secondary alarm signal.
As shown in fig. 1, NSSMF108 sends a learning mode simulation fault instruction to NFVO +105, after NFVO +105 receives an internal fault instruction simulating a slice, NFVO +105 queries resources under the slice, and calls, according to a topology hierarchical relationship, Management interfaces pm (Performance Management), fm (fault Management), cm (configuration Management) of lower-layer PIM107, VIM106, and VNFM113 from lower-layer resources of the topology, and performs operations such as closing, suspending, and restarting on physical resources, virtual resources NFVI114(Network Function Virtualization Infrastructure), and virtual Network functions to cause a root cause fault and generate a root cause alarm signal; and generating a secondary alarm signal based on the secondary fault caused by the root cause fault and the topological hierarchy relation. Thereby establishing the alarm association relationship between the factor alarm signal and the secondary alarm signal.
As shown in fig. 5, taking the virtual machine in VIM106 as an example:
fault handling engine 1031 calls the virtual machine suspend API in VIMa1061 to simulate a virtual machine fault in the nsai 401 slice. According to the topology hierarchy, the suspension of the virtual machine may cause a VNFa1101 fault alarm, may also cause a VNFb1102 alarm in communication with the VNFa1101, and may cause a sub-slice NSSIa402 alarm and a slice nsaa 401 alarm.
And step 304, storing the alarm association relation into the alarm mode library.
The root cause alarm and a series of secondary alarms caused by the suspension of the virtual machine actively generated by the decision control layer 103 in the automatic learning mode can be recorded as an alarm mode in the alarm analysis engine 1022 in fig. 1. As shown in fig. 5, a virtual machine 1 on a physical device a1121 fails, a VNFa1101 failure alarm is triggered, a VNFb1102 communication alarm is triggered due to a VNFa1101 failure, an NSSIa402 failure alarm is triggered due to a VNFa1101 failure and a VNFb1102 failure, and an nsaa 401 failure alarm is triggered due to an NSSIa402 failure, and the series of alarms may be recorded in the alarm analysis engine 1022 as an alarm mode.
Each virtual machine and each object are simulated one by one, and a series of alarms generated after the faults are recorded as alarm patterns in the alarm analysis engine 1022, so that an alarm pattern library can be formed.
In addition, the decision control layer 103 may also call an interface down API to the virtual network card in the VIM106 in the automatic learning mode, simulate a network port fault, and cause a series of secondary alarms. Each interface failure may also be recorded as a failure mode in the failure handling engine 1031.
In another example, the decision control layer 103 may also call an interface of the PIM107 to control the suspension of the physical server, simulate the physical device failure, and record the alarm caused by each physical device failure as a failure mode in the failure processing engine 1031 in the auto-learning mode.
305, acquiring an alarm processing strategy corresponding to the alarm association relation;
step 306, storing the alarm processing policy to the alarm processing policy library.
After the alarm mode is obtained, an alarm processing strategy corresponding to the alarm mode can be determined according to the alarm mode, and the alarm processing strategy is stored in an alarm processing strategy library.
Step 307, when it is monitored that the network slice in the running state generates a fault alarm, acquiring an alarm mode corresponding to the fault alarm;
and when the network slice is online and put into use, closing the automatic learning mode. After the alarm is generated in the actual operation, the fault alarm can be analyzed, and the fault alarm mode corresponding to the fault alarm is obtained. The alarm pattern as obtained may be an alarm pattern of {1, 2, 3, 5}, where 1, 2, 3, 5 represent objects of different hierarchies.
Step 308, matching the alarm mode with the alarm mode library, and determining a first alarm root cause corresponding to the alarm mode;
as shown in fig. 1, when the alarm pattern corresponding to the fault alarm is determined, an alarm analysis engine 1022 in the data analysis layer 102 may match the alarm pattern with the alarm patterns already collected in the alarm pattern library, so as to provide an alarm root cause.
For example, when the alarm pattern corresponding to the fault alarm is {1, 2, 3}, if the matching is successful in the alarm pattern library, the 1 fault with the lowest hierarchy may be determined as the first root cause fault.
Step 309, matching the first alarm root cause with the alarm processing strategy library, and determining a first alarm processing strategy corresponding to the first alarm root cause;
and matching the first alarm root cause with the alarm processing strategy library to determine and obtain the alarm processing strategy for processing the first alarm root cause.
And 310, processing the first alarm root factor by adopting the first alarm processing strategy.
After the first alarm processing strategy is obtained, the first alarm root can be processed according to the first alarm processing strategy.
It should be noted that, if the alarm patterns collected in the alarm pattern library cannot match the alarm patterns generated by the fault, that is, multiple cause faults may be generated simultaneously, a plurality of alarm patterns may be used to match the alarm patterns generated by the fault.
The matching process specifically comprises the following steps:
s21, when the alarm mode is not matched in the alarm mode library, determining at least one alarm object corresponding to the alarm mode;
s22, matching a plurality of target alarm modes in the alarm mode library by adopting the at least one alarm object;
s23, sequentially determining corresponding second alarm root factors from the target alarm modes;
s24, acquiring a second alarm processing strategy corresponding to the second alarm root;
and S25, processing the second alarm root factor by adopting the alarm processing strategy.
In the embodiment of the invention, when the alarm pattern library cannot be matched with the alarm pattern corresponding to the fault alarm, namely a plurality of root cause faults are possibly generated simultaneously, at the moment, a plurality of target alarm patterns can be matched in the alarm database according to the object generating the fault, a corresponding second alarm root cause is determined from the target alarm patterns in sequence, and the processing is performed according to the corresponding alarm processing strategy in sequence.
For example, when the alarm pattern generated by the fault is {1, 2, 3, 5} which is not in the alarm pattern library, the existing alarm patterns {1, 2, 3} and {1, 2, 5} can be used for dematching. In a specific implementation, a root cause fault is first matched, and then the decision control layer 103 is notified that the root cause fault occurs, so that the decision control layer 103 performs fault processing in a working state, and controls the lower-layer VIM106 to perform switching or migration processing of the virtual machine. After the decision control layer 103 finishes processing a root cause failure, if only the alarm of the {2, 3, 5} mode remains, the failure processing is continued to be performed on the failure of the {2, 3, 5} mode. Attempts were made to control NFVO + to perform self-healing operations with 2 as the root cause of failure.
The self-healing operation is a management function of NFVO +105, the self-healing operation of NFVO +105 is essentially an interface for invoking virtual resource management provided by VIM106, the virtual resource management interface provides management operations such as migration, recovery, and remote regeneration of virtual resources, and the self-healing operation is realized through the management operations of the interfaces.
If the {2, 3, 5} alarms are cleared, it indicates that two root cause failures occurred simultaneously.
It should be noted that if the result cannot be eliminated, the result may be a mode that is not learned in the automatic learning scenario, the data analysis layer 102 starts the artificial learning mode to request artificial processing, confirms the root cause alarm after the artificial processing is completed, and learns a new alarm mode and a new processing strategy according to the result of the artificial confirmation.
For example, if the failure mode is {2, 3, 4} which cannot match the existing mode, it is recorded that, as a new failure mode, the alarm 2 with the earliest time in the failure mode and the lowest topology level is treated as a possible cause failure, the decision control layer 103 tries to recover the possible failure, and if the {2, 3, 4} data analysis layer 102 starts the learning mode and the alarm is eliminated, the new mode is recorded. And recording the alarm mode root cause alarm elimination method obtained by the learning mode into an alarm processing strategy library. If not, manual processing is requested. After the manual processing alarm is eliminated, new alarm modes and processing strategies are added and recorded.
The embodiment of the invention determines a first alarm root factor by matching an alarm mode corresponding to a fault alarm generated by a network slice with an alarm mode library; then, a first alarm root cause is matched with an alarm processing strategy library to determine a first alarm processing strategy; to process the first alarm root cause via the first alarm handling policy. By the embodiment of the invention, the difficulty of fault positioning and processing is reduced, the efficiency of fault positioning and processing is improved, and the analysis time is saved.
It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.
Referring to fig. 6, a block diagram of an embodiment of an alarm processing apparatus according to the present invention is shown, and may specifically include the following modules:
an alarm mode obtaining module 601, configured to obtain an alarm mode corresponding to a fault alarm when it is monitored that the network slice in the running state generates the fault alarm;
an alarm root cause determining module 602, configured to match the alarm pattern with the alarm pattern library, and determine a first alarm root cause corresponding to the alarm pattern;
a first alarm processing policy determining module 603, configured to match the first alarm root cause with the alarm processing policy library, and determine a first alarm processing policy corresponding to the first alarm root cause;
a first alarm root cause processing module 604, configured to process the first alarm root cause by using the first alarm processing policy.
In an embodiment of the present invention, the network slice is composed of a plurality of hierarchical levels of objects; the apparatus may further include:
the configuration data acquisition module is used for acquiring configuration data of objects of all levels in the network slice when the network slice is not operated;
the topological hierarchy relation obtaining module is used for obtaining the topological hierarchy relation among the objects of different hierarchies based on the configuration data;
the alarm incidence relation generating module is used for generating an alarm incidence relation based on the topological hierarchical relation;
and the alarm association relation storage module is used for storing the alarm association relation into the alarm mode library.
In the embodiment of the present invention, the apparatus may further include:
the alarm processing strategy acquisition module is used for acquiring an alarm processing strategy corresponding to the alarm association relation;
and the alarm processing strategy storage module is used for storing the alarm processing strategy into the alarm processing strategy library.
In the embodiment of the present invention, the configuration data includes object identifiers and association relations between different object identifiers; the topology hierarchy relationship obtaining module may include:
and the topological hierarchical relationship generation submodule is used for generating the topological hierarchical relationship among the objects in different hierarchies by adopting the object identifications and the incidence relationship among the different object identifications.
In this embodiment of the present invention, the alarm association relationship generating module may include:
the fault simulation instruction receiving submodule is used for receiving a fault simulation instruction;
a root cause alarm signal generation submodule for responding to the fault simulation instruction and generating a root cause alarm signal;
and the alarm incidence relation generating submodule is used for generating an alarm incidence relation based on the root cause alarm signal and the topological hierarchy relation.
In this embodiment of the present invention, the alarm association relation generating sub-module may include:
the secondary alarm signal generating unit is used for generating a secondary alarm signal based on the root alarm signal and the topological hierarchy relation;
and the alarm incidence relation generating unit is used for generating an alarm incidence relation by adopting the root alarm signal and the secondary alarm signal.
In this embodiment of the present invention, the apparatus may further include:
the alarm object determining module is used for determining at least one alarm object corresponding to the alarm mode when the alarm mode is not matched in the alarm mode library;
the target alarm mode determining module is used for matching a plurality of target alarm modes in the alarm mode library by adopting the at least one alarm object;
the second alarm root cause determining module is used for sequentially determining corresponding second alarm root causes from the target alarm mode;
the second alarm processing strategy acquisition module is used for acquiring a second alarm processing strategy corresponding to the second alarm root;
and the second alarm root cause processing module is used for processing the second alarm root cause by adopting the alarm processing strategy.
In this embodiment of the present invention, the apparatus may further include:
the alarm mode library storage module is used for storing the alarm modes in the alarm mode library;
and the alarm strategy base storage module is used for storing the alarm processing strategy in the alarm strategy base.
For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
An embodiment of the present invention further provides an apparatus, including:
one or more processors; and
one or more machine-readable media having instructions stored thereon, which when executed by the one or more processors, cause the apparatus to perform methods as described in embodiments of the invention.
Embodiments of the invention also provide one or more machine-readable media having instructions stored thereon, which when executed by one or more processors, cause the processors to perform the methods described in embodiments of the invention.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.
The alarm processing method and the alarm processing device provided by the invention are described in detail, and the principle and the implementation mode of the invention are explained by applying specific examples, and the description of the embodiments is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (18)

1. An alarm processing method is characterized in that the method is applied to a fault processing system; the fault processing system monitors at least one network slice, and is configured with an alarm mode library and an alarm processing strategy library; the method comprises the following steps:
when it is monitored that the network slice in the running state generates a fault alarm, acquiring an alarm mode corresponding to the fault alarm;
matching the alarm mode with the alarm mode library, and determining a first alarm root cause corresponding to the alarm mode;
matching the first alarm root cause with the alarm processing strategy library, and determining a first alarm processing strategy corresponding to the first alarm root cause;
and processing the first alarm root factor by adopting the first alarm processing strategy.
2. The method of claim 1, wherein the network slice is composed of a plurality of hierarchical levels of objects; the method further comprises the following steps:
when the network slice is not operated, acquiring configuration data of objects of all levels in the network slice;
acquiring topological hierarchy relations among objects of different hierarchies based on the configuration data;
generating an alarm association relation based on the topological hierarchical relation;
and storing the alarm association relation into the alarm mode library.
3. The method of claim 2, further comprising:
acquiring an alarm processing strategy corresponding to the alarm association relation;
and storing the alarm processing strategy into the alarm processing strategy library.
4. The method of claim 2, wherein the configuration data comprises object identifiers and associations between different object identifiers; the step of obtaining a topological hierarchical relationship between objects of different hierarchies based on the configuration data includes:
and generating a topological hierarchy relation between the objects of different hierarchies by adopting the object identification and the incidence relation between different object identifications.
5. The method of claim 2, wherein the step of generating an alarm association relationship based on the topological hierarchical relationship comprises:
receiving a fault simulation instruction;
responding to the fault simulation instruction, and generating a root cause alarm signal;
and generating an alarm association relation based on the root cause alarm signal and the topological hierarchy relation.
6. The method of claim 5, wherein the step of generating an alarm association relationship based on the root cause alarm signal and the topological hierarchical relationship comprises:
generating a secondary alarm signal based on the root cause alarm signal and the topological hierarchy relation;
and generating an alarm association relation by adopting the root cause alarm signal and the secondary alarm signal.
7. The method according to any one of claims 1-6, further comprising:
when the alarm mode is not matched in the alarm mode library, determining at least one alarm object corresponding to the alarm mode;
matching a plurality of target alarm modes in the alarm mode library by adopting the at least one alarm object;
sequentially determining corresponding second alarm root causes from the target alarm modes;
acquiring a second alarm processing strategy corresponding to the second alarm root;
and processing the second alarm root factor by adopting the alarm processing strategy.
8. The method of claim 7, further comprising:
saving the alert mode in the alert mode library;
and storing the alarm processing strategy in the alarm strategy library.
9. An alarm processing device is characterized by being applied to a fault processing system; the fault processing system monitors at least one network slice, and is configured with an alarm mode library and an alarm processing strategy library; the device comprises:
the alarm mode acquisition module is used for acquiring an alarm mode corresponding to a fault alarm when the network slice in the running state is monitored to generate the fault alarm;
the alarm root cause determining module is used for matching the alarm mode with the alarm mode library and determining a first alarm root cause corresponding to the alarm mode;
the first alarm processing strategy determining module is used for matching the first alarm root cause with the alarm processing strategy library and determining a first alarm processing strategy corresponding to the first alarm root cause;
and the first alarm root cause processing module is used for processing the first alarm root cause by adopting the first alarm processing strategy.
10. The apparatus of claim 9, wherein the network slice is composed of a plurality of hierarchical levels of objects; the device further comprises:
the configuration data acquisition module is used for acquiring configuration data of objects of all levels in the network slice when the network slice is not operated;
the topological hierarchy relation obtaining module is used for obtaining the topological hierarchy relation among the objects of different hierarchies based on the configuration data;
the alarm incidence relation generating module is used for generating an alarm incidence relation based on the topological hierarchical relation;
and the alarm association relation storage module is used for storing the alarm association relation into the alarm mode library.
11. The apparatus of claim 10, further comprising:
the alarm processing strategy acquisition module is used for acquiring an alarm processing strategy corresponding to the alarm association relation;
and the alarm processing strategy storage module is used for storing the alarm processing strategy into the alarm processing strategy library.
12. The apparatus of claim 10, wherein the configuration data comprises object identifiers and associations between different object identifiers; the topology hierarchy relation obtaining module comprises:
and the topological hierarchical relationship generation submodule is used for generating the topological hierarchical relationship among the objects in different hierarchies by adopting the object identifications and the incidence relationship among the different object identifications.
13. The apparatus of claim 10, wherein the alarm association generating module comprises:
the fault simulation instruction receiving submodule is used for receiving a fault simulation instruction;
a root cause alarm signal generation submodule for responding to the fault simulation instruction and generating a root cause alarm signal;
and the alarm incidence relation generating submodule is used for generating an alarm incidence relation based on the root cause alarm signal and the topological hierarchy relation.
14. The apparatus of claim 13, wherein the alarm association generation sub-module comprises:
the secondary alarm signal generating unit is used for generating a secondary alarm signal based on the root alarm signal and the topological hierarchy relation;
and the alarm incidence relation generating unit is used for generating an alarm incidence relation by adopting the root alarm signal and the secondary alarm signal.
15. The apparatus according to any one of claims 9-14, further comprising:
the alarm object determining module is used for determining at least one alarm object corresponding to the alarm mode when the alarm mode is not matched in the alarm mode library;
the target alarm mode determining module is used for matching a plurality of target alarm modes in the alarm mode library by adopting the at least one alarm object;
the second alarm root cause determining module is used for sequentially determining corresponding second alarm root causes from the target alarm mode;
the second alarm processing strategy acquisition module is used for acquiring a second alarm processing strategy corresponding to the second alarm root;
and the second alarm root cause processing module is used for processing the second alarm root cause by adopting the alarm processing strategy.
16. The apparatus of claim 15, further comprising:
the alarm mode library storage module is used for storing the alarm modes in the alarm mode library;
and the alarm strategy base storage module is used for storing the alarm processing strategy in the alarm strategy base.
17. An apparatus, comprising:
one or more processors; and
one or more machine-readable media having instructions stored thereon that, when executed by the one or more processors, cause the apparatus to perform the method of one or more of claims 1-8.
18. One or more machine readable media having instructions stored thereon that, when executed by one or more processors, cause the processors to perform the method of one or more of claims 1-8.
CN202010297416.XA 2020-04-15 2020-04-15 Alarm processing method and device Active CN113543180B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010297416.XA CN113543180B (en) 2020-04-15 2020-04-15 Alarm processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010297416.XA CN113543180B (en) 2020-04-15 2020-04-15 Alarm processing method and device

Publications (2)

Publication Number Publication Date
CN113543180A true CN113543180A (en) 2021-10-22
CN113543180B CN113543180B (en) 2024-06-14

Family

ID=78088294

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010297416.XA Active CN113543180B (en) 2020-04-15 2020-04-15 Alarm processing method and device

Country Status (1)

Country Link
CN (1) CN113543180B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114513802A (en) * 2022-01-04 2022-05-17 武汉烽火技术服务有限公司 Event stream-based bearer network fault analysis method and device
CN116155692A (en) * 2023-02-24 2023-05-23 北京优特捷信息技术有限公司 Alarm solution recommending method and device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109768895A (en) * 2019-03-29 2019-05-17 南京邮电大学 A kind of network slice failure management method and system
CN110048872A (en) * 2018-01-16 2019-07-23 中兴通讯股份有限公司 A kind of network alarm method, apparatus, system and terminal
CN110611926A (en) * 2018-06-15 2019-12-24 华为技术有限公司 Alarm method and device
US20200065173A1 (en) * 2018-08-22 2020-02-27 Ca, Inc. Controlled monitoring based on root cause analysis recommendations

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110048872A (en) * 2018-01-16 2019-07-23 中兴通讯股份有限公司 A kind of network alarm method, apparatus, system and terminal
CN110611926A (en) * 2018-06-15 2019-12-24 华为技术有限公司 Alarm method and device
US20200065173A1 (en) * 2018-08-22 2020-02-27 Ca, Inc. Controlled monitoring based on root cause analysis recommendations
CN109768895A (en) * 2019-03-29 2019-05-17 南京邮电大学 A kind of network slice failure management method and system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114513802A (en) * 2022-01-04 2022-05-17 武汉烽火技术服务有限公司 Event stream-based bearer network fault analysis method and device
CN114513802B (en) * 2022-01-04 2023-06-09 武汉烽火技术服务有限公司 Method and device for analyzing bearing network faults based on event stream
CN116155692A (en) * 2023-02-24 2023-05-23 北京优特捷信息技术有限公司 Alarm solution recommending method and device, electronic equipment and storage medium
CN116155692B (en) * 2023-02-24 2023-11-24 北京优特捷信息技术有限公司 Alarm solution recommending method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN113543180B (en) 2024-06-14

Similar Documents

Publication Publication Date Title
CN107544839B (en) Virtual machine migration system, method and device
CN109150572B (en) Method, device and computer readable storage medium for realizing alarm association
CN107508722B (en) Service monitoring method and device
CA2808239C (en) Determining equivalent subsets of agents to gather information for a fabric
CN108337110B (en) Virtual resource management method and device and computer readable storage medium
CN111092752B (en) Fault positioning method and device spanning multiple network slices
CN112769605B (en) Heterogeneous multi-cloud operation and maintenance management method and hybrid cloud platform
CN113543180B (en) Alarm processing method and device
CN113542039A (en) Method for positioning 5G network virtualization cross-layer problem through AI algorithm
CN108199901B (en) Hardware repair reporting method, system, device, hardware management server and storage medium
CN106655502B (en) Method and device for acquiring running state data of power distribution network equipment
CN114070707A (en) Internet performance monitoring method and system
CN106021070A (en) Method and device for server cluster monitoring
CN109274734B (en) Service process calling method and device based on Internet of things cloud platform
CN108199860A (en) A kind of alert processing method and the network equipment
CN106452836A (en) Method and apparatus for setting host node
CN111371570B (en) Fault detection method and device for NFV network
CN112671586B (en) Automatic migration and guarantee method and device for service configuration
CN117880132A (en) Cluster topology testing method and device based on IB network
JP6555721B2 (en) Disaster recovery system and method
CN110636072B (en) Target domain name scheduling method, device, equipment and storage medium
CN114124727B (en) Network management communication pressure testing method and system
CN107018033B (en) Self-adjusting cloud management system
CN106972942B (en) Alarm processing method and system
CN113849364B (en) Edge application management method, device, equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant