CN115051907A - Alarm log data processing method and device and nonvolatile storage medium - Google Patents

Alarm log data processing method and device and nonvolatile storage medium Download PDF

Info

Publication number
CN115051907A
CN115051907A CN202210655548.4A CN202210655548A CN115051907A CN 115051907 A CN115051907 A CN 115051907A CN 202210655548 A CN202210655548 A CN 202210655548A CN 115051907 A CN115051907 A CN 115051907A
Authority
CN
China
Prior art keywords
alarm
log data
alarm log
graph
preset rule
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210655548.4A
Other languages
Chinese (zh)
Inventor
张舒朗
郑顺吾
孙迪
巴麒龙
宋泽宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Telecom Corp Ltd
Original Assignee
China Telecom Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Telecom Corp Ltd filed Critical China Telecom Corp Ltd
Priority to CN202210655548.4A priority Critical patent/CN115051907A/en
Publication of CN115051907A publication Critical patent/CN115051907A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0604Management of faults, events, alarms or notifications using filtering, e.g. reduction of information by using priority, element types, position or time
    • H04L41/0627Management of faults, events, alarms or notifications using filtering, e.g. reduction of information by using priority, element types, position or time by acting on the notification or alarm source
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0686Additional information in the notification, e.g. enhancement of specific meta-data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/142Network analysis or design using statistical or mathematical methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Animal Behavior & Ethology (AREA)
  • Algebra (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Pure & Applied Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The application discloses a method and a device for processing alarm log data and a nonvolatile storage medium. Wherein, the method comprises the following steps: acquiring first alarm log data, wherein the first alarm log data comprises at least one of the following data: alarm log data of a base station, alarm log data of a power environment monitoring system and alarm log data of a transmission network; classifying the first alarm log data according to a first preset rule to obtain second alarm log data; determining a second preset rule, and screening second alarm log data according to the second preset rule to obtain third alarm log data; and outputting an alarm causal relationship diagram by using the first alarm log data and the third alarm log data, and determining the reason of the alarm according to the alarm causal relationship diagram. The method and the device solve the technical problem that the reason of the alarm cannot be correctly positioned due to the fact that network topology data is lost and the alarm log data is manually labeled.

Description

Alarm log data processing method and device and nonvolatile storage medium
Technical Field
The application relates to the field of big data processing and analysis, in particular to a method and a device for processing alarm log data and a nonvolatile storage medium.
Background
The network of the operator is huge and comprises a large number of devices with different functions, different service domains, different manufacturers and different models; the networking protocols and topology forms of the operator networks are different, and the data formats, transmission protocols and responses to network conditions of the equipment are also different. In network implementation, service arrangement, availability guarantee, safety guarantee and other network operation and maintenance related matters, fault processing and alarm analysis are very important rings for guaranteeing continuous availability of network services and are the core of network operation and maintenance; the alarm root is determined to depend on expert experience only by manually analyzing the alarm, and the analysis lasts for a long time. The existing optional network alarm root cause positioning method mainly comprises two types: one method is to use network alarm log data to perform supervised training on machine learning classifiers (such as decision trees, support vector machines and the like) and then use models to perform classified prediction on new alarm events received in real time. The other method is that network alarm graph data is constructed by relying on the physical topological structure relationship of an operator network, a graph neural network (such as Graphsage, GCN, GAT and the like) is trained, a trained model is used for classifying alarm graph nodes appearing in real time, and the alarm nodes are better than the former alarm nodes in terms of positioning effect; in addition, the two methods both require an operation and maintenance expert to mark the network alarm log data item by item, which is time-consuming and labor-consuming and difficult to achieve the ideal effect.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The embodiment of the application provides a method and a device for processing alarm log data and a nonvolatile storage medium, which are used for at least solving the technical problem that alarm reasons cannot be correctly positioned due to network topology data loss and manual labeling of the alarm log data.
According to an aspect of an embodiment of the present application, a method for processing alarm log data is provided, including: acquiring first alarm log data, wherein the first alarm log data comprises at least one of the following data: alarm log data of a base station, alarm log data of a power environment monitoring system and alarm log data of a transmission network; classifying the first alarm log data according to a first preset rule to obtain second alarm log data; determining a second preset rule, and screening second alarm log data according to the second preset rule to obtain third alarm log data; and outputting an alarm causal relationship graph by using the first alarm log data and the third alarm log data, and determining the reason of the alarm according to the alarm causal relationship graph.
Optionally, classifying the first alarm log data according to a first preset rule to obtain second alarm log data, including: splicing an alarm equipment identification field and an alarm type field in the first alarm log data to obtain a first field; sorting the first alarm log data according to the generated time; setting a first time interval, wherein the sequenced first alarm log data comprises a plurality of first time intervals; and forming second alarm log data according to the plurality of first fields in the first time interval.
Optionally, before classifying the first alarm log data according to a first preset rule to obtain second alarm log data, the method further includes: determining a first preset rule, wherein the first preset rule comprises at least one of the following: the alarm type is a flash alarm, the alarm type is a repeated alarm, or the alarm type is a remote alarm; and clearing the fields of the alarm types of the flash alarm, the repeat alarm or the remote alarm in the second alarm log data.
Optionally, determining a second preset rule, and screening the second alarm log data according to the second preset rule to obtain third alarm log data includes: setting a first hyper-parameter and a second hyper-parameter, wherein the first hyper-parameter and the second hyper-parameter represent the association relationship between the alarm equipment identifier and the alarm type; respectively calculating third hyper-parameters and fourth hyper-parameters of a plurality of first fields in second alarm log data through a first preset algorithm; if the third superparameter is greater than or equal to the first superparameter and the fourth superparameter is greater than or equal to the second superparameter, reserving the first field; if the third superparameter is smaller than the first superparameter or the fourth superparameter is smaller than the second superparameter, deleting the first field; and forming third alarm log data according to the plurality of reserved first fields.
Optionally, outputting the alarm causality graph using the first alarm log data and the third alarm log data comprises: determining a first field in the third alarm log data as a node of the alarm causal graph; determining a connecting line of the connecting nodes as a non-directional edge of the alarm causal relationship graph, and taking the maximum value of the fourth hyper-parameter as the weight of the non-directional edge of the alarm causal relationship graph; and outputting the alarm causal graph through the nodes of the alarm causal graph and the weights of the undirected edges of the alarm causal graph.
Optionally, before determining the cause of the alarm according to the alarm cause and effect relationship diagram, the method further includes: and converting the alarm cause and effect relationship graph into an adjacent matrix, wherein the row number of the adjacent matrix and the column number of the adjacent matrix are determined according to the nodes, and the elements of the adjacent matrix are the weights of the non-directional edges of the alarm cause and effect relationship graph.
Optionally, determining the cause of the alarm according to the alarm causal relationship diagram includes: calculating an adjacency matrix through a second preset algorithm to obtain a directed edge of the alarm causal relationship graph; fusing directed edges obtained through a third preset algorithm, and converting the alarm causal relationship graph into an alarm causal relationship knowledge graph; reversely recursing to a first root node of the alarm causal relation knowledge graph according to the alarm causal relation knowledge graph; if the first alarm log data comprises an alarm type field of the first root node, the alarm type of the first root node is the reason of the alarm; and if the first alarm log data does not comprise the alarm type field of the first root node, continuing to recurse reversely to a second root node of the alarm cause and effect relation knowledge graph until the reason of the alarm is determined.
According to another aspect of the embodiments of the present application, there is provided an apparatus for processing alarm log data, the apparatus including: the obtaining module is configured to obtain first alarm log data, where the first alarm log data includes at least one of: alarm log data of a base station, alarm log data of a power environment monitoring system and alarm log data of a transmission network; the first classification module is used for classifying the first alarm log data according to a first preset rule to obtain second alarm log data; the second classification module is used for determining a second preset rule and screening second alarm log data according to the second preset rule to obtain third alarm log data; and the processing module is used for outputting an alarm causal graph by utilizing the first alarm log data and the third alarm log data to determine the reason of the alarm.
According to the embodiment of the application, a nonvolatile storage medium is further provided, and the nonvolatile storage medium comprises a stored program, wherein when the program runs, a device where the nonvolatile storage medium is located is controlled to execute any one of the above alarm log data processing methods.
According to the embodiment of the application, a processor is further provided, and the processor is used for running a program stored in a memory, wherein when the program runs, any one of the processing methods of the alarm log data is executed.
In the embodiment of the application, obtaining first alarm log data is adopted, wherein the first alarm log data comprises at least one of the following data: alarm log data of a base station, alarm log data of a power environment monitoring system and alarm log data of a transmission network; classifying the first alarm log data according to a first preset rule to obtain second alarm log data; determining a second preset rule, and screening second alarm log data according to the second preset rule to obtain third alarm log data; outputting an alarm causal graph by using the first alarm log data and the third alarm log data, determining the mode of the reason of the alarm according to the alarm causal graph, the acquired alarm log data is processed, time domain connection among network alarms is converted into space domain practice, an association analysis algorithm is used for calculating to obtain an alarm frequent item set construction edge to approximate a real topological graph of a network alarm node, an alarm association undirected graph is reduced into a directed graph with effect relations by using a graph causal algorithm based on unsupervised learning, an alarm causal knowledge graph is constructed, and the purpose of determining the reason of the alarm is achieved, thereby realizing the technical effect of accurately positioning the reason of the alarm under the condition of missing alarm log data, and the technical problem that the reason of the alarm cannot be correctly positioned due to the loss of network topology data and the manual labeling of alarm log data is solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
fig. 1 is a schematic diagram of a processing method of alarm log data according to a related art;
FIG. 2 is a technical architecture diagram of a method for processing alarm data according to an embodiment of the present application;
FIG. 3 is a schematic diagram of relationship graph root cause localization provided in accordance with an embodiment of the present application;
fig. 4 is a block diagram of a processing apparatus for alarm log data according to an embodiment of the present application;
fig. 5 is a functional analysis diagram of modules provided according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort shall fall within the protection scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
In order to solve the problems mentioned in the background technology, the method for processing the alarm log data without depending on the topology data of the real network equipment and the manual experience of experts is provided, the conversion of the alarm node data from a time domain to a space domain is completed based on the automatic calculation of mass alarm data, the derived alarm and the root cause alarm are associated together to generate an alarm knowledge graph, and the method is a new thought for analyzing the graph data under the condition of lacking the network topology data.
In accordance with an embodiment of the present application, there is provided a method embodiment for processing alarm log data, it is noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system such as a set of computer-executable instructions, and that while a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order different than presented herein.
Fig. 1 is a method for processing alarm log data according to an embodiment of the present application, and as shown in fig. 1, the method includes the following steps:
step S102, obtaining first alarm log data, wherein the first alarm log data comprises at least one of the following data: alarm log data of a base station, alarm log data of a power environment monitoring system and alarm log data of a transmission network.
The processing method of the alarm log data is divided into three aspects of a data layer, an algorithm layer and an application layer. For the data layer: firstly, acquiring alarm log data in a network, including but not limited to alarm log data in a base station, alarm log data of a power environment monitoring system and alarm log data of a transmission network; the power environment monitoring system is a three-level monitoring management structure consisting of a monitoring center, a monitoring sub-center and a monitoring station under the jurisdiction; the dynamic environment monitoring system monitors the running states of an operator network system and equipment in a monitoring range in real time, and records and processes related data. The Transmission network in the existing network mainly includes two types, namely, data Transmission based on Transmission Control Protocol/Internet Protocol (TCP/IP) and data Transmission based on Packet Detection Rule (PDR).
And step S104, classifying the first alarm log data according to a first preset rule to obtain second alarm log data.
The alarm log records the related information of all faults in the network system, including but not limited to the equipment of the fault, the type of the fault and the time of the fault; and setting a first rule to classify the acquired data and simultaneously cleaning redundant invalid data so as to further process the data in the following.
Step S106, determining a second preset rule and screening second alarm log data according to the second preset rule to obtain third alarm log data.
And setting a second rule, further screening the data classified by the first rule, and reserving the data required in subsequent analysis.
And S108, outputting an alarm causal relationship graph by using the first alarm log data and the third alarm log data, and determining the reason of the alarm according to the alarm causal relationship graph.
And drawing an alarm causal relationship graph by using the collected first alarm log data and the processed third alarm log data, and further processing the alarm causal relationship graph to determine the reason of the alarm.
Through the steps, the method for processing the alarm log data is provided, and the problem that the reason for analyzing the alarm cannot be carried out by using the map data due to no or partial topology data loss is solved by using the network alarm root cause positioning method of frequent item set composition; the method does not depend on physical topological data of real network equipment, does not need experts to manually label the data, improves the practicability of the method, and can realize the technical effect of accurately positioning the reason of the alarm.
According to another optional embodiment of the present application, the second alarm log data is classified from the first alarm log data according to a first preset rule; when classification is carried out, firstly, an alarm equipment identification field and an alarm type field in first alarm log data are spliced to obtain a first field; sorting the first alarm log data according to the data generation sequence; setting a first time interval, and splitting the sorted first alarm log data according to a plurality of first time intervals; the plurality of first fields within the first time interval constitute second alarm log data.
According to the embodiment, each alarm record in the network alarm original data is spliced according to two fields of an alarm device ID and an alarm type to be used as an alarm topology data node ID, the network alarm original data set is sorted according to the fault occurrence time, then the data set is segmented by using an adjustable time window (Timedelta), and all alarm nodes in each time window form an item set.
According to another optional embodiment of the present application, before classifying the first alarm log data according to the first preset rule to obtain the second alarm log data, the method further includes: and determining a first preset rule, limiting the alarm type field in the second alarm log data according to the first preset rule, and clearing the field of which the alarm type in the second alarm log data is flash alarm or repeat alarm or remote alarm.
According to the embodiment, all alarm items in the same time window are determined by using the manually defined rules to determine the alarm type field to be cleared, including but not limited to flash alarm, repeat alarm and remote alarm, the alarm log data is cleaned and filtered, redundant data, invalid data and data which cannot be utilized are cleared, and the influence of the redundant data, the invalid data and the data which cannot be utilized on the accuracy of the subsequently determined alarm reason is avoided.
According to still another alternative embodiment of the present application, the second alarm log data is filtered according to a second preset rule to obtain third alarm log data. Setting a first hyper-parameter and a second hyper-parameter as standard hyper-parameters, wherein the first hyper-parameter and the second hyper-parameter are used for representing the association relationship between the alarm equipment identifier and the alarm type; respectively calculating third hyper-parameters and fourth hyper-parameters of a plurality of first fields in second alarm log data through a first preset algorithm; comparing the calculated result with the standard hyper-parameter, and if the third hyper-parameter is greater than or equal to the first hyper-parameter and the fourth hyper-parameter is greater than or equal to the second hyper-parameter, reserving a first field; if the third superparameter is smaller than the first superparameter or the fourth superparameter is smaller than the second superparameter, deleting the first field; the reserved first fields are aggregated into third alarm log data.
In this embodiment, all alarm item sets use an association rule algorithm, such as a frequent item set algorithm (Apriori), a frequent pattern growth algorithm (fpgorowth), and the like, and a group of alarm frequent item sets including different numbers of items is generated by reasonably setting two hyper-parameters of support and confidence, and filtering out data whose support and confidence are lower than a threshold. The support degree and the confidence degree are used for representing the association degree of the alarm equipment and the alarm type; the support degree is a probability value of the simultaneous occurrence of the alarm equipment field and the alarm type field in a group of alarm equipment and alarm type fields; the confidence is the probability of the alarm type in a group of alarm devices and alarm type fields on the premise that the alarm device fields exist, or the probability of the alarm device fields on the premise that the alarm type fields exist.
According to some alternative embodiments of the present application, an alarm causality map is output using the first alarm log data and the third alarm log data. Respectively determining the nodes of the alarm causal graph, the weights of undirected edges and undirected edges; taking a first field in the third alarm log data as a node of the alarm causal graph; taking a connection line of the connection nodes as a non-directional edge of the alarm causal relationship graph, and taking the maximum value of the fourth hyper-parameter as the weight of the non-directional edge of the alarm causal relationship graph; and outputting an alarm causal relationship graph according to the determined nodes, the undirected edges and the weights of the undirected edges.
In some optional embodiments, data nodes in the same frequent item set are set as nodes of an alarm causal graph, all alarm node items in the same frequent item set are set as undirected edges connected in pairs, the Weight (Weight) of the edges is set as the confidence of the frequent item set, and if two alarm nodes appear in a plurality of frequent item sets, the maximum value of the confidence is taken as the Weight of the edges. The undirected edge means that the connecting line of the two nodes does not comprise a direction for indicating the connecting line; and determining good nodes by the weights of the undirected edges, namely the lengths of the undirected edges, and drawing an undirected association graph of the alarm nodes after the undirected edges and the side lengths. The method for outputting the approximate network alarm association diagram by using the alarm frequent item set is equivalent to one-time standardized dimensionality reduction operation on the original data, and the alarm root is not influenced by data quality factors such as diagram data structure loss, topology inaccuracy and the like due to the positioning result, so that the accuracy of the method is improved.
According to an alternative embodiment of the present application, before determining the cause of an alarm according to the alarm cause and effect graph, the alarm cause and effect graph is converted into an adjacent matrix, wherein the number of rows of the adjacent matrix and the number of columns of the adjacent matrix are determined by the number of nodes of the alarm cause and effect graph, and the elements of the adjacent matrix are weights of the undirected edges of the alarm cause and effect graph.
Before the reason of the alarm is determined, a directed edge with an alarm causal relationship needs to be restored from an undirected correlation diagram of the alarm node, so that a matrix needs to be used for calculation; storing alarm node association graph data generated by using a frequent item set into an N × N adjacency matrix (Adjacency matrix), wherein N is the total number of alarm nodes; the adjacency matrix is a matrix representing the connection relationship between the vertexes, and elements in the matrix are weights of non-directional edges.
According to another alternative embodiment of the application, the cause of the alarm is determined according to an alarm causal graph. Calculating an adjacency matrix through a second preset algorithm to restore a directed edge of the alarm causal relationship graph; and fusing the obtained multiple directed edges, and successfully converting the alarm causal relationship graph into an alarm causal relationship knowledge graph. When the alarm reason is determined, reversely recursing to a first root node of the alarm causal relation knowledge graph according to the alarm causal relation knowledge graph, and judging the alarm type of the first root node; if the first alarm log data comprises an alarm type field of the first root node, the alarm type of the first root node is the reason of the alarm; and if the first alarm log data does not comprise the alarm type field of the first root node, continuing to recurse reversely to a second root node of the alarm cause and effect relation knowledge graph until the reason of the alarm is determined.
According to the embodiment, the alarm node association graph data uses different graph cause and effect algorithms such as; constraint-based cause and effect discovery methods (such as a PC cause and effect algorithm), Greedy interference equivalent Search algorithms (GES), ultra-parameterized Concave point penalty Coordinate Descent methods (coordinated penalty Coordinate Descent), Noise asymmetry-based methods (ANM), data distribution asymmetry cause and effect Model-based methods (Linear Non-Gaussian Model, LINGAM) and the like. And (3) reversely recursing the root node of the directed graph in the graph, if the alarm type of the root node appears in the alarm event, the alarm is considered as the root cause, if the alarm type does not appear in the alarm event, the next level node is continuously searched, and the like until the root cause is determined. The method is realized by network equipment, and the root cause is quickly positioned while the number of alarms processed by operation and maintenance personnel is reduced. In addition, because the dependence degree on the original data format is low, the method has higher practicability, and can be quickly generalized to various fields such as: the alarm root cause positioning tasks of networks such as a wireless network, an IT network bearing network, a carrier network and the like.
In specific implementation, the processing method of the alarm log data is divided into a data layer, an algorithm layer and an application layer according to a technical architecture; the data layer adopts Hive as a data standardization frame, the core layer adopts a spark Mllib library to construct a machine learning model, an end-to-end big data analysis framework is formed, device operation logs of Terabyte (TB) scale can be subjected to offline analysis and processing, and in addition, the data processing frames of Pandas and Sciktlearn are also reserved in consideration of flexibility in a large-scale data analysis scene; finally, the data layer and the core layer form a set of uniform big data middle station; the application layer finally realizes two functions of alarm compression and root cause analysis. Fig. 2 is a technical architecture diagram according to the present embodiment, and the following eight steps are adopted to process alarm log data:
the method comprises the following steps: the data including topological relation, base station and dynamic ring relation, alarm type and alarm detail relation, etc. in the wireless network are processed to be 8 tables, 51 ten thousand. Through a plurality of steps of feature screening, feature association combination, missing value processing, abnormal value processing, classification variable processing, data type conversion and the like, the alarm topology training data are finally merged into 13 ten thousand pieces.
Step two: and splicing each alarm record in the alarm topology training data according to two fields of an alarm equipment ID and an alarm type, wherein the spliced character string is used as a constructed alarm topology data node ID.
Step three: selecting TimeDelta, and dividing the network alarm topology training data into combinations of item sets by using a time window. After repeated test and evaluation, the optimal effect of alarm association can be achieved by segmenting alarm topology training data by a 90S time window.
Step four: and performing first-level data filtering on alarm items in the same item set in the alarm topology training data set by using a manual definition rule, wherein the filtering rule comprises the following steps: flash alarm, repeat alarm and remote alarm.
Step five: a set of hyper-parameters with the highest support degree and the highest compression rate needs to be found as parameters of the generation rule. Through all 13 pieces of alarm topology training data, different parameters are selected for 120 rounds of experiments, and finally parameters with alarm time intervals of 90s, support degrees of more than 100 and confidence degrees of more than 0.5 are selected for alarm compression, so that 9400 frequent item sets are obtained, and the compression rate reaches 92%.
Step six: and constructing nodes and edges of a graph by using the equipment topology data table and the alarm frequent item set generated in the association analysis module, carrying out unsupervised learning by adopting a graph causal algorithm based on a Bayesian estimation principle, and restoring alarm causal relations from the alarm association original data graph to generate a knowledge graph of the alarm causal relations.
Step seven: and respectively calculating the alarm association undirected graphs by using different graph causal algorithm models, fusing causal directed edges restored by the different models, and finally obtaining an alarm knowledge graph containing 138 causal directed edges.
Step eight: in an application scenario, a group of root nodes of a reverse recursion directed graph in the graph is received at intervals of time windows, fig. 3 is a schematic diagram of the positioning of the root cause of the relational graph provided by the embodiment, if an alarm type of a root node occurs in an alarm event, the alarm is considered to be a root cause, if the alarm type of the root node does not occur, the next-level node is searched continuously, and the like until the alarm root cause is determined, the alarm relational graph in fig. 3 is paved with alarm devices and alarm types, and when a group of alarm information is received as an S1 interface fault alarm, a corresponding root node can be found in the graph to determine the reason of the alarm.
Fig. 4 is a block diagram of an apparatus for processing alarm log data according to an embodiment of the present application, where the apparatus includes: the obtaining module 40 is configured to obtain at least one of the following first alarm log data; alarm log data of a base station, alarm log data of a power environment monitoring system and alarm log data of a transmission network; the first classification module 42 is configured to classify the first alarm log data according to a first preset rule to obtain second alarm log data; the second classification module 44 is configured to filter the second alarm log data according to a second preset rule to obtain third alarm log data; and the processing module 46 is configured to output an alarm causal graph by using the first alarm log data and the third alarm log data, and determine a cause of an alarm.
According to the embodiment, the processing method of the alarm log data is divided into five modules, namely a data storage module, a data preprocessing module, an association analysis module, a root cause positioning module and a real-time analysis module according to functions during specific implementation; firstly, extracting a great amount of alarm log data such as base stations, moving loops, transmission and the like from the existing network and storing the alarm log data into a data storage module; and the data cleaning module, namely the data preprocessing module, imports the original data in the data storage module and cleans and denoises according to rules.
The alarm association module adopts an FP-Growth or Apriori association rule algorithm as a core algorithm for calculating the alarm association rule, and determines the support degree, the confidence coefficient and the cutting frequency of a compression rule through super-parameter optimization; and constructing nodes and edges of the graph by utilizing the alarm topological subject training data after standardization in the data preprocessing module and an alarm frequent item set generated by model training.
The root cause positioning module adopts a causal algorithm based on a Bayesian estimation principle to perform unsupervised learning, restores alarm causal relations from the alarm associated original data graph, and generates a knowledge graph of the alarm causal relations through multi-model fusion.
The real-time analysis module receives real-time alarm time window information transmitted by a network manager, the root node of a reverse recursion graph in the alarm relation graph is considered as a root cause if the alarm type of the root node appears in a real-time alarm event, and if the alarm type does not appear in the real-time alarm event, the real-time analysis module continues searching for the next-stage node. FIG. 5 is a functional analysis diagram of modules according to an embodiment of the present application.
The embodiment of the application also provides a nonvolatile storage medium, which comprises a stored program, wherein when the program runs, the equipment where the nonvolatile storage medium is located is controlled to execute the processing method of the alarm log data.
The nonvolatile storage medium stores a program for executing the following functions: acquiring first alarm log data, wherein the first alarm log data comprises at least one of the following: alarm log data of a base station, alarm log data of a power environment monitoring system and alarm log data of a transmission network; classifying the first alarm log data according to a first preset rule to obtain second alarm log data; determining a second preset rule, and screening second alarm log data according to the second preset rule to obtain third alarm log data; and outputting an alarm causal relationship graph by using the first alarm log data and the third alarm log data, and determining the reason of the alarm according to the alarm causal relationship graph.
The embodiment of the application also provides a processor, wherein the processor is used for running the program stored in the memory, and the alarm log data processing method is executed when the program runs.
The processor is used for executing the following programs: acquiring first alarm log data, wherein the first alarm log data comprises at least one of the following: alarm log data of a base station, alarm log data of a power environment monitoring system and alarm log data of a transmission network; classifying the first alarm log data according to a first preset rule to obtain second alarm log data; determining a second preset rule, and screening second alarm log data according to the second preset rule to obtain third alarm log data; and outputting an alarm causal relationship graph by using the first alarm log data and the third alarm log data, and determining the reason of the alarm according to the alarm causal relationship graph.
The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present application, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, a division of a unit may be a division of a logic function, and an actual implementation may have another division, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or may not be executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The foregoing is only a preferred embodiment of the present application and it should be noted that those skilled in the art can make several improvements and modifications without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims (10)

1. A processing method of alarm log data is characterized by comprising the following steps:
acquiring first alarm log data, wherein the first alarm log data comprises at least one of the following: alarm log data of a base station, alarm log data of a power environment monitoring system and alarm log data of a transmission network;
classifying the first alarm log data according to a first preset rule to obtain second alarm log data;
determining a second preset rule, and screening the second alarm log data according to the second preset rule to obtain third alarm log data;
and outputting an alarm causal relationship graph by using the first alarm log data and the third alarm log data, and determining the reason of the alarm according to the alarm causal relationship graph.
2. The method of claim 1, wherein classifying the first alarm log data according to a first preset rule to obtain second alarm log data comprises:
splicing an alarm device identification field and an alarm type field in the first alarm log data to obtain a first field;
sorting the first alarm log data according to the generated time;
setting a first time interval, wherein the sorted first alarm log data comprises a plurality of first time intervals;
and composing the second alarm log data according to the plurality of first fields in the first time interval.
3. The method of claim 2, wherein before the first alarm log data is classified according to a first preset rule and the second alarm log data is obtained, the method further comprises:
determining the first preset rule, wherein the first preset rule comprises at least one of the following rules: the alarm type is a flash alarm, the alarm type is a repeat alarm, or the alarm type is a remote alarm;
and clearing fields of the alarm types of the flash alarms or the repeat alarms or the remote alarms in the second alarm log data.
4. The method of claim 2, wherein determining a second preset rule and screening the second alarm log data according to the second preset rule to obtain third alarm log data comprises:
setting a first hyper-parameter and a second hyper-parameter, wherein the first hyper-parameter and the second hyper-parameter represent the association relationship between the alarm equipment identifier and the alarm type;
respectively calculating a third hyper-parameter and a fourth hyper-parameter of a plurality of first fields in the second alarm log data through a first preset algorithm;
if the third superparameter is greater than or equal to the first superparameter and the fourth superparameter is greater than or equal to the second superparameter, reserving the first field;
deleting the first field if the third superparameter is smaller than the first superparameter or the fourth superparameter is smaller than the second superparameter;
composing the third alarm log data from a plurality of the reserved first fields.
5. The method of claim 4, wherein outputting an alarm causality map using the first alarm log data and the third alarm log data comprises:
determining a first field in the third alarm log data as a node of the alarm causal graph;
determining a connecting line connecting the nodes as a non-directional edge of the alarm causal relationship graph, and taking the maximum value of the fourth hyper-parameter as the weight of the non-directional edge of the alarm causal relationship graph;
and outputting the alarm causal graph through the nodes of the alarm causal graph and the weights of the undirected edges of the alarm causal graph.
6. The method of claim 5, wherein prior to determining the cause of the alarm from the alarm cause and effect graph, the method further comprises:
converting the alarm cause and effect relationship graph into an adjacent matrix, wherein the row number of the adjacent matrix and the column number of the adjacent matrix are determined according to the nodes, and the elements of the adjacent matrix are the weights of the non-directional edges of the alarm cause and effect relationship graph.
7. The method of claim 6, wherein determining the cause of an alarm from the alarm causality map comprises:
calculating the adjacency matrix through a second preset algorithm to obtain a directed edge of the alarm causal graph;
fusing the directed edges obtained through the second preset algorithm, and converting the alarm causal relationship graph into an alarm causal relationship knowledge graph;
recursion is reversely carried out to a first root node of the alarm causal relation knowledge graph according to the alarm causal relation knowledge graph;
if the first alarm log data comprises an alarm type field of the first root node, the alarm type of the first root node is the reason of the alarm;
if the first alarm log data does not include the alarm type field of the first root node, continuing to recurse backwards to a second root node of the alarm causal relationship knowledge graph until the cause of the alarm is determined.
8. An apparatus for processing alarm log data, comprising:
an obtaining module, configured to obtain first alarm log data, where the first alarm log data includes at least one of: alarm log data of a base station, alarm log data of a power environment monitoring system and alarm log data of a transmission network;
the first classification module is used for classifying the first alarm log data according to a first preset rule to obtain second alarm log data;
the second classification module is used for determining a second preset rule and screening the second alarm log data according to the second preset rule to obtain third alarm log data;
and the processing module is used for outputting an alarm causal relationship graph by using the first alarm log data and the third alarm log data and determining the reason of the alarm.
9. A non-volatile storage medium, characterized in that the non-volatile storage medium includes a stored program, and when the program runs, the apparatus where the non-volatile storage medium is located is controlled to execute the processing method of alarm log data according to any one of claims 1 to 7.
10. A processor, characterized in that the processor is configured to run a program stored in a memory, wherein the program is configured to execute the method for processing alarm log data according to any one of claims 1 to 7 when running.
CN202210655548.4A 2022-06-10 2022-06-10 Alarm log data processing method and device and nonvolatile storage medium Pending CN115051907A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210655548.4A CN115051907A (en) 2022-06-10 2022-06-10 Alarm log data processing method and device and nonvolatile storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210655548.4A CN115051907A (en) 2022-06-10 2022-06-10 Alarm log data processing method and device and nonvolatile storage medium

Publications (1)

Publication Number Publication Date
CN115051907A true CN115051907A (en) 2022-09-13

Family

ID=83162051

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210655548.4A Pending CN115051907A (en) 2022-06-10 2022-06-10 Alarm log data processing method and device and nonvolatile storage medium

Country Status (1)

Country Link
CN (1) CN115051907A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200160230A1 (en) * 2018-11-19 2020-05-21 International Business Machines Corporation Tool-specific alerting rules based on abnormal and normal patterns obtained from history logs
CN113407507A (en) * 2018-04-23 2021-09-17 华为技术有限公司 Alarm log compression method, device and system and storage medium
CN113641526A (en) * 2021-09-01 2021-11-12 京东科技信息技术有限公司 Alarm root cause positioning method and device, electronic equipment and computer storage medium
CN113791926A (en) * 2021-09-18 2021-12-14 平安普惠企业管理有限公司 Intelligent alarm analysis method, device, equipment and storage medium
WO2022007108A1 (en) * 2020-07-07 2022-01-13 南京邮电大学 Deep learning-based network alarm positioning method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113407507A (en) * 2018-04-23 2021-09-17 华为技术有限公司 Alarm log compression method, device and system and storage medium
US20200160230A1 (en) * 2018-11-19 2020-05-21 International Business Machines Corporation Tool-specific alerting rules based on abnormal and normal patterns obtained from history logs
WO2022007108A1 (en) * 2020-07-07 2022-01-13 南京邮电大学 Deep learning-based network alarm positioning method
CN113641526A (en) * 2021-09-01 2021-11-12 京东科技信息技术有限公司 Alarm root cause positioning method and device, electronic equipment and computer storage medium
CN113791926A (en) * 2021-09-18 2021-12-14 平安普惠企业管理有限公司 Intelligent alarm analysis method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111047082A (en) Early warning method and device for equipment, storage medium and electronic device
CN112217674B (en) Alarm root cause identification method based on causal network mining and graph attention network
CN109787846A (en) A kind of 5G network service quality exception monitoring and prediction technique and system
CN112769605B (en) Heterogeneous multi-cloud operation and maintenance management method and hybrid cloud platform
CN114201374B (en) Operation and maintenance time sequence data anomaly detection method and system based on hybrid machine learning
CN113542039A (en) Method for positioning 5G network virtualization cross-layer problem through AI algorithm
CN114465874B (en) Fault prediction method, device, electronic equipment and storage medium
CN113259176B (en) Alarm event analysis method and device
CN111275136B (en) Fault prediction system based on small sample and early warning method thereof
CN111126437B (en) Abnormal group detection method based on weighted dynamic network representation learning
CN109547251A (en) A kind of operation system failure and performance prediction method based on monitoring data
CN114416423B (en) Root cause positioning method and system based on machine learning
CN117221087A (en) Alarm root cause positioning method, device and medium
CN116541782A (en) Power marketing data anomaly identification method
CN108521346B (en) Method for positioning abnormal nodes of telecommunication bearer network based on terminal data
CN114244691A (en) Video service fault positioning method and device and electronic equipment
CN115051907A (en) Alarm log data processing method and device and nonvolatile storage medium
CN116094955B (en) Operation and maintenance fault chain labeling system and method based on self-evolution network knowledge base
CN116796894A (en) Construction method of efficient deep learning weather prediction model
CN116155581A (en) Network intrusion detection method and device based on graph neural network
CN116522213A (en) Service state level classification and classification model training method and electronic equipment
CN114785617A (en) 5G network application layer anomaly detection method and system
CN113591897A (en) Method, device and equipment for detecting monitoring data abnormity and readable medium
CN115080286A (en) Method and device for discovering log exception of network equipment
CN115022916B (en) 5G communication abnormity early warning method and system based on state detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination