CN112312443A

CN112312443A - Mass alarm data processing method, system, medium, computer equipment and application

Info

Publication number: CN112312443A
Application number: CN202011088627.9A
Authority: CN
Inventors: 齐小刚; 刘美丽; 刘立芳; 冯海林
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2020-10-13
Filing date: 2020-10-13
Publication date: 2021-02-02

Abstract

The invention belongs to the technical field of network data processing, and discloses a method, a system, a medium, computer equipment and application for processing massive alarm data, wherein the method for dividing double-sliding window data which more accords with alarm characteristics is used for preprocessing original alarm data; constructing a visual relation graph between network alarms by using a sequence mode obtained by mining a Prefix span algorithm, representing causal relation between the alarms to a certain extent according to the direction of an edge determined by a timestamp, and taking the obtained relation graph as a Bayesian network; based on historical data, obtaining parameters in the Bayesian network by adopting a maximum likelihood estimation method; taking alarm information appearing in the network as evidence to carry out Bayesian inference, and taking alarms with fault inference probability greater than a certain threshold value as alarm sources; the communication network warning system and the algorithm module are reasonably combined, and a practical high-performance processing framework is utilized to process the warning information in time. The invention improves the processing efficiency and reduces the labor cost.

Description

Mass alarm data processing method, system, medium, computer equipment and application

Technical Field

The invention belongs to the technical field of network data processing, and particularly relates to a method, a system, a medium, computer equipment and application for processing massive alarm data.

Background

In recent years, with the rapid development of 5G networks, the heterogeneity and complexity of mobile communication networks in scale and topology has increased exponentially. Network operation and maintenance centers (NOCs) receive thousands of Network alarm messages each day, these alarms having different importance and domain, in which case it becomes almost impossible to manage the Network manually, and thus it becomes increasingly important to manage the Network intelligently and efficiently. How to dig out alarm knowledge which is related to the fault and can reflect the root of the fault from massive alarm data is an essential process for network management. This, of course, does not prevent the generation of excessive alarms, but allows the operation and maintenance personnel to better handle the huge amount of alarm data.

The network alarm is a notice sent by a network node when the network node encounters a network problem related to a system or weather, and alarm data provides a great reference value for a network manager to guarantee the normal service work of the network, so that the network alarm can be used for determining a potential network fault. However, due to the high degree of interconnection of network elements, network events may result in chained activation of alarms, thereby generating a large amount of alarm data within the network, with dependencies between these alarms. In order to solve the problem that the alarm amount exceeds the range that the operator can handle, various methods have been used to improve the performance of network diagnosis using alarm information. The most intuitive idea is to cluster the related alarm data into one group and then perform a detailed analysis on each group. Some researchers have introduced methods based on similarity indicators to analyze alarm data. In view of the fact that delay between alarms may lead to erroneous analysis results, Bo Yang et al propose a correlated alarm detection method (BMS-d) based on delay block matching similarity, which can quickly and efficiently estimate correlation coefficients and transmission delays between alarm variables. BMS-d can be further improved by considering the effects of false positives and false negatives. But clustering analysis of alarm data directly often ignores the timing characteristics of the alarms. To address this problem, many documents consider alarm data as a time series to study. Some scholars compare the similarity of the historical alarm sequence and the current alarm sequence to predict the impending alarm event.

Many alarm data are generated in connection with interconnections between devices, and therefore dependencies also exist between alarm data, the discovery of which is based on patterns identified in such alarm sequences that occur within a particular time frame. These alert patterns verify the correlation between events, which ultimately helps to mine the degree of correlation between events. These determined patterns may help locate the root cause of the alarm flood. In telecommunications networks, many common pattern mining methods have been proposed. Generally, these methods are based on Apriori algorithms. GSP and WINEPI are among the earliest algorithms to apply Apriori algorithm to find sequence association rules. The Prefix span algorithm is a very classical time sequence pattern mining algorithm, mainly carries out pattern mining according to a prefix projection technology, and is mainly characterized in that an original sequence set is recursively projected to a smaller projection data set according to the concept of prefixes and suffixes, and then a new sequence pattern can be obtained only by searching frequent items in each projection data set. TaherehNiyazman et al propose an improved prefix-span sequence pattern recognition algorithm to find alarm patterns.

In actual work, the source of the alarm is often found by using a sequence mode, so that the workload of operation and maintenance personnel is reduced, the operation and maintenance personnel can respond to the source of the alarm in time, and possible faults in the network are recovered as soon as possible. Most of the work is to adopt a sequence comparison mode to perform alarm tracing, Shiqi Lai et al perform early prediction on the incoming alarm flood by matching an online alarm sequence with a pattern database and performing similarity calculation, and find an alarm source from historical data and a similar pattern in the past. However, the methods based on sequence alignment are all deterministic, i.e. the probability of failure of the source of the failure it finds is 1. Moreover, these methods do not take into account the correlation between the patterns of the sequence, i.e. they consider the patterns to be independent. A bayesian network (BN, also known as a belief network) is a Directed Acyclic Graph (DAG) that can efficiently handle uncertainty problems and is the strongest of the most widely used fault analysis tools. In the Bayesian network, nodes represent random variables and all have a conditional probability distribution, and directed edges among the nodes represent an interrelation between a parent node and a child node. The bayesian network includes structure learning and parameter learning, wherein the parameter learning is based on a bayesian structure to calculate a conditional probability distribution of each node. The most important capability of BN is bayesian reasoning. Khanafer et al propose automated diagnostics in universal mobile telecommunications system network fault diagnosis using BNs. The network alarm prediction method based on Bayesian inference estimates the operation condition and development trend of the network by taking the alarm appearing in the network as evidence. However, one of the great defects of the bayesian network is high complexity, and in order to overcome the problem that the BN complexity exponentially increases with the number of nodes and the limitation, the bayesian network can be combined with case reasoning, which reduces the complexity of the bayesian network and overcomes the limitation of the bayesian network in fault diagnosis. Moreover, the relationships of the edges in the bayesian network represent causal relationships, which is important when constructing the bayesian network, and the timestamps help to determine causal relationships.

The performance of the processing framework that services the algorithm also has a large impact on the actual execution efficiency of the algorithm. An algorithm processing framework is designed, and in the framework, massive alarm data can be traced in time.

Through the above analysis, the problems and defects of the prior art are as follows:

(1) the existing sequence division technology is unreasonable, and the characteristics of the alarm sequence are difficult to be truly embodied.

(2) Existing correlation analysis ignores the correlation between sequence patterns.

(3) The fault inference method is deterministic, ignoring many uncertainty factors in the alarm system.

(4) Causal relationships between variables in a bayesian network are difficult to determine.

(5) The existing algorithm processing framework has low execution efficiency, and the instantaneity of the algorithm is difficult to guarantee.

The difficulty in solving the above problems and defects is:

(1) simply using a fixed-length sliding window technique for alarm sequence partitioning often results in results that do not meet practical requirements.

(2) The correlation between the modes is difficult to obtain in the prior art.

(3) Uncertainty factors in the alarm system make it difficult to reason for faults.

(4) Causality is not just co-occurrence on data, but also requires reliance on other attributes to better determine causality.

(5) The speed of hard disk reading and the synchronization between algorithms limit the immediacy of the algorithms.

The significance of solving the problems and the defects is as follows:

(1) and performing sequence division by using a variable length sliding window strategy.

(2) The hidden knowledge between association rules is known in depth.

(3) And (4) considering uncertainty factors in the alarm system, and performing fault inference by using a probabilistic inference method.

(4) The causal relationship is determined by using the existing attribute which can represent the causal relationship to a certain extent.

(5) A cache-based asynchronous processing framework is employed.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a method, a system, a medium, computer equipment and application for processing mass alarm data.

The invention is realized in this way, a method for processing massive alarm data, the method for processing massive alarm data comprises:

preprocessing original alarm data by using a data division method of double sliding windows;

constructing a visual relation graph between network alarms by using a sequence mode obtained by mining a Prefix span algorithm, representing causal relation between the alarms to a certain extent according to the direction of an edge determined by a timestamp, and taking the obtained relation graph as a Bayesian network;

based on historical data, obtaining parameters in the Bayesian network by adopting a maximum likelihood estimation method;

taking alarm information appearing in the network as evidence to carry out Bayesian inference, and taking alarms with fault inference probability greater than a certain threshold value as alarm sources;

the communication network warning system and the algorithm module are reasonably combined, and the warning information is processed in time by using a practical and high-performance processing framework.

Further, the data association analysis of the massive alarm data processing method applies a sequence pattern mining method to the network alarm events to finally obtain the logical relationship of each alarm event in the network, and a relationship graph is created based on the logical relationship.

Further, alarm sequence division is realized by using a double-sliding time window method;

definition 1: given an alarm sequence S ═ S, T_s,T_eIs generated in a time interval [ T ]_s,T_e]Ascending sequence of inner, S_w＝[w,t_s,t_e]Is a time window of the sequence S, where t_s＞T_s,t_e＜T_e,

t_e-t_sIs the width of the sliding window, denoted as W;

dividing the alarm event sequence into a transaction database through a sliding time window;

finding a pattern, adopting a Prefix span algorithm to mine a frequent pattern from the alarm sequence, starting to mine a sequence pattern by using a prefix with the length of 1, and searching a corresponding projection database to obtain a corresponding frequent sequence; then recursively excavating a frequent sequence corresponding to the prefix with the length of 2 until a longer prefix can not be excavated;

and (3) constructing a relation graph, converting the obtained sequence mode into the relation graph, wherein the direction is determined by the sequence of the events in the sequence mode, and the sequence is determined by the time stamp of the event.

Further, the massive alarm data processing method uses a Bayesian network as a causal model of a diagnosis process, and observation data collected from the network is used for deducing a possible fault source;

converting an alarm sequence mode into a relational graph among alarm events, wherein the relational graph is a directed graph, nodes in the graph are the alarm events and are equivalent to variables, and each variable has two states, namely present and absent, and represents the presence or absence of alarm information; the edges in the graph represent, to some extent, causal relationships of alarm events.

Further, the bayesian network reasoning and fault identification of the massive alarm data processing method comprises the following steps: the observation values c of a group of evidence variables E are given, the posterior probability distribution of a group of query variables X is calculated, and then a root fault source is diagnosed; in the Bayesian network, the evidence variable refers to a non-root node, the query variable refers to a root node, and the posterior probability is calculated by a Bayesian formula:

the posterior probability is determined by prior probability and conditional probability; and calculating the posterior probability of the root alarm according to the occurrence condition of the alarm data in a certain time period, and taking the root alarm with higher probability as a final fault source.

Further, the processing framework of the massive alarm data processing method comprises:

(1) a Redis database is adopted on the basis of a cache technology processing framework, the operation relation between alarm data cache and an algorithm is synchronized by utilizing a message queue technology, and when data in the cache needs to operate an algorithm module, the operation state of the algorithm module is increased in a message queue; when the algorithm finishes processing the data, informing the cache to empty the used data;

(2) the asynchronous processing frame is used for creating a new thread in the thread pool by the request processing thread when a task request takes effect, calling an algorithm for processing, finishing the request processing thread by the controller and returning state information which is processed by the browser; and calling back when the algorithm obtains a result, and returning the result to the front-end browser by using the techniques of websocket/mq and the like.

It is a further object of the invention to provide a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of:

constructing a visual relation graph between network alarms by using a sequence mode obtained by mining a Prefix span algorithm, representing the causal relation between the alarms to a certain extent according to the direction of an edge determined by a timestamp, and taking the obtained relation graph as a Bayesian network;

It is another object of the present invention to provide a computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:

Another object of the present invention is to provide a mass alarm data processing system for implementing the mass alarm data processing method, wherein the mass alarm data processing system comprises:

the preprocessing module is used for processing the original alarm data into a data form suitable for a sequence mining algorithm;

and the data association analysis module is used for mining association rules of the preprocessed alarm data.

And the graph building module is used for converting the mode generated by the alarm log file into a directed acyclic graph.

And the fault reasoning module is used for reasoning possible fault sources in the current alarm data.

The invention also aims to provide a 5G network terminal, and the 5G network terminal is loaded with the mass alarm data processing system.

By combining all the technical schemes, the invention has the advantages and positive effects that: in a communication network, due to the interconnection of a large number of components, a mobile network operator needs to run an operation and maintenance support system that generates a large number of alarm events. How to find potential warning sources in real time from massive warning and take remedial measures in time to ensure normal operation of the network is a serious challenge for network operators. The invention provides an idea of discovering the relation between network nodes by utilizing the time sequence of network alarms and provides a solution for analyzing alarm sources, wherein the solution comprises a method for converting alarm sequences based on double sliding time windows, constructing a relation graph between alarm events and a root alarm identification method based on a Bayesian network. The results of the invention show that by analyzing network alarms, the relationship graph shows the relationship between different network events, and the relationship graph can help network operation and maintenance personnel to find the faults in the networks. Large mobile communication providers need intelligent alarm management platforms, which can improve processing efficiency and reduce labor cost. Therefore, a new algorithm processing framework aiming at massive alarm data is provided, and timely processing of alarm information and synchronization of alarm data cache and an algorithm can be realized. Finally, the processing results are visualized using the visualization tool Echarts.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments of the present application will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained from the drawings without creative efforts.

Fig. 1 is a flowchart of a method for processing mass alarm data according to an embodiment of the present invention.

FIG. 2 is a schematic structural diagram of a system for processing mass alarm data according to an embodiment of the present invention;

in fig. 2: 1. a preprocessing module; 2. a data association analysis module; 3. a graph building module; 4. and a fault reasoning module.

Fig. 3 is a flowchart of alarm analysis provided in the embodiment of the present invention.

Fig. 4 is a schematic diagram of a 5G telecommunication network architecture provided by an embodiment of the present invention.

Fig. 5 is a schematic diagram of a sliding window method according to an embodiment of the present invention.

Fig. 6(a) is a schematic diagram of alarm time records in a certain event set in the variable step sliding window method according to the embodiment of the present invention.

Fig. 6(b) is a schematic diagram of an alarm item set obtained by the variable step size sliding window method provided in the embodiment of the present invention.

Fig. 7(a) is a schematic diagram of a prefix with a length of 1 and a suffix sequence thereof according to an embodiment of the present invention.

Fig. 7(b) is a schematic diagram of mining frequent sequences by taking as an example according to an embodiment of the present invention.

Fig. 8(a) is a schematic diagram of patterns obtained by using a sequence mining algorithm and their support degrees according to an embodiment of the present invention.

Fig. 8(b) is a relationship diagram after mode conversion according to the embodiment of the present invention.

Fig. 9 is a schematic diagram of a conditional independence relationship of a bayesian network according to an embodiment of the present invention.

Fig. 10(a) is a schematic diagram of a plot relation provided by an embodiment of the present invention.

Fig. 10(b) is a schematic diagram of a relationship diagram of an apricot green region according to an embodiment of the present invention.

Fig. 10(c) is a schematic diagram of a wanbain area relationship diagram provided by an embodiment of the present invention.

Fig. 11 is a schematic diagram of a non-pipeline framework based on a memory data caching and message queuing technique according to an embodiment of the present invention.

Fig. 12(a) is a performance diagram of a Redis database query provided by an embodiment of the present invention.

Fig. 12(b) is a performance diagram of performing Redis database query and record simultaneously according to the embodiment of the present invention.

Fig. 13(a) is a schematic performance diagram of MySQL database query according to an embodiment of the present invention.

Fig. 13(b) is a performance diagram of MySQL database query and insertion performed simultaneously according to the embodiment of the present invention.

FIG. 14 is a block diagram of an asynchronous processing framework according to an embodiment of the present invention.

Fig. 15 is a diagram of an Apache ab test method provided by an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Aiming at the problems in the prior art, the invention provides a method, a system, a medium, a computer device and an application for processing mass alarm data, and the invention is described in detail with reference to the accompanying drawings.

As shown in fig. 1, the method for processing massive alarm data provided by the present invention includes the following steps:

s101: preprocessing an original alarm sequence by using a data division method of double sliding windows;

s102: constructing a visual relation graph between network alarms by using a sequence mode obtained by mining a Prefix span algorithm, representing causal relation between the alarms to a certain extent according to the direction of an edge determined by a timestamp, and taking the obtained relation graph as a Bayesian network;

s103: based on historical data, obtaining parameters in the Bayesian network by adopting a maximum likelihood estimation method;

s104: taking alarm information appearing in the network as evidence to carry out Bayesian inference, and taking alarms with fault inference probability greater than a certain threshold value as alarm sources;

s105: the communication network warning system and the algorithm module are reasonably combined, and the warning information is processed in time by using a practical and high-performance processing framework.

Persons of ordinary skill in the art can also use other steps to implement the method for processing mass alarm data provided by the present invention, and the method for processing mass alarm data provided by the present invention in fig. 1 is only a specific embodiment.

As shown in fig. 2, the system for processing mass alarm data provided by the present invention includes:

the system comprises a preprocessing module 1, a data mining module and a data mining module, wherein the preprocessing module is used for processing original alarm data into a data form suitable for a power accumulation and mining algorithm;

and the data association analysis module 2 is used for mining association rules of the preprocessed alarm data.

And the graph building module 3 is used for converting the mode generated by the alarm log file into a directed acyclic graph.

And the fault reasoning module 4 is used for reasoning possible fault sources in the current alarm data.

The technical solution of the present invention is further described below with reference to the accompanying drawings.

The overall analysis flow of the massive alarm data processing method provided by the invention is shown in fig. 3. Alarm data comes from the physical world (equipment, etc.) and is aggregated to the NOC and recorded in an alarm database.

In the stage of data association analysis, association rule mining is carried out on the preprocessed alarm data by using a sequence mining algorithm, and finally frequent patterns in the alarm sequence are obtained, wherein the patterns are fixed forms which are identified in the alarm sequence and are related to time sequence, and are hidden knowledge in the alarm data.

The graph build phase converts the patterns generated by the alarm log file into a directed acyclic graph. Nodes in the graph represent alarm variables, and directed edges in the graph represent causal relationships between nodes to some extent. The invention regards this graph as a bayesian network, and parameters in the bayesian network can be obtained based on historical alarm data. Therefore, the present invention can find out the root cause of the alarm by using the fault inference method based on the BN, and effectively take proper action. The performance of the processing framework that services the algorithm also has a large impact on the actual execution efficiency of the algorithm. Therefore, in the system embedding stage, the invention designs an algorithm processing framework, and in the framework, massive alarm data can be traced in time.

In the invention, the original alarm sequence is preprocessed by using a data division method of double sliding windows; and constructing a visual relation graph between network alarms by using a sequence mode obtained by mining through a Prefix span algorithm, wherein the causal relation between the alarms is represented to a certain extent according to the direction of an edge determined by a time stamp, and taking the obtained relation graph as a Bayesian network. And based on historical data, obtaining parameters in the Bayesian network by adopting a maximum likelihood estimation method. Then, the Bayesian inference is carried out by taking the alarm information appearing in the network as evidence, and the alarms with the fault inference probability larger than a certain threshold value are all taken as alarm sources. And finally, the communication network alarm system and the algorithm module are reasonably combined, and the alarm information is processed in time by using a practical and high-performance processing framework.

Mobile network architecture

The infrastructure of a telecommunications network is made up of a set of components (network nodes), each of which is responsible for a particular network function. Fig. 4 summarizes the main components of a 5G network. The 5G network architecture includes three clouds corresponding to an access network, a bearer network, and a core network, respectively. The access network is a 'window' and is responsible for collecting data, and the next generation mobile network comprises a plurality of promising key wireless technologies; the carrying network is a truck and is responsible for sending data; the core network is a "management hub" that manages the data, sorts the data, and then forwards the data out. The 5G network is a multi-service, multi-access technology and multi-layer coverage system. The 5G network supports the network slicing function and provides virtual special network resources for different scenes and users. The virtualization and layering evolution of each network system realizes the unified control of the network information transmission capability. The 5G network is a more flexible, intelligent, efficient and open network system based on SDN, NFV and cloud computing technologies. Due to the introduction of the NFV/SDN architecture, control and bearing are further separated, each system is deployed on the same platform, and system intercommunication and cooperative work are more convenient. In overview, SDN is the key to connecting and forwarding clouds; the NFV replaces the forwarding cloud equipment and the network elements in the multiple control clouds with general equipment, thereby saving cost. Resource scheduling, elastic expansion and automatic management in the three clouds all depend on the cloud computing platform.

When an abnormal situation occurs, the interconnected components may trigger an alarm. Table 1 shows an example of real alarm data. Each row represents an alarm message, which includes attributes: alarm ID, alarm title, network element name, alarm level, and alarm occurrence time. In particular, the alarm ID represents a unique identifier of an alarm event, while the alarm header specifies the specific cause of the alarm and the network element name represents the network element where the alarm occurred, which can help to find the network element in question for timely maintenance. The alarm level indicates the severity of the alarm and the alarm occurrence time indicates the timestamp of the alarm trigger. This information helps the operator to better understand and address the problem.

TABLE 1 network alarm example

Second, data association analysis

A serious phenomenon in alarm systems is the alarm propagation chain, i.e. an alarm may trigger a large number of other alarms in a short time due to the connections of devices in the telecommunication network or their logical, functional dependencies. And data association analysis is a method for studying the dependencies or correlations between alarm data. In recent years, the application of sequence pattern mining becomes an important research field of alarm correlation analysis.

A. Overview

A network alarm is a message that indicates a failure or anomaly triggered by a particular node in the network at a particular time. The alarm information can help network management personnel to locate fault points, but a large amount of alarm information influences the execution efficiency. And the data correlation analysis is beneficial to compressing the alarm data and extracting information which is beneficial to decision making of network management personnel. Sequence pattern mining is a data mining technique for alarm analysis in communication networks. Before pattern mining, the original alarm information needs to be processed into a data form which can be used for sequence mining by using a double sliding window technology. Based on the time sequence characteristics of the alarm events, the invention applies a sequence pattern mining method to the network alarm events to finally obtain the logical relationship of each alarm event in the network and create a relationship graph based on the logical relationship. Thus, the problem can be solved using the knowledge about graph theory.

B. Alarm sequence partitioning

Raw alarm data often cannot be directly input into sequential pattern mining algorithms, which typically require a sliding time window approach to transform the data.

Definition 1: given an alarm sequence S ═ S, T_s,T_eIt occurs in a time interval T_s,T_e]Ascending sequence of inner, S_w＝[w,t_s,t_e]Is a time window of the sequence S, where t_s＞T_s,t_e＜T_e,

t_e-t_sIs the width of the sliding window, denoted as W.

The sequence of alarm events is partitioned into a transaction database by sliding a time window. In fig. 5, the present invention gives a simple example to illustrate the sliding window method. In this example, it is assumed that each capital letter represents a particular type of alarm information and that the original alarm records are arranged in ascending chronological order. Given a sequence of alarms A, B, …, D, C, each dashed box in FIG. 5 represents a 180s time window with a sliding step size of 60 s.

The size of the sliding window greatly affects the performance of the mining algorithm. On one hand, if the length of the sliding window is too large, some more important frequent sequences may be ignored; on the other hand, if it is too small, too many transactions are generated and the computational cost of the algorithm increases accordingly. Based on past experience, the present invention can achieve a relatively reasonable window size.

In the present invention, the alarms within each transaction set need to be further divided to obtain the alarm item set. For alarm data within a certain set of things, as shown in fig. 6(a), it is noted that the alarm occurrence time intervals are different, intuitively and easily understood: there is a high probability that alarms triggered within a short time interval will be associated with each other. Based on the idea, the method adopts a variable-step sliding window strategy to divide the data in each object set, and the method is more practical compared with the traditional fixed-length sliding window strategy. Specifically, the invention gives a maximum interval max _ interval of an alarm time in advance, and determines whether to divide two pieces of alarm information together by comparing the size relationship between the interval between the occurrence times of two consecutive pieces of alarm information and max _ interval. The present invention defines a quiet period as a period of time when no alarm event is received. Thus the variable length sliding strategy can be simply expressed as: any quiet period greater than max _ interval defines a sequence separator to enable sequence partitioning of the alert message. Thus, in the same example, as shown in fig. 6(b), 3 sets of alarm items are obtained: ABCA, BCDA, and DCEBA.

C. Pattern discovery

The frequent pattern mining problem was first posed by Agrawal et al and has now become an important area of the data mining field. In this section, the present invention exploits the well-known Prefix span algorithm to mine frequent patterns from the alarm sequence. Similar to Apriori algorithm, the prefix span algorithm also starts to mine sequence patterns from prefixes with length 1, and searches the corresponding projection database to obtain the corresponding frequent sequences. Then recursively excavates the frequent sequence corresponding to prefix of length 2, and so on until longer prefixes cannot be excavated. The Prefix span algorithm does not need to generate a candidate sequence, the projection database is reduced quickly, and the memory consumption is stable, so the effect is good when the Prefix span algorithm is used for frequent sequence pattern mining. Prefix span has great advantages over other sequence mining algorithms, such as GSP and Freesan, and is therefore a commonly used algorithm in production environments. The greatest consumption at the runtime of Prefix span is in recursively constructing the projection database. If the sequence data set is large and the number of the items is large, the running speed of the algorithm is obviously reduced. The algorithm process of Prefix span is shown as Algorithm 1.

The present invention gives a simple example to illustrate the Prefix span algorithm. In this example, the present invention assumes min _ sup of 2. In the sequence shown in fig. 7(a), the prefix of length 1 includes < a >, , < c >, < d >, < e >, < f >, < g >. The prefix is counted, and the mining cannot be continued because < g > appears only in the sequence 4, the support count is 1, and the support requirement is not satisfied. The suffix corresponding to each prefix of length 1 has been indicated in fig. 7 (a). The invention takes as an example to mine a frequent sequence, and the specific process is shown in fig. 7 (b). The suffixes of the prefix are counted first, and the 2 frequent sequences with prefixes are < (bc) >, < bd >, < be > because the suffixes , < c >, < f > do not reach the support degree threshold, so the invention recursively obtains. Then proceed the above operations with < (bc) >, < bd >, < be > as prefixes, respectively, and so on, the resulting frequent sequences are: , < ba >, < bc >, < bd >, < be >, < (bc) a >, < (bc) d >, < (bc) e >, < bda >, < (bc) da >. Note that in all the frequent sequences generated, there is sequence redundancy, i.e. some long sequences contain short sequences. Only the sequences of the longest length, i.e., < ba >, < be >, < (bc) a >, < (bc) e >, < bda >, < (bc) da > are retained. The recursive mining method for other nodes is the same as < d >.

D. Structure of relationship diagram

In order to further discover the relationship between different network alarms, the invention converts the obtained sequence mode into a relationship graph. Based on the real data, the present invention converts the sequence pattern in fig. 8(a) into the relationship diagram shown in fig. 8 (b). The direction of the edges in the graph is determined by the order of the events in the sequence pattern, which in turn depends on the timestamp at which the event occurred. Using this principle, a sequence pattern can be used to construct a relationship graph. This figure shows the relationship between different network alarms. In fact, a frequent sequence pattern represents a set of ordered events that occur often together. Although events occur in a sequential order, this is not to say that the relationships found by the sequence patterns imply causal relationships. Rather, the sequence pattern simply indicates strong co-occurrence between the items in the pattern, which is equivalent to a statistically significant relationship. A causal relationship is a partial order relationship, unlike a correlation relationship that is generally quantified by correlation coefficients. Using correlation as a causal relationship can produce many false positives, as a positive correlation between two events does not always imply a causal relationship. The causal relationship requires a deeper understanding of the causal properties of the alarm data. In practical situations, the alarm information is often propagated from a high level to a low level, which is beneficial to determining the causal relationship. In the invention, the sequence of the alarm data in each transaction set is adjusted according to the alarm level in the preprocessing stage.

Bayesian inference

A. Bayesian networks

The Bayesian Network (BN) is a probability graph model and is one of the most effective theoretical models in the fields of uncertain knowledge expression and reasoning at present. It represents a set of random variables and their conditional independence by a directed acyclic graph. In this directed acyclic graph, nodes represent variables, and edges represent dependencies or causal relationships between variables. Meanwhile, the conditional probability in the Bayesian network represents the quantitative relationship between the parent node and the child node. A bayesian network is a non-linear extension of a markov chain, which no longer defines a structure as a chain only, but both follow the markov assumption, i.e. a node depends only on its last node. The topology of the bayesian network, in combination with Conditional Probability Tables (CPTs), can implicitly specify a fully associative probability distribution for all variables. Consider a directed acyclic graph having n nodes and n random variables X₁,X₂,...,X_nAssume that node i (1. ltoreq. i. ltoreq. n) and variable X of the graph_iAnd (4) correlating. Based on conditional independence, the bayesian fully associative probability distribution can be expressed as:

wherein parent (X)_i) Representing the set of variables for all parent nodes of node i in the graph.

A bayesian network provides a compact way of representing conditional independent relationships between variables. The condition independent relationship in a bayesian network can be described as: given a parent node, a node is conditionally independent from its non-descendant nodes, which can also be illustrated by the concept of markov coverage. In a trusted bayesian network, markov coverage of a node refers to the parent, child, and parent of the child, as shown in fig. 9. Thus, the conditional independence in a bayesian network can also be explained as: given a Markov coverage of a node, this node is conditionally independent of all other nodes in the network.

BN structural modeling and parametric modeling

To realize fault diagnosis based on BN, BN structure modeling and BN parameter modeling are firstly carried out. By utilizing structural learning methods such as causal relationship, mapping algorithm and the like, the invention can establish a BN structural model for fault diagnosis. The parameter learning of the bayesian network is to calculate the conditional probability distribution of each node based on the bayesian structure, which can be obtained by a certain statistical method (e.g., maximum likelihood estimation, bayesian estimation). In our BN model, two kinds of parameters need to be determined, namely (1) the prior probability of a root node; (2) CPT of non-root nodes. We can then use the BN as a causal model of the diagnostic process, using the observation data collected in the network to infer the likely root cause of the fault.

The invention converts the alarm sequence mode into a relational graph between alarm events. In the present invention, the relationship graph is a directed graph, and the nodes in the graph are alarm events, which are equivalent to a variable. There are two states for each variable, present and absent, that indicate the presence or absence of an alarm message. The edges in the graph represent, to some extent, causal relationships of alarm events, so this graph can be considered, to some extent, as a bayesian network. The method is applied to alarm data of the Shanxi mobile communication network. The data included 2018 continuous alarm records for two months. The purpose of the experiment is to discover the relationships between network alarms based solely on the alarm data sets. Therefore, the invention does not use any network topology information or any other information of the network in this process. The present invention performs experiments on alarm data of different geographical areas to obtain a plurality of relational graphs, as shown in fig. 10(a) -10 (c), in which letters represent alarm information. The present invention needs to ensure that the number of alarms per selected area is sufficient.

Taking the alarm of the small store area as an example, the relationship diagram is further explained. As shown in fig. 10(a), the root node of the small-shop-area correspondence map: 'x', 'u', 'j','t', represented by blue nodes; the non-root nodes are 'b', 'z', 'e', 'w', 'f', 'y', 'o', 'k', 'n', 'i','m','s', 'q', represented by red nodes.

In the present invention, the present and present states of the alarm variable are represented by 0 and 1, respectively. Based on the database, the invention can obtain the parameters of the BN by using maximum likelihood estimation. Fig. 10(a) shows prior probabilities of corresponding root nodes (i.e., query variables) in table 2, and conditional probabilities of non-root nodes (i.e., evidence variables) in table 3.

TABLE 2 Prior probability tables for query variables

TABLE 3 conditional probability table of evidence variables

(a)

	x＝0	x＝1
			b＝0	0.8	0.77
b＝1	0.2	0.23

(b)

	x＝0	x＝1
			e＝0	0.76	0.74
e＝1	0.24	0.25

(c)

	e＝0	e＝1
			f＝0	0.95	0.92
f＝1	0.05	0.08

(d)

(e)

	h＝0	h＝1
			i＝0	1	0.16
i＝1	0	0.84

(f)

	j＝0，o＝0	j＝0，o＝1	j＝1，o＝0	j＝1，o＝1
					k＝0	0.95	0.74	0.24	0.23
k＝1	0.05	0.26	0.76	0.77

(g)

	j＝0	j＝1
			m＝0	0.94	0.18
m＝1	0.06	0.82

(h)

	i＝0	i＝1
			o＝0	0.94	0.9
o＝1	0.06	0.1

(i)

	s＝0	s＝1
			q＝0	0.98	0.44
q＝1	0.02	0.56

(j)

	t＝0	t＝1
			s＝0	0.98	0.25
s＝1	0.02	0.75

(k)

	u＝0，x＝0	u＝0，x＝1	u＝1，x＝0	u＝1，x＝1
					w＝0	0.95	0.91	0.15	0.1
w＝1	0.05	0.09	0.85	0.9

(l)

	o＝0，w＝0	o＝0，w＝1	o＝1，w＝0	o＝1，w＝1
					y＝0	0.99	0.71	0.98	0.41
y＝1	0.01	0.29	0.02	0.59

(m)

	x＝0	x＝1
			z＝0	0.86	0.86
z＝1	0.14	0.14

So far, the invention obtains a complete Bayesian network based on the small-store alarm data. The Bayesian network can be used as a model of the fault diagnosis process to realize the reasoning of the fault source.

C. Bayesian network inference and fault identification

The tasks of the bayesian network-based fault diagnosis are: given a set of observations c of evidence variable E (i.e. known network internal failure conditions), a posterior probability distribution of a set of query variables X is calculated, i.e. root fault sources are diagnosed. In the present invention, the evidence variable refers to a non-root node, and the query variable is a root node. The posterior probability can be calculated by a Bayesian formula:

as can be seen from equation (1), the posterior probability is determined by the prior probability and the conditional probability.

The invention can calculate the posterior probability of the root alarm (query variable) according to the occurrence condition of the alarm data in a certain time period, and the root alarms with higher probability are all used as final fault sources. And fault reasoning is carried out by taking a small store area as an example.

And fault reasoning is carried out by taking a small store area as an example. Based on the observed evidence, the probability that the root variable is in different states can be calculated by utilizing a Bayesian formula.

Case 1: within a certain time period, the alarm information w and e appears, i.e. its variable state is 1.

And (4) analyzing results: the root alarm that causes "alarm information w and e to appear" is: and (6) alarming u. This shows that the occurrence of an alarm u more easily causes the occurrence of an alarm w than an alarm x, which is also reflected in the conditional probability table: p (w-1 | u-1, x-1) 0.09, P (w-1 | u-1, x-0) 0.85. The alarm information includes attributes such as network element name, alarm title, etc., and can further determine detailed information about the fault source.

Case 2: within a certain time period, alarm information m and k appear.

And (4) analyzing results: the root alarm that causes "alarm information m and k to appear" is: and (5) alarming j. This is intuitively easily understood, and the presence of alarm j more easily leads to the observation of alarms m and k.

The alarm data of 7 months and 8 days are used for carrying out real fault tracing, different fault probabilities can be generated for the same alarm, and the average value, the maximum value and the minimum value of the fault probabilities are selected as final results as shown in a table 4.

TABLE 4 root alarm failure probability

(a)

	Average	Maximum of	Minimum size
				x＝0	0.89	0.9	0.85
x＝1	0.11	0.15	0.1

(b)

	Average	Maximum of	Minimum size
				u＝0	0.92	0.98	0.34
u＝1	0.08	0.66	0.02

(c)

	Average	Maximum of	Minimum size
				j＝0	0.97	1	0.07
j＝1	0.03	0.93	0

(d)

	Average	Maximum of	Minimum size
				t＝0	0.96	0.98	0.25
t＝1	0.04	0.75	0.02

As can be seen from table 4, the probability of occurrence of alarm x is relatively stable and relatively high compared to the average probability of occurrence of the other three alarms. The occurrence probability of the alarm j fluctuates greatly, the maximum value can be 0.93, and the minimum value is 0. In practice, due to the sparsity of the same alarm data, the probability of alarm occurrence is small, which is consistent with the result obtained by the present invention. Generally, the present invention takes alarms with an average probability of occurrence greater than 0.1 as the ultimate alarm source. In the results of the present invention, x is considered as the ultimate source of the alarm.

Fourth, process the frame

The communication network alarm tracing comprises two aspects, namely an alarm correlation and fault tracing correlation algorithm on one hand and a processing framework of the algorithm on the other hand. Large mobile communication providers need intelligent alarm management platforms. The communication network alarm system needs to be reasonably combined with the traceability algorithm module, and alarm information is processed in time by using a practical and high-performance processing framework, so that the effectiveness of the algorithm depends on a processing framework to a great extent, and the processing framework accounts for an important part of the whole communication network alarm traceability work. With the acceleration of the 5G construction pace, the number of alarm information increases exponentially due to the sharp increase of the number of base stations, how to quickly process alarm data, and managing and tracing the alarm data is also one of the key technologies for the 5G to better serve the public. In the face of massive alarms, the traditional processing framework cannot meet the requirements of the 5G era. The traditional processing mode is a linear pipeline framework and needs to be carried out in sequence according to steps. In short, in this framework, the generated alarm data needs to be stored in a database, and a formatted data file needs to be generated. The algorithm related to data processing and alarm needs to read and process data, and the final result needs to be synchronized into a visual interface. Such a processing framework is often limited by hard disk read-write rates, database performance, and formatted file processing middleware performance (e.g., Apache POI).

(1) Processing framework based on caching technology

In the case of a large number of alarms generated by the alarm center, the time consumption of the algorithm is also an important influence factor of the instantaneity of the system. Aiming at the alarm data processing mode provided by the invention, the traditional linear pipeline processing framework is not applicable any more. By using the memory database, a data caching mechanism is easily implemented, the alarm data is received through the memory database, and simultaneously, in another thread, the data is recorded into the local database. And when the data number meeting the algorithm parameters is collected by the memory data, the data number is used as algorithm input, and the cache data in the memory database is emptied after the algorithm is finished. The specific flow is shown in fig. 11.

The performance of the alarm processing framework is determined by the execution rate of the database, and if the data reading and storing operation frequency is lower than the frequency of sending alarm data by the alarm center, the system cannot process the newly discharged alarm data in real time. As the running time of the system increases, the alarm data processing result is probably not consistent with the current state of the system, and the meaning of the whole system is lost in the case. And the alarm data needs to be accumulated to a certain scale to execute the algorithm. Therefore, it is important to select a suitable database for data caching between alarm data and algorithm inputs. In order to implement a real-time processing framework, the present invention compares the databases Redis and MySQL. By testing the set and get commands of the Redis, the invention finds that the read performance and the write performance of the Redis can respectively reach 20000row/s and 15000row/s, and the average statement execution time is within milliseconds, as shown in FIG. 12.

As shown in FIG. 12(a), for the MySQL database, in read-only mode, the TPS of the select statement can reach 6000 row/s. When reading and writing are performed simultaneously, the TPS of the select statement is severely reduced to 3500row/s, as shown in FIG. 12 (b). Therefore, many systems adopt a read-write sub-table mode to improve the operation efficiency of the MySQL database. However, due to the difference of the storage media, the performance improvement effect of the optimization mode has limitation. And the average statement execution time of the database MySQL is far larger than Reids. In summary, MySQL is far less than Redis, regardless of read-write performance or average execution time.

Redis is a single-process single-thread mode, which converts concurrent access into serial access in a queue mode, and the mode does not affect the performance of the parallel access, but becomes faster due to the design. In this simple design mode, Redis has no notion of a contention lock. Because Redis mainly operates memory data, and hash positioning is adopted during positioning data, the hard disk read-write bottleneck can not be met during writing and inquiring, in addition, single command is basically completed instantly, and the blocking condition caused by serialization execution can not occur. The simple design mode not only improves the performance of Redis, but also is simpler to realize.

Because alarm data association and failure node tracing belong to time-consuming tasks, synchronization between running tasks is critical in such a non-pipelined mode. The invention adopts the message queue to synchronize the operation relationship between the alarm data cache and the algorithm. Specifically, when the algorithm module needs to be operated by data in the cache, the operation state of the algorithm module is added in the message queue; when the algorithm finishes processing the data, the cache is informed to empty the used data.

(2) Asynchronous processing framework

In a synchronous processing framework, an interaction mode of an algorithm module and a Web container is as follows: the user sends an http/https request to the Web container through the browser, and the controller in the Wed container processes the request and invokes the associated algorithm. When the algorithm module is in data reading and executing the algorithm, the controller in the Web container is in a blocking state. Since the algorithm execution process is related to data and cannot guarantee execution time, "http request timeout" may occur, resulting in failure of the whole process.

Therefore, the present invention solves the above problems using asynchronous processing techniques. As shown in fig. 14, when a task request takes effect, the request processing thread creates a new thread in the thread pool and calls the algorithm to process, and the controller ends the request processing thread and returns the state information being processed by the browser. And calling back when the algorithm obtains a result, and returning the result to the front-end browser by using the techniques of websocket/mq and the like. If the running time of the algorithm exceeds the average processing time, the processing of subsequent data cannot be influenced, so that the stable running of the whole system is guaranteed.

In order to illustrate the importance of the asynchronous processing technology in the alarm tracing framework of the communication network, the httpd _ tools of Apache is used for testing 2 asynchronous processing frameworks tornado, snaic and 2 non-asynchronous processing frameworks Django and Flask. Without loss of generality, the request amount per second, the unit request time and the transmission rate are taken as performance indexes. For the PC of the present invention (Intel E3-1246V3 CPU,64bit, RAM 16GB), the test method is shown in FIG. 15.

TABLE 5 requests per second (#)

TABLE 6 Unit request time (ms)

TABLE 7 Transmission Rate (kb/s)

The experimental results are shown in tables 5 to 7, and it can be seen that the asynchronous processing framework Snaic requests can reach 2600 times per second, Tornado has similar performance, and flash adopting the synchronous processing framework only has 1400 times, and Django only has 25 times, which hardly meets the actual performance requirements. In the transmission rate performance results, Tornado performed best in all processing frames, and as the concurrency level increased, the performance of both asynchronous processing frames leveled off. It is clear from the test results that the asynchronous processing framework has good performance in both the frequent request test and the transmission rate test. For the communication network alarm tracing application, because the alarm information output request is frequent and the data volume is large, the execution efficiency of the algorithm can be effectively improved by utilizing the asynchronous processing frame, and the processing time consumption of the whole tracing process is reduced.

(3) Data visualization

The processing results can be visualized by using the open source data analysis Echarts framework. ECharts is a free, powerful library of charts and visualizations that provides a simple way to add intuitive, interactive, and highly customizable charts to a commercial product. It is written in pure JavaScript, which is a brand-new lightweight canvas library based on zrender.

By using the open source data analysis Echarts framework, the Redis database and the Tornado asynchronous processing framework, the invention obtains the communication network alarm traceability system and integrates the communication network alarm traceability system into the communication network management application.

As telecommunications networks continue to increase in size and complexity, network management becomes more complex. When a network fails, the NOC system receives thousands of network alarms. The network operator must react quickly to locate and repair the failure. The invention provides a method for searching the relation between network alarms based on frequent alarm sequence mining. The obtained relation graph can help operation and maintenance personnel to quickly locate the alarm source, and the efficiency is improved. In addition, the invention also introduces a real-time alarm data analysis platform developed for telecommunication network service providers, which helps operators to improve the design of alarm management systems thereof. The invention can improve the analysis efficiency by using a clustering technology, and the structure of the corresponding relation graph can be changed. The modes can also be applied to trend analysis of alarms and take precautionary measures in advance.

It should be noted that the embodiments of the present invention can be realized by hardware, software, or a combination of software and hardware. The hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the apparatus and methods described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided on a carrier medium such as a disk, CD-or DVD-ROM, programmable memory such as read only memory (firmware), or a data carrier such as an optical or electronic signal carrier, for example. The apparatus and its modules of the present invention may be implemented by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., or by software executed by various types of processors, or by a combination of hardware circuits and software, e.g., firmware.

The above description is only for the purpose of illustrating the present invention and the appended claims are not to be construed as limiting the scope of the invention, which is intended to cover all modifications, equivalents and improvements that are within the spirit and scope of the invention as defined by the appended claims.

Claims

1. A method for processing massive alarm data is characterized by comprising the following steps:

preprocessing original alarm data by using a double-sliding-window data division method which is more in line with alarm characteristics; constructing a visual relation graph between network alarms by using a sequence mode obtained by mining a Prefix span algorithm, representing causal relation between the alarms to a certain extent according to the direction of an edge determined by a timestamp, and taking the obtained relation graph as a Bayesian network;

2. The method for processing mass alarm data according to claim 1, wherein the data association analysis of the method for processing mass alarm data applies a sequential pattern mining method to the network alarm events, and finally obtains the logical relationship of each alarm event in the network, and creates a relationship graph based on the logical relationship.

3. The method for processing mass alarm data according to claim 2, wherein the division of the alarm sequence is realized by using a double sliding time window method;

definition 1: given an alarm sequence S ═ S, T_s,T_eIs generated at a time interval [ T }_s,T_e]Ascending sequence of inner, S_w＝[w,t_s,t_e]Is a time window of the sequence S, where t_s＞T_s,t_e＜T_e,

t_e-t_sIs the width of the sliding window, denoted as W;

dividing the alarm event sequence into an alarm object set database through a sliding time window;

dividing the alarm sequence into a transaction set database and then further dividing an item set, wherein the item set is divided according to the principle that alarm data which are similar to the alarm data generated at the same moment are divided together and a sliding window changing strategy is adopted;

finding a mode, adopting a Prefix span algorithm to mine a frequent mode from an alarm sequence, beginning to mine a sequence mode from a prefix with the length of 1, and searching a corresponding projection database to obtain a corresponding frequent sequence; then recursively excavating a frequent sequence corresponding to the prefix with the length of 2 until a longer prefix can not be excavated;

and (3) constructing a relation graph, converting the obtained sequence mode into the relation graph, wherein the direction is determined by the sequence of the events in the sequence mode, and the sequence depends on the time stamp of the event occurrence.

4. The method for processing mass alarm data according to claim 1, wherein a bayesian network is used as a causal model of a diagnostic process, and observation data collected from the network is used to infer a possible root cause of a fault;

5. The mass alarm data processing method of claim 4, wherein the Bayesian network inference and fault identification of the mass alarm data processing method comprises: the observation values c of a group of evidence variables E are given, the posterior probability distribution of a group of query variables X is calculated, and then a root fault source is diagnosed; in the Bayesian network, the evidence variable refers to a non-root node, the query variable refers to a root node, and the posterior probability is calculated by a Bayesian formula:

the posterior probability is determined by prior probability and conditional probability; and calculating the posterior probability of the root alarm according to the occurrence condition of the alarm data in a certain time period, and taking the root alarm with higher fault probability as a fault source.

6. The method for processing mass alarm data according to claim 1, wherein the processing framework of the method for processing mass alarm data comprises:

(1) a Redis database is adopted on the basis of a cache technology processing framework, the operation relation between alarm data cache and an algorithm is synchronized by utilizing a message queue technology, and when data in the cache needs to operate the algorithm module, the operation state of the algorithm module is increased in a message queue; when the algorithm finishes processing the data, informing the cache to empty the used data;

7. A computer device, characterized in that the computer device comprises a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to carry out the steps of:

8. A computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:

9. A mass alarm data processing system for implementing the mass alarm data processing method of any one of claims 1 to 6, wherein the mass alarm data processing system comprises:

the data association analysis module is used for mining association rules of the preprocessed alarm data;

the graph building module is used for converting the mode generated by the alarm log file into a directed acyclic graph;

10. A 5G network terminal, wherein the 5G network terminal is equipped with the mass alarm data processing system of claim 9.