CN113326244B

CN113326244B - Abnormality detection method based on log event graph and association relation mining

Info

Publication number: CN113326244B
Application number: CN202110592113.5A
Authority: CN
Inventors: 陈双武; 李江明; 杨坚; 杨锋; 徐正欢; 吴枫
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2021-05-28
Filing date: 2021-05-28
Publication date: 2024-04-02
Anticipated expiration: 2041-05-28
Also published as: CN113326244A

Abstract

The invention relates to an anomaly detection method based on log event graphs and association relation mining, which is used for collecting an original log of a system to obtain log events; dividing the log events into different groups according to a set time span or task number, wherein the log events in each group form a log event sequence according to the generated time; mining system log events with related relations to each anomaly according to the association relation mining, and removing log events irrelevant to the anomaly in a log event sequence; extracting a semantic vector of each log event as a feature vector of the log event; generating a bidirectional full-connection log event diagram according to the log event sequence, updating the feature vector of each node by using a gating map neural network, carrying out weighted summation on the feature vectors updated by all nodes by using an attention network, calculating a global feature vector of the log event diagram, and finally carrying out classification detection by using the full-connection network to obtain the normal or abnormal type of the system.

Description

Abnormality detection method based on log event graph and association relation mining

Technical Field

The invention relates to the technical field of computers, in particular to an anomaly detection method based on log event graphs and association relation mining.

Background

Modern computer systems are often complex large-scale distributed software-intensive systems, such as large cloud service systems or centralized data processing and storage systems. These systems often provide various online services to up to millions of users at the same time, but once an anomaly occurs, it may cause a breakdown in the system services, resulting in a significant economic loss. Therefore, when an abnormality occurs, a quick and accurate abnormality detection mechanism is necessary to allow a system maintainer to quickly find and resolve the abnormality and to restore the system to normal as soon as possible. The log is an important component of a modern system, is a semi-structured text message, records various events in the system state and the running time, and is one of the most important data sources for abnormality detection.

Initially, system maintenance personnel detect anomalies by means of manual inspection. However, as the scale and complexity of modern systems continue to increase, the number of system logs increases rapidly, and thousands or tens of thousands of logs can be generated even a day, which makes manual inspection methods no longer viable. Thus, modern log anomaly detection systems often employ automated detection methods. The traditional method is mainly based on rules, namely, abnormal rules are manually formulated by maintenance personnel or automatically learned by a machine, and then the rules are used for matching logs to obtain the abnormality. By combining with advanced character string matching methods such as regular expressions, quite complex rules can be constructed so as to match various abnormal patterns. But consider that: 1) And (5) labor cost. In order to formulate rules for increasingly complex log sets, huge labor cost is required, and participation of field experts is also required; 2) Matching accuracy. Because the rule formulation often has great correlation with the experience of the formulator, the rules formulated by different persons often have great differences, and the rules cannot be perfectly adapted to abnormal situations in consideration of the influence of noise; 3) System isomerism. Modern large systems are often developed by a large number of developers in a distributed fashion and therefore may contain a large number of small components, for example, each developer may be responsible for one of the components. The log structure and style may vary greatly between different components, and it is difficult for rules to form an envelope and overlay for all these logs; 4) Instability of the log. With the update or repair of the system, the log set of the system may change, such as the addition of a new log, the discarding of an old log, or the updating of a modification of an old log, etc. Meanwhile, log data often generates much noise during access, transportation and processing, so that log information changes. The matching rules must therefore adapt to and accommodate the various changes in the log to avoid false positives and false negatives, which places a greater burden on maintenance personnel. Because the system update speed is very fast, untimely update can seriously affect the performance of anomaly detection; 5) Anomalies of a single log do not represent anomalies of the entire system. Since modern computer systems often have some self-checking mechanism built in, some temporary, obvious errors can often be quickly repaired, so that the occurrence of certain anomalies in a single log does not represent an anomaly in the entire system. Meanwhile, based on research-realistic system log findings, general switch anomalies are often accompanied by certain specific log sequences. Sometimes, perhaps all individual logs are in a normal state, but the occurrence of a particular sequence marks the occurrence of an exception. Thus, anomaly detection should be done for a sequence of logs rather than a single log. However, the rule matching-based anomaly detection method is mainly based on a single log, and detects a specific anomaly mainly by matching a specific pattern appearing in the log, but cannot capture various association relations among a plurality of logs, so that the method cannot be suitable for a scene of anomaly detection for a log sequence. Therefore, such rule-based anomaly detection methods are often not applicable to modern large-scale systems.

In recent years, abnormality detection is often combined with methods such as machine learning, and particularly the development of abnormality detection technology has been greatly promoted by the rise of deep learning. The current mainstream abnormality detection technologies are mainly divided into two types: 1) An anomaly detection method based on supervised learning. And training the marked training log set by using a supervised learning algorithm such as a support vector machine, a decision tree and the like to obtain a learning model, and classifying the unknown log set by using the model so as to detect abnormality. Deep learning, which has been recently developed, is largely applied to log abnormality detection, which mainly excavates time sequence relationships among log sequences by constructing a cyclic neural network, such as a long short time memory neural network (LSTM), etc., and excavates more characteristic information in the log sequences, so that log abnormality detection can be completed through log sequence relationships rather than a single log; 2) An anomaly detection method based on supervision-free learning. Any log event that deviates from some invariants of the log set is considered an exception event, such as by mining out some invariants of the log set by Invariant Mining (IM). The deep learning method also often adopts an unsupervised learning method, uses a normal data set to predict a possible log at the next moment, and considers the possible log as abnormal if the true log deviates from the predicted log. The technical methods can find out a small amount of hidden abnormal events in massive log information, and the accuracy and the detection efficiency are improved greatly.

Since most of modern systems are distributed systems, logs may be generated separately in different components and then collected, stored and analyzed uniformly, and during this process, the logs and the log sequence are very subject to various changes, the sources of which include:

1) Semi-structured nature of the log. This requires a pre-parsing step of the log text, and the effect of parsing affects the effects of log analysis and anomaly detection. Because the log analyzer cannot guarantee the complete accuracy of the analysis result, the analysis effect of the same log analyzer on log sets with different structures is also quite different, and the analysis operation of the log cannot avoid introducing noise.

2) Instability of log sets. The system log set itself may change due to the updated patching of the system, which may affect the accuracy of the detection model. While considering that the changed log may not be parsed by the log parser, this introduces more noise during the parsing stage.

3) Loss of log. Because of the huge number of logs generated by modern systems, the logs are very prone to a small number of errors and losses during the process of collection, transmission and storage. The loss of the log often comes from high-speed log transmission, so that the integrity of the log cannot be ensured; and the storage space of the system is limited, when the space is insufficient, capacity overflow often occurs, and partial log loss can be caused.

4) And (5) repeating the log. Since modern computer systems often employ various fault tolerant mechanisms, the same task may be repeatedly performed multiple times, resulting in a repeated log. At the same time, during the operation of the system itself, it is also possible that the same log information is repeatedly recorded due to some errors, which also causes a problem of repetition of the log. For some anomaly detection methods based on log count vectors, the problem of loss and duplication of logs is fatal.

5) The log sequence is out of order. In addition to the problem of running errors of the system itself, both the buffering mechanism and the concurrent running mechanism of modern systems may cause problems of log sequence misordering. Meanwhile, with the continuous upgrading and updating of the system, the execution sequence of the program can be changed, so that the problem of disorder is also caused.

6) Multiple anomaly overlap problem. In modern systems, because of the complexity of the structure and tasks, there is a degree of dependency on the various system components, which results in different anomalies that may occur simultaneously, such that the anomaly logs may overlap together, thus making construction and analysis of the log sequence very difficult. It may be that the abnormality detection model may misanalyze logs of various abnormalities as one log sequence, resulting in misdetection or omission of the abnormalities. For deep learning models based on LSTM, the loss, repetition, disorder, and multiple abnormal overlaps of log sequences can seriously affect feature propagation in the model, resulting in propagation chain breakage or error such as construction of error propagation chain, thereby resulting in reduced detection capability.

Most of the anomaly detection models today do not completely solve the above-mentioned problems, making the models difficult to use effectively in practice. In system anomaly detection, logging the state of the system as it runs and various important events is often one of the primary sources of data. Since the data volume of the system log is often very large, some automatic anomaly detection methods need to be constructed, which is usually done by a machine learning method. Meanwhile, log data has a quite large degree of unstable characteristic, and is extremely easy to change in the process of collection, storage and analysis, such as: 1) The semi-structured characteristic of the log makes it necessary to analyze the log information, and noise may be introduced in the analysis process; 2) Instability of log data, i.e., log statements may be modified, added or subtracted during a system upgrade, such that the log data changes; 3) The log is extremely easy to be lost in the process of collection, transmission and storage; 4) The log may be re-recorded during system operation; 5) The sequence order of the log may be disturbed during the running process of the system, resulting in the phenomenon of sequence disorder; 6) A multiple exception overlap problem, i.e., the logs of multiple components may be recorded together in an overlapping manner. These changes in the log set may severely anomaly the detection capabilities of the detection model.

In order to extract features, conventional methods, such as SVM and precision Trees, only count the occurrence frequency of various log events in a log event sequence, and cannot solve the problems of log loss, repetition and multi-anomaly overlapping. However, due to the unstable characteristic of the log set, the traditional machine learning method is difficult to identify the abnormality very accurately, a large number of misjudgment events are generated, and the burden of operation and maintenance personnel is increased. Deep learning-based methods often employ long and short-term memory neural network (LSTM) models, such as DeepLog, logRobust, and use sequential patterns of log event sequences to detect anomalies, which are the problems of disordered sequences, repetition, loss, multiple anomaly overlaps, and the like of log sequences that still severely interfere with the effectiveness of the detection model.

SVM：Y.Liang,Y.Zhang,H.Xiong,and R.Sahoo,“Failure prediction inibm bluegene/l event logs,”inSeventh IEEE International Confer-ence on Data Mining(ICDM 2007).IEEE,2007,pp.583–588.

Decision Trees：M.Chen,A.X.Zheng,J.Lloyd,M.I.Jordan,and E.Brewer,“Failure diagnosis using decision trees,”inInternational Conferenceon Autonomic Computing,2004.Proceedings.IEEE,2004,pp.36–43.

DeepLog：M.Du,F.Li,G.Zheng,and V.Srikumar,“Deeplog: Anomalydetection and diagnosis from system logs through deep learning,”inProceedings of the 2017ACM SIGSAC Conference on Computer andCommunications Security,2017,pp. 1285–1298.

LogRobust：X.Zhang,Y.Xu,Q.Lin,B.Qiao,H.Zhang,Y.Dang, C.Xie,X.Yang,Q.Cheng,Z.Liet al.,“Robust log-based anomaly detec-tion on unstable log data,”inProceedings of the 2019 27th ACM JointMeeting on European Software Engineering Conference and Symposiumon the Foundations of Software Engineering,2019,pp.807–817.

Disclosure of Invention

The invention solves the technical problems: the method has the advantages that the defects of the prior art are overcome, the log input sequence is ignored, the frequency of log events is considered, the problems of disordered sequence, repetition, loss, multi-exception overlapping and the like of the log sequence are effectively solved, a small amount of log loss which retains key log information can be well dealt with, and the method has high robustness to the problems of instability and multi-exception overlapping of the log information; and simultaneously, the system abnormality is accurately identified by effectively utilizing the log information.

The technical proposal of the invention is as follows: an anomaly detection method based on log event diagram and association relation mining mainly aims at utilizing the log event diagram to perform anomaly detection on multiple anomaly overlapping and unstable log sequences. Firstly, log events with association relations with different anomalies can be mined from a log sequence of the anomaly points, and logs irrelevant to the anomalies can be filtered, so that the number of logs needing to be detected is reduced, the workload is reduced, and the working efficiency is improved. And then, the log sequence is expressed in a completely undirected graph form, a graph neural network is utilized to build a model, and the characteristics of each node in the graph can be spread to other nodes in the whole graph through edges, so that the input sequence of the nodes can be ignored, the influence of the log loss and repetition on the model is small, and the instability of the log and the log sequence can be dealt with by combining the attention network.

The invention discloses an anomaly detection method based on log event graphs and association relation mining, which specifically comprises the following steps:

step 1, collecting an original log of a system from a computer system, and deleting a machine language part in log content to obtain a log event; dividing the log events into different groups according to a set time span or task number, wherein the log events in each group form a log event sequence according to the generated time;

step 2, mining system log events with correlation relation with each anomaly according to the correlation relation, and eliminating log events irrelevant to the anomaly in a log event sequence;

step 3, extracting a semantic vector of each log event as a feature vector of the log event by using a natural language processing method, and taking the semantic vector as input of a gating map neural network in step 5;

step 4, generating a bidirectional full-connection log event diagram according to the log event sequence, namely, each log event is used as a node in the diagram, and every two nodes are connected;

and 5, updating the feature vector of each node by using a gating map neural network, then carrying out weighted summation on the feature vectors updated by all nodes by using an attention network, calculating the global feature vector of the log event map, and finally carrying out classification detection by using a fully connected network, thereby obtaining the normal or abnormal type of the system.

In the step 1, for each piece of log information in the log, a Drain log parser is first used to extract a template to obtain a log template, which is also called a log event, and the same log template is used as the same log event.

And step 2, carrying out association relation mining by adopting an FP-tree algorithm.

In the step 4, the generation of the bidirectional full-connection log event map according to the log event sequence is specifically implemented as follows:

the log events are used as nodes in the graph, and each log event sequence S ₂ Are all modeled as a log event graph G _s ＝(V _s ，L _s ) Form (iv), wherein V _s Representing a collection of log event nodes in the graph; l (L) _s Is a collection of edges, i.e. nodes (v _s，i-1 ，v _s，i ) A set of connection relationships between the graph, since the log event may repeatedly occur in the log event sequenceIs unique and non-repeatable, thus adding the frequency characteristic of the log event to the edges, assigning a normalized weighting value to each edge, edge L (v _s，i-1 ，v _s，i ) The weighted value is calculated as node v _s，i Frequency/node v of (2) _s，i-1 Is a frequency of (a) is a frequency of (b). The weighted values of all edges form the adjacency matrix A of the log event graph _s 。

In the step 5, the feature vector of the node is iteratively updated by using the gate control graph neural network until convergence, and the feature vector after node update is obtained, and the update function is as follows:

wherein,each represents a node v at time t-1 and time t _(s，i) Feature vector of>Node v representing time t _s，i The vector dimension is d, i=1,..n, n is the number of nodes; a is that _s，i Adjacent matrix A representing a log event graph _s Intermediate and node v _s，i Corresponding two columns comprise an outgoing side and an incoming side, A _s ∈R ^n×2n ，H∈R ^d×2d To control the weight, z _s，i And r _s，i And respectively representing a reset gate and an update gate in the neural network of the gating map, wherein in the formula, all W with footmarks and U are weight parameters of the neural network.

In the step 5, classification detection is performed through a fully connected network, so that the system is obtained as a normal or abnormal species specifically:

adding an attention network behind the output node of the graph neural network, inputting an output vector of the graph neural network, outputting an output vector with the same size as the input vector after passing through the attention network as a weight vector, and finally performing dot multiplication on the weight vector and the output vector of the graph neural network to obtain a global feature vector, namely, the feature representation of the normal and abnormal log event sequences, wherein the formula of the attention network for calculating the final feature vector is as follows:

wherein alpha is _i Is a log event E _i S is the global feature vector,for the output of the graph neural network, q ^T ，W ₁ And c is a neural network parameter. And finally, classifying the system state by using the fully connected network.

Compared with the prior art, the invention has the advantages that:

(1) Modern computer systems are often complex large-scale distributed software-intensive systems that often provide various online services for a large number of up to millions of users at the same time, but once an anomaly occurs, it may cause a breakdown in system services, resulting in a significant economic loss. The invention provides a system anomaly detection method and device based on log event graphs and association relation mining, which uses logs generated during system operation as input, and performs anomaly detection through a graph neural network model, so that the problems of system log loss, repetition, disorder and multi-anomaly overlapping can be effectively solved. Compared with the traditional method, the method is high in accuracy and robustness.

(2) The invention aims at the problems of log sets, instability of log sequence sets and multi-anomaly overlapping, and provides a method for solving the problems by utilizing semantic vectors and a graph neural network. In addition, the invention greatly reduces the log quantity to be detected by utilizing the associated log event mining, and further enhances the robustness and the anomaly detection capability of the model by utilizing the attention network and the graph neural network.

(3) The log abnormality detection method provided by the invention can not only realize the detection of abnormal logs and abnormal log sequences with high performance, but also can process the instability problems of logs and log sequence sets, such as log loss, log repetition, log sequence disorder, multi-abnormality overlapping and the like, and can especially completely solve the problems of log disorder and repetition, thereby having high practical value.

Drawings

FIG. 1 is a flow chart of a method implementation of the present invention;

FIG. 2 is a schematic diagram illustrating the extraction of a log event template according to the present invention;

FIG. 3 is a schematic diagram of a log event diagram according to the present invention;

fig. 4 is a block diagram of the neural network of fig. 4 in the present invention.

Detailed Description

The present invention will be described in detail with reference to the accompanying drawings and examples.

In software-intensive systems, to record the operating state of a computer system, programmers often add output statements to the code to generate a system log. The system log is a semi-structured text message, which not only contains the natural language written by the programmer, but also contains the machine language for recording various states of the system. For example, for a system log: "Connection from 10.10.34.12closed", its natural language part is: "Connection from closed" and "10.10.34.12" are the machine language portion of the log.

As shown in fig. 1, a flowchart of an anomaly detection method based on log event graphs and association relation mining according to an embodiment of the present invention is provided, where the method mainly includes:

and step 1, collecting a system log from a computer system, and deleting a machine language part in log content to obtain a log event. The log events are divided into different groups according to the manually set time span or task number, and the log events in each group form a log event sequence according to the generated time.

And 2, mining system log events with related relations with each exception according to the association relation. And removing log events irrelevant to the exception in the log event sequence.

And 3, extracting a semantic vector of each log event as a feature vector of the log event by using a natural language processing method.

And 4, generating a bidirectional full-connection log event diagram according to the log event sequence, namely, each log event is used as a node in the diagram, and every two nodes are connected.

And 5, updating the feature vector of each node by using a gating map neural network, calculating the global feature vector of the log event map through an attention network, and finally carrying out classification detection through a fully connected network.

The classification detects the specific type of system normal or system abnormal.

For ease of understanding, the following detailed description of the various steps of the invention is provided.

1. System log collection and log event extraction

In the computer system, a log collecting and storing system is arranged, and logs generated by the system are collected in real time and transmitted and stored in a hard disk of the system, so that the system can be used as a data source for detecting system abnormality.

The natural language part in the log is fixed, and the machine language part can change at any time, which can interfere with the accuracy of anomaly detection. Therefore, the log is analyzed, a machine language part in the log is deleted, and the system log is extracted as a log event.

In order to perform log parsing tasks, a number of log parsing methods are proposed. According to the invention, through comparative analysis, a Drain method is adopted to carry out log analysis. The Drain method is a log analysis method based on a directed acyclic graph, and is widely used as various log processing models with high accuracy, low variance and high efficiency. Fig. 2 illustrates the log event obtained after resolution by the Drain method. The original log "packetresponse 1for block blk_38865049064139660 terminating" is extracted as log event "PackerResponder for block blk _terminating".

Some of the system log information contains task numbers, and the system log events can be divided into different groups according to the task numbers. For system logs without task numbers, log events may be grouped into different groups according to a set time window. The log events in each group are arranged according to the time sequence, and a log event sequence S can be obtained ₁ ＝{E ₁ ,E ₂ ,…,E _m E, where E _i Representing a log event.

2. Associated log event mining

Different anomalies of the system have some log events with association relation with the anomalies, and in order to eliminate the influence of the log events which are irrelevant to the anomalies on anomaly detection, and in order to separate log event chains of different anomalies which occur simultaneously, an association relation mining method is used for mining the log events which are relevant to the anomalies.

In order to perform associated log event mining, the invention adopts an FP-growth algorithm. The FP-growth algorithm is also called as an FP-tree algorithm, is an improved algorithm of a classical Apriori algorithm, and is realized by constructing a data structure such as a head table, an FP tree and the like, skillfully utilizing the tree structure, and efficiently searching the associated log event. The FP-growth calculates the frequency of each log event in each abnormal log event sequence as the degree of support of the associated log event for which the log event is abnormal. The support calculation formula of each log event is as follows:

judging whether the log event is an associated log event according to the threshold value, and considering the log event as an abnormal associated log event when the support degree exceeds the threshold value of 0.2.

For log event sequence S ₁ After the associated log event mining, a more simplified log sequence S can be obtained ₂ ＝{E ₁ ，E ₂ ，...，E _n E, where E _i Is a log event, m is the length of the log event sequence before mining, n is the length of the log event sequence after mining, and n is less than or equal to m.

3. Log event semantic vectorization

According to the method, the log event is subjected to semantic vectorization, potential semantic feature information in the event is extracted, and the log event is converted into the fixed-length semantic vector, so that convenience is provided for the subsequent detection step. The semantic vectorization operation of the present invention comprises the following three steps:

and (5) word segmentation. First, all words in the log event are screened once, and non-character words (such as numerals, operators and the like) and rest words (such as 'a', 'the') with little meaning are removed, and the rest words are separated into word segmentation. After segmentation, the log event may be separated into a list of individual segmented words.

Word segmentation vectorization. The invention uses Word2vec tools to convert each Word segment into a Word vector. Word2vec is a related model which can be used for generating Word vectors, and the models can be efficiently trained on a large-scale corpus by using a shallow neural network, and a Word can be mapped into a Word vector with a fixed length after training is completed. After vectorization, the log events are represented as a list of word vectors.

Semantic vectorization. The invention uses TF-IDF algorithm to weight and sum the obtained word vector, thereby obtaining the semantic vector of log event. TF-IDF can effectively count each word in log eventThe relative importance is measured so that different words are given different weights. If a word in a log event occurs at a high frequency in the template, then that word is highly likely to be representative of the event, and word frequency (TF) is used to measure this importance, namely: however, if a word occurs very frequently in all events, then the word does not form an effective distinction between different events, and for this purpose the inverse document frequency is used to account for this, namely:finally, the TF-IDF weights for each word vector can be expressed as: weight=word frequency (TF) ×inverse document frequency (IDF). And weighting and summing all word vectors in the log event to obtain the semantic vector of the event. The semantic vector can effectively extract potential semantic feature information of the event, and meanwhile, the semantic vector of the event can not change greatly due to the occurrence of a small change or a log event influenced by noise, so that instability of the log can be effectively dealt with. For S ₂ After semantic vectorization, each log event sequence can represent S by using a fixed-length semantic vector sequence ₃ ＝{h ₁ ,h ₂ ,…,h _n }, where h _i Is a log event E _i Is described.

4. Building log event graphs

The invention constructs a bidirectional full-connection log event graph by using log events in a log event sequence, wherein the log events are used as nodes in the graph. As shown in fig. 3, each log event sequence S ₂ Can be modeled as a log event graph G _s ＝(V _s ，L _s ) Form (iv), wherein V _s Representing a collection of log event nodes in the graph; l (L) _s Is a collection of edges, i.e. nodes (v _s，i-1 ，v _s，i ) A set of connection relationships between. Since the log event is in logThe event sequence may be repeated, but the nodes in the graph are not repeated, so that the frequency characteristic of the log event is added on the edges, a normalized weight value is assigned to each edge, and the edges L (v _s，i-1 ，v _s，i ) The weighted value is calculated as node v _s，i Frequency/node v of (2) _s，i-1 Weighting values of all edges form an adjacency matrix a of the log event graph _s As shown in fig. 3.

5. Neural network-based anomaly classification

The invention uses the graph neural network to mine the context relation among the log event sequences, thereby realizing more accurate anomaly detection. Compared with other machine learning methods, particularly the deep learning method based on long-short-time memory neural network, which is popular in recent years, has the following advantages:

(1) The input order of the nodes may be ignored. For the anomaly detection method based on LSTM, although the association relationship between nodes can be well learned, the sequence of the nodes is very important. However, due to the instability of the log event sequence, various problems may occur including duplication, loss, misordering, etc., which makes LSTM very susceptible to learning completely different patterns, thereby affecting detection accuracy. By expressing the log event sequence as the bidirectional log event graph, the method based on the log event graph can ignore the input sequence of the nodes, greatly reduce the influence of the sequence on the model and completely solve the problem of disordered sequence. Since the input order of the log event sequence does not actually affect the effect of anomaly detection much, the method can cope with various instabilities of the sequence without affecting the detection accuracy. This is the main reason for the present invention's use of the graph neural network.

(2) Semantic information of all other log events can be learned. Because the structural relevance of the graph is higher, the complex relevance among various log events can be learned, so that the model based on the graph neural network can learn various complex mutual modes among log event sequences, and the accuracy of anomaly detection is improved.

(3) The computational efficiency is higher than LSTM. The method is favorable for the maturity and sufficiency of graph research, and the calculation process of the graph neural network is optimized to a very high degree, so that the calculation efficiency is far higher than that of deep learning networks such as LSTM. This is necessary for fast training and updating of the model.

In the embodiment of the invention, the characteristic vector of the node is iteratively updated by using the gate control graph neural network to obtain the characteristic vector updated by the node until convergence, and the updating function is as follows:

wherein,each represents a node v at time t-1 and time t _(s，i) Feature vector of>Node v representing time t _s，i The vector dimension is d, i=1,..n, n is the number of nodes; a is that _s，i Adjacent matrix A representing a log event graph _s Intermediate and node v _s，i Corresponding two columns (including an outgoing side and an incoming side), A _s ∈R ^n×2n ，H∈R ^d×2d To control the weight, z _s，i Sum sigma _s，i The reset gate and the update gate in the gated graph neural network are represented, respectively.

The first equation described above is used for information propagation between different nodes under the constraints given by the matrix, in particular it extracts the eigenvectors of the neighborhood and inputs them into the graph neural network. The second and third equations (update gate and reset gate equations) determine which information to retain and discard, respectively. In the fourth formula, constructing candidate feature vectors by using the node vector, the current state and the reset gate at the previous moment; in the fifth expression, under the control of the update gate, the node feature vector of the previous time and the candidate feature vector are combined to obtain the node feature vector of the current time. All nodes repeat the updating process, and when the updating process is converged, the obtained node v _s，i Is expressed as a feature vector of (a)T is the instant of convergence. Fig. 4 intuitively shows the operation flow of the neural network of the gating map, and "s" in the subscript is omitted in the figure for convenience of description.

The attention network is used as a supplement to the graph neural network to further increase the accuracy of the detection model and to improve the robustness of the model. The impact of different log events in the log event sequence on anomaly detection is different, and more attention is paid to those important nodes by building an attention network to assign a weight to the events. Meanwhile, as the non-important node is smaller in assigned weight, the influence of the existence of the non-important node on the overall detection is also smaller, so that the loss or repetition of the non-important node is smaller in the detection of the performance, the influence of the sequence instability can be further reduced, and the robustness of the model is improved.

Specifically, a fully connected network layer is added behind the output node of the graph neural network. The input is the output vector of the graph neural network, and after the graph neural network is fully connected, the output vector with the same size as the input vector is output as the weight vector. And finally, performing dot multiplication on the weight vector and the output vector of the graph neural network to obtain a global feature vector, namely semantic representation of the normal and abnormal log event sequences. The formula for the attention network to calculate the final feature vector is:

wherein alpha is _i Is a log event E _i S is the global feature vector,for the output of the graph neural network, q ^T ，W ₁ And c is a neural network parameter.

At the end of the model, the invention uses a fully connected network to classify the system state. Therefore, the invention not only can detect whether the system has abnormality, but also can judge the type of the abnormality, thereby providing convenience for subsequent abnormality positioning.

Claims

1. The anomaly detection method based on log event graph and association relation mining is characterized by comprising the following steps:

step 1, collecting an original log of a system to obtain a log event; dividing the log events into different groups according to a set time span or task number, wherein the log events in each group form a log event sequence according to the generated time;

step 2, mining log events with related relations to each anomaly according to the association relation, and eliminating log events irrelevant to the anomaly in a log event sequence;

step 3, extracting a semantic vector of each log event as a feature vector of the log event, and taking the semantic vector as input of a gating map neural network in step 5;

step 5, updating the feature vector of each node by using a gating map neural network, then carrying out weighted summation on the feature vectors updated by all nodes by using an attention network, calculating the global feature vector of a log event map, and finally carrying out classification detection by using a fully connected network to obtain the normal or abnormal type of the system;

in the step 5, the feature vector of the node is iteratively updated by using the gate control graph neural network until convergence, and the feature vector after node update is obtained, wherein the update function is as follows:

wherein,each represents a node v at time t-1 and time t _(s,i) Feature vector of>Node v representing time t _s,i Is a candidate feature vector of (a), the vector dimension is dI=1, …, n, n is the number of nodes; a is that _s,i Adjacency matrix A representing log event graph _s Intermediate and node v _s,i Corresponding two columns comprise an outgoing side and an incoming side, A _s ∈R ^n×2n ，H∈R ^d×2d To control the weight, z _s,i And r _s,i And respectively representing a reset gate and an update gate in the neural network of the gating map, wherein in the formula, all W with footmarks and U are weight parameters of the neural network.

In the step 5, classification detection is performed through a fully connected network, and the types of the normal or abnormal system are obtained specifically as follows:

wherein alpha is _i Is a log event E _i S is the global feature vector,for the output of the graph neural network, q ^T ,W ₁ C is a neural network parameter;

and finally, classifying the system state by using the fully connected network.

2. The anomaly detection method based on log event graph and association relation mining according to claim 1, wherein: in the step 1, for each piece of log information in the log, a Drain log parser is first used to extract a template to obtain a log template, which is also called a log event, and the same log template is used as the same log event.

3. The anomaly detection method based on log event graph and association relation mining according to claim 1, wherein: and step 2, carrying out association relation mining by adopting an FP-tree algorithm.

4. The anomaly detection method based on log event graph and association relation mining according to claim 1, wherein: in the step 4, the generation of the bidirectional full-connection log event map according to the log event sequence is specifically implemented as follows:

the log events are used as nodes in the graph, and each log event sequence S ₂ Are all modeled as a log event graph G _s ＝(V _s ,L _s ) Form (iv), wherein V _s Representing a collection of log event nodes in the graph; l (L) _s Is a collection of edges, i.e. nodes (v _s,i-1 ,v _s,i ) The set of connection relations between the log event frequency characteristics are added on the edges, a normalized weighting value is assigned to each edge, and the edges L (v _s,i-1 ,v _s,i ) The weighted value is calculated as node v _s,i Frequency/node v of (2) _s,i-1 Weighting values of all edges form an adjacency matrix a of the log event graph _s 。