CN113778733B

CN113778733B - Log sequence anomaly detection method based on multi-scale MASS

Info

Publication number: CN113778733B
Application number: CN202111014950.6A
Authority: CN
Inventors: 曹志英; 王雪洁; 张秀国; 王乔正
Original assignee: Dalian Maritime University
Current assignee: Dalian Maritime University
Priority date: 2021-08-31
Filing date: 2021-08-31
Publication date: 2024-03-15
Anticipated expiration: 2041-08-31
Also published as: CN113778733A

Abstract

The invention provides a log sequence anomaly detection method based on multi-scale MASS, which relates to the technical field of log sequence anomaly detection and comprises a training stage and a detection stage. Firstly, extracting a structured log key from an unstructured log; secondly, in order to capture the context dependence of different scales in the log sequence and improve the capability of acquiring log context information, the invention improves the Attention mechanism of the MASS model and replaces the Attention mechanism with multi-scale Attention; and finally, inputting the normal log sequence into an improved MASS model to learn the normal mode, and detecting the abnormality of the log sequence by using a Mask mechanism of the normal log sequence. Experimental results on two log data sets show that the method is superior to the existing most of anomaly detection methods based on log sequences.

Description

Log sequence anomaly detection method based on multi-scale MASS

Technical Field

The invention relates to the technical field of log sequence anomaly detection, in particular to a log sequence anomaly detection method based on multi-scale MASS.

Background

The system log records detailed system running information, is an important data source for monitoring performance, knowing system states and detecting abnormality, and is vital for preventing system faults by enabling developers and engineers to monitor the system and analyze abnormal behaviors and errors through the log.

At present, a plurality of methods based on anomaly detection of log sequences have been proposed at home and abroad, and can be mainly divided into a method based on association rules and probability statistics, a method based on a traditional machine learning algorithm and a deep learning algorithm, and a method based on natural language processing.

In terms of association rules and probability statistics, bao uses source code to analyze the logical relationship between log information, parse the execution log into a sequence of events, and generate an ordered list of all possible subsequences by constructing a suffix array of events. Although this method has high efficiency, it is not easy to acquire codes in most cases. Beehive automatically mines association rules from log data, and clusters by using features in an unsupervised manner so as to identify suspicious host behaviors reported as abnormal events. The method has the advantages that high accuracy can be achieved, but only predefined abnormal scenes can be identified, and a large amount of manual intervention is required.

In the aspect of the traditional machine learning and deep learning algorithm, log Cluster is used for identifying log problems by using log clustering, and each log sequence is represented by a vector; then, the similarity between the two log sequences is calculated, and the similar log sequences are classified into one type by using a clustering technology. LogCluster, while improving efficiency, is too coarse in strategy. LogRobust uses the attention-based Bi-LSTM model to detect anomalies, and while the model is Bi-directional, it does not have the ability to adequately capture context information in the log sequence.

In terms of natural language processing, most of the existing researches are to extract semantic vectors through a natural language processing model, and few researches can directly apply the natural language processing model to an abnormality detection stage. The LogBert trains the log sequence in the session window by using a Mask mechanism of the Bert model, and predicts the abnormality deviating from the normal log sequence in the potential mode by comparing the values before and after the Mask. On the one hand, although the LogBert detection method is novel and has high precision, the method randomly selects the token to perform Mask, but not Mask continuous sequences, so that the modeling capability of the sequences cannot be enhanced; on the other hand, the Attention mechanism in the Bert model can only acquire the global information of the log sequence, and cannot learn the context local information of different scales.

At present, few researches on log sequence anomaly detection based on a natural language processing method are carried out, and most of the researches use a deep learning model to carry out anomaly detection on log sequences, and the researches have the following defects:

(1) Most prediction-based log anomaly detection models are trained to predict the next log key from the previous ones, thereby capturing the pattern of the normal sequence and the correlation between the log keys in the normal sequence. The model can not accurately predict the log key violating the correlation in the log sequence based on the existing log key, but also capture the context information in a unidirectional way in a capture mode of the sequence relation, so that the context information can not be considered only in the prediction process, and the accuracy of the prediction is reduced.

(2) Most of current log anomaly detection models only learn global information of log sequences, and do not take global information and local information of the sequences as consideration factors of log sequence anomaly detection at the same time.

Disclosure of Invention

According to the technical problems, a log sequence anomaly detection method based on multi-scale MASS is provided. Considering that when abnormality detection is performed based on a log sequence, it is important to acquire context information with different scales and complete, the method of the invention learns normal log sequence relations by using a Multi-scale MASS pre-training language model (MSMASS), and improves the Attention mechanism of MASS into a Multi-scale Attention (MSAttention). The MSMASS model not only can acquire the context information of each log in the log sequence, but also can acquire global and local information simultaneously by focusing on the context of different scales instead of all the heads of the MSMASS model for acquiring the correlation between global information; meanwhile, the MSMASS model adopts a continuous Mask log key mode, so that the sequence modeling capability can be fully enhanced; and the Encoder-Decoder structure can fully learn the context information of the log sequence, and improves the accuracy of predicting the log key, thereby improving the accuracy of detecting the abnormality of the log sequence.

The invention adopts the following technical means:

the log sequence anomaly detection method based on the multi-scale MASS comprises a training stage and a detection stage, and specifically comprises the following steps:

s1, extracting a structured log key from an unstructured log;

s2, improving an attribute mechanism of the MASS model, replacing the attribute mechanism with multi-scale attribute to obtain an MSMASS model, and training the MSMASS model;

s3, inputting the normal log sequence into the MSMASS model to learn the normal mode, and detecting the abnormality of the log sequence by using a Mask mechanism of the normal log sequence.

Further, the specific implementation process of the step S1 is as follows:

constructing an analysis tree with fixed depth based on log data, carrying out log analysis according to template extraction rules contained in the tree, extracting structured log keys from unstructured log events, carrying out classified statistics on all log keys in a data set, numbering, enabling different types of log keys to have different numbers, and setting K= { K ₁ ,k ₂ ,...,k _n And is a collection of categories of log keys in the dataset.

Further, the multi-scale Attention in step S2 acquires context information of different scales at each layer, focuses on local dependence in the shallow layer, balances the relationship between the local dependence and the global dependence in the deep layer, and acquires effective context information.

Further, in the step S2, the training process of the MSMASS model includes:

s21, giving a normal log sequence set χ;

s22, taking an unprocessed normal log sequence x epsilon chi in the set chi, if the unprocessed normal log sequence x epsilon chi can be obtained, framing a first segment of the sequence x by a sliding window with a fixed size, otherwise, ending training;

s23, inputting the log sequence segment into an MSMASS model, and continuously inputting a half of log keys at the middle position of the Mask into an Encoder;

s24, acquiring context dependencies of different scales in the log sequence fragment based on an improved multi-scale Attention mechanism;

s25, shifting the real log key sequence which is dropped by the Mask one bit backwards and inputting the real log key sequence into a Decoder; at the input of the Decoder, each position can only see the information before the current position, and the information of the subsequent positions is invisible;

s26, taking a real log key sequence which is dropped by a Mask as an output target training model of the Decoder;

s27, judging whether an unframed log key exists in the normal log sequence, if so, moving the next segment of the sliding window frame log sequence x with the step length of 1 to go to the step S23, otherwise, stopping training the sequence to go to the step S22.

Further, the specific implementation process of the step S23 is as follows:

the sequence segment from position u to position v of sequence x is denoted as x ^u:v Sequence x the sequence after Mask from position u to position v is noted as x ^\u:v The number of the log keys of the log sequence x is v-u+1, wherein u is more than 0 and v is more than m; by special symbolsReplace each log key masked and ensure x ^\u:v The same sequence length as x.

Further, the specific implementation process of the step S24 is as follows:

given a vector sequence of normal log sequences of length NAssuming that the dimension of the ith head is ω, the self-attention of the ith head at the jth position can be expressed as:

C _ij (x,ω)＝[x _i,j-ω ,...,x _i,j+ω ]

Q＝HW ^Q ,K＝HW ^K ,V＝HW ^V

wherein i represents the ith head, 1.ltoreq.i.ltoreq.A, j represents the jth position, 1.ltoreq.j.ltoreq.N, D represents the dimension of the vector,q, K, V respectively represent a query value matrix, a key value matrix, a value matrix, C _ij A function representing the extraction context of the ith header at the jth position, W ^Q 、W ^K 、W ^V Representing a learnable parameter, ω representing a scale for controlling the working range;

in MSAttention, each header works on one of the scales of the log sequence, different headers can work on the same scale. Let a heads be provided, the working scale of each head is different, i.e. the scale set is Ω= [ ω ] ₁ ,ω ₂ ,...,ω _A ]The improved multi-scale Attention mechanism can be expressed as:

head(H,ω _i ) _i ＝(head(H,ω _i ) _i,1 ,head(H,ω _i ) _i,2 ,...,head(H,ω _i ) _i,N )

MSAttention(H,Ω)＝[head(H,ω ₁ ) ₁ ；...；head(H,ω _A ) _A ]W ^O

wherein head (H, omega _i ) _i Expressed at a scale omega _i In the case of (1) the self-attention of the ith head at all positions, W ^O Representing a parameter matrix; in the MSMASS model with the multi-scale Attention mechanism layer number of L, in order to acquire context information of different scales at each layer, a scale allocation scheme mentioned in multi-scale self-Attention based on text classification, i.e., Ω= [1,3, the term, N]Determining the number of scales according to the number of heads and the principle that the same amount of attention is paid to each scale in the last layer; in the case of determining a set of scales, a different number of heads are allocated for different scales in each layer, the allocation scheme using n ^l ＝{hnum ₁ ,hnum ₂ ,...,hnum _A [ MEANS FOR SOLVING ] hnum _A Expressed as the dimension omega _A The more heads are allocated, the more attention is paid, and the formula is as follows:

n ^l ＝softmax(z ^l )·A

wherein, |Ω| represents the number of different scales,assigning a scaling factor, z, representing the number of heads determined for the kth scale of the first layer ^l Coefficient vector representing head number distribution ratio determined by each scale of the first layer, softmax (z ^l ) A head number proportion distribution function representing each scale in the first layer, n ^l Representing the head number allocation scheme of each scale in the first layer.

Further, the specific implementation process of the step S26 is as follows:

the MSMASS model is implemented by inputting the sequence x ^\u:v Predicted sequence fragment x ^u:v The model is trained by selecting a log-likelihood function as an objective function, and the formula is as follows:

wherein θ represents the learning rate, P (x) ^u:v |x ^\u:v The method comprises the steps of carrying out a first treatment on the surface of the θ) represents the conditional probability, which the MSMASS model estimates by learning the parameter θ;is obtained by further factorization of conditional probabilities according to the chain law, x _＜t Representing the log key before position t.

Further, the specific implementation process of the step S3 is as follows:

s31, framing the log sequence fragments with a sliding window with a fixed size;

s32, inputting the log sequence segments into a trained MSMASS model, and continuously performing half log keys at the middle position of the Mask;

s33, acquiring context dependencies of different scales in the log sequence fragment based on an improved multi-scale Attention mechanism;

s34, carrying out probability prediction on the log key subjected to Mask;

s35, judging whether the real log key is in Top-K log keys of the prediction result, if yes, taking the real log key as the output of a Decoder, turning to the step S36, otherwise, judging that the sequence segment is an abnormal log sequence segment, and the log sequence in which the segment is positioned is an abnormal log sequence, and ending detection;

s36, judging whether a log key which is not predicted exists, if so, turning to a step S34, otherwise, judging that the sequence segment is a normal log sequence segment;

s37, judging whether the log sequence has an unframed log key, if so, continuing to move the sliding window with the step length of 1, turning to the step S31, otherwise, judging that the sequence is a normal log sequence, and ending detection.

Further, the specific implementation process of the step S34 is as follows:

and transmitting the context embedded vector of the log key at a position, which is subjected to Mask, in the log sequence fragment to a softMax function, and predicting the probability distribution of the log key possibly appearing at the position.

Compared with the prior art, the invention has the following advantages:

1. the log sequence abnormality detection method based on the multi-scale MASS provided by the invention fuses the advantages of continuously shielding log keys and carrying out efficient prediction into log sequence abnormality detection. The invention replaces the multi-head self-attention mechanism with the multi-scale multi-head self-attention mechanism, introduces locality into the model, improves the MASS model, and has stronger capability of acquiring context information.

2. According to the log sequence anomaly detection method based on the Multi-scale MASS, in consideration of the fact that obtaining complete context information with different scales is important when anomaly detection is carried out based on the log sequence, the method learns normal log sequence relations by using a Multi-scale MASS pre-training language model (MSMASS), and improves an Attention mechanism of the MASS to be Multi-scale Attention (MSAttention).

3. According to the log sequence anomaly detection method based on the multi-scale MASS, the MSMASS model can not only acquire the context information of each log in the log sequence, but also acquire the global and local information simultaneously by paying attention to the context of different scales instead of all the heads of the MSMASS model, wherein the head of each MSMASS model is used for acquiring the correlation between global information; meanwhile, the MSMASS model adopts a continuous Mask log key mode, so that the sequence modeling capability can be fully enhanced; and the Encoder-Decoder structure can fully learn the context information of the log sequence, and improves the accuracy of predicting the log key, thereby improving the accuracy of detecting the abnormality of the log sequence.

Based on the reasons, the method can be widely popularized in the fields of log sequence anomaly detection and the like.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to the drawings without inventive effort to a person skilled in the art.

FIG. 1 is a block diagram of the method of the present invention.

Fig. 2 is a training flowchart of an MSMASS model according to an embodiment of the present invention.

Fig. 3 is a detection flow chart of the log sequence anomaly detection method based on multi-scale MASS according to the present invention.

Fig. 4 is a graph showing the effect of the sliding window size on the accuracy in the HDFS dataset according to an embodiment of the present invention.

Fig. 5 is a graph showing the effect of the size of a sliding window on accuracy in BGL dataset according to an embodiment of the present invention.

Fig. 6 is a cumulative probability graph of Top-K when predicting log keys according to an embodiment of the present invention.

FIG. 7 is a bar graph of the accuracy of various methods provided by embodiments of the present invention on an HDFS data set.

Fig. 8 is a bar graph of the accuracy of various methods provided by embodiments of the present invention on BGL datasets.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The invention improves the Attention mechanism of the original MASS model, replaces the Attention mechanism with multi-scale Attention, and models a log sequence by using a transducer by using the MSMASS model, as shown in figure 1. The invention enables the MSMASS model to have better language modeling capability by predicting the log keys of the continuous Mask instead of the discrete log keys. The method comprises the steps that through the fact that the log keys which are subjected to the Mask are predicted by the Decoder of the converter, and in the input of the Decoder, the log keys which are not subjected to the Mask in the input end of the Encoder are subjected to Mask, namely, the Decoder gives up priori information, so that the Encoder of the converter is forced to understand the meaning of the log keys which are not subjected to the Mask, meanwhile, the Decoder is stimulated to extract meaningful information in the context from a log sequence input to the Encoder, the multi-scale attribute can acquire context information of different scales in each layer, local dependence is focused in a shallow layer, and the relationship between the local dependence and the global dependence is balanced in the deep layer, so that more effective context information is acquired.

In specific implementation, the invention provides a log sequence anomaly detection method based on multi-scale MASS, which comprises a training stage and a detection stage, and specifically comprises the following steps:

s1, extracting a structured log key from an unstructured log;

in specific implementation, as a preferred embodiment of the present invention, the specific implementation procedure of the step S1 is as follows:

in specific implementation, as a preferred embodiment of the present invention, the multi-scale attribute in step S2 obtains context information of different scales at each layer, focuses on local dependency in the shallow layer, balances the relationship between the local dependency and the global dependency in the deep layer, and obtains valid context information.

In specific implementation, as a preferred embodiment of the present invention, in the step S2, the process of training the MSMASS model includes:

s21, giving a normal log sequence set χ;

the specific implementation process of the step S23 is as follows:

S24, acquiring context dependencies of different scales in the log sequence fragment based on an improved multi-scale attribute mechanism (MSattribute);

the specific implementation process of the step S24 is as follows:

given a vector sequence of normal log sequences of length NAssuming that the dimension of the ith head is ω, the self-attention of the ith head at the jth position is expressed as:

C _ij (x,ω)＝[x _i,j-ω ,...,x _i,j+ω ]

Q＝HW ^Q ,K＝HW ^K ,V＝HW ^V

wherein i represents the ith head, 1.ltoreq.i.ltoreq.A, j represents the jth position, 1.ltoreq.j.ltoreq.N, D represents the dimension of the vector, Q, K, V represent the query value matrix, the key value matrix, the value matrix, C respectively _ij A function representing the extraction context of the ith header at the jth position, W ^Q 、W ^K 、W ^V Representing a learnable parameter, ω representing a scale for controlling the working range;

in MSAttention, each head works on one of the scales of the log sequence, different heads can act on the same scale; let a heads be provided, the working scale of each head is different, i.e. the scale set is Ω= [ ω ] ₁ ,ω ₂ ,...,ω _A ]The improved multi-scale Attention mechanism can be expressed as:

MSAttention(H,Ω)＝[head(H,ω ₁ ) ₁ ；...；head(H,ω _A ) _A ]W ^O

wherein head (H, omega _i ) _i Expressed at a scale omega _i In the case of (1) the self-attention of the ith head at all positions, W ^O Representing a parameter matrix. In the MSMASS model with the multi-scale Attention mechanism layer number of L, in order to acquire context information of different scales at each layer, a scale allocation scheme mentioned in multi-scale self-Attention based on text classification, i.e., Ω= [1,3, the term, N]Determining the number of scales according to the number of heads and the principle that the same amount of attention is paid to each scale in the last layer; in the case of determining a set of scales, a different number of heads are allocated for different scales in each layer, the allocation scheme using n ^l ＝{hnum ₁ ,hnum ₂ ,...,hnum _A [ MEANS FOR SOLVING ] hnum _A Expressed as the dimension omega _A The more heads are allocated, the more attention is paid, and the formula is as follows:

n ^l ＝softmax(z ^l )·A

S25, shifting the real log key sequence which is dropped by the Mask one bit backwards and inputting the real log key sequence into a Decoder; at the input of the Decoder, each position sees only the information before the current position, and the information at the subsequent position is not visible.

the specific implementation process of the step S26 is as follows:

In specific implementation, as a preferred embodiment of the present invention, the specific implementation procedure of the step S3 is as follows:

s34, carrying out probability prediction on the log key subjected to Mask;

the specific implementation process of the step S34 is as follows:

Examples

In order to verify the effectiveness of the method of the invention, a comparison experiment was performed, the experimental environment being as follows:

experiments of the invention are all carried out on a NVIDIA TESLA V100G GPU server, a model is built based on Keras by using a Python3.6 environment, an MSMASS model is trained by using an Adam optimizer, a cross entropy function is used as a loss function during training, and the training process is terminated after 10 iterations; the MSMASS model has a total of 3 layers of MSAttention, each layer having 8 heads.

Data set: the present invention selects two representative log data sets HDFS and BGL, the details of which are shown in table 1. The HDFS dataset was collected from the 203 node cluster of Amazon EC2 platform by LogHub, which is common benchmark data for anomaly detection based on logs, which contains 11,175,629 raw log messages in total, and corresponding labels were assigned to 575061 sessions to illustrate their normal and anomalous states. The BGL dataset was collected by LogHub from the blue gene/L supercomputer system of the lons-lifmor national laboratory (LLNL) of lifmor, california, containing a total of 4,747,963 raw log messages and marking each log as either an alarm or a non-alarm message. In the experiment, 6000 normal log sequences and 6000 abnormal log sequences are selected from top to bottom according to the timestamp information of the logs for each data set, the first 80% is used as training data, and the remaining 20% is used as test data. In training, only the normal log sequence is used. Since both HDFS and BGL dataset labels are manually labeled, these labels are used as a de facto basis for evaluation.

Table 1 details of the dataset

Baseline method: the present invention selects IM, deepLog, logAnomaly and original MASS methods as baseline methods for comparison experiments.

IM is good at mining data rules, mining invariance components in log events in log count vectors, those log sequences that violate invariance being considered anomalies; the deep log uses LSTM to learn the mode of the normal log sequence, which is an abnormality detection model based on log keys; modeling a log stream into a natural language sequence by using LogAnomaly, extracting semantic information in a log template by using template2vec, and detecting abnormality according to a predicted value by using a next log possibly appearing based on LSTM prediction; in order to simultaneously capture the context dependence of different scales in a normal log sequence, the MSMASS model improves the Attention mechanism of the MASS model, replaces the Attention mechanism with the MSattention, introduces locality into the model, and has stronger capability of acquiring context information; secondly, MSMASS combines the Encoder and the Decoder of the transducer to learn the context of the sentence, which is superior to most of the prior natural language models in the aspect of learning the context.

Evaluation index: abnormality detection is a two-class problem, and the invention utilizes widely used indexes, namely accuracy, recall and F1-score, to evaluate the accuracy of MSMASS and each reference method in abnormality detection.

(1) Accuracy rate: the percentage of the log sequences that are actually abnormal log sequences in all the log sequences that are judged to be abnormal by the model is shown as follows:

(2) Recall rate: the log sequences of all the anomalies are correctly judged as the percentage of the anomaly log sequences by the model, and the following formula is shown:

(3) F1 fraction: the harmonic mean of accuracy and recall is shown as follows:

where TP is the number of anomaly log sequences that the model correctly detects. FP is the number of abnormal normal log sequences that the model erroneously identifies as abnormal. FN is the number of anomaly log sequences that the model did not detect.

Setting experimental parameters: by trial and errorFor HDFS data sets, the window size for each training is set to 16 (see fig. 4), the best scale set is Ω= [1,3,5,16 ]]Head number assignment was performed at α=1.0 (see table 2); the window size for each training on the BGL dataset is set to 8 (see fig. 5), and the optimal scale set is Ω= [1,3,5,8]The head number assignment was performed at α=0.5 (see table 3). To explore the influence of different K values, let L be _t Is a set of the Top-K log keys predicted at position t, K _t Is the actual log key in the log sequence at the position t, and the actual log key K under different K values is researched through experiments _t In predicted log key set L _t Probability P in (2) _r [k _t ∈L _t ]The result is shown in fig. 6. From the experimental results, it is found that setting the K value to 5 is most effective, i.e., if K _t Does not occur at L _t In the first 5 log keys of (c), the sequence is considered abnormal.

TABLE 2 head count distribution and accuracy for each layer under different parameters in HDFS dataset

TABLE 3 head count distribution and accuracy for each layer under different parameters in BGL dataset

The invention performs experimental comparison on MSMASS, three baseline methods and original MASS to verify the accuracy of MSMASS. Fig. 7 and 8 show the comparison of the present method with the original MASS and three baseline methods on the HDFS and BGL datasets, respectively. From the graph, the experimental results on the two data sets of HDFS and BGL show that the MSMASS has the highest accuracy in the five methods, and the F1-score on the HDGS data set and BGL data set is 0.97 and 0.98, respectively. While IM and deep achieve higher recall rates on both datasets, e.g., IM and deep recall rates on BGL datasets of 0.99 and 0.96, respectively, their high recall rates come at the expense of low accuracy (0.83 and 0.90), which means that no more anomalies can be detected. Compared with the MSMASS provided by the invention, the LogAnomaly achieves similar accuracy with the original MASS on two data sets, but has lower recall rate. Before adding the MSAttention, the average accuracy of the original MASS on two data sets is 0.96 and 0.98 respectively, after adding the MSAttention, the average accuracy of the MSMASS is up to 0.98 and 0.99 respectively, and is improved by about 2 percent and 1 percent respectively, so compared with the original MASS model, the improved MSMASS model is superior in sequence anomaly detection task, and context information with different scales is important for the accuracy of anomaly detection. Experimental results demonstrate that MSMASS is able to accurately process stable log data sets.

In summary, the invention provides a new log sequence anomaly detection method MSMASS based on multi-scale MASS, which fuses the advantages of continuously shielding log keys and performing efficient prediction into log sequence anomaly detection. The invention replaces the multi-head self-attention mechanism with the multi-scale multi-head self-attention mechanism, introduces locality into the model, improves the MASS model, and has stronger capability of acquiring context information. Experimental evaluation of large system logs shows that MSMASS has better effect than previous methods.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims

1. The log sequence anomaly detection method based on the multi-scale MASS is characterized by comprising a training stage and a detection stage, and specifically comprises the following steps:

s1, extracting a structured log key from an unstructured log;

s2, improving an attribute mechanism of the MASS model, replacing the attribute mechanism with multi-scale attribute to obtain an MSMASS model, and training the MSMASS model; in the step S2, the training process for the MSMASS model includes:

s21, giving a normal log sequence set χ;

s24, acquiring context dependencies of different scales in the log sequence fragment based on an improved multi-scale Attention mechanism; the specific implementation process of the step S24 is as follows:

C _ij (x,ω)＝[x _i,j-ω ,...,x _i,j+ω ]

Q＝HW ^Q ,K＝HW ^K ,V＝HW ^V

in MSAttention, each head works on one of the scales of the log sequence, different heads acting on the same scale; let a heads be provided, the working scale of each head is different, i.e. the scale set is Ω= [ ω ] ₁ ,ω ₂ ,...,ω _A ]The improved multi-scale Attention mechanism is expressed as:

MSAttention(H,Ω)＝[head(H,ω ₁ ) ₁ ；...；head(H,ω _A ) _A ]W ^O

n ^l ＝softmax(z ^l )·A

wherein |Ω| represents different scalesIs set in the number of (3),assigning a scaling factor, z, representing the number of heads determined for the kth scale of the first layer ^l Coefficient vector representing head number distribution ratio determined by each scale of the first layer, softmax (z ^l ) A head number proportion distribution function representing each scale in the first layer, n ^l A head number allocation scheme representing each scale in the first layer;

s27, judging whether an unframed log key exists in the normal log sequence, if so, moving the next segment of the sliding window frame log sequence x with the step length of 1 to go to the step S23, otherwise, stopping training the sequence to go to the step S22;

2. The method for detecting log sequence anomalies based on multi-scale MASS according to claim 1, wherein the specific implementation process of step S1 is as follows:

3. The method for detecting log sequence anomalies based on multi-scale MASS as claimed in claim 1, wherein the multi-scale Attention in step S2 acquires context information of different scales at each layer, focuses on local dependencies in shallow layers, balances the relationship between local dependencies and global dependencies in deep layers, and acquires valid context information.

4. The method for detecting log sequence anomalies based on multi-scale MASS according to claim 1, wherein the specific implementation process of step S23 is as follows:

5. The method for detecting log sequence anomalies based on multi-scale MASS according to claim 1, wherein the specific implementation process of step S26 is as follows:

wherein θ represents the learning rate, P (x) ^u:v |x ^\u:v The method comprises the steps of carrying out a first treatment on the surface of the θ) represents the conditional probability, which the MSMASS model estimates by learning the parameter θ;is obtained by further factorization of conditional probabilities according to the chain law, x _＜t Indicating the day before position tAnd (5) a key.

6. The method for detecting log sequence anomalies based on multi-scale MASS according to claim 1, wherein the specific implementation process of step S3 is as follows:

s34, carrying out probability prediction on the log key subjected to Mask;

7. The method for detecting log sequence anomalies based on multi-scale MASS according to claim 6, wherein the specific implementation process of step S34 is as follows: