CN115617614A

CN115617614A - Log sequence anomaly detection method based on time interval perception self-attention mechanism

Info

Publication number: CN115617614A
Application number: CN202211339210.4A
Authority: CN
Inventors: 曹志英; 徐伟刚; 张秀国; 李旺旺
Original assignee: Dalian Maritime University
Current assignee: Dalian Maritime University
Priority date: 2022-10-28
Filing date: 2022-10-28
Publication date: 2023-01-17

Abstract

The invention provides a log sequence anomaly detection method based on a time interval perception self-attention mechanism, which introduces the time interval perception self-attention mechanism in the process of extracting log sequence features by using a Transformer encoder, utilizes semantic information and time interval information of a log template in the process of calculating an attention score, improves the capability of acquiring associated information between logs, enables a model to learn the influence of the time interval between logs in a sequence on anomaly detection, and improves the accuracy of anomaly detection. Meanwhile, on the basis of extracting word vectors by using a BERT language model, the vector representation of the log template is generated by adopting the CNN aggregated word vectors, and the words and the semantic information with different context scales are learned, so that the model can adapt to the word change in log sentences in the software updating process, and the robustness of the abnormal detection is improved.

Description

Log sequence anomaly detection method based on time interval perception self-attention mechanism

Technical Field

The invention relates to the technical field of information, in particular to a log sequence anomaly detection method based on a time interval perception self-attention mechanism.

Background

Log data is an important and valuable source of data in online service systems that records detailed information about the operational status of the system and user behavior. By analyzing whether the log sequence generated by the system deviates from the normal working mode, the error generated during the operation of the system can be effectively found, and the reliability of the software is improved.

Currently, mainstream log sequence anomaly detection methods can be divided into methods based on machine learning and methods based on deep learning.

Most of log sequence anomaly detection methods based on machine learning utilize the statistical characteristics of log data and a machine learning algorithm to carry out anomaly detection on log sequences.

For example, using unsupervised PCA algorithm, statistics is performed on the event state and occurrence number of log expression by analyzing the source code, and anomaly detection is performed by using a state proportion vector and an event count vector as the input of the model.

For another example, an SVM algorithm is used, log sequences are vectorized through the number and distribution of various log levels in a sliding window, the log sequences are input into a model, and anomaly detection is performed through supervised training.

Most of log sequence anomaly detection methods based on deep learning focus on using the sequence relation among logs, log sequence anomaly detection is performed by using methods such as a recurrent neural network and an attention mechanism, and some scholars apply technical methods in the field of natural language processing to the log sequence anomaly detection.

For example, the LSTM-based log anomaly detection model DeepLog. Although the sequential relationship among the logs is learned, the encoding mode adopting the log template index cannot fully extract the semantic information of the log template, and the robustness is lacked.

For another example, logRobust converts words in a log template into word vectors by a natural language processing method, generates log template vectors based on TF-IDF (term frequency-inverse document frequency) aggregation word vectors, and detects an abnormality by using a Bi-LSTM model based on an attention-driven system, although semantic information included in a log sentence is considered, the method of aggregating word vectors by using a statistical method cannot cope with changes of words in the log sentence and cannot learn word context characteristics, resulting in low robustness. Meanwhile, the LSTM-based model does not consider the position information of the log event, is troubled by the long-term dependence problem, has the risk of losing the past information and causes low accuracy.

For example, the word vector is extracted through the neural log, the log template vector is calculated as the average value of the corresponding word vector, and the log template vector sequence is input into a transform-based model for anomaly detection, so that the detection accuracy is improved. Although these methods all achieve a certain effect on the accuracy of the abnormal detection, the statistical method is used to aggregate word vectors to generate log template vectors, so that semantic information of different words cannot be distinguished, local features of the log template cannot be learned, and the accuracy of the abnormal detection is reduced when the words in the log template change. When the system normally runs, the response time of different tasks is stabilized in a normal range, and when the abnormality such as hardware problem, network communication congestion and component performance defect occurs, the response time of the tasks fluctuates greatly, and the time interval between log statements generated by the tasks is too long.

The method only focuses on the sequence information of the log, and cannot fully utilize the time information which is useful for the abnormal detection task in the log, so that the performance of the abnormal detection is limited. In summary, the current method for detecting the abnormality of the log sequence mainly has the following disadvantages:

(1) Most methods employ techniques in the field of natural language processing to obtain word vectors, and use statistical methods to aggregate word vectors to generate log template vectors. Although the semantic information of the words in the log can be obtained, the influence of the semantics of different words on the abnormal detection task cannot be well distinguished, the context characteristics of the words cannot be obtained, the dependency relationship between the words is captured, the method cannot adapt to the word change generated by the irregular updating of log statements in the upgrading process of a system or service, and the accuracy of the abnormal detection is influenced.

(2) Most methods focus primarily on the impact of the log's sequential information on anomaly detection. Although most system faults can cause logs to deviate from a normal log sequence, and system abnormity can be effectively detected through log sequence information, time intervals generated by the logs are also important information sources for judging the system abnormity, and only the log sequence information is considered, and the time intervals among the logs are ignored, so that the abnormity detection performance is limited.

Disclosure of Invention

The invention provides a log sequence anomaly detection method based on a time interval perception self-attention mechanism.

The technical means adopted by the invention are as follows:

a log sequence anomaly detection method based on a time interval perception self-attention mechanism comprises the following steps:

acquiring a log set for training, and performing data analysis on the log set for training so as to generate a log template sequence and a time stamp sequence; calculating the relative time interval between each log event in the sequence according to the timestamp sequence, thereby obtaining a time interval matrix;

and inputting the log template sequence and the time interval matrix into an anomaly detection model based on a time interval perception self-attention mechanism to perform supervised training on the whole model.

And acquiring a log set to be detected, and inputting the log set to be detected into a trained log sequence abnormity detection model based on a time interval perception self-attention mechanism to perform abnormity detection.

Further, the training step of the log sequence anomaly detection model based on the time interval-aware self-attention mechanism comprises:

inputting the log template sequence into a BERT model for word vector processing, thereby obtaining a word vector of the log template sequence;

extracting a template vector of the word vector of the log template sequence through a one-dimensional convolutional neural network so as to obtain a log template vector; judging whether the log template sequence has unprocessed log templates or not, if so, performing word vector extraction and template vector extraction on the next log template until the log templates are processed;

inputting the log template vector and the time interval matrix into a Transformer encoder to obtain a log sequence vector;

and inputting the log sequence vector into a classifier based on a fully-connected neural network to output a detection result, calculating loss based on the detection result and updating model parameters.

Further, calculating a relative time interval between each log event in the sequence according to the time stamp sequence, thereby obtaining a time interval matrix, including:

for a sequence of timestamps T = { T = } ₁ ,t ₂ ,…t _n Calculate the time difference t _ij And taking the absolute value to represent the relative time interval between the log of the ith position and the log of the jth position, and then normalizing the relative time interval

Where μ is the mean of all time interval data in the training set, σ is the standard deviation,obtaining a new time interval matrix

Further, inputting the log template sequence into a BERT model to extract word vectors, so as to obtain word vectors of the log template sequence, including:

regarding the log template as a sentence in the natural language, for a log template sequence X = { X = ₁ ,x ₂ ,…x _n N is the sequence length, x _i Representing the ith log template, and dividing the log template into word sequences represented as x _i ＝{w ₁ ,w ₂ ,…w _m M represents the sentence length of the log template;

the word sequence is encoded by a BERT language model, each word is mapped to a d-dimensional vector, and the log template is represented as a word vector sequence z _i ＝{v ₁ ,v ₂ ,…v _m Therein of

Further, extracting a template vector of the word vector of the log template sequence through a one-dimensional convolutional neural network, thereby obtaining a log template vector, including:

for log template word vector sequence z _i Let v be _k:k+j Represents from v _k To v _k+j The word vectors are input into the convolutional layer, the convolutional layer comprises a plurality of convolution kernels, and the operation process of each convolution kernel is shown as formula (1).

c _i ＝f(W·v _k:k+h-1 +b) (1)

Wherein

h is the height of the convolution kernel, d is the width of the convolution kernel,

is a bias term, f is nonlinearAn activation function ReLU;

each convolution kernel can obtain a new feature mapping C = { C) by operating all word vectors in the log template ₁ ,c ₂ ,…c _m-h+1 Get the maximum value through the maximum pooling layer

As a characteristic value obtained by the convolution kernel operation;

connecting the characteristic values generated by all the convolution kernels to obtain a log template vector

Where y is the number of convolution kernels.

Further, inputting the log template vector and the time interval matrix into a Transformer encoder based on a time interval perception self-attention mechanism to obtain a log sequence vector, including:

position encoding a log template vector, generating a vector for log events at each position in a log sequence using sine and cosine functions

As shown in equation (2):

where t =1,2, … y, represent different dimensions of a vector, PE _i Added to the log template vector e at location i _i So that the model can learn the relative location information of each log event;

inputting the log template vector sequence added with the position information into a time interval perception self-attention layer, adding the time interval information of the log event in attention calculation, and expanding a time interval matrix by one dimension to change the time interval matrix into a time interval perception self-attention layer

And with two parameter matrices

Multiplying to obtain a time interval key value matrix

Sum value matrix

For vector e in sequence _i The attention calculation method of (2) is shown in formula (3):

wherein

h is the h-th attention head, D _q ＝D _k ＝D _v H is the number of the attention heads, each attention head adopts different learnable parameter matrixes during calculation, and the calculation result of each head is spliced with the parameter matrixes

The multiplication results in a new log template vector representation, as shown in equation (4):

z _i ＝Concat(A ₁ ,A ₂ …,A _H )W ^O (4)；

inputting the log template vector subjected to attention calculation into a feedforward full-connection layer, and performing linear transformation twice to obtain a final log sequence vector as shown in formula (5):

r _i ＝max(0,z _i ·W ₁ +b ₁ )W ₂ +b ₂ (5)

wherein

The final log sequence vector representation is R = { R = { ₁ ,r ₂ ,...r _n },r _i ∈R ^y 。

Further, inputting the log sequence vector into a classifier based on a fully-connected neural network to obtain a classification result of the log sequence and perform supervised training on the whole model, and calculating loss and updating model parameters based on the detection result, including:

inputting the log sequence vector into a classifier based on the fully-connected neural network, outputting the detection result, calculating the loss and updating the model parameters, as shown in formula (6):

pre＝Softmax(R·W _s +b _s ) (6)

wherein W _s ，b _s Weights and offset terms of the fully-connected neural network are used, and pre is the probability of outputting normality and abnormality by the model;

the loss between the model detection result and the label provided in the dataset is calculated using a cross entropy loss function, as shown in equation (7):

and (4) reversely propagating the updating parameters through the Adam optimizer, wherein the parameters needing to be updated comprise parameters in the CNN, the Transformer encoder and the fully-connected neural network.

Further, the step of inputting the log set to be detected into the trained log sequence anomaly detection model based on the time interval perception self-attention mechanism to perform anomaly detection includes:

acquiring a log set to be detected, and performing data analysis on the log set to generate a log template sequence and a time stamp sequence; calculating the relative time interval between each log event in the sequence according to the timestamp sequence, thereby obtaining a time interval matrix;

inputting the log template sequence into a BERT model for word vector extraction, thereby obtaining a word vector of the log template sequence;

inputting the log template vector and the time interval matrix into a Transformer encoder based on a time interval perception self-attention mechanism to obtain a log sequence vector;

and inputting the log sequence vector into a classifier based on a fully-connected neural network to output a detection result.

Compared with the prior art, the invention has the following advantages:

the invention relates to a log sequence anomaly detection method based on a time interval perception self-attention mechanism. In order to improve the accuracy of anomaly detection, a time interval perception self-attention mechanism is introduced in the process of extracting log sequence features by using a Transformer encoder, and the capability of acquiring association information between logs is improved by using time interval information between logs, so that a model can learn the influence of the time interval between each log in a sequence on anomaly detection. In addition, on the basis of extracting the word semantic representation in the log by utilizing the pre-trained BERT language model, the invention adopts the CNN to aggregate word vectors to generate the vector representation of the log template, learns the words and the semantic information of different scales of the context, ensures that the model can adapt to the word change in the log sentence in the software updating process, and improves the robustness of the abnormal detection. The experimental results on two log data sets show that the method is superior to most of the existing log sequence-based anomaly detection methods.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a flowchart of a log sequence anomaly detection method based on a time interval aware self-attention mechanism according to the present invention.

Fig. 2 is a schematic diagram of the LogT framework in the example.

Fig. 3 is a schematic diagram of the LogT training process in the embodiment.

Fig. 4 is a schematic diagram of a detection process of the LogT in the embodiment.

FIG. 5 shows the effect of different numbers of heads of attention on the detection performance in the HDFS data set of the embodiment.

Fig. 6 shows the effect of different numbers of attention heads in the BGL data set on detection performance in an embodiment.

FIG. 7 shows the performance of the different methods on the HDFS data set in the example.

Fig. 8 shows the performance of the different methods on the BGL data set in an embodiment.

FIG. 9 is a diagram illustrating log update in an embodiment.

Fig. 10 is a comparison graph of the robustness of the different methods in the examples.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in fig. 1 to 4, the present invention provides a log sequence anomaly detection method (LogT) based on a time interval perception self-attention mechanism, which mainly includes:

s1, acquiring a log set for training, and performing data analysis on the log set to be detected so as to generate a log template sequence and a time stamp sequence; and calculating the relative time interval between each log event in the sequence according to the time stamp sequence, thereby obtaining a time interval matrix.

The goal of log parsing is to convert unstructured log text into structured data [11]. The method selects a Drain algorithm with good performance to analyze the log, and has the core idea that an analysis tree with fixed depth is constructed based on log data, the log is analyzed according to template extraction rules contained in the tree, and a log template and a timestamp are extracted from an unstructured log text.

According to the characteristics of the log data set, the log sequence is preferably divided according to the session id or the sliding window of the log data set, and the log template sequence and the time stamp sequence are obtained.

Further, calculating a relative time interval between each log event in the sequence according to the time stamp sequence, thereby obtaining a time interval matrix, comprising:

for a sequence of timestamps T = { T = } ₁ ,t ₂ ,…t _n Calculate the time difference t _ij And taking absolute value to represent the relative time interval between the log of the ith position and the log of the jth position, and then normalizing them

Wherein mu is the mean value of all time interval data in the training set, and sigma is the standard deviation, so as to obtain a new time interval matrix

And S2, inputting the log template sequence and the time interval matrix into an anomaly detection model based on a time interval perception self-attention mechanism for supervised training.

In a specific implementation manner, as a preferred embodiment of the present invention, in the step S2, the process of training the anomaly detection model based on the time interval perception attention mechanism includes:

s201, inputting the log template sequence into a BERT model to extract word vectors, and accordingly obtaining the word vectors of the log template sequence.

In the present invention, words for capturing log eventsMeaning information, regarding the log template as a sentence in the natural language, and for a log template sequence X = { X = ₁ ,x ₂ ,…x _n N is the sequence length, x _i Representing the ith log template, splitting the ith log template into word sequences represented as x _i ＝{w ₁ ,w ₂ ,…w _m M represents the sentence length of the log template; the word sequence is coded by a BERT language model, each word is mapped into a d-dimensional vector, and the log template is expressed as a word vector sequence z _i ＝{v ₁ ,v ₂ ,…v _m Therein of

S202, extracting a template vector of the word vector of the log template sequence through a one-dimensional convolutional neural network to obtain a log template vector; and judging whether the log template sequence has unprocessed log templates or not, and if so, performing word vector extraction and template vector extraction on the next log template until the log templates are processed.

In particular, for log template word vector sequence z _i Let v be _k:k+j Represents from v _k To v _k+j The word vectors are input into the convolutional layer, the convolutional layer comprises a plurality of convolution kernels, and the operation process of each convolution kernel is shown as formula (1).

c _i ＝f(W·v _k:k+h-1 +b) (1)

Wherein

f is a nonlinear activation function ReLU as a bias term;

each convolution kernel can obtain a new feature mapping C = { C) by operating all word vectors in the log template ₁ ,c ₂ ,…c _m-h+1 Get it throughOver-maximum pooling layer to maximum

As a characteristic value obtained by the convolution kernel operation;

Where y is the number of convolution kernels.

Further, whether the log template sequence has unprocessed log templates is judged, if yes, S201 is executed, and if not, S204 is executed.

S203, inputting the log template vector and the time interval matrix into a Transformer encoder based on a time interval perception self-attention mechanism to obtain a log sequence vector.

First, a log template vector is position encoded, and a vector is generated for the log event at each position in the log sequence using sine and cosine functions

As shown in equation (2):

where t =1,2, … y, represent different dimensions of a vector, PE _i Added to the log template vector e at location i _i So that the model can learn the relative location information of each log event.

Secondly, inputting the log template vector sequence added with the position information into a time interval perception self-attention layer, adding the time interval information of the log event in attention calculation, and expanding a time interval matrix by one dimension to form the time interval matrix

And with two parameter matrices

Multiplying to obtain a time interval key value matrix

Sum value matrix

wherein

h is the h-th attention head, D _q ＝D _k ＝D _v H is the number of the attention heads, each attention head adopts different learnable parameter matrixes during calculation, and the operation result of each head is spliced and connected with the parameter matrixes

z _i ＝Concat(A ₁ ,A ₂ …,A _H )W ^O (4)。

finally, the log template vector after attention calculation is input into a feedforward full-connection layer, and linear transformation is performed twice, as shown in formula (5), a final log sequence vector is obtained:

r _i ＝max(0,z _i ·W ₁ +b ₁ )W ₂ +b ₂ (5)

wherein

The final log sequence vector representation is R = { R = ₁ ,r ₂ ,...r _n },r _i ∈R ^y 。

And S204, inputting the log sequence vector into a classifier based on a fully-connected neural network to output a classification result and carrying out supervised training on the whole model.

Specifically, inputting the log sequence vector into the fully-connected neural network with the Softmax function, outputting the detection result, calculating the loss, and updating the model parameters, as shown in formula (6):

pre＝Softmax(R·W _s +b _s ) (6)

And S205, judging whether the log test set has sequences which are not trained, if so, turning to S2, otherwise, ending the training.

And S3, acquiring the combination of the logs to be detected, and inputting the log set to be detected into an anomaly detection model based on a time interval perception self-attention mechanism after training for anomaly detection.

As a preferred embodiment of the present invention, the detecting step includes:

s301, acquiring a training log set, and performing data analysis on the training log set to generate a log template sequence and a time stamp sequence; and calculating the relative time interval between each log event in the sequence according to the time stamp sequence, thereby obtaining a time interval matrix. The steps are the same as the execution method of S1, and are not described herein again.

S302, inputting the log template sequence into a BERT model to extract word vectors, and thus obtaining the word vectors of the log template sequence. The steps are the same as the execution method of S201, and are not described herein again.

S303, extracting the template vector of the word vector of the log template sequence through a one-dimensional convolutional neural network to obtain a log template vector; and judging whether the log template sequence has unprocessed log templates or not, and if so, performing word vector extraction and template vector extraction on the next log template until the log templates are processed. The steps are the same as the execution method of S202, and are not described herein again.

S304, inputting the log template vector and the time interval matrix into a transform encoder to obtain a log sequence vector. The steps are the same as the execution method of S203, and are not described herein again.

S305, inputting the log sequence vector into a classifier based on a fully-connected neural network, and outputting a detection result.

S306, judging whether the log test set has sequences which are not trained, if so, turning to S302, otherwise, ending the training.

The scheme and effect of the present invention will be further explained by specific application examples.

The method compares the algorithm of the patent with various latest algorithms at present from two aspects of the accuracy and robustness of log sequence anomaly detection, and describes the beneficial effect of the algorithm.

(1) Experimental Environment

The experiments of the invention are all performed on a NVIDIA TESLA V100G GPU server, the model is constructed based on Pythroch by using Python3.6 environment, the LogT model is trained by using an Adam optimizer, the cross entropy function is used as a loss function during training, and the training process is terminated after 10 iterations.

(2) Data set

2019 Shilin He et al published a large number of logs from 16 different systems on Loghub the present invention selects two representative log datasets HDFS and BGL, the details of which are shown in Table 1. The HDFS dataset is generated by running Hadoop on over 200 Amazon EC2 (EC 2) nodes, and is common reference data for log-based anomaly detection, which contains 11,175,629 original log messages in total, and 575061 sessions are all given corresponding labels to illustrate their normal and abnormal states. The BGL dataset was collected from the BlueGene/L supercomputer system of the Lawrence Livermore National Laboratory (LLNL) of Livermore, calif., containing a total of 4,747,963 raw log messages, and labeling each log as normal or abnormal. In the experiment, the first 80% of the data are taken as training data from top to bottom according to the time stamp information of the log, and the rest 20% are taken as test data. The sequence of the HDFS dataset is partitioned by Block _ id, and the sequence of the BGL dataset is framed by a sliding window of size 20. Since both the HDFS and BGL dataset tags are manually labeled, these labels are factored into the evaluation.

Table 1 details of the data set

(3) Baseline method

According to the invention, PCA, SVM, deeplog, logRobust and Neuralog are selected as baseline methods of the comparison experiment.

The PCA counts the event state and occurrence frequency expressed by the log by analyzing a source code, and performs unsupervised anomaly detection by using a state proportion vector and an event count vector as the input of a model; the SVM vectorizes log sequences by the quantity and distribution of various log levels in a sliding window, and performs anomaly detection by supervised training; the DeepLog adopts a coding mode of log template index, and utilizes LSTM to learn the sequence relation among normal logs to detect abnormality; logRobust extracts word vectors through FastText, aggregates the word vectors through TF-IDF to generate log template vectors, and inputs the log template vectors into a Bi-LSTM model based on attention to detect abnormality; the NeuralLog extracts word vectors through Bert, obtains template vectors through calculating the average value of the word vectors, and inputs the template vectors into a model based on a Transformer for anomaly detection.

(4) Evaluation index

The anomaly detection is a two-classification problem, and the invention utilizes widely used indexes, namely accuracy, recall and F1-score, to evaluate the accuracy of the LogT and each reference method in the aspect of anomaly detection.

Rate of accuracy

Is the percentage of the log sequences that are truly anomalous, out of all the log sequences that are determined to be anomalous by the model.

Recall rate

Representing the percentage of log sequences that are correctly distinguished by the model as anomalous among all anomalous log sequences.

Fraction of F1

Represent harmonic means of accuracy and recall.

Where TP is the number of abnormal log sequences that the model correctly detects. FP is the number of log sequences that the model wrongly identifies as abnormal normal. FN is the number of abnormal log sequences that the model does not detect.

(5) Experimental parameter settings

For the one-dimensional convolutional layer, according to the experience [14-15] of the convolutional neural network applied in the field of natural language processing, the convolutional neural network adopts convolutional kernels with three different sizes, namely 1,2 and 3, the number of the convolutional kernels with each size is set to be 100, and the convolutional neural network is ensured to be capable of acquiring information of words and different scales of the contexts of the words; for different numbers of attention heads in the self-attention mechanism, according to the results of multiple experiments (as shown in fig. 5 and 6), the number of attention heads of the HDFS data set and the BGL data set is set to 4 by simultaneously considering the model complexity and accuracy.

(6) Set of experimental results comparative analysis

The LogT and six baseline methods are compared in experiments to verify the advantages of the method in the aspects of accuracy and robustness of log sequence anomaly detection.

1) Rate of accuracy

Fig. 7 and 8 show the results of comparing the method with the six baseline methods in terms of accuracy on the HDFS and BGL datasets, respectively.

As can be seen from fig. 7 and 8, the experimental results on the HDFS and BGL datasets show that the accuracy of the log t is highest in the six methods, and the F1-score on the HDGS dataset and the BGL dataset is 0.98. The PCA method obtains the statistical characteristics of log data by analyzing a source code for anomaly detection, and cannot be used as a general anomaly detection method, so that the PCA method has poor performance on two data sets; although the accuracy of the SVM on the BGL data set is higher and reaches 0.98, the low recall rate of the SVM affects the detection performance and is not ideal on the HDFS data set because only the statistical characteristics of log data are considered and the sequence information of the log is not considered; the deep Log has higher accuracy of 0.96 on an HDFS data set, but has lower recall rate and poorer performance on a BGL data set because the deep Log adopts a log template indexing method and cannot acquire semantic information of the log; log robust achieves higher recall rate on two data sets, but the high recall rate is at the cost of low accuracy rate, the lower accuracy rate represents that more abnormalities cannot be detected, and NeralLog uses a model of BERT and a Transformer to perform abnormality detection, so that the accuracy rate and the recall rate on the two data sets are improved compared with the method; the LogT senses the self-attention mechanism through a time interval, and obtains higher accuracy and recall rate on an HDFS (Hadoop distributed file system) data set and higher recall rate on a BGL (Web metadata Link) data set by utilizing the time interval information among logs in the running process of the system, so that the LogT can detect more abnormalities and avoid wrong alarm, and the accuracy of abnormality detection is improved.

2) Robustness

As a system or service is upgraded, developers often insert or delete words in log statements during the process of updating system source code. Updates to log statements can affect the accuracy of anomaly detection. Therefore, it becomes important to improve the robustness of the model so that the model can cope with log template updates. In order to compare the robustness of LogT and other baseline methods, the invention makes a certain action on the original HDFS data set according to the log update rule proposed by Zhang et al [6 ]. The modified log statement will not significantly change the semantics of the original log statement, so the corresponding abnormal tag status is not affected, and a specific example of log update is shown in fig. 9. The abnormality detection is performed again on the HDFS data sets updated in different proportions, and the comparison results of F1 scores by different methods are shown in fig. 10.

As can be seen from fig. 9, when the update ratio of the log reaches 5%, the F1 score of DeepLog starts to decrease significantly, when the update ratio of the log reaches 15%, the F1 scores of SVM, PCA and neuralog also decrease significantly, and when the update ratio of the log reaches 25%, logRobust also decreases significantly. The F1 score of logT is slightly influenced, and the higher F1 score can be kept even if the update ratio of the log reaches 30%. Therefore, the LogT method provided by the invention obtains the log template vector by using the Bert and the CNN, so that the model can better learn the semantic information and the context characteristics of the unused words, can adapt to the situation that the words in the log are updated, and has better robustness compared with other baseline methods.

(7) Conclusion

The invention provides a log sequence anomaly detection method (logT) based on a time interval perception self-attention mechanism, which constructs a hierarchical network structure consisting of CNN and Transformer to obtain log features of different levels, utilizes the CNN to obtain multi-scale features of words, and considers the association information between the words in log sentences more carefully, so that a model can adapt to the condition of log sentence updating, and the robustness of anomaly detection is improved. Meanwhile, in the process of extracting the characteristics of the log sequence, the sequence information of the log is considered, and the time information contained in the log is added. The time interval perception self-attention mechanism is utilized, the time interval between the logs and the semantic information are input into the self-attention mechanism together to obtain the correlation information between the logs, the expression capability of the extracted features is optimized, and the anomaly detection effect is improved. Experimental evaluation on large system log data sets shows that LogT achieves a better anomaly detection effect than current mainstream methods. In future work, the method for extracting the characteristics of the abnormal detection of the log sequence is further explored, more valuable information in the log is considered to be fused into an abnormal detection model, the difference between the normal log sequence and the abnormal log sequence can be effectively distinguished, and the performance of the abnormal detection is enhanced. And the efficiency of the method is tested on more log data sets, and the universality of the method is verified.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A log sequence anomaly detection method based on a time interval perception self-attention mechanism is characterized by comprising the following steps:

2. The method for detecting log sequence abnormality of the time interval-aware self-attention mechanism according to claim 1, wherein the training step of the log sequence abnormality detection model of the time interval-aware self-attention mechanism includes:

3. The method for detecting log sequence abnormality based on the mechanism of self-attention perceived through time interval perception according to claim 1 or 2, wherein the relative time interval between each log event in the sequence is calculated according to the time stamp sequence, so as to obtain a time interval matrix, and the method comprises the following steps:

for a sequence of time stamps T = { T = ₁ ,t ₂ ,…t _n Calculate the time difference t _ij And taking the absolute value to represent the relative time interval between the log of the ith position and the log of the jth position, and then normalizing the relative time interval

Wherein mu is trainingCollecting the mean value of all time interval data, sigma is standard deviation, obtaining new time interval matrix

4. The method for detecting log sequence abnormality based on the time interval aware self-attention mechanism according to claim 1 or 2, wherein inputting the log template sequence into a BERT model to extract word vectors, thereby obtaining word vectors of the log template sequence, comprises:

5. The method for detecting the abnormality of the log sequence based on the time interval aware self-attention mechanism according to claim 4, wherein performing template vector extraction on a word vector of the log template sequence through a one-dimensional convolutional neural network to obtain a log template vector, includes:

c _i ＝f(W·v _k:k+h-1 +b) (1)

Wherein

f is a nonlinear activation function ReLU as a bias term;

each convolution kernel can obtain a new feature mapping C = { C) by operating all word vectors in the log template ₁ ,c ₂ ,…c _m-h+1 Get the maximum value by maximizing the pooling layer

As a characteristic value obtained by the convolution kernel operation;

Where y is the number of convolution kernels.

6. The method for detecting log sequence abnormality based on the time interval aware self-attention mechanism according to claim 5, wherein inputting the log template vector and the time interval matrix into a Transformer encoder based on the time interval aware self-attention mechanism to obtain a log sequence vector comprises:

As shown in equation (2):

And with two parameter matrices

Multiplying to obtain a time interval key value matrix

Sum value matrix

For the vector e in the sequence _i The attention calculation method of (2) is shown in formula (3):

wherein

Multiplying to obtain a new log templateQuantity expression, as shown in equation (4):

z _i ＝Concat(A ₁ ,A ₂ …,A _H )W ^O (4)；

r _i ＝max(0,z _i ·W ₁ +b ₁ )W ₂ +b ₂ (5)

wherein

7. The method for detecting log sequence abnormality based on the time interval aware self-attention mechanism according to claim 6, wherein the log sequence vector is input into a classifier based on a fully-connected neural network to obtain a classification result of the log sequence and perform supervised training on the whole model, and the loss is calculated and the model parameters are updated based on the detection result, including:

inputting the log sequence vector into a classifier based on the fully-connected neural network, outputting a detection result, calculating loss, and updating model parameters, as shown in formula (6):

pre＝Softmax(R·W _s +b _s ) (6)

8. The method according to claim 1, wherein the step of inputting the log set to be detected into the trained log sequence anomaly detection model based on the time interval aware self-attention mechanism for anomaly detection comprises:

acquiring a log set to be detected, and performing data analysis on the log set so as to generate a log template sequence and a time stamp sequence; calculating the relative time interval between each log event in the sequence according to the timestamp sequence, thereby obtaining a time interval matrix;