CN118312393A

CN118312393A - Log sequence anomaly detection method based on attention domain adaptive transfer learning

Info

Publication number: CN118312393A
Application number: CN202410123465.XA
Authority: CN
Inventors: 曹志英; 李旺旺; 张秀国; 王凯月
Original assignee: Dalian Maritime University
Current assignee: Dalian Maritime University
Filing date: 2024-01-29
Publication date: 2024-07-09

Abstract

The invention provides a log sequence anomaly detection method based on attention domain adaptive transfer learning, which comprises the following steps: s1, obtaining a semantic vector of a log template by adopting BERT; s2, vector optimization is carried out on semantic vectors of the log templates by adopting Normalizing Flow; s3, extracting vectors of the log templates after the optimization of the S2 by using GRU; s4, training a source domain log sequence anomaly detection model by using the GRU network based on the vector of the log template acquired in the S3; s5, judging whether the source domain log sequence is fully trained; s6, judging whether a model training result meets expectations or not; s7, calculating domain differences of log data of the source domain and the target domain; s8, performing migration learning training on the target domain log data by using a source domain log sequence optimal anomaly detection model based on domain differences; and S9, judging whether the target domain log sequence transfer learning reaches the expectations or not. The method and the device can accurately capture the semantic feature information of the log data set, and realize that the detection result with high accuracy is obtained on the label-free log data set.

Description

Log sequence anomaly detection method based on attention domain adaptive transfer learning

Technical Field

The invention relates to the technical field of computer data processing, in particular to a log sequence anomaly detection method based on attention domain adaptive transfer learning.

Background

The system log records data information of system states and major events of various key points, is common in almost all computer systems, and is a precious resource for knowing the system states. By collecting, processing and analyzing the log data, debugging system performance problems and faults can be facilitated, and root cause analysis can be performed. With the increasing size and complexity of existing systems, it has become impractical to manually examine logs to find anomalies. Therefore, many automatic log sequence anomaly detection methods are proposed, which are mainly classified into supervised, semi-supervised and unsupervised based log sequence anomaly detection methods.

In the aspect of monitoring abnormality detection based on a supervised log sequence, an abnormality detection model is required to be trained from the log data with labels, the training data with labels are preconditions for monitoring abnormality detection, and the more the training data labels, the more accurate the model. Liang et al use a Support Vector Machine (SVM) to detect anomaly log sequences, which uses SVM for supervised training. In the case of anomaly detection, if it is detected that the log sequence is located above the hyperplane, it is considered as anomaly. However, this method cannot cope with the update of the log message template, and lacks robustness. Moreover, the method is only focused on the text, and analysis and utilization of text semantic information are lacked, so that the abnormality detection accuracy is low. Zhang et al propose LogRobust to convert words in a log template into word vectors by natural language processing techniques, to generate log template vectors using TF-IDF (term frequency-inverse document frequency) -based aggregation of word vectors, and to detect anomalies using the Bi-LSTM model based on the attention mechanism. Although the method considers semantic information contained in the log statement, the way of aggregating word vectors by adopting a statistical method cannot cope with the change of words in the log statement, and cannot learn word context characteristics, so that the robustness is not high. Meanwhile, the LSTM-based model does not consider the position information of the log event, is plagued by long-term dependence problem, and has the risk of losing history information, so that the detection accuracy is reduced. NeuralLog, le V H et al, proposes a method that uses BERT to extract word vectors, calculates log template vectors as averages of their corresponding word vectors, and inputs a sequence of log template vectors into a transducer-based model for anomaly detection, thereby improving the accuracy of the detection. However, neuralLog focuses only on the order information of the log, and the time information useful for the abnormality detection task in the log is not fully utilized, resulting in limitation of the performance of abnormality detection.

In the field of log sequence anomaly detection based on semi-supervision, normal log data is generally used for training, a mode of a normal log sequence is learned, and an anomaly mode deviating from the normal log sequence is detected. The Du et al propose DeepLog as the first method to detect log anomalies using deep learning, but this method does not respond in time to updates to the log template, and lacks robustness. In addition, the method lacks analysis of log semantic features, so that the abnormality detection accuracy is low.

In the field of anomaly detection based on an unsupervised log sequence, an anomaly detection model can be obtained by training without any log data with labels. Xu et al uses Principal Component Analysis (PCA) to detect anomalies in a log sequence, the method selects proper variables from the preprocessed log and groups related messages to construct a state ratio vector and a message count vector, and then uses an unsupervised method of PCA to train feature vectors, so that the anomaly detection accuracy is lower. The method of DeepSyslog was proposed by Zhou et al to represent Syslog with the context of the logged event and the event metadata. Inspired by the nature of the log stream sequence, deepSyslog uses unsupervised sentence embedding to extract semantic and contextual information hidden in the log stream. However DeepSyslog require retraining on the new data set and thus universality is to be improved.

In terms of log sequence anomaly detection based on transfer learning, chen et al propose a semi-supervised LogTransfer method. The method trains the source LSTM network and the fully connected network using source system labels until both achieve good performance. For the target system LogTransfer applications initialize the target LSTM network based on the source LSTM network, fine tuning the target LSTM network based on the target system labels and the shared fully-connected network. This approach requires the use of labels for the target domain and is therefore not applicable to unlabeled target log datasets.

In summary, the current log anomaly detection method has a plurality of problems, and the outstanding aspects are as follows: (1) Most methods adopt the technology in the natural language processing field to acquire the log template vector, and although the semantic information of the log template can be acquired, the high-frequency words and the low-frequency words in the semantic information of the log template are unevenly distributed, the semantic information of the low-frequency words in the log template can not be well utilized, word changes generated by irregular updating of log sentences in the system or service upgrading process can not be adapted, and the accuracy of anomaly detection is affected. (2) Most methods are supervised and semi-supervised, only the context information of the current log data is considered, and although good anomaly detection effects can be obtained, detection is performed on the labeled log data, and the detection result is poor on the unlabeled data set. The log information in the actual system is unlabeled, so that most of the current log anomaly detection methods cannot be applied to detection of unlabeled log data sets. (3) Most methods are difficult to adapt to different types of large software services, with different types of systems differing in log syntax. The similarity of log data between a source system and a target system can be considered in cross-system migration learning, however, the existing log anomaly detection method only considers the context information of the current log data, and the anomaly detection cannot be performed by well utilizing the similarity between a plurality of different grammar log data sets.

Disclosure of Invention

Therefore, the invention aims to provide a log sequence abnormality detection method based on attention domain adaptive transfer learning, so as to solve the technical problem of low abnormality detection accuracy caused by uneven distribution of high-frequency words and low-frequency words in the existing log abnormality detection method.

The invention adopts the following technical means:

a log sequence anomaly detection method based on attention domain adaptive transfer learning comprises the following steps:

S1, obtaining a semantic vector of a log template by adopting BERT;

S2, vector optimization is carried out on semantic vectors of the log template by adopting Normalizing Flow, so that the log template vectors are converted into smooth isotropic Gaussian distribution;

s3, extracting vectors of the log templates after the optimization of the S2 by using GRU;

s4, training a source domain log sequence anomaly detection model by using the GRU network based on the vector of the log template acquired in the S3;

S5, judging whether the source domain log sequence is trained completely, if not, acquiring a source domain log sequence vector of the next batch, and turning to S4;

S6, judging whether a model training result meets the expectation or not, if not, storing a source domain log sequence abnormality detection model, and turning to S4; if yes, the optimal anomaly detection model of the source domain log sequence is saved;

S7, improving a domain adaptation function of a source domain log sequence optimal anomaly detection model by adopting an attention mechanism, and calculating domain differences of log data of a source domain and target domain;

S8, performing migration learning training on the target domain log data by using a source domain log sequence optimal anomaly detection model based on domain differences;

S9, judging whether the target domain log sequence transfer learning reaches the expectations or not, if not, turning to S7; if yes, testing is carried out on the test log sequence data set, and a detection result is output.

Further, S1 specifically includes the following steps:

S11, analyzing the log entries by adopting a Drain algorithm, and converting unstructured log information into a structured log template;

s12, dividing a log sequence according to a log session ID or a sliding time window to obtain a log template sequence set of a log data set;

S13, extracting a log template vector from a log template sequence set by adopting a BERT model, wherein n represents the number of log template types, T _i＝{t_i,1,t_i,2,...,t_i,j,...,t_i,l represents the ith log template, T _i,j epsilon T and l represent the sentence length of the log template, the BERT language model is used for encoding the log template T _i, each word in the log template is mapped into a vector of d dimension, and the semantic vector of the log template is generated and recorded as u _i＝{v₁,v₂,...,v_l, wherein v _i∈R^d.

Further, S2 specifically includes the following steps:

S21, inputting U _i into a reversible neural network f (U) to obtain a potential Gaussian representation g _i of a set U= { U ₁,u₂,...,u_l } of semantic vectors of the log template, wherein p (U) is the true probability distribution, and the probability distribution is p (g _i), and the generation process formula of Normalizing Flow is as follows:

s22, calculating a probability density function of the observable U through a variable substitution theorem, wherein the calculation formula is as follows:

S23, learning a stream-based generation model by maximizing the possibility of generating BERT sentence embedding from standard Gaussian latent variables, wherein the formula is as follows:

Wherein f _Φ is a reversible neural network, during training, only the flow parameters are optimized, while the BERT parameters remain unchanged, resulting in a bijective The function can transform each log template vector u _i into a potential gaussian representation G _i without losing information, resulting in a potential gaussian representation g= { G ₁,g₂,...,g_l } of the set of log template vectors.

Further, S3 specifically includes the following steps:

S31, taking the log template vector after the optimization of S2 as input of a GRU model to obtain vector representation of the log template, wherein the GRU unit comprises an update gate z _t and a reset gate r _t, and the calculation formula of the GRU unit is as follows:

z_t＝σ(W_z·[h_t-1,x_t]+b_z) (4)

r_t＝σ(W_r·[h_t-1,x_t]+b_r) (5)

Where z _t is the output of the update gate, h _t-1 is the output of the GRU unit at time t-1, x _t is the input of the GRU unit at time t, W _z,b_z is the weight of the update gate, r _t is the output of the reset gate, W _r,b_r is the weight of the reset gate, W _h,b_h is the weight of the output gate, Is the hidden state at time t, h _t is the output of the GRU unit at time t;

S32, splicing the log template vectors into a log vector sequence set X= { X ₁,x₂,...x_m }, wherein m is the number of log sequences, X _i＝(g_i,1,g_i,2,…g_i,li),l_i≤l,g_i,j represents the j-th log template vector in the log template vector set G, and l _i is the length of the log template vector sequence X _i, and the log sequence vector set X is shown in the following formula through GUR layer coding:

h_t＝GRU(h_t-1,x_t),t∈[1,m] (8)

Where H _t represents the hidden representation of step t, the resulting log sequence vector is represented as h= { H ₁,h₂,...h_m }.

Further, S4 specifically includes the following steps:

S41, inputting the vector of the log template into a fully-connected neural network with a Softmax function to output a detection result, wherein the formula is as follows:

wherein W _s,b_s is the weight and bias vector of the fully connected neural network, Outputting the probability of normal and abnormal for the model;

s42, in the training stage, a cross entropy loss function is used for calculating the loss between the model detection result and the label, and the formula is as follows:

wherein Y is a tag of a log sequence in the dataset;

s43, updating parameters in the GRU network and the fully-connected neural network by using an Adam optimizer.

Further, S7 specifically includes the following steps:

S71, improving a domain adaptation function by adopting an attention mechanism, and calculating a sample correlation score _si＝Cos_Simila(s_i,average_feature_T) and a score _j＝Cos_Simila(tj,average_feature_S according to a sample S _i in a source domain S and a sample T _j in a target domain T, wherein cos_ Simila represents a calculated cosine similarity, average_feature _T represents a feature vector average value of all samples in the target domain T, average_feature _S represents a feature vector average value of all samples in the source domain S, and calculating a source domain attention weight α _i and a target domain attention weight β _j according to the following calculation formula:

S72, respectively calculating weighted sample vectors S _i'＝α_i·s_i and T' _j＝β_j·t_j, and average values of weighted sample vectors of source domain S and target domain T And

S73, calculating domain differences of the source domain and the target domain, wherein a calculation formula is as follows

MMD_Loss＝||μ'_S-μ'_T||²。 (13)

Further, S8 specifically includes the following steps:

and performing migration learning training on the target domain log data by taking the domain difference and the classification loss linear weighting as a loss function, wherein the loss function is as follows:

loss＝clf_Loss+λ*MMD_Loss (14)

wherein clf _loss is E (0, 1), lambda is E (0, 1), MMD_loss is E (0, 1).

The invention also provides a storage medium comprising a stored program, wherein when the program runs, the log sequence abnormality detection method based on the attention domain adaptive migration learning is executed.

The invention also provides an electronic device, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor runs and executes any log sequence abnormality detection method based on the attention domain adaptive migration learning through the computer program.

Compared with the prior art, the invention has the following advantages:

the invention adopts BERT to obtain the semantic vector of the log template, and uses Normalizing Flow to convert the vector distribution of the log template into a smooth standard Gaussian distribution, thereby solving the problem of uneven characteristic distribution of low-frequency words of the log template and improving the accuracy of the model.

When the maximum mean value difference is used for calculating the difference between the log sequence vectors of the source domain and the target domain, the attention mechanism is added, and the characteristics in the source domain and the target domain can be adaptively selected, so that the model pays attention to more important characteristics, and the accuracy and universality of the model are improved.

According to the invention, the linear weighting of the domain difference and the classification loss is used as a loss function to carry out migration learning training, so that a trained source domain anomaly detection model is adapted to target domain data.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to the drawings without inventive effort to a person skilled in the art.

FIG. 1 is a block diagram of the method of the present invention.

FIG. 2 is a flow chart of training and detecting the method of the present invention.

Fig. 3 is a graph showing the influence of different amounts of source domain data on an anomaly detection result when the source domain is BGL and the target domain is HDFS.

Fig. 4 is a graph showing the influence of different amounts of source domain data on an anomaly detection result when the source domain is BGL and the target domain is Thunderbird.

Fig. 5 is a graph of performance of different methods on BGL datasets.

FIG. 6 is a graph of performance of different methods on an HDFS dataset.

FIG. 7 is a graph of performance of different methods on Thunderbird datasets.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

As shown in fig. 1 and 2, the present invention provides a log sequence anomaly detection method based on attention domain adaptive migration learning, which comprises the following steps:

Step 1: and extracting the log template vector by adopting BERT.

Firstly, converting unstructured log information into a structured log template, and analyzing log entries by adopting a Drain algorithm with the best comprehensive performance; then, dividing a log sequence according to the log session ID or the sliding time window to obtain a log template sequence set of a log data set; finally, extracting a log template vector by adopting a BERT model, wherein n represents the number of log template types for a log template set T= { T ₁,t₂,...,t_n } set, T _i＝{t_i,1,t_i,2,...,t_i,j,...,t_i,l represents an ith log template, T _i,j epsilon T, l represents the sentence length of the log template, the log template T _i is encoded by using the BERT language model, each word is mapped into a d-dimensional vector, and a feature vector of the log template is generated and recorded as u _i＝{v₁,v₂,...,v_l, wherein v _i∈R^d.

Step 2: and carrying out log template vector optimization by adopting Normalizing Flow.

For the log template vector set U= { U ₁,u₂,...,u_l }, p (U) is the actual probability distribution, U _i is input into a reversible neural network f (U) to obtain potential Gaussian representation g _i, the probability distribution is p (g _i), and the generation process of Normalizing Flow is shown in a formula (1).

An observable Probability Density Function (PDF) is calculated using equation (2) by the variable substitution theorem.

The stream-based generation model is learned by maximizing the likelihood of generating BERT sentence embeddings from standard gaussian latent variables, as shown in equation (3).

Wherein f _Φ is a reversible neural network. During training, only the stream parameters are optimized, while the BERT parameters remain unchanged, resulting in a bijectionThe function can transform each log template vector u _i into a potential gaussian representation G _i without losing information, resulting in a potential gaussian representation g= { G ₁,g₂,...,g_l } of the set of log template vectors.

Step 3: and extracting the log sequence vector by using the GRU.

And taking the optimized log template vector as input of the GRU model to obtain vector representation of the log sequence. The GRU unit comprises two gates, an update gate z _t and a reset gate r _t, and the calculation formula of the GRU unit is shown as follows.

z_t＝σ(W_z·[h_t-1,x_t]+b_z) (4)

r_t＝σ(W_r·[h_t-1,x_t]+b_r) (5)

Where z _t is the output of the update gate, h _t-1 is the output of the GRU unit at time t-1, x _t is the input of the GRU unit at time t, W _z,b_z is the weight of the update gate, r _t is the output of the reset gate, W _r,b_r is the weight of the reset gate, W _h,b_h is the weight of the output gate,Is the hidden state at time t, h _t is the output of the GRU unit at time t.

The log template vectors are stitched into a set of log vector sequences x= { X ₁,x₂,...x_m }, where m is the number of log sequences, X _i＝(g_i,1,g_i,2,…g_i,li),l_i≤l,g_i,j represents the j-th log template vector in the set of log template vectors G, and l _i is the length of the sequence of log template vectors X _i. The log sequence vector set X is encoded by the GUR layer as shown in equation (8).

h_t＝GRU(h_t-1,x_t),t∈[1,m] (8)

H _t represents the hidden representation of step t, and the resulting log sequence vector is represented as h= { H ₁,h₂,...h_m }.

Step 4: training a source domain log sequence anomaly detection model using the GRU network.

The log sequence vector is input into a fully connected neural network with a Softmax function to output a detection result, as shown in formula (9). Where W _s,b_s is the weight and bias vector of the fully connected neural network,The probabilities of normal and abnormal are output for the model.

During the training phase we use a cross entropy loss function to calculate the loss between the model detection result and the labels, as shown in equation (10), where Y is the label of the log sequence in the dataset.

Parameters in the GRU network and the fully connected neural network are updated using Adam optimizers.

Step 5: and (3) judging whether the source domain log sequence is trained completely, if not, acquiring the source domain log sequence vector of the next batch, and turning to the step (4).

Step 6: judging whether the model training result meets the expectation or not, if not, storing a source domain log sequence abnormality detection model, and turning to the step 4; otherwise, the optimal anomaly detection model of the source domain log sequence is saved.

Step 7: the attention mechanism is employed to improve the domain adaptation function and calculate the domain differences of the source domain and target domain log data.

First, the attention mechanism is adopted to improve the domain adaptation function, and the sample correlation scores are calculated according to the sample S _i in the source domain S and the sample T _j in the target domain TAndWhere cos_ Simila represents the computed cosine similarity, average_feature _T represents the eigenvector average of all samples in the target domain T, average_feature _S represents the eigenvector average of all samples in the source domain S, and the source domain attention weight α _i is computed according to equation (11); the target domain attention weight β _j is calculated according to equation (12).

Then, weighted sample vectors S _i'＝α_i·s_i and T' _j＝β_j·t_j are calculated, respectively, as well as the average of the source and target domain S and T weighted sample vectorsAnd

Finally, the domain difference of the source domain and the target domain is calculated according to equation (13).

MMD_Loss＝||μ'_S-μ'_T||² (13)

Step 8: and performing migration learning training on the target domain log data.

And performing migration learning training on the target domain log data by taking the domain difference and the classification loss linear weight as a loss function, wherein the loss function is shown in a formula (14).

loss＝clf_Loss+λ*MMD_Loss (14)

Wherein clf _Loss epsilon (0, 1), lambda epsilon (0, 1), MMD_Loss epsilon (0, 1).

Step 9: and judging whether the target domain log sequence transfer learning reaches the expectations or not, if not, turning to the step 7, otherwise, testing on a test log sequence data set, and outputting a detection result.

Examples

The algorithm of the patent is respectively compared with the latest algorithms at present from three aspects of accuracy, recall rate and F1 fraction of log sequence anomaly detection, so that the advantages of the algorithm are reflected.

(1) Experimental environment

Experiments of the invention were performed on NVIDIA RTX 3090 24G GPU servers using a Python3.6 environment and Pytorch-based model building, training of LogADAT model using an Adam optimizer.

(2) Data set

In 2019 Shilin He et al published a large amount of log data from 16 different systems on Loghub. LogHub provides a real world collection of log data containing log information for various machine types, such as distributed systems (HDFS), supercomputers (BGL, thunderbird), and the like. The invention selects the journal data sets such as BGL, HDFS, thunderbird and the like, and the details are shown in table 1. The HDFS dataset was collected from a 203 node cluster of the Amazon EC2 platform. The data set is common reference data for abnormality detection based on logs, belongs to reference workload generation in a private cloud environment, and is manually marked by manual rules to identify abnormalities. The log is partitioned into tracks according to block IDs, and then each track associated with a particular block ID is assigned a real tag: normal/abnormal. The BGL dataset is collected from the bluetooth gene/L supercomputer system and each log is identified to be marked as an alert or non-alert message. Thunderbird is an open log dataset collected from a Thunderbird supercomputer system of the Sandia National Laboratory (SNL) located in albert base. The log contains both alert and non-alert messages identified by alert category labels.

Table 1 details of the dataset

(3) Baseline method

LogRobust, logTransfer, neuralLog, deepSyslog was chosen as the baseline method for the comparative experiments.

LogRobust is supervised, using FastText to extract word vectors and aggregating the word vectors through TF-IDF to generate log template vectors that are input into the attention-based Bi-LSTM model for anomaly detection. LogTransfer is a semi-supervised approach based on transfer learning, first using the log template sequence and labels of the source system to train a base model consisting of the source LSTM network and the fully connected network. Then, the source LSTM network in the base model is trimmed by using the log template sequence and the labels of the target system to obtain the target LSTM network. NeuralLog is supervised, using BERT to extract word vectors, and to obtain template vectors by computing the mean of the word vectors, which are input into a transducer-based model for anomaly detection. DeepSyslog are unsupervised, use character-level word embedding to accommodate changes in log print statements, and extract semantic and contextual information hidden in the log stream to represent the original log.

(4) Evaluation index

Abnormality detection is a two-class problem, and the invention utilizes widely used metrics, namely accuracy, recall and F1-score, to evaluate LogADAT the accuracy of each baseline method in terms of abnormality detection.

1) Accuracy rate ofIs the percentage of the log sequence that is truly the exception log sequence in all the log sequences that the model determines to be exception.

2) Recall rate of recallRepresenting the percentage of log sequences that are correctly discriminated by the model as abnormal log sequences in all abnormal log sequences.

3) F1 fractionRepresenting the harmonic mean of accuracy and recall.

Where TP is the number of anomaly log sequences that the model correctly detects. FP is the number of abnormal normal log sequences that the model erroneously identifies as abnormal. FN is the number of anomaly log sequences that the model did not detect.

(5) Experimental parameter setting

In our experiments LogADAT had a layer of GRU with input latitude 768, training the GRU network with Adam optimizer, initial learning rate 2e-4, batch size and weight decay set to 16 and 1e-4, using cross entropy as the loss function.

In order to evaluate the influence of different numbers of source domain data sets (BGL) on detection precision, 10K, 30K, 50K, 80K and 100K log sequences are randomly selected from the source domain data sets to perform LogADAT log anomaly detection model training. Fig. 3, 4 show the results of the test of the present method on HDFS and Thunderbird datasets. Notably, logADAT can achieve a 0.95F 1 score on the HDFS dataset by using a 50K log sequence from BGL, while a 100K log sequence of BGL is required on the Thunderbird dataset to achieve the same anomaly detection performance. Since the log template of HDFS contains most of the BGL, while the log template of Thunderbird contains only a portion of the BGL log template, the anomaly detection model fits faster on HDFS than Thunderbird.

(6) Experimental result set contrast analysis

The study was experimentally compared using LogADAT method with four baseline methods. The present method and four baseline methods were compared on BGL, HDFS, and Thunderbird datasets using accuracy, recall, and F1 scores, and the results are shown in fig. 5,6, and 7.

According to fig. 5, 6 and 7, the experimental results of LogADAT on three datasets BGL, HDFS and Thunderbird show that the accuracy, recall and F1 score on BGL and HDFS datasets are better than the other four methods. The invention is trained on BGL datasets and then anomaly detection is performed on BGL, HDFS and Thunderbird datasets. LogADAT the detection results on BGL and HDFS are all optimal, and the detection effects reach the level of supervised anomaly detection. LogADAT the anomaly detection accuracy and F1 score on Thunderbird are better than the traditional supervised method LogRobust and semi-supervised method LogTransfer, but less so than the supervised method NeuraLog. On Thunderbird dataset, logRobust achieved higher recall but at the cost of low accuracy, no more anomalies could be detected; logTransfer training through a shared full-connection network and fine-tuning on a target domain, wherein the fine-tuning cannot make up the difference between two data sets, so that the accuracy of a detection result and the F1 score are not as good as LogADAT; neuralLog supervised anomaly detection using BERT and transducer models, improved accuracy and recall over LogRobust across all three datasets. DeepSyslog is an unsupervised approach, although superior to the present invention in recall, the overall index F1 score is below LogADAT. Compared to the above method, logADAT performs migration training by combining domain adaptation loss to better adapt the model to anomaly detection of the target domain, so that a higher overall index F1 score can be obtained on the HDFS and Thunderbird datasets.

(7) Conclusion(s)

Currently, research for software system log anomaly detection is still data-driven. However, when the system is just on-line, it is impractical to collect enough tagged data to train an anomaly detection model. Therefore, the invention provides a log sequence anomaly detection method (LogADAT) based on attention domain adaptive transfer learning. According to the method, first Normalizing Flow is used for optimizing semantic features of the log template after BERT extraction so as to better utilize semantic information of low-frequency words in the log template. Secondly, constructing a GRU network to obtain the relation between log sequences, and combining the attention weights of a source domain and a target domain while calculating the domain difference so that the model can pay attention to more important features better. Finally, the domain difference loss and the classification loss are weighted linearly, and the target domain log data is subjected to migration learning training, so that a trained abnormality detection model is adapted to different types of log data sets, and the universality of abnormality detection on the label-free data sets is improved. Experimental results show that the abnormality detection effect of LogADAT method on a large system log dataset is superior to that of the current mainstream method. In the future, a method for detecting log anomalies by combining other valuable information in the log is further explored, and the indexes such as the accuracy, recall rate, F1 score and the like of the method are tested on more log data sets in an attempt to verify the universality of the indexes.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims

1. The log sequence anomaly detection method based on the attention domain adaptive transfer learning is characterized by comprising the following steps of:

S1, obtaining a semantic vector of a log template by adopting BERT;

2. The method for detecting log sequence anomalies based on attention-domain adaptive transfer learning according to claim 1, wherein S1 specifically comprises the steps of:

S13, extracting a log template vector from a log template sequence set by adopting a BERT model, wherein n represents the number of log template types for a log template set T= { T ₁,t₂,...,t_n }, T _i＝{t_i,1,t_i,2,...,t_i,j,...,t_i,l } represents an ith log template, T _i,j epsilon T, l represents the sentence length of the log template, the log template T _i is encoded by adopting the BERT language model, each word in the log template is mapped into a vector of d dimension, and the semantic vector of the log template is generated and recorded as u _i＝{v₁,v₂,...,v_l }, wherein v _i∈R^d.

3. The log sequence anomaly detection method based on attention domain adaptive transfer learning of claim 1, wherein S2 specifically comprises the steps of:

4. The log sequence anomaly detection method based on attention domain adaptive transfer learning of claim 1, wherein S3 specifically comprises the steps of:

z_t＝σ(W_z·[h_t-1,x_t]+b_z) (4)

r_t＝σ(W_r·[h_t-1,x_t]+b_r) (5)

S32, splicing the log template vectors into a log vector sequence set X= { X ₁,x₂,...x_m }, wherein m is the number of log sequences, L _i≤l,g_i,j denotes the j-th log template vector in the log template vector set G, and l _i denotes the length of the log template vector sequence X _i, the log sequence vector set X is represented by the GUR layer coding formula:

h_t＝GRU(h_t-1,x_t),t∈[1,m] (8)

5. The method for detecting log sequence anomalies based on attention-domain adaptive transfer learning of claim 1, wherein S4 specifically comprises the steps of:

wherein Y is a tag of a log sequence in the dataset;

6. The log sequence anomaly detection method based on attention domain adaptive transfer learning of claim 1, wherein S7 specifically comprises the steps of:

S71, improving a domain adaptation function by adopting an attention mechanism, and respectively calculating sample correlation scores according to a sample S _i in a source domain S and a sample T _j in a target domain T AndWhere cos_ Simila represents the computed cosine similarity, average_feature _T represents the feature vector average of all samples in the target domain T, average_feature _S represents the feature vector average of all samples in the source domain S, and the computed source domain attention weight α _i and target domain attention weight β _j are as follows:

S72, respectively calculating weighted sample vectors S '_i＝α_i·s_i and T' _j＝β_j·t_j, and average values of weighted sample vectors of source domain S and target domain T And

MMD_Loss＝||μ'_S-μ'_T||²。 (13)

7. The log sequence anomaly detection method based on attention domain adaptive transfer learning of claim 1, wherein S8 specifically comprises the steps of:

loss＝clf_Loss+λ*MMD_Loss (14)

wherein clf _loss is E (0, 1), lambda is E (0, 1), MMD_loss is E (0, 1).

8. A storage medium comprising a stored program, wherein the program, when executed, performs the log sequence anomaly detection method based on attention-domain adaptive migration learning of any one of claims 1 to 7.

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor is operative with the computer program to perform the attention-domain adaptive transfer learning-based log sequence anomaly detection method of any one of claims 1 to 7.