CN114625567A

CN114625567A - Abnormity detection method and system for system fault-tolerant mechanism

Info

Publication number: CN114625567A
Application number: CN202210138381.4A
Authority: CN
Inventors: 莫语; 朱品燕
Original assignee: Beijing Yunji Zhizao Technology Co ltd
Current assignee: Beijing Yunji Zhizao Technology Co ltd
Priority date: 2022-02-15
Filing date: 2022-02-15
Publication date: 2022-06-14

Abstract

The invention provides an anomaly detection method and system facing to a system fault-tolerant mechanism, wherein a new log anomaly detection model is constructed by analyzing the fault-tolerant mechanism of a software system in the anomaly detection scheme so as to solve the problem of false anomaly false alarm caused by the fault-tolerant mechanism in the traditional log anomaly detection, eliminate false anomaly points and improve the detection precision of the log anomaly detection model.

Description

Abnormity detection method and system for system fault-tolerant mechanism

Technical Field

The invention relates to the field of an abnormality detection method and system, in particular to an abnormality detection method and system oriented to a system fault-tolerant mechanism.

Background

Large-scale and complex software systems, such as online service systems and data centers, record a large amount of log information for troubleshooting. Often, this log information is a semi-structured text string used to record events or the state of the system. However, because the software system is large in scale and increasingly complex in structure, it is increasingly difficult to check the log manually to know the real state of the system and to remove the exception of the software system. Therefore, many solutions for log-based automated detection of system anomalies have emerged in recent years. These methods retrieve valuable information from the logs and analyze the log data and detect system anomalies using data mining or machine learning techniques. Although such automated log anomaly detection methods greatly reduce the complexity of software anomaly detection, these methods ignore the effects of fault-tolerant mechanisms in software systems, resulting in false positives of anomalies. For example: in the interactive process of the software system, the timeout exception may be recorded in the log due to the network delay problem, but the retry strategy according to the fault-tolerant mechanism may be successfully accessed in the subsequent process, so that the system still normally operates, and the recorded log exception should not be regarded as a system exception.

The disadvantages of the prior art are as follows: the fault-tolerant mechanism built in the software system is not considered during log abnormity detection, due to the existence of the fault-tolerant mechanism in the software system, the recorded abnormity of the log does not indicate the abnormity of the software system, and due to the neglect of the fault-tolerant mechanism, a large amount of false abnormity exists during log abnormity detection, so that the precision of the log abnormity detection is limited.

Therefore, an anomaly detection method and system oriented to a system fault-tolerant mechanism become an urgent problem to be solved in the whole society.

Disclosure of Invention

The technical scheme of the invention is as follows:

aiming at log data collected by a software system, firstly, a log data processor is utilized to process a log, the log is converted into a log template, irrelevant symbols are removed, and then the log template is vectorized, so that a log anomaly detection model can extract semantic information of the log. Inputting vectorized log data into a log anomaly detection model based on an attention mechanism to obtain an anomaly analysis result, recording fault-tolerant knowledge in a four-tuple manner according to a pseudo-anomaly log mode in the anomaly detection result and corresponding software system fault-tolerant knowledge, establishing a fault-tolerant knowledge base of heterogeneous semantics by a regularization fault-tolerant mechanism on the basis of heterogeneous rules of a DataLog data query language, and supporting manual query, supplement and update of the fault-tolerant knowledge. And finally, using the text similarity as a basic principle, utilizing semantic information obtained by vectorization of a log template to perform fault-tolerant knowledge matching on the log text needing pseudo-anomaly filtering, and eliminating pseudo-anomaly points according to the prior fault-tolerant knowledge in a fault-tolerant knowledge base to realize a higher-precision log anomaly detection method.

In order to solve the technical problems, the technical scheme provided by the invention is as follows: an anomaly detection method and system for system fault-tolerant mechanism includes,

(1) log data processor

Firstly, searching the logs according to the length of the logs, selecting different nodes according to the length in the parse tree, and selecting different leaf nodes according to the first Token of the logs, so that the logs can be divided into different leaf nodes, the logs in the leaf nodes have the same log length and the same first Token, and then selecting a proper log group in the leaf nodes by using a similarity matching algorithm, wherein the similarity matching algorithm is as follows:

wherein the function equ is defined as follows:

after obtaining a proper log group, updating the logs in the log group, namely scanning the logs Token in the same log group, if the tokens are the same, not changing the logs Token, otherwise replacing by "+", and converting the original logs into log templates by the method;

(2) log vectorization

Obtaining vectorization representation of a log Token based on a distributed representation method of a neural network, namely adding a Word Embedding network layer for a log anomaly detection model during neural network training, wherein all parameters of the layer can be learned, the network layer is integrated in the log anomaly detection model for common training, a sequence number numerical sequence passes through the network layer to obtain vectorization representation of the log, and if the length of an input log sequence number sequence is N, an obtained semantic representation vector is as follows: an N × 300 vector matrix, wherein each Token is characterized by a 300-dimensional vector, and a log sequence with a length of N is to be converted into the N × 300 vector matrix;

after the vector matrix is obtained, the invention obtains the weights of different Token according to Token statistical information of the global log, and the specific operation is as follows: calculate its word frequency (TF) and Inverse Document Frequency (IDF) for each Token:

TF × IDF is denoted as Token weight, then the vectorization of a log of length N is represented as:

wherein v is_iVectorized representation, w, representing each Token_iRepresenting the weight of the Token;

(3) log anomaly detection model

After the semantic vector of the log template is obtained, the log template sequence can be converted into a numerical value sequence; the invention uses a bidirectional LSTM network to analyze the numerical sequence after log direction quantization; the network divides the hidden layer into two parts, namely a forward propagation process and a backward propagation process, and the combination of the two parts is the hidden layer characteristics extracted by the layer network; meanwhile, an attention mechanism is introduced to calculate the weight of each hidden layer feature at time t, all the hidden layer features are subjected to weighted summation, and then a prediction result is obtained through a full connection layer and a Softmax layer and is recorded as:

wherein h is_tIndicating the hidden layer obtained at time tSymbol, w_iRepresents the weight of the Token;

(4) false anomaly discriminator

Because the abnormal detection result obtained by the neural network may cause false alarm due to the fault-tolerant mechanism of the system, the invention designs a false abnormal discriminator aiming at the fault-tolerant mechanism of the system, in the Session of the log, the Session is divided according to the window size, wherein the data in the window is training data, and the first log template behind the window is a label; for example, the log sequence number value sequence is:

Session_i＝[12，24，3，6，7，9，10，0，1]

assuming a window size of 5, the division results in windows and corresponding labels as follows:

Window₁＝[12，24，3，6，7，9]label＝10

Window₂＝[24，3，6，7，9，10]label=0

Window₃＝[3，6，7，9，10，0，1]label=1

when the log is detected to be abnormal, if a certain window prediction result is abnormal, the pseudo-abnormal discriminator does not directly mark the Session containing the window as abnormal, but completely traverses all the windows of the Session and counts the number of abnormal windows; considering error-tolerant mechanisms such as retry, check point and the like existing in a software system, the invention provides two abnormal indexes to comprehensively judge whether the Session is abnormal or not;

(5) the fault-tolerant knowledge base can eliminate partial false exceptions through the false exception discriminator, for example, if a system log caused by a retry mechanism in a fault-tolerant strategy has network overtime exceptions, the overtime exceptions independently appear as normal conditions of the system, and after the exceptions appear, the system obtains a network feedback result through the retry mechanism, so that the system can continuously operate, and the continuous appearance of the overtime exceptions indicates that the system continuously requests but does not obtain feedback, so that the system is abnormal; according to the method, the log template information recorded in the Session is extracted according to the pseudo-abnormal log mode found by the pseudo-abnormal discriminator, the event is taken as the core, the hierarchical description of the log template body is completed, the log template body is defined as a quadruple group which takes the event ID, the log correlation, the pseudo-abnormal log mode and the corresponding fault-tolerant rule as elements, the quadruple group can effectively reflect the core fault-tolerant information and the pseudo-abnormality caused by fault tolerance, and the matching requirement of a subsequent pseudo-abnormal filter can be met; the log pseudo-abnormal mode in the quadruple is obtained by sequentially storing a log template in a data table form;

(6) the invention provides a log matching algorithm based on text similarity, which is characterized in that a pseudo-anomaly filter is used for eliminating pseudo-anomaly points in a log anomaly detection output result by using fault-tolerant rules in a fault-tolerant knowledge base; in the recorded logs of the software system, even if two logs correspond to the same fault-tolerant rule, the two logs cannot be completely consistent, so that the invention sets a proper screening threshold value to determine the matched fault-tolerant knowledge on the basis of the text similarity;

when the log is vectorized, the vectorization expression of the log template is obtained based on the distributed expression method of the neural network and TF multiplied by IDF, the similarity of the two logs is obtained through the cosine similarity of the vectorization expression of the two logs, and the similarity calculation formula is as follows:

the vectorization representation of the two logs is found to have the same length from the formula, but the two logs may have different lengths in practice, so that the log template sequence is subjected to weighted summation before the similarity is calculated, the log template vector sequence is converted into a single vector, and the similarity between the two logs is obtained through a cosine similarity calculation method; by the method, the logs in the current Session are matched with corresponding fault-tolerant knowledge, and the pseudo-abnormal points are removed according to the fault-tolerant knowledge to obtain the final log abnormality detection result.

The length of the vectorization representation of the two logs is necessarily consistent from the formula, but the lengths of the two logs are not consistent in practice, so that the log template sequence is subjected to weighted summation before the similarity is calculated, the log template vector sequence is converted into a single vector, and the similarity between the two logs is obtained through a cosine similarity calculation method. By the method, the logs in the current Session are matched with corresponding fault-tolerant knowledge, and the pseudo-abnormal points are removed according to the fault-tolerant knowledge to obtain the final log abnormality detection result.

Further, the abnormality index includes:

the method comprises the following steps that an abnormal quantity index indicates that if a plurality of abnormal windows appear in a Session, a plurality of possible abnormal log sequences exist in log data in the Session, and then the Session is abnormal at a high probability; the invention sets the number index of abnormal windows as 2;

if continuous abnormal windows appear in the Session, the continuous abnormal indexes show that all logs recorded in a software system within a period of continuous time are possibly abnormal logs, and the condition is quite rare when the software system normally runs; the invention considers the window exception occurring continuously as one of the signs of Session exception, and sets the number of continuous window exception to 2.

Compared with the prior art, the invention has the advantages that the invention provides the log anomaly detection method and the system facing the system fault-tolerant mechanism, and the anomaly detection scheme constructs a new log anomaly detection model by analyzing the fault-tolerant mechanism of the software system to solve the problem of false anomaly false alarm caused by the fault-tolerant mechanism in the traditional log anomaly detection, eliminate false anomaly points and improve the detection precision of the log anomaly detection model.

Drawings

FIG. 1 is a block diagram of an anomaly detection method and system for a system fault tolerance mechanism according to the present invention;

FIG. 2 is a flow chart of the system fault-tolerant mechanism-oriented anomaly detection method and system according to the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings.

The present invention will be described in detail with reference to the accompanying drawings.

The invention provides an anomaly detection method and system oriented to a system fault-tolerant mechanism in specific implementation,

(1) log data processor

For the processing of the original log, some simple regular expressions defined artificially are used to remove the interference of some variables, such as IP addresses and the like. Then, the invention constructs a fixed-depth parse tree to parse the log. Firstly, the logs are searched according to the length of the logs, different nodes are selected according to the length in the parse tree, and different leaf nodes are selected according to the first Token of the logs, so that the logs can be divided into different leaf nodes, and the logs in the leaf nodes have the same log length and the same first Token. Then, an appropriate log group is selected in the leaf node using a similarity matching algorithm as follows:

wherein the function equ is defined as follows:

after obtaining a proper log group, updating the logs in the log group, namely scanning logs Token in the same log group, and if Token is the same, not changing the logs Token, otherwise replacing with "+". In this way we convert the original log into a log template.

(2) Log vectorization

The method uses a regular expression to delete some meaningless symbols in the log template sequence, such as "+," the ", and the like, and the semantic representation of the log is not influenced by the addition or absence of the characters. Then, the invention constructs a word list aiming at all the logs, assigns the sequence number of each word in the word list to each word in the log template, and converts the log template into a sequence number numerical value sequence.

The invention obtains the vectorization representation of the log Token by using a distributed representation method based on a neural network, namely, a Word Embedding network layer is added to a log anomaly detection model during the training of the neural network, all parameters of the layer can be learned, and the network layer is integrated in the log anomaly detection model for common training. And the sequence number numerical value sequence passes through the vectorization representation of the representation log obtained by the network layer. If the length of the input log sequence number sequence is N, the obtained semantic expression vector is as follows: an N × 300 matrix of vectors, wherein each Token is characterized by a vector of 300 dimensions. A log sequence of length N would translate into an N x 300 vector matrix.

TF × IDF is denoted as Token weight, then vectorization of a log of length N is represented as:

wherein v is_iRepresenting vectorized representation, w, of each Token_iRepresenting the weight of the Token.

(3) Log anomaly detection model

After the semantic vector of the log template is obtained, the log template sequence can be converted into a numerical value sequence. The invention uses a bidirectional LSTM network to analyze the numerical sequence after log direction quantization. The network divides the hidden layer into two parts, namely a forward propagation process and a backward propagation process, and the combination of the two processes is the hidden layer feature extracted by the network. Meanwhile, an attention mechanism is introduced to calculate the weight of each hidden layer feature at time t, all the hidden layer features are subjected to weighted summation, and then a prediction result is obtained through a full connection layer and a Softmax layer. Is recorded as:

wherein h is_tRepresenting the hidden layer characteristic obtained at time t, w_iRepresents the weight of the Token.

(4) False anomaly discriminator

Because the abnormal detection result obtained by the neural network may cause false alarm due to the fault-tolerant mechanism of the system, the invention designs a false abnormal discriminator aiming at the fault-tolerant mechanism of the system. In a Session of a log, sessions are divided according to the size of a window, wherein data in the window is training data, and a first log template behind the training data is a label. For example, the log sequence number value sequence is:

Session_i＝[12，24，3，6，7，9，10，0，1]

Window₁＝[12，24，3，6，7，9]label＝10

Window₂＝[24，3，6，7，9，10]label＝0

Window₃＝[3，6，7，9，10，0，1]label＝1

then, there are multiple continuous windows in a Session, and when the log exception is detected, if a prediction result of a certain window is abnormal, the false exception discriminator does not directly mark the Session containing the window as abnormal, but completely traverses all the windows of the Session and counts the number of abnormal windows therein. In consideration of error-tolerant mechanisms such as retry and check points existing in a software system, the invention provides two abnormal indexes to comprehensively judge whether the Session is abnormal or not.

An abnormality number index: if multiple exception windows occur in a Session, which indicates that multiple possible exception log sequences exist in the log data in the Session, then the Session is also abnormal with a high probability. The number index of abnormal windows is set to be 2.

Continuous anomaly index: if continuous abnormal windows appear in the Session, the fact that logs recorded in the software system for a continuous time are all possible abnormal logs is very rare when the software system runs normally. The invention considers the window abnormity which continuously appears as one of the signs of the Session abnormity, and the number of the continuous window abnormity is set to be 2.

According to the method, the abnormal window quantity index threshold value and the continuous window abnormal index threshold value are set, so that the number of normal samples detected as abnormal Samples (FPs) is greatly reduced while the correct abnormal (TP) detection result is ensured, pseudo-abnormal points are reduced, and the log abnormal detection precision is improved.

(5) Fault tolerant knowledge base

The system can eliminate partial false exceptions through the false exception discriminator, for example, if a network overtime exception occurs in a system log caused by a retry mechanism in a fault-tolerant strategy, the overtime exception independently occurs in a normal condition belonging to the system, after the exception occurs, the system obtains a network feedback result through the retry mechanism, so that the system can continuously operate, and if the overtime exception continuously occurs, the system continuously requests without obtaining feedback, at this time, the system exception is. According to the invention, the log template information recorded in the Session is extracted according to the pseudo-abnormal log pattern found by the pseudo-abnormal discriminator, the event is taken as the core, the hierarchical description of the log template body is completed, the log template body is defined as a quadruple group which takes the event ID, the log correlation, the pseudo-abnormal log pattern and the corresponding fault-tolerant rule as elements, and the quadruple group can effectively reflect the core fault-tolerant information and the pseudo-abnormality caused by fault tolerance and can meet the matching requirement of the subsequent pseudo-abnormal filter. The log pseudo-abnormal pattern in the quadruple is obtained by sequentially storing the log template in a data table form.

In the aspect of knowledge query, the heterogeneous rules of a DataLog data query language are adopted to express log relation and user access behavior, and the fault-tolerant knowledge base is formally defined as B ═ KB, P, wherein KB represents quadruplet describing fault-tolerant knowledge, and P represents an application rule set consisting of DataLog heterogeneous rules. And under the security constraint of DataLog, constructing a fault-tolerant knowledge base of heterogeneous semantics for the pseudo-abnormal log mode. The method not only supports iterative updating of the fault-tolerant knowledge base, but also supports manual inquiry and supplement of corresponding fault-tolerant knowledge. The fault-tolerant knowledge base based on the DataLog heterogeneous rule is constructed, so that the defect that reasoning and query of a log template body are weak is overcome, and the expression capability and the reasoning capability of the fault-tolerant knowledge base are effectively improved. The invention records the fault-tolerant knowledge in a four-tuple mode, regularizes a fault-tolerant mechanism and expands a fault-tolerant knowledge base.

(6) Pseudo-anomaly filter

The pseudo-anomaly filter needs to eliminate pseudo-anomaly points in the log anomaly detection output result by using fault-tolerant rules in a fault-tolerant knowledge base. In the log recorded by the software system, even if two logs correspond to the same fault-tolerant rule, the two logs cannot be completely consistent, so that the invention sets a proper screening threshold value to determine the matched fault-tolerant knowledge on the basis of the text similarity.

The invention provides a log abnormity detection method and system facing to a system fault-tolerant mechanism, and the core innovation points of the method and the system are as follows:

1. a new log anomaly detection method is provided, and an anomaly false report of an automated system anomaly detection tool is reduced by utilizing a fault-tolerant mechanism.

2. A pseudo-anomaly filter based on a fault-tolerant mechanism is embedded in an automatic log anomaly detection model, and an anomaly detection system with higher accuracy is constructed together.

3. A log abnormity detection method based on a system fault tolerance mechanism and a system structure are provided.

The present invention and its embodiments have been described above, and the description is not intended to be limiting, and the drawings are only one embodiment of the present invention, and the actual structure is not limited thereto. In summary, those skilled in the art should appreciate that they can readily use the disclosed conception and specific embodiments as a basis for designing or modifying other structures for carrying out the same purposes of the present invention without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. An anomaly detection method and system for system fault-tolerant mechanism is characterized by comprising

(1) Log data processor

wherein the function equ is defined as follows:

after obtaining a proper log group, updating the logs in the log group, namely scanning logs Token in the same log group, if Token is the same, not changing the logs Token, and if Token is not the same, replacing the logs with "+", and converting the original logs into log templates by the method;

(2) log vectorization

wherein v is_iRepresenting vectorized representation, w, of each Token_iRepresents the weight of the Token;

(3) log anomaly detection model

After the semantic vector of the log template is obtained, the log template sequence can be converted into a numerical value sequence; the invention uses a bidirectional LSTM network to analyze numerical sequences after log direction quantization; the network divides the hidden layer into two parts, namely a forward propagation process and a backward propagation process, and the combination of the two parts is the hidden layer characteristics extracted by the layer network; meanwhile, an attention mechanism is introduced to calculate the weight of each hidden layer feature at time t, all the hidden layer features are subjected to weighted summation, and then a prediction result is obtained through a full connection layer and a Softmax layer and is recorded as:

wherein h is_tRepresenting the hidden layer characteristic obtained at time t, w_iRepresents the weight of the Token;

(4) false anomaly discriminator

Session_i＝[12，24，3，6，7，9，10，0，1]

Window₁＝[12，24，3，6，7，9]label＝10

Window₂＝[24，3，6，7，9，10]label＝0

Window₃＝[3，6，7，9，10，0，1]label＝1

(6) the invention provides a log matching algorithm based on text similarity, which is characterized in that a pseudo-anomaly filter is used for eliminating pseudo-anomaly points in a log anomaly detection output result by using fault-tolerant rules in a fault-tolerant knowledge base; in the recorded logs of the software system, even if two logs correspond to the same fault-tolerant rule, the two logs cannot be completely consistent, so that the invention sets a proper screening threshold value to determine matched fault-tolerant knowledge on the basis of text similarity;

when the log is vectorized, the vectorization representation of the log template is obtained based on the distributed representation method of the neural network and TF multiplied by IDF, the similarity of the two logs is obtained through the cosine similarity of the vectorization representation of the two logs, and the similarity calculation formula is as follows:

finding that the vectorization representation of the two logs has to be consistent in length from the formula, but the two logs may not be consistent in length in practice, therefore, the invention carries out weighted summation on the log template sequence before calculating the similarity, converts the log template vector sequence into a single vector, and obtains the similarity between the two logs by a cosine similarity algorithm; by the method, the logs in the current Session are matched with corresponding fault-tolerant knowledge, and the pseudo-abnormal points are removed according to the fault-tolerant knowledge to obtain the final log abnormality detection result.

2. The method and system for detecting the abnormality of the system fault-tolerant mechanism according to claim 1, wherein: the abnormality index includes:

if continuous abnormal windows appear in the Session, the logs recorded in the software system within a period of continuous time are all possible abnormal logs, and the condition is very rare when the software system runs normally; the invention considers the window abnormity which continuously appears as one of the signs of the Session abnormity, and the number of the continuous window abnormity is set to be 2.