CN111782472B

CN111782472B - System abnormality detection method, device, equipment and storage medium

Info

Publication number: CN111782472B
Application number: CN202010611178.5A
Authority: CN
Inventors: 邓悦; 郑立颖; 徐亮
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-06-30
Filing date: 2020-06-30
Publication date: 2022-04-26
Anticipated expiration: 2040-06-30
Also published as: CN111782472A; WO2021139235A1

Abstract

The invention relates to artificial intelligence and provides a system anomaly detection method, a system anomaly detection device, system anomaly detection equipment and a storage medium. The method comprises the following steps: respectively inputting the marked logs, the unmarked logs and the expanded logs of the system to be detected into three same training models in a training model set for training, and outputting probability distribution of different abnormal grades of the marked logs, the unmarked logs and the expanded logs; then calculating cross entropy loss and consistency loss output by the training model; predicting the abnormal levels of the unmarked logs and the extended logs according to the consistency loss, and iterating the training model set according to the cross entropy loss until the training model set is converged to obtain a log abnormal detection model; and finally, detecting the abnormal logs in the system operation through a log abnormality detection model. In addition, the invention also relates to a blockchain technology, and the marked logs, the unmarked logs and the extended logs can be stored in the blockchain. By optimizing the model training mode, overfitting of the model is prevented, and the difficulty in detecting abnormal points in the system by the detection model is reduced.

Description

System abnormality detection method, device, equipment and storage medium

Technical Field

The present invention relates to artificial intelligence decision making, and in particular, to a method, an apparatus, a device, and a storage medium for system anomaly detection.

Background

With the increase of system scale, the improvement of complexity and the perfection of monitoring coverage, the monitoring data volume is larger and larger, and operation and maintenance personnel cannot find quality problems from massive monitoring data. The intelligent anomaly detection is to automatically, real-timely and accurately find anomalies from monitoring data through an AI algorithm, and provide a basis for subsequent diagnosis and self-healing. The anomaly detection is a very basic but very important function in an AIOps (intelligent operation) system, and is mainly used for discovering abnormal behaviors in KPI time sequence data through automatic excavation by an algorithm and a model, so that necessary decision basis is provided for subsequent alarm, automatic loss stopping, root cause analysis and the like.

However, in an actual application scenario, normal data generally accounts for a large proportion of the total data volume, and data of abnormal points is very rare, which brings great difficulty to abnormal detection. In the training phase of the detection model, in order to ensure the positive and negative balance of the model training sample, the traditional solution idea is mainly as follows: in the process of model detection, under-sampling (discarding a part of data) a normal sample and over-sampling (repeating a part of data) an abnormal sample, wherein the normal sample loses a large amount of sample information to cause model overfitting and poor generalization capability; for the latter, simple random sampling also creates a risk of overfitting the model. Therefore, the data detection difficulty in the intelligent operation system is increased no matter the data amount of the abnormal points per se is rare or the data detection model for the abnormal points is difficult to construct accurately.

Disclosure of Invention

The invention mainly aims to solve the problem of high difficulty in anomaly detection of an intelligent operation system.

The invention provides a system anomaly detection method in a first aspect, which comprises the following steps:

acquiring a marked log and a non-marked log of a system to be detected, and expanding the non-marked log to obtain an expanded log;

respectively inputting the marked logs, the unmarked logs and the expanded logs into three identical abnormal grade training models for training, and correspondingly outputting a first probability distribution of each abnormal grade of the marked logs, a second probability distribution of each abnormal grade of the unmarked logs and a third probability distribution of each abnormal grade of the expanded logs, wherein the three identical abnormal grade training models form an abnormal grade training model set;

calculating a cross-entropy loss of the first probability distribution corresponding to a preset abnormal level marker of the marked log, and calculating a consistency loss between the second probability distribution and the third probability distribution;

predicting abnormal grade marks of the unmarked logs and the extended logs according to the consistency loss, and iterating the abnormal grade training model set according to the cross entropy loss until the abnormal grade training model set is converged to obtain a log abnormal detection model;

and acquiring a log to be detected of the system to be detected, inputting the log to be detected into the log abnormity detection model for detection, outputting an abnormity grade corresponding to the log to be detected, and taking the abnormity grade corresponding to the log to be detected as an analysis result of the current system operation state.

Optionally, in a first implementation manner of the first aspect of the present invention, the expanding the unmarked log to obtain an expanded log includes:

analyzing the unmarked log to obtain a plurality of log fields with different semantics;

screening key fields related to abnormal levels from the log fields according to preset semantic structure prior knowledge and the occurrence frequency of the log fields;

acquiring one or more synonymous fields corresponding to the key fields, and replacing the corresponding key fields with the synonymous fields;

and splicing the synonymous field and other log fields except the key field according to a random field processing strategy to obtain a plurality of corresponding expansion logs, wherein the random field processing strategy comprises replacing, deleting, inserting or exchanging the other log fields.

Optionally, in a second implementation manner of the first aspect of the present invention, the inputting the marked log, the unmarked log, and the extended log into three identical anomaly level training models for training respectively, and outputting a first probability distribution of each anomaly level of the marked log, a second probability distribution of each anomaly level of the unmarked log, and a third probability distribution of each anomaly level of the extended log in a corresponding manner includes:

uniformly adjusting the length of each log data in the marked log, the unmarked log and the expanded log to a preset length, and constructing a corresponding data vector;

determining the characteristic dimension of the data vector according to the length of the data vector, and extracting semantic features of the data vector according to the characteristic dimension to obtain initial semantic features;

and screening and combining the prominent features of the initial semantic features to obtain final semantic features, and calculating and outputting the probability distribution of the abnormal levels of the marked logs, the unmarked logs and the expanded logs according to the final semantic features.

Optionally, in a third implementation manner of the first aspect of the present invention, the calculating a cross-entropy loss of the first probability distribution corresponding to a preset abnormal level flag of the flag log includes:

calculating the correct prediction probability of the abnormal grade of each marked log according to the first probability distribution and the preset abnormal grade mark of the marked log;

and calculating the cross entropy loss of the first probability distribution according to preset model training parameters and the correct prediction probability so as to measure the difference between the abnormal grade prediction of the labeled log by the classification model and the real abnormal grade of the labeled log.

Optionally, in a fourth implementation manner of the first aspect of the present invention, the iterating the abnormal level training model set according to the cross entropy loss until the abnormal level training model set converges to obtain a log abnormal detection model includes:

determining the correct prediction probability corresponding to each marked log according to the cross entropy loss;

judging whether a correct prediction probability larger than a preset probability threshold exists or not;

if so, deleting the first probability distribution corresponding to the correct prediction probability which is greater than the probability threshold value, and continuing to iterate the log anomaly detection model, otherwise, directly iterating the log anomaly detection model, and updating the model training parameters after the log anomaly detection model is iterated;

calculating the sum of the cross entropy loss and the consistency loss to obtain a corresponding final loss value, and judging whether the final loss value is smaller than a preset final loss threshold value or not;

and if the final loss value is smaller than the final loss threshold value, the abnormal level training model set is converged and stops iteration to obtain a log abnormal detection model.

Optionally, in a fifth implementation manner of the first aspect of the present invention, the calculation formula of the correct prediction probability is:

and is

Wherein, said eta_tIs a probability threshold value, the_tFor a growth coefficient, K is the number of the abnormal grade categories, T is the current iteration number, and T is the preset total iteration number;

when the data amount in the marking log is smaller than a preset normal data amount range,

when the data amount in the marked log is larger than the normal data amount range, the data amount is recorded in the marked log

Optionally, in a sixth implementation manner of the first aspect of the present invention, the acquiring a log to be detected of a system to be detected, inputting the log to be detected into the log anomaly detection model for detection, outputting an anomaly level corresponding to the log to be detected, and taking the anomaly level corresponding to the log to be detected as an analysis result of the current system operation state includes:

acquiring a log to be detected of a system to be detected, wherein the log to be detected comprises a plurality of pieces of log information, and the log information comprises identification information of system operation management priority;

inputting the log to be detected into the log abnormity detection model for detection, and predicting the abnormity grade of the log to be detected through the log abnormity detection model;

screening logs to be detected with abnormal grades higher than a preset abnormal grade threshold value, and determining log information with priority higher than a preset priority threshold value from the screened logs to be detected according to the identification information;

and highlighting the log information with the priority greater than a preset priority threshold value, and taking the abnormal grade corresponding to the highlighted log information and other log information except the highlighted log information as an analysis result of the current system running state.

A second aspect of the present invention provides a system abnormality detection apparatus, including:

the acquisition module is used for acquiring a marked log and a non-marked log of the system to be detected and expanding the non-marked log to obtain an expanded log;

a training module, configured to input the marked log, the unmarked log, and the extended log into three identical abnormal level training models for training, and output a first probability distribution of each abnormal level of the marked log, a second probability distribution of each abnormal level of the unmarked log, and a third probability distribution of each abnormal level of the extended log, where the three identical abnormal level training models form an abnormal level training model set;

a calculation module, configured to calculate a cross entropy loss between the first probability distribution and a preset abnormal level flag of the flag log, and calculate a consistency loss between the second probability distribution and the third probability distribution;

the generation module is used for predicting the abnormal grade marks of the unmarked logs and the extended logs according to the consistency loss, and iterating the abnormal grade training model set according to the cross entropy loss until the abnormal grade training model set is converged to obtain a log abnormal detection model;

the detection module is used for acquiring the log to be detected of the system to be detected, inputting the log to be detected into the log abnormity detection model for detection, outputting the abnormity grade corresponding to the log to be detected, and taking the abnormity grade corresponding to the log to be detected as the analysis result of the current system running state.

Optionally, in a first implementation manner of the second aspect of the present invention, the obtaining module is further configured to:

Optionally, in a second implementation manner of the second aspect of the present invention, the training module further includes:

the construction unit is used for uniformly adjusting the length of each log data in the marked log, the unmarked log and the expanded log to a preset length and constructing a corresponding data vector;

the feature extraction unit is used for determining the feature dimension of the data vector according to the length of the data vector and extracting the semantic features of the data vector according to the feature dimension to obtain initial semantic features;

and the probability distribution generating unit is used for screening and combining the prominent features of the initial semantic features to obtain final semantic features, and calculating and outputting the probability distribution of the abnormal levels of the marked logs, the unmarked logs and the extended logs according to the final semantic features.

Optionally, in a third implementation manner of the second aspect of the present invention, the calculation module further includes:

the first calculation unit is used for calculating the correct prediction probability of the abnormal grade of each marked log according to the first probability distribution and the preset abnormal grade marks of the marked logs;

and the second calculation unit is used for calculating the cross entropy loss of the first probability distribution according to preset model training parameters and the correct prediction probability so as to measure the difference between the abnormal grade prediction of the labeled log by the classification model and the real abnormal grade of the labeled log.

Optionally, in a fourth implementation manner of the second aspect of the present invention, the generating module further includes:

the iteration unit is used for determining the correct prediction probability corresponding to each marked log according to the cross entropy loss; judging whether a correct prediction probability larger than a preset probability threshold exists or not; if so, deleting the first probability distribution corresponding to the correct prediction probability which is greater than the probability threshold value, and continuing to iterate the log anomaly detection model, otherwise, directly iterating the log anomaly detection model, and updating the model training parameters after the log anomaly detection model is iterated;

the model generation unit is used for calculating the sum of the cross entropy loss and the consistency loss to obtain a corresponding final loss value and judging whether the final loss value is smaller than a preset final loss threshold value or not; and if the final loss value is smaller than the final loss threshold value, the abnormal level training model set is converged and stops iteration to obtain a log abnormal detection model.

Optionally, in a fifth implementation manner of the second aspect of the present invention, the calculation formula of the correct prediction probability is:

and is

Optionally, in a sixth implementation manner of the second aspect of the present invention, the detection module further includes:

the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a log to be detected of a system to be detected, wherein the log to be detected comprises a plurality of pieces of log information, and the log information comprises identification information of system operation management priority;

the detection unit is used for inputting the log to be detected into the log abnormity detection model for detection and predicting the abnormity grade of the log to be detected through the log abnormity detection model;

the screening unit is used for screening the logs to be detected with the abnormal grade higher than a preset abnormal grade threshold value, and determining log information with the priority higher than a preset priority threshold value from the screened logs to be detected according to the identification information;

and the analysis result generation unit is used for highlighting the log information with the priority level larger than a preset priority threshold value, and taking the abnormal grade corresponding to the highlighted log information and other log information except the highlighted log information as the analysis result of the current system operation state.

A third aspect of the present invention provides a priority-based resource allocation apparatus, including: a memory having instructions stored therein and at least one processor, the memory and the at least one processor interconnected by a line; the at least one processor invokes the instructions in the memory to cause the system anomaly detection device to perform the system anomaly detection method described above.

A fourth aspect of the present invention provides a computer-readable storage medium having stored therein instructions, which, when run on a computer, cause the computer to execute the above-described system abnormality detection method.

In the technical scheme provided by the invention, a marked log, a non-marked log and an expanded log of a system to be detected are obtained and are respectively input into three identical abnormal grade training models in an abnormal grade training model set for training, and the probability distribution of each abnormal grade of the three abnormal grade training models is output; then calculating cross entropy loss and consistency loss output by the abnormal level training model; predicting the abnormal levels of the unmarked logs and the extended logs according to the consistency loss, and iterating the abnormal level training model set according to the cross entropy loss until the abnormal level training model set is converged to obtain a log abnormal detection model; and finally, detecting the abnormal logs in the system operation through a log abnormality detection model. And the model training mode is optimized, model overfitting is prevented, and the difficulty in detecting abnormal points in the system by the detection model is reduced.

Drawings

FIG. 1 is a schematic diagram of a first embodiment of the system abnormality detection method of the present invention;

FIG. 2 is a schematic diagram of a system abnormality detection method according to a second embodiment of the present invention;

FIG. 3 is a schematic diagram of a system abnormality detection method according to a third embodiment of the present invention;

FIG. 4 is a schematic diagram of a fourth embodiment of the system abnormality detection method according to the present invention;

FIG. 5 is a schematic diagram of an embodiment of the system abnormality detection apparatus of the present invention;

FIG. 6 is a schematic diagram of another embodiment of the system abnormality detection apparatus of the present invention;

fig. 7 is a schematic diagram of an embodiment of the system abnormality detection apparatus of the present invention.

Detailed Description

The embodiment of the invention provides a system anomaly detection method, a device, equipment and a storage medium, which comprises the steps of obtaining a marked log, a non-marked log and an expanded log of a system to be detected, respectively inputting the marked log, the non-marked log and the expanded log into three same anomaly level training models in an anomaly level training model set for training, and outputting probability distribution of each anomaly level of the marked log, the non-marked log and the expanded log; then calculating cross entropy loss and consistency loss output by the abnormal level training model; predicting the abnormal levels of the unmarked logs and the extended logs according to the consistency loss, and iterating the abnormal level training model set according to the cross entropy loss until the abnormal level training model set is converged to obtain a log abnormal detection model; and finally, detecting the abnormal logs in the system operation through a log abnormality detection model. And the model training mode is optimized, model overfitting is prevented, and the difficulty in detecting abnormal points in the system by the detection model is reduced.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," or "having," and any variations thereof, are intended to cover non-exclusive inclusions, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

For convenience of understanding, a specific flow of the embodiment of the present invention is described below, and referring to fig. 1, a first embodiment of the system anomaly detection method in the embodiment of the present invention includes:

101. acquiring a marked log and a non-marked log of a system to be detected, and expanding the non-marked log to obtain an expanded log;

it is to be understood that the execution subject of the present invention may be a system abnormality detection apparatus, and may also be a terminal or a server, which is not limited herein. The embodiment of the present invention is described by taking a server as an execution subject. It is emphasized that, in order to further ensure the privacy and security of the marked log, the unmarked log and the extended log, the marked log, the unmarked log and the extended log may also be stored in a node of a block chain.

In this embodiment, a log generated during past operation of the system may be acquired from the system memory, where the log is text information for recording a system state and an operation state, and the content includes a timestamp and text information indicating a transmission content.

In the practical application scenario in this embodiment, a small number of marked logs are used to predict a large number of unmarked logs, and the exception level of the unmarked logs is obtained. Obtaining logs generated by a system in the past as unmarked logs, and then screening a small number of logs from the unmarked logs to perform abnormal grade marking to be used as marked logs; on the other hand, in the model training process, because the number of normal samples is significantly higher than that of abnormal samples, and in order to ensure the positive and negative balance of the training samples and facilitate the stability of the model training, the unmarked logs need to be expanded, and the number of negative samples is increased, the expanded logs are obtained by replacing the same semantic words with the keywords describing the system running state in the marked logs.

Preferably, the abnormal unmarked log can be expanded by adopting a translation and TD-IDF (term frequency-inverse text frequency index) alternative word method, firstly, the importance degree of each field in the unmarked log to the unmarked log is evaluated by the TD-IDF, specifically, the field with high occurrence frequency in the unmarked log is used as the key field of the unmarked log, then the key fields in different unmarked logs are classified according to semantics by DBPedia priori knowledge to obtain the key fields of a plurality of categories, and the key fields are replaced by synonyms with the same semantics, and finally other non-key fields are processed in modes of replacement, deletion, insertion, exchange and the like, the number of abnormal unmarked logs can be expanded, the same semantics can be ensured when the information content is abnormal, and the expanded logs are obtained.

102. Respectively inputting the marked logs, the unmarked logs and the expanded logs into three identical abnormal grade training models for training, and correspondingly outputting a first probability distribution of each abnormal grade of the marked logs, a second probability distribution of each abnormal grade of the unmarked logs and a third probability distribution of each abnormal grade of the expanded logs, wherein the three identical abnormal grade training models form an abnormal grade training model set;

in this embodiment, the abnormal class training model set is formed by stacking three identical abnormal class training models, and the training process of each abnormal class training model specifically includes:

In this embodiment, the training mode in which the labeled log is input into the abnormal level training model for training belongs to supervised learning, and the training mode in which the unlabeled log, the extended log are input into the abnormal level training model and the abnormal level training model for training belongs to unsupervised learning. It should be noted that, here, according to the feature distribution of different fields of the system log, the probabilities of different exception levels are correspondingly output, and finally, the exception level with the highest exception level probability is selected as the exception level of the system log, instead of directly outputting the exception level.

Preferably, Text-CNN (Text-Convolutional Neural Network) is used herein to train the abnormal level training model corresponding to the labeled log in a supervised learning manner and the abnormal level training model corresponding to the unlabeled log and the augmented data in an unsupervised learning manner. The method specifically comprises the following steps:

the input layer adjusts the Text vocabularies in the marked logs, the unmarked logs or the extended logs in the input Text-CNN model to be L with the same length to obtain word vectors of each Text vocabulary;

the convolution layer is used for extracting the characteristic vocabulary describing the grade abnormality of the word vector by using a plurality of convolution kernels with different sizes by taking the category number of the abnormal grade as the dimension of an abnormal grade training model;

the pooling layer combines different characteristic vocabularies obtained by the convolutional layer by Max-pool (maximum pooling) to be used as classification characteristics of different system logs;

and the full connection layer inputs the classification characteristics into an LR (logical Regression) classifier for classification, for example, if the abnormal levels in the set output rules include major abnormality, common abnormality, slight abnormality and normality, the probability of each abnormal level of different system logs is output.

And finally, according to the abnormal grade probabilities output by the model, taking the abnormal grade with the highest probability as a prediction result of the abnormal grade of the current system log. For example, in the anomaly level of the unmarked log A of the input Text-CNN, the probabilities of major anomaly, common anomaly, slight anomaly and normal are respectively [0.5,0.2,0.2 and 0.1], and then the anomaly level of the unmarked log A is predicted to be the major anomaly through the model.

103. Calculating a cross-entropy loss of the first probability distribution corresponding to a preset abnormal level marker of the marked log, and calculating a consistency loss between the second probability distribution and the third probability distribution;

in this embodiment, the cross entropy loss represents a difference value between the prediction of the first probability distribution of the marked log and the real level thereof, and the consistency loss represents a difference value between the unmarked log and the corresponding extended log. Finally, the unmarked log to be detected can be directly input, and the abnormal grade of the log can be directly predicted.

Specifically, the unmarked logs and the abnormal level training models corresponding to the extended logs are stacked, consistency training is performed, iteration upgrade of the abnormal level training model set is controlled through a consistency loss function, along with model training and iteration, the more concentrated the characteristics of the unmarked logs and the corresponding extended logs are, the higher the similarity is, the smaller the similarity distance between the two models is, the corresponding consistency loss is reduced, and when the similarity distance is reduced to a preset threshold value, the corresponding abnormal level labels are spread to the extended logs, so that the abnormal level of the unmarked logs is obtained. Wherein the calculation of the consistency loss is calculated by the following function:

in addition, the difference between the prediction result of the abnormal grade probability distribution output by the abnormal grade training model corresponding to the marked log and the actual abnormal grade is measured by using cross entropy loss; the cross entropy is calculated by the following cross entropy function:

wherein p θ (y)^*| x) represents the probability of predicting the mark log correctly, and p θ (y) is calculated when the probability distribution of certain mark data is calculated when training reaches t steps^*| x) is greater than a threshold η_tThen the marker data is removed from the loss function. Here, the

In the normal case of the operation of the device,

104. predicting abnormal grade marks of the unmarked logs and the extended logs according to the consistency loss, and iterating the abnormal grade training model set according to the cross entropy loss until the abnormal grade training model set is converged to obtain a log abnormal detection model;

in this embodiment, consistency loss and cross entropy loss are combinedEvaluating log anomaly detection models, i.e.

Wherein theta is a preset model parameter,

and for the final loss, lambda is used for balancing consistency loss and cross entropy loss, and when the final loss is smaller than a preset threshold value, the log anomaly detection model stops iteration.

When the consistency loss is smaller than a preset consistency loss threshold value, the prediction credibility of the abnormal grade marks of the unmarked logs and the expanded logs can be determined; when the cross entropy loss is smaller than the preset cross entropy loss threshold, the probability distribution of each abnormal level output in the step 102 can be determined to be credible, and when the final loss obtained by adding the cross entropy loss and the probability distribution is smaller than the final loss threshold, the output result of the whole abnormal level training model set can be shown to be credible.

105. And acquiring a log to be detected of the system to be detected, inputting the log to be detected into the log abnormity detection model for detection, outputting an abnormity grade corresponding to the log to be detected, and taking the abnormity grade corresponding to the log to be detected as an analysis result of the current system operation state.

In this embodiment, the logs to be detected can be obtained from one or more systems, different operating states of different systems are managed in priority, an operating condition in which a major abnormality is likely to occur is focused on in the monitoring process, and for an abnormal log with a high priority, once a major abnormality occurs, emergency measures need to be taken in time to quickly respond, locate a specific fault reason, and remove the fault. Therefore, the log to be detected is provided with identification information with priority, when the abnormal level output by the input log abnormal detection model is higher, whether the log to be detected is the log to be detected with high identification priority is judged, if yes, the log to be detected is highlighted when being recorded in the decomposition result, and if necessary, an alarm is given.

In the embodiment of the invention, the marked logs and the unmarked logs are obtained, the unmarked logs are expanded to obtain expanded logs, and the expanded logs are respectively input into three corresponding abnormal grade training models in an abnormal grade training model set for training so as to predict the probability distribution of each abnormal grade of the three abnormal grade training models; the abnormal grade training model set is subjected to iterative training in a mode of gradually reducing the marking information, and the finally generated abnormal log detection model can be used for predicting the abnormal grade corresponding to the log to be detected generated in the system operation process to obtain the analysis result of the system operation state, so that the over-fitting resisting strength of the model is improved, and the detection difficulty of the model is reduced.

Referring to fig. 2, a second embodiment of the system anomaly detection method according to the embodiment of the present invention includes:

201. acquiring a marked log and a non-marked log of a system to be detected;

202. analyzing the unmarked log to obtain a plurality of log fields with different semantics;

the embodiment mainly introduces the expansion of the unmarked log to obtain the expanded log. The unmarked log can be a system log generated in an abnormal state of a system, the log content comprises time, session identification, function identification, refining content and other information, such as system version number, thread number and log level, such as DEBUG, INFO, WARM, ERROR and the like, and fields with different semantics of the unmarked log are analyzed to obtain a plurality of semantic fields, obviously, the log level in the log content is a target key field to be obtained.

203. Screening key fields related to abnormal levels from the log fields according to preset semantic structure prior knowledge and the occurrence frequency of the log fields;

in the embodiment, preset semantic structure prior knowledge is used for associating key fields with the same semantics in each unmarked log, wherein the same semantics refer to expressed content meanings with the same abnormal level; then, the occurrence frequency of each semantic field in the same unmarked log is counted, the occurrence frequency of each semantic field in all unmarked logs is counted, the product of the two is calculated, and according to the calculation result and the set threshold value, which fields are key fields, namely, the fields which represent the abnormal level of the unmarked logs can be screened.

Preferably, a TF-IDF (Term Frequency-Inverse text Frequency index) technique may be used to determine the key field, and if the unmarked log is a system log, when the system log contains 100 fields and the number of occurrences of the field a is 15, the TF of the field is: 15/100 ═ 0.15; and the number of the system logs adopted by the training is 10 ten-thousandth, and if the field a appears in 1 thousand of system logs, the IDF of the field a is as follows: lg (100000/1000) ═ 3, then the TF-IDF of field a is: 0.15 × 3 is 0.45, and if the TF-IDF threshold set as the key field is 0.4, it is determined that the field a is the key field.

204. Acquiring one or more synonymous fields corresponding to the key fields, and replacing the corresponding key fields with the synonymous fields;

205. according to a random field processing strategy, splicing the synonymous field with other log fields except the key field to obtain a plurality of corresponding expansion logs, wherein the random field processing strategy comprises replacing, deleting, inserting or exchanging the other log fields;

in this embodiment, after the key fields in the unmarked log are confirmed, the unmarked log is extended in a retranslation manner. Firstly, the content expression of the unmarked log is required to be kept the same, the unmarked log is realized through other multiple synonymous fields, and then the difference between the whole content of the expanded log and the unmarked log is required to be ensured, so that other semantic fields except the key fields are required to be processed, including modes of replacement, deletion, insertion, exchange and the like. And then splicing the processed key field with other fields to obtain a plurality of expansion logs with the same meaning and different contents.

206. Respectively inputting the marked logs, the unmarked logs and the expanded logs into three identical abnormal grade training models for training, and correspondingly outputting a first probability distribution of each abnormal grade of the marked logs, a second probability distribution of each abnormal grade of the unmarked logs and a third probability distribution of each abnormal grade of the expanded logs, wherein the three identical abnormal grade training models form an abnormal grade training model set;

207. calculating a cross-entropy loss of the first probability distribution corresponding to a preset abnormal level marker of the marked log, and calculating a consistency loss between the second probability distribution and the third probability distribution;

208. predicting abnormal grade marks of the unmarked logs and the extended logs according to the consistency loss, and iterating the abnormal grade training model set according to the cross entropy loss until the abnormal grade training model set is converged to obtain a log abnormal detection model;

209. and acquiring a log to be detected of the system to be detected, inputting the log to be detected into the log abnormity detection model for detection, outputting an abnormity grade corresponding to the log to be detected, and taking the abnormity grade corresponding to the log to be detected as an analysis result of the current system operation state.

In the embodiment of the invention, unmarked logs generated under the condition of system abnormity are introduced to be expanded, the number of the abnormal unmarked logs is increased under the condition of ensuring the data difference, the over-fitting resisting capability of the model is increased in the following training process of the detection model, and the difficulty of the training of the detection model is reduced.

Referring to fig. 3, a third embodiment of the system anomaly detection method according to the embodiment of the present invention includes:

301. acquiring a marked log and a non-marked log of a system to be detected, and expanding the non-marked log to obtain an expanded log;

302. respectively inputting the marked logs, the unmarked logs and the expanded logs into three identical abnormal grade training models for training, and correspondingly outputting a first probability distribution of each abnormal grade of the marked logs, a second probability distribution of each abnormal grade of the unmarked logs and a third probability distribution of each abnormal grade of the expanded logs, wherein the three identical abnormal grade training models form an abnormal grade training model set;

303. calculating the correct prediction probability of the abnormal grade of each marked log according to the first probability distribution and the preset abnormal grade mark of the marked log;

in this embodiment, the probability of correct prediction is calculated by using the abnormal level probability distribution of the abnormal level training model for predicting the labeled log and the real abnormal level of the labeled log, and the calculation formula is as follows:

wherein, p (y)_i) To mark the ith anomaly level probability in the log, q (y)_i) To mark the ith anomaly level probability in the log, it should be noted that, here, the true anomaly level probability is 1, and the other anomaly level probabilities are 0, so the specific calculation method is as follows: log (q (y)_i))。

Specifically, if the abnormality level is classified as: major anomaly, common anomaly, slight anomaly and normal are respectively represented by 1, 2, 3 and 4, Z is the probability distribution of the anomaly level of the marked log, if the probabilities of taking 1, 2, 3 and 4 in Z-P are respectively [0.5,0.2,0.2 and 0.1], and the real anomaly level of the marked log is major anomaly, the corresponding correct prediction probability is as follows: -1xlog (0.5) ═ 0301.

304. Calculating cross entropy loss of the first probability distribution according to preset model training parameters and the correct prediction probability, so as to measure the difference between the abnormal grade prediction of the labeled log by a classification model and the real abnormal grade of the labeled log;

in this embodiment, the cross entropy loss of the first probability distribution can be obtained by accumulating and averaging the correct prediction probabilities corresponding to all the first probability distributions. The classification accuracy of the abnormal grade training model corresponding to the marked log can be evaluated through the cross entropy loss, namely the quantitative difference index between the classification result and the real result.

305. Calculating a loss of consistency between the second probability distribution and the third probability distribution;

306. predicting the abnormal grades of the unmarked logs and the extended logs according to the consistency loss, and iterating the abnormal grade training model set according to the cross entropy loss until the abnormal grade training model set is converged to obtain a log abnormal detection model;

307. and acquiring a log to be detected of the system to be detected, inputting the log to be detected into the log abnormity detection model for detection, outputting an abnormity grade corresponding to the log to be detected, and taking the abnormity grade corresponding to the log to be detected as an analysis result of the current system operation state.

In the embodiment of the invention, the cross entropy loss of the first probability distribution is calculated to be used for calculating the final loss by subsequently combining the consistency loss, and the abnormal log detection model is evaluated to be used as one of indexes for measuring the abnormal log detection model.

Referring to fig. 4, a fourth embodiment of the system anomaly detection method according to the embodiment of the present invention includes:

401. acquiring a marked log and a non-marked log of a system to be detected, and expanding the non-marked log to obtain an expanded log;

402. respectively inputting the marked logs, the unmarked logs and the expanded logs into three identical abnormal grade training models for training, and correspondingly outputting a first probability distribution of each abnormal grade of the marked logs, a second probability distribution of each abnormal grade of the unmarked logs and a third probability distribution of each abnormal grade of the expanded logs, wherein the three identical abnormal grade training models form an abnormal grade training model set;

403. calculating a cross-entropy loss between the first probability distribution and a preset anomaly level of the marked log, and calculating a consistency loss between the second probability distribution and the third probability distribution;

404. determining the correct prediction probability corresponding to each marked log according to the cross entropy loss, and judging whether the correct prediction probability larger than a preset probability threshold exists or not;

in this embodiment, in the iterative training process of the log anomaly detection model, labeled logs of labels need to be deleted step by step, and overfitting of the model is prevented in a training signal annealing manner, so that the generalization capability of the model is increased. And when the correct prediction probability is larger than the set probability threshold, deleting the marked log corresponding to the correct prediction probability.

Specifically, when the amount of data with a mark in the mark log is normal, the probability threshold calculation formula is as follows:

when the amount of data with marks in the mark log is small, the model is easy to be over-fitted, and the model can make high-probability prediction according to the data in a short time, so that the probability threshold value calculation formula is converted into:

reducing the rate of increase of the threshold so as to remove more invalid samples; when the amount of data with marks in the mark log is large, the model is difficult to over-fit, the model takes a long time to converge, the high-probability prediction samples output by the model in the same time are fewer, and the samples needing to be deleted are fewer, so that the probability threshold calculation formula can be converted into:

increasing the threshold growth rate, the number of deleted samples is decreased.

405. If so, deleting the first probability distribution corresponding to the correct prediction probability which is greater than the probability threshold value, and continuing to iterate the log anomaly detection model, otherwise, directly iterating the log anomaly detection model, and updating the model training parameters after the log anomaly detection model is iterated;

in this embodiment, iteration is performed on the log anomaly detection model in a training signal annealing manner, and labeled logs which easily cause overfitting of the model are deleted step by step until the final loss is smaller than a set threshold, so that it can be confirmed that the log anomaly detection model can be used in detection practice.

406. Calculating the sum of the cross entropy loss and the consistency loss to obtain a corresponding final loss value, and judging whether the final loss value is smaller than a preset final loss threshold value or not;

407. if the final loss value is smaller than the final loss threshold value, the abnormal level training model set is converged and iteration is stopped, and a log abnormal detection model is obtained;

in this embodiment, the cross entropy loss and the consistency loss are combined to evaluate the correct prediction probability of the log anomaly detection model, which is used as a criterion for model iteration, and here, only the two are added, that is:

408. and acquiring a log to be detected of the system to be detected, inputting the log to be detected into the log abnormity detection model for detection, outputting an abnormity grade corresponding to the log to be detected, and taking the abnormity grade corresponding to the log to be detected as an analysis result of the current system operation state.

In the embodiment of the invention, in the iterative training process of the log anomaly detection model, the first probability distribution of training is gradually deleted along with the increase of unmarked data, and the overfitting risk can be effectively resisted by the training signal annealing method.

With reference to fig. 5, the system anomaly detection method in the embodiment of the present invention is described above, and a system anomaly detection apparatus in the embodiment of the present invention is described below, where an embodiment of the system anomaly detection apparatus in the embodiment of the present invention includes:

an obtaining module 501, configured to obtain a marked log and a unmarked log of a system to be detected, and expand the unmarked log to obtain an expanded log;

a training module 502, configured to input the marked log, the unmarked log, and the extended log into three identical abnormal level training models for training, and output a first probability distribution of each abnormal level of the marked log, a second probability distribution of each abnormal level of the unmarked log, and a third probability distribution of each abnormal level of the extended log, where the three identical abnormal level training models form an abnormal level training model set;

a calculating module 503, configured to calculate a cross entropy loss between the first probability distribution and a preset abnormal level flag of the flag log, and calculate a consistency loss between the second probability distribution and the third probability distribution;

a generating module 504, configured to predict an abnormal level flag of the unmarked log and the extended log according to the consistency loss, and iterate the abnormal level training model set according to the cross entropy loss until the abnormal level training model set converges to obtain a log abnormality detection model;

the detection module 505 is configured to obtain a log to be detected of the system to be detected, input the log to be detected into the log anomaly detection model for detection, output an anomaly level corresponding to the log to be detected, and use the anomaly level corresponding to the log to be detected as an analysis result of the current system operation state.

In the embodiment of the invention, a marked log, a non-marked log and an expanded log of a system to be detected are obtained and are respectively input into three same abnormal grade training models in an abnormal grade training model set for training, and the probability distribution of each abnormal grade of the three is output; then calculating cross entropy loss and consistency loss output by the abnormal level training model; predicting the abnormal levels of the unmarked logs and the extended logs according to the consistency loss, and iterating the abnormal level training model set according to the cross entropy loss until the abnormal level training model set is converged to obtain a log abnormal detection model; and finally, detecting the abnormal logs in the system operation through a log abnormality detection model. And the model training mode is optimized, model overfitting is prevented, and the difficulty in detecting abnormal points in the system by the detection model is reduced.

Referring to fig. 6, another embodiment of the system abnormality detection apparatus according to the embodiment of the present invention includes:

Specifically, the obtaining module 501 is further configured to:

Specifically, the training module 502 further includes:

a constructing unit 5021, configured to uniformly adjust the lengths of the log data in the marked log, the unmarked log, and the extended log to preset lengths, and construct corresponding data vectors;

a feature extraction unit 5022, configured to determine a feature dimension of the data vector according to the length of the data vector, and perform semantic feature extraction on the data vector according to the feature dimension to obtain an initial semantic feature;

and a probability distribution generating unit 5023, configured to perform screening and combination of salient features on the initial semantic features to obtain final semantic features, and calculate and output probability distributions of abnormal levels of the marked logs, the unmarked logs and the extended logs according to the final semantic features.

Specifically, the calculating module 503 further includes:

a first calculating unit 5031, configured to calculate, according to the first probability distribution and preset abnormal level marks of the marked logs, a correct prediction probability of an abnormal level of each marked log;

a second calculating unit 5032, configured to calculate, according to preset model training parameters and the correct prediction probability, a cross entropy loss of the first probability distribution, so as to measure a difference between the abnormal level prediction of the labeled log by the classification model and a true abnormal level of the labeled log.

Specifically, the generating module 504 further includes:

an iteration unit 5041, configured to determine, according to the cross entropy loss, a correct prediction probability corresponding to each marked log; judging whether a correct prediction probability larger than a preset probability threshold exists or not; if so, deleting the first probability distribution corresponding to the correct prediction probability which is greater than the probability threshold value, and continuing to iterate the log anomaly detection model, otherwise, directly iterating the log anomaly detection model, and updating the model training parameters after the log anomaly detection model is iterated;

a model generating unit 5042, configured to calculate a sum of the cross entropy loss and the consistency loss to obtain a corresponding final loss value, and determine whether the final loss value is smaller than a preset final loss threshold; and if the final loss value is smaller than the final loss threshold value, the abnormal level training model set is converged and stops iteration to obtain a log abnormal detection model.

Specifically, the calculation formula of the correct prediction probability is as follows:

and is

Specifically, the detecting module 505 further includes:

an obtaining unit 5051, configured to obtain a log to be detected of a system to be detected, where the log to be detected includes multiple pieces of log information, and the log information includes identification information of system operation management priority;

the detection unit 5052 is configured to input the log to be detected into the log abnormality detection model for detection, and predict an abnormality level of the log to be detected through the log abnormality detection model;

a screening unit 5053, configured to screen logs to be detected whose exception level is higher than a preset exception level threshold, and determine log information whose priority is higher than a preset priority threshold from the screened logs to be detected according to the identification information;

the analysis result generation unit 5054 is configured to highlight the log information with the priority greater than the preset priority threshold, and use an abnormal level corresponding to the highlighted log information and other log information except the highlighted log information as an analysis result of the current system operation state.

In the embodiment of the invention, marked logs and unmarked logs are obtained, and unmarked logs generated under the condition of system abnormity are expanded, so that the number of abnormal unmarked logs is increased under the condition of ensuring data difference, and the over-fitting resisting capability of the model is increased in the following training process of detecting the model; respectively inputting the marked logs, the unmarked logs and the expanded logs into three corresponding abnormal grade training models in the abnormal grade training model set for training so as to predict the probability distribution of each abnormal grade of the marked logs, the unmarked logs and the expanded logs; in the iterative training process of the log anomaly detection model, the first probability distribution of training is gradually deleted along with the increase of unmarked data, and the finally generated anomaly log detection model can be used for predicting the anomaly grade corresponding to the log to be detected generated in the system operation process to obtain the analysis result of the system operation state, so that the overfitting resistance intensity of the model is improved, and the detection difficulty of the detection model is reduced.

Fig. 5 and fig. 6 describe the system anomaly detection apparatus in the embodiment of the present invention in detail from the perspective of the modular functional entity, and the device in the embodiment of the present invention is described in detail from the perspective of hardware processing.

Fig. 7 is a schematic structural diagram of a system anomaly detection device 700 according to an embodiment of the present invention, which may have a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 710 (e.g., one or more processors) and a memory 720, one or more storage media 730 (e.g., one or more mass storage devices) for storing applications 733 or data 732. Memory 720 and storage medium 730 may be, among other things, transient storage or persistent storage. The program stored in the storage medium 730 may include one or more modules (not shown), each of which may include a series of instruction operations for the system abnormality detection apparatus 700. Further, the processor 710 may be configured to communicate with the storage medium 730 to execute a series of instruction operations in the storage medium 730 on the system abnormality detection apparatus 700.

The system anomaly detection apparatus 700 may also include one or more power supplies 740, one or more wired or wireless network interfaces 750, one or more input-output interfaces 760, and/or one or more operating systems 731, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, and the like. It will be understood by those skilled in the art that the system abnormality detection apparatus configuration shown in fig. 7 does not constitute a limitation of the system abnormality detection apparatus, and may include more or less components than those shown, or some components in combination, or a different arrangement of components.

The present invention also provides a computer-readable storage medium, which may be a non-volatile computer-readable storage medium, and which may also be a volatile computer-readable storage medium, having stored therein instructions, which, when run on a computer, cause the computer to perform the steps of the system anomaly detection method.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A system anomaly detection method, characterized in that the system anomaly detection method comprises:

acquiring a marked log and a non-marked log of a system to be detected, and expanding the non-marked log to obtain an expanded log; wherein the marked log is: screening logs which are marked with abnormal grades from the unmarked logs;

calculating a cross-entropy loss between the first probability distribution and a preset anomaly level of the marked log, and calculating a consistency loss between the second probability distribution and the third probability distribution;

predicting the abnormal grades of the unmarked logs and the extended logs according to the consistency loss, and iterating the abnormal grade training model set according to the cross entropy loss until the abnormal grade training model set is converged to obtain a log abnormal detection model;

and acquiring a log to be detected of the current system, inputting the log to be detected into the log abnormity detection model for detection, outputting an abnormity grade corresponding to the log to be detected, and taking the abnormity grade corresponding to the log to be detected as an analysis result of the running state of the current system.

2. The method according to claim 1, wherein the expanding the unmarked log to obtain an expanded log comprises:

3. The system abnormality detection method according to claim 1, wherein the training of the marked log, the unmarked log, and the expanded log by inputting the marked log, the unmarked log, and the expanded log into three identical abnormality level training models and outputting a first probability distribution of each abnormality level of the marked log, a second probability distribution of each abnormality level of the unmarked log, and a third probability distribution of each abnormality level of the expanded log in response thereto include:

4. The system anomaly detection method according to any one of claims 1-3, wherein said calculating a cross-entropy loss between said first probability distribution and a preset anomaly level of said labeled log comprises:

calculating cross entropy loss of the first probability distribution according to preset model training parameters and the correct prediction probability, so as to measure the difference between the abnormal grade prediction of the marked log and the real abnormal grade of the marked log.

5. The method according to claim 4, wherein the iterating the abnormal level training model set according to the cross entropy loss until the abnormal level training model set converges to obtain a log abnormal detection model comprises:

determining the correct prediction probability corresponding to each marked log according to the cross entropy loss, and judging whether the correct prediction probability larger than a preset probability threshold exists or not;

6. The system anomaly detection method according to claim 5, wherein said correct prediction probability is calculated by the formula:

and is

7. The system anomaly detection method according to claim 1, wherein the acquiring a log to be detected of a current system, inputting the log to be detected into the log anomaly detection model for detection, outputting an anomaly level corresponding to the log to be detected, and taking the anomaly level corresponding to the log to be detected as an analysis result of the current system operation state comprises:

acquiring a log to be detected of a current system, wherein the log to be detected comprises a plurality of pieces of log information, and the log information comprises identification information of system operation management priority;

8. A system abnormality detection device, characterized by comprising:

the acquisition module is used for acquiring a marked log and a non-marked log of the system to be detected and expanding the non-marked log to obtain an expanded log; wherein the marked log is: screening logs which are marked with abnormal grades from the unmarked logs;

a calculation module for calculating a cross-entropy loss between the first probability distribution and a preset anomaly level of the marked log, and for calculating a consistency loss between the second probability distribution and the third probability distribution;

the generation module is used for predicting the abnormal grades of the unmarked logs and the extended logs according to the consistency loss, and iterating the abnormal grade training model set according to the cross entropy loss until the abnormal grade training model set is converged to obtain a log abnormal detection model;

and the detection module is used for acquiring the log to be detected of the current system, inputting the log to be detected into the log abnormity detection model for detection, outputting the abnormity grade corresponding to the log to be detected, and taking the abnormity grade corresponding to the log to be detected as the analysis result of the running state of the current system.

9. A system abnormality detection apparatus characterized by comprising: a memory having instructions stored therein and at least one processor, the memory and the at least one processor interconnected by a line;

the at least one processor invokes the instructions in the memory to cause the system anomaly detection device to perform the system anomaly detection method of any one of claims 1-7.

10. A computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the system anomaly detection method according to any one of claims 1-7.