WO2021139235A1 - 系统异常检测方法、装置、设备及存储介质 - Google Patents

系统异常检测方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2021139235A1
WO2021139235A1 PCT/CN2020/118218 CN2020118218W WO2021139235A1 WO 2021139235 A1 WO2021139235 A1 WO 2021139235A1 CN 2020118218 W CN2020118218 W CN 2020118218W WO 2021139235 A1 WO2021139235 A1 WO 2021139235A1
Authority
WO
WIPO (PCT)
Prior art keywords
log
abnormality level
abnormality
detected
unmarked
Prior art date
Application number
PCT/CN2020/118218
Other languages
English (en)
French (fr)
Inventor
邓悦
郑立颖
徐亮
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021139235A1 publication Critical patent/WO2021139235A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3452Performance evaluation by statistical analysis

Definitions

  • This application relates to artificial intelligence decision-making, and in particular to a system anomaly detection method, device, equipment and storage medium.
  • Intelligent anomaly detection is to use AI algorithms to automatically, real-time and accurately find anomalies from the monitoring data, providing a basis for subsequent diagnosis and self-healing.
  • Anomaly detection is a very basic but very important function in the AIOps (Algorithmic IT Operations) system. It mainly uses algorithms and models to automatically discover abnormal behaviors in KPI time series data for subsequent alarms. Automatic stop loss, root cause analysis, etc. provide the necessary basis for decision-making.
  • the main purpose of this application is to solve the difficult problem of abnormal detection of the intelligent operation system.
  • the first aspect of this application provides a system anomaly detection method, including:
  • the marked log, the unmarked log, and the expanded log are respectively input into three training models of the same abnormality level for training, corresponding to the first probability distribution of each abnormality level of the marked log, and the unmarked The second probability distribution of each abnormality level of the log, and the third probability distribution of each abnormality level of the expanded log, wherein the three same abnormality level training models form an abnormality level training model set;
  • the second aspect of the present application provides a system anomaly detection device, including:
  • the acquisition module is used to acquire the marked log and the unmarked log of the system to be detected, and expand the unmarked log to obtain the expanded log;
  • the training module is used to input the marked log, the unmarked log, and the expanded log into three training models of the same abnormality level respectively for training, and correspondingly output the first probability distribution of each abnormality level of the marked log
  • a calculation module configured to calculate the cross entropy loss corresponding to the first probability distribution and the preset abnormality level mark of the mark log, and calculate the consistency loss between the second probability distribution and the third probability distribution;
  • the detection module is used to obtain the to-be-detected log of the system to be detected, input the to-be-detected log into the log anomaly detection model for detection, output the abnormality level corresponding to the to-be-detected log, and compare the abnormality corresponding to the to-be-detected log The level is used as the analysis result of the current system operation status.
  • a third aspect of the present application provides a system anomaly detection device, including: a memory and at least one processor, the memory stores instructions, the memory and the at least one processor are interconnected through a wire; the at least one processor The device calls the instructions in the memory, so that the system anomaly detection device executes the steps of the system anomaly detection method as described below:
  • the marked log, the unmarked log, and the expanded log are respectively input into three training models of the same abnormality level for training, corresponding to the first probability distribution of each abnormality level of the marked log, and the unmarked The second probability distribution of each abnormality level of the log, and the third probability distribution of each abnormality level of the expanded log, wherein the three same abnormality level training models form an abnormality level training model set;
  • the fourth aspect of the present application provides a computer-readable storage medium having instructions stored in the computer-readable storage medium, which when run on a computer, cause the computer to execute the steps of the system anomaly detection method as described below:
  • the marked log, the unmarked log, and the expanded log are respectively input into three training models of the same abnormality level for training, corresponding to the first probability distribution of each abnormality level of the marked log, and the unmarked The second probability distribution of each abnormality level of the log, and the third probability distribution of each abnormality level of the expanded log, wherein the three same abnormality level training models form an abnormality level training model set;
  • the marked log, unmarked log, and expanded log of the system to be detected are respectively input into three training models of the same abnormality level in the abnormal level training model set for training, and the abnormal level of each of the three is output.
  • Probability distribution then calculate the cross entropy loss and consistency loss output by the anomaly level training model; then predict the anomaly level of the unmarked log and the expanded log based on the consistency loss, and iterate the abnormal level training model set according to the cross entropy loss , Until the abnormality level training model set converges, the log abnormality detection model is obtained; finally, the abnormal log in the system operation is detected through the log abnormality detection model.
  • Optimize the model training method to prevent the model from overfitting and reduce the difficulty of detecting abnormal points in the system by the detection model.
  • FIG. 1 is a schematic diagram of the first embodiment of the system abnormality detection method in this application.
  • FIG. 2 is a schematic diagram of a second embodiment of the system abnormality detection method in this application.
  • FIG. 3 is a schematic diagram of a third embodiment of the system abnormality detection method in this application.
  • FIG. 4 is a schematic diagram of a fourth embodiment of the system abnormality detection method in this application.
  • Figure 5 is a schematic diagram of an embodiment of the system anomaly detection device in this application.
  • Figure 6 is a schematic diagram of another embodiment of the system anomaly detection device in this application.
  • FIG. 7 is a schematic diagram of an embodiment of the system abnormality detection device in this application.
  • the embodiment of the application provides a system anomaly detection method, device, equipment, and storage medium, including obtaining the marked log, unmarked log, and expanded log of the system to be detected, respectively inputting three identical anomaly level training models in the anomaly level training model set It is trained in, output the probability distribution of each abnormality level of the three; then calculate the cross entropy loss and consistency loss output by the abnormality level training model; then predict the abnormality level of the unmarked log and the expanded log based on the consistency loss, and according to the cross Entropy loss iterates the abnormality level training model set until the abnormality level training model set converges to obtain the log anomaly detection model; finally, the abnormal log in the system operation is detected through the log anomaly detection model. Optimize the model training method to prevent the model from overfitting and reduce the difficulty of detecting abnormal points in the system by the detection model.
  • the first embodiment of the system abnormality detection method in the embodiment of the present application includes:
  • the execution subject of this application may be a system abnormality detection device, or a terminal or a server, which is not specifically limited here.
  • the embodiment of the present application takes the server as the execution subject as an example for description. It should be emphasized that, in order to further ensure the privacy and security of the marked log, unmarked log, and extended log, the marked log, unmarked log, and extended log may also be stored in a node of a blockchain.
  • the log generated during the past operation of the system can be obtained from the system memory.
  • the log is text information used to record the system status and operating status, and the content includes a time stamp and text information indicating the content to be sent.
  • TD-IDF term frequency-inverse document frequency, word frequency-inverse text frequency index
  • back translation and TD-IDF term frequency-inverse document frequency, word frequency-inverse text frequency index replacement word method can be used to expand the abnormal unmarked log.
  • each field in the unmarked log is evaluated by TD-IDF
  • the importance of the unmarked log specifically, the fields with high frequency in the unmarked log are used as the key fields of the unmarked log, and then the key fields in the different unmarked logs are classified according to the semantics based on the prior knowledge of DBPedia , Get the key fields of multiple categories, replace the key fields with synonyms with the same semantics, and finally process other non-key fields, including replacement, deletion, insertion, exchange, etc., to expand the exception unmarked
  • the number of logs ensures that the semantics are the same while the information content is abnormal, and the expanded log is obtained.
  • the abnormality level training model set is formed by stacking three identical abnormality level training models, and the training process of each abnormality level training model specifically includes:
  • the initial semantic features are screened and combined to obtain the final semantic features, and the probability distributions of abnormal levels of the marked log, the unmarked log, and the expanded log are calculated and output according to the final semantic feature. .
  • the training method of inputting the marked log into the abnormal level training model belongs to the training method of supervised learning, and the training of the unmarked log and the expanded log input into the abnormal level training model, and the abnormal level training model for training belongs to the training of unsupervised learning. the way. It is worth mentioning that here, according to the feature distribution of different fields in the system log, the probability of outputting different abnormality levels is correspondingly outputted. Finally, the abnormality level with the highest abnormality level probability is selected as the abnormality level of the system log instead of directly outputting the abnormality level.
  • Text-CNN Text-Convolutional Neural Network, text convolutional neural network
  • Text-CNN Text-Convolutional Neural Network, text convolutional neural network
  • the input layer adjusts the text vocabulary in the labeled log, unlabeled log or expanded log input in the Text-CNN model to the same length L to obtain the word vector of each text vocabulary;
  • the convolutional layer uses the number of abnormal level categories as the dimension of the abnormal level training model, and uses multiple convolution kernels of different sizes to extract the feature vocabulary describing the abnormal level of the word vector;
  • Pooling layer using Max-pool (maximum pooling) to combine different feature words obtained by the convolutional layer as the classification features of different system logs;
  • the fully connected layer inputs the classification features into the LR (Logistic Regression) classifier for classification.
  • the abnormality level in the set output rule includes: major abnormality, normal abnormality, slight abnormality, and normal, then different system logs will be output The probability of each of the aforementioned abnormal levels.
  • the abnormal level with the highest probability is used as the prediction result of the abnormal level of the current system log.
  • the probabilities of major abnormality, normal abnormality, minor abnormality, and normal are [0.5, 0.2, 0.2, 0.1], and the abnormality level of unmarked log A is predicted by the model.
  • Major exception the probability of major abnormality, normal abnormality, minor abnormality, and normal.
  • the cross entropy loss represents the difference between the prediction of the first probability distribution of the marked log and its true level
  • the consistency loss represents the difference between the unmarked log and the corresponding extended log.
  • the corresponding The anomaly level tag is propagated to the extended log to obtain the anomaly level of the unmarked log.
  • the calculation of the consistency loss is calculated by the following function:
  • the cross entropy loss is used to measure the difference between the predicted result of the abnormality level probability distribution output by the abnormal level training model corresponding to the marked log and the actual abnormality level; the cross entropy is calculated by the following cross entropy function: Among them, p ⁇ (y *
  • x) represents the probability of correct prediction of the labeled log.
  • the log anomaly detection model is evaluated by combining consistency loss and cross-entropy loss, namely Among them, ⁇ is the preset model parameter, For the final loss, ⁇ is used to balance the consistency loss and the cross-entropy loss. When the final loss is less than the preset threshold, the log anomaly detection model stops iterating.
  • the step can be determined The probability distribution of each abnormality level output by 102 is credible.
  • the final loss obtained by adding the two is less than the final loss threshold, it can indicate that the output result of the entire abnormal level training model set is credible.
  • the logs to be detected can be obtained from one or more systems. Different systems are managed by priority in different operating states. During the monitoring process, the focus is on the operating conditions that are prone to major abnormalities. For abnormal logs with high priority , Once a major abnormal situation occurs, emergency measures must be taken in a timely manner, quick response, and the specific cause of the failure must be located and eliminated. Therefore, the log to be detected has priority identification information. When the abnormality level output by the input log anomaly detection model is high, it is determined whether it is the log to be detected with a high identification priority. If it is, it is recorded in the decomposition result. Highlight it and give an alarm when necessary.
  • the expanded log is obtained, and then inputted into the three corresponding abnormal level training models in the abnormal level training model set for training, so as to use
  • the finally generated anomaly log detection model can be used to predict the log corresponding to the log to be detected generated during the operation of the system Abnormal level, obtain the analysis result of the system operation status, increase the model's resistance to over-fitting, and reduce the difficulty of inspecting the model.
  • the second embodiment of the system abnormality detection method in the embodiment of the present application includes:
  • the unmarked log can be the system log generated in the abnormal state of the system.
  • the log content includes time, session ID, function ID, refined content, and other information, such as system version number, thread number, log level, such as DEBUG, INFO, WARM, ERROR Etc.
  • the various semantic fields of the unmarked log are parsed to obtain a variety of semantic fields.
  • the log level in the log content is the target key field we want to obtain.
  • a priori knowledge of the preset semantic structure is associated with the key fields of the same semantics in each unmarked log, where the same semantics refers to the content meaning of the same abnormal level expressed; then the statistics of each semantic field is The frequency of occurrence in the same unmarked log, and the frequency of each field in all unmarked logs, and then calculate the product of the two. According to the calculation result and the set threshold, you can filter which fields are key fields, which means For the exception level field of the unmarked log, it should be noted that when the same type of field has traveled in all unmarked logs, that is, the field is not representative, it will not be considered.
  • TF-IDF Term Frequency-Inverse Document Frequency, term frequency-inverse text frequency index
  • splicing the synonymous field with other log fields except the key field to obtain a plurality of corresponding extended logs, wherein the random field processing strategy includes Replace, delete, insert or exchange log fields;
  • the unmarked log is expanded by means of back translation.
  • the third embodiment of the system abnormality detection method in the embodiment of the present application includes:
  • the probability distribution of the abnormality level predicted by the abnormality level training model on the marked log and the true abnormality level of the marked log are used to calculate the probability that the prediction is correct.
  • the calculation formula is:
  • p(yi) is the probability of the i-th anomaly level in the mark log
  • q(yi) is the probability of the i-th anomaly level in the mark log. It should be noted that the true anomaly level probability here is 1, and other anomalies The level probability is 0, so the specific calculation method is: -log(q(yi)).
  • Z is the probability distribution of the abnormality level of the mark log, and if the value is from Z to P
  • the probabilities of 1, 2, 3, 4 are [0.5, 0.2, 0.2, 0.1]
  • the true anomaly level of the marked log is a major anomaly
  • the cross entropy loss of the first probability distribution can be obtained. That is, the classification accuracy of the abnormal level training model corresponding to the marked log can be evaluated through the cross-entropy loss, that is, the quantitative difference index between the classification result and the real result.
  • the cross-entropy loss of the first probability distribution is calculated for subsequent calculation of the final loss combined with the consistency loss, and the abnormal log detection model is evaluated as one of the indicators for measuring the abnormal log detection model.
  • the fourth embodiment of the system abnormality detection method in the embodiment of the present application includes:
  • the probability threshold calculation formula is:
  • the calculation formula for transforming the probability threshold is as follows: The growth rate of the threshold is reduced to delete more invalid samples; when there is a large amount of marked data in the marked log, it is difficult for the model to overfit, and it takes a long time for the model to converge, and the model outputs within the same time There are fewer high-probability prediction samples, and fewer samples need to be deleted, so the probability threshold can be calculated by transforming the formula as: If the threshold growth rate is accelerated, the number of deleted samples decreases.
  • the log anomaly detection model is iterated by training signal annealing, and marked logs that are likely to cause the model to be over-fitted are gradually deleted, until the final loss is less than the set threshold, then it can be confirmed that the log anomaly detection model can be used for detection.
  • the abnormality level training model set converges and stops iteration to obtain a log abnormality detection model
  • the cross-entropy loss and consistency loss are combined to evaluate the correct prediction probability of the log anomaly detection model as the basis for the judgment of model iteration.
  • the two only need to be added, namely:
  • the training signal annealing method can effectively resist the risk of overfitting.
  • an embodiment of the system anomaly detection device in the embodiment of the application includes:
  • the obtaining module 501 is configured to obtain the marked log and the unmarked log of the system to be detected, and expand the unmarked log to obtain an expanded log;
  • the training module 502 is configured to input the marked log, the unmarked log, and the expanded log into three training models of the same abnormality level respectively for training, corresponding to the first probability of each abnormality level of the marked log being output Distribution, the second probability distribution of each abnormality level of the unmarked log, and the third probability distribution of each abnormality level of the expanded log, wherein the three same abnormality level training models form an abnormality level training model set;
  • the calculation module 503 is configured to calculate the cross entropy loss corresponding to the first probability distribution and the preset abnormality level mark of the mark log, and calculate the consistency loss between the second probability distribution and the third probability distribution;
  • the generating module 504 is configured to predict the abnormality level mark of the unmarked log and the expanded log according to the consistency loss, and iterate the abnormality level training model set according to the cross-entropy loss until the abnormality The hierarchical training model set converges, and the log anomaly detection model is obtained;
  • the detection module 505 is configured to obtain the log to be detected of the system to be detected, input the log to be detected into the log anomaly detection model for detection, output the abnormality level corresponding to the log to be detected, and set the log corresponding to the log to be detected The abnormal level is used as the analysis result of the current system operation status.
  • the marked log, unmarked log, and extended log of the system to be detected are respectively input into three training models of the same abnormality level in the abnormality level training model set for training, and the probability distribution of each abnormality level of the three is output. ; Then calculate the cross-entropy loss and consistency loss output by the abnormal level training model; then predict the abnormal level of the unmarked log and the expanded log according to the consistency loss, and iterate the abnormal level training model set according to the cross-entropy loss until The anomaly level training model set converges to obtain the log anomaly detection model; finally, the abnormal log in the system operation is detected through the log anomaly detection model.
  • Optimize the model training method to prevent the model from overfitting and reduce the difficulty of detecting abnormal points in the system by the detection model.
  • FIG. 6 another embodiment of the system anomaly detection device in the embodiment of the present application includes:
  • the obtaining module 501 is configured to obtain the marked log and the unmarked log of the system to be detected, and expand the unmarked log to obtain an expanded log;
  • the training module 502 is configured to input the marked log, the unmarked log, and the expanded log into three training models of the same abnormality level respectively for training, corresponding to the first probability of each abnormality level of the marked log being output Distribution, the second probability distribution of each abnormality level of the unmarked log, and the third probability distribution of each abnormality level of the expanded log, wherein the three same abnormality level training models form an abnormality level training model set;
  • the calculation module 503 is configured to calculate the cross entropy loss corresponding to the first probability distribution and the preset abnormality level mark of the mark log, and calculate the consistency loss between the second probability distribution and the third probability distribution;
  • the generating module 504 is configured to predict the abnormality level mark of the unmarked log and the expanded log according to the consistency loss, and iterate the abnormality level training model set according to the cross-entropy loss until the abnormality The hierarchical training model set converges, and the log anomaly detection model is obtained;
  • the detection module 505 is configured to obtain the log to be detected of the system to be detected, input the log to be detected into the log anomaly detection model for detection, output the abnormality level corresponding to the log to be detected, and set the log corresponding to the log to be detected The abnormal level is used as the analysis result of the current system operation status.
  • the obtaining module 501 is also used for:
  • the synonymous field is spliced with other log fields except the key field to obtain a plurality of corresponding extended logs, wherein the random field processing strategy includes processing the other log fields Perform replacement, deletion, insertion or exchange.
  • the training module 502 further includes:
  • the constructing unit 5021 is configured to uniformly adjust the length of each log data in the marked log, the unmarked log, and the extended log to a preset length, and construct a corresponding data vector;
  • the feature extraction unit 5022 is configured to determine the feature dimension of the data vector according to the length of the data vector, and perform semantic feature extraction on the data vector according to the feature dimension to obtain an initial semantic feature;
  • the probability distribution generating unit 5023 is used to screen and combine the prominent features of the initial semantic features to obtain the final semantic features, and calculate the marked log, the unmarked log, and the expanded log according to the final semantic feature And output the probability distribution of the abnormal level.
  • the calculation module 503 further includes:
  • the first calculation unit 5031 is configured to calculate the correct prediction probability of the abnormality level of each marked log according to the first probability distribution and the preset abnormality level mark of the marked log;
  • the second calculation unit 5032 is configured to calculate the cross-entropy loss of the first probability distribution according to the preset model training parameters and the correct prediction probability, so as to measure the abnormal level prediction of the marked log by the classification model and the correct prediction probability. The difference between the true anomaly levels of the marked logs.
  • the generating module 504 further includes:
  • the iterative unit 5041 is configured to determine the correct prediction probability corresponding to each marked log according to the cross-entropy loss; determine whether there is a correct prediction probability greater than the preset probability threshold; if so, delete the correct prediction probability greater than the probability threshold First probability distribution, and continue to iterate the log anomaly detection model, otherwise directly iterate the log anomaly detection model, and update the model training parameters after the log anomaly detection model is iterated;
  • the model generation unit 5042 is configured to calculate the sum of the cross-entropy loss and the consistency loss to obtain the corresponding final loss value, and determine whether the final loss value is less than a preset final loss threshold; if the final loss is If the value is less than the final loss threshold, the abnormality level training model set converges and stops iteration to obtain a log abnormality detection model.
  • the calculation formula for the correct prediction probability is:
  • T is the total number of iterations is predetermined probability threshold t
  • a t is the growth coefficient
  • K is the number of abnormal level category
  • t is the current iteration number
  • the detection module 505 further includes:
  • the acquiring unit 5051 is configured to acquire the to-be-detected log of the system to be detected, where the to-be-detected log includes multiple pieces of log information, and the log information includes identification information of system operation management priority;
  • the detection unit 5052 is configured to input the log to be detected into the log anomaly detection model for detection, and predict the abnormality level of the log to be detected through the log anomaly detection model;
  • the screening unit 5053 is configured to screen logs to be detected with an abnormality level higher than a preset abnormality level threshold, and determine log information with a priority greater than the preset priority threshold from the screened logs to be detected according to the identification information;
  • the analysis result generating unit 5054 is configured to highlight the log information whose priority is greater than the preset priority threshold, and combine the highlighted log information with other log information except the highlighted log information The corresponding abnormal level is used as the analysis result of the current system operation status.
  • the unmarked log generated in the abnormal situation of the system is expanded, and the number of abnormal unmarked logs is increased while the data difference is ensured, and the detection will be performed later Increase the model’s ability to resist over-fitting during the training process of the model; input the marked log, unmarked log, and expanded log into the three corresponding abnormal level training models in the abnormal level training model set for training to be used to predict the three
  • the iterative training process of the log anomaly detection model with the increase of unlabeled data, the first probability distribution of training is gradually deleted, and the finally generated anomaly log detection model can be used to predict the generation during the operation of the system
  • the abnormality level corresponding to the log to be detected is obtained, and the analysis result of the operating state of the system is obtained, so that the model's resistance to over-fitting is increased, and the detection difficulty of the inspection model is reduced.
  • FIG. 7 is a schematic structural diagram of a system anomaly detection device provided by an embodiment of the present application.
  • the system anomaly detection device 700 may have relatively large differences due to different configurations or performance, and may include one or more processors (central processing units).
  • a CPU 710 for example, one or more processors
  • a memory 720 for example, one or more storage devices
  • storage media 730 for example, one or more storage devices
  • the memory 720 and the storage medium 730 may be short-term storage or persistent storage.
  • the program stored in the storage medium 730 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations on the system abnormality detection device 700.
  • the processor 710 may be configured to communicate with the storage medium 730, and execute a series of instruction operations in the storage medium 730 on the system abnormality detection device 700.
  • the system anomaly detection device 700 may also include one or more power supplies 740, one or more wired or wireless network interfaces 750, one or more input and output interfaces 760, and/or one or more operating systems 731, such as Windows Serve , Mac OS X, Unix, Linux, FreeBSD, etc.
  • operating systems 731 such as Windows Serve , Mac OS X, Unix, Linux, FreeBSD, etc.
  • FIG. 7 does not constitute a limitation on the system anomaly detection device, and may include more or less components than shown in the figure, or a combination of certain components, or different components. The layout of the components.
  • the computer-readable storage medium may be a non-volatile computer-readable storage medium, and the computer-readable storage medium may also be a volatile computer-readable storage medium.
  • the computer-readable storage medium stores instructions, and when the instructions run on a computer, the computer executes the steps of the system abnormality detection method.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of the present application essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , Including several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disk or optical disk and other media that can store program code .
  • the blockchain referred to in this application is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Hardware Design (AREA)
  • Debugging And Monitoring (AREA)

Abstract

本申请涉及人工智能,提供一种系统异常检测方法、装置、设备及存储介质。所述方法包括:将待检测系统的标记日志、无标记日志及扩充日志分别输入训练模型集中三个相同的训练模型中进行训练,输出三者的各异常等级的概率分布;然后计算训练模型输出的交叉熵损失、一致性损失;再根据一致性损失预测无标记日志与扩充日志的异常等级,以及根据交叉熵损失对训练模型集进行迭代,直到训练模型集收敛,得到日志异常检测模型;最后通过日志异常检测模型对系统运行中的异常日志进行检测。此外,本申请还涉及区块链技术,其标记日志、无标记日志及扩充日志可存储于区块链中。通过优化模型训练方式,防止模型过拟合,降低检测模型对系统中异常点的检测难度。

Description

系统异常检测方法、装置、设备及存储介质
本申请要求于2020年6月30日提交中国专利局、申请号为202010611178.5、发明名称为“系统异常检测方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在申请中。
技术领域
本申请涉及人工智能决策,尤其涉及一种系统异常检测方法、装置、设备及存储介质。
背景技术
随着系统规模的变大、复杂度的提高、监控覆盖的完善,监控数据量越来越大,运维人员无法从海量监控数据中发现质量问题。智能化的异常检测就是要通过AI算法,自动、实时、准确地从监控数据中发现异常,为后续的诊断、自愈提供基础。异常检测是AIOps(Algorithmic IT Operations,智能运营)系统中的一项非常基础但是十分重要的功能,主要是通过算法和模型去自动的挖掘发现KPI时间序列数据中的异常行为,为后续的报警,自动止损,根因分析等提供必要的决策依据。
但是在实际的应用场景下,由于正常数据一般占总数据量的很大比例,而异常点的数据十分稀少,给异常检测带来了极大的困难。在检测模型的训练阶段,为了保证模型训练样本的正负均衡,传统的解决思路主要是:在模型检测的过程中对正常样本欠采样(丢弃一部分数据)和异常样本过采样(重复一部分数据),发明人发现,前者会丢失大量样本信息,造成模型过拟合,泛化能力不佳;对于后者,简单的随机抽样,也会使模型产生过拟合风险。故无论是本身异常点的数据量稀少,还是用于异常点的数据检测模型的准确构建难度大,都使得智能运营系统中的数据检测难度增加。
发明内容
本申请的主要目的在于解决智能运营系统的异常检测难度大的问题。
本申请第一方面提供了一种系统异常检测方法,包括:
获取待检测系统的标记日志、无标记日志,并对所述无标记日志进行扩充,得到扩充日志;
将所述标记日志、所述无标记日志、所述扩充日志分别输入三个相同的异常等级训练模型中进行训练,对应输出所述标记日志的各异常等级的第一概率分布、所述无标记日志的各异常等级的第二概率分布、所述扩充日志的各异常等级的第三概率分布,其中,所述三个相同的异常等级训练模型组成异常等级训练模型集;
计算第一概率分布与所述标记日志的预设异常等级标记对应的交叉熵损失,以及计算所述第二概率分布与所述第三概率分布之间的一致性损失;
根据所述一致性损失预测所述无标记日志与所述扩充日志的异常等级标记,以及根据所述交叉熵损失对所述异常等级训练模型集进行迭代,直到所述异常等级训练模型集收敛,得到日志异常检测模型;
获取待检测系统的待检测日志,并将所述待检测日志输入所述日志异常检测模型进行检测,输出所述待检测日志对应的异常等级,并将待检测日志对应的异常等级作为当前系统运行状态的分析结果。
本申请第二方面提供了一种系统异常检测装置,包括:
获取模块,用于获取待检测系统的标记日志、无标记日志,并对所述无标记日志进行扩充,得到扩充日志;
训练模块,用于将所述标记日志、所述无标记日志、所述扩充日志分别输入三个相同的异常等级训练模型中进行训练,对应输出所述标记日志的各异常等级的第一概率分布、所述无标记日志的各异常等级的第二概率分布、所述扩充日志的各异常等级的第三概率分布,其中,所 述三个相同的异常等级训练模型组成异常等级训练模型集;
计算模块,用于计算第一概率分布与所述标记日志的预设异常等级标记对应的交叉熵损失,以及计算所述第二概率分布与所述第三概率分布之间的一致性损失;
生成模块,用于根据所述一致性损失预测所述无标记日志与所述扩充日志的异常等级标记,以及根据所述交叉熵损失对所述异常等级训练模型集进行迭代,直到所述异常等级训练模型集收敛,得到日志异常检测模型;
检测模块,用于获取待检测系统的待检测日志,并将所述待检测日志输入所述日志异常检测模型进行检测,输出所述待检测日志对应的异常等级,并将待检测日志对应的异常等级作为当前系统运行状态的分析结果。
本申请第三方面提供了一种系统异常检测设备,包括:存储器和至少一个处理器,所述存储器中存储有指令,所述存储器和所述至少一个处理器通过线路互联;所述至少一个处理器调用所述存储器中的所述指令,以使得所述系统异常检测设备执行如下所述的系统异常检测方法的步骤:
获取待检测系统的标记日志、无标记日志,并对所述无标记日志进行扩充,得到扩充日志;
将所述标记日志、所述无标记日志、所述扩充日志分别输入三个相同的异常等级训练模型中进行训练,对应输出所述标记日志的各异常等级的第一概率分布、所述无标记日志的各异常等级的第二概率分布、所述扩充日志的各异常等级的第三概率分布,其中,所述三个相同的异常等级训练模型组成异常等级训练模型集;
计算第一概率分布与所述标记日志的预设异常等级标记对应的交叉熵损失,以及计算所述第二概率分布与所述第三概率分布之间的一致性损失;
根据所述一致性损失预测所述无标记日志与所述扩充日志的异常等级标记,以及根据所述交叉熵损失对所述异常等级训练模型集进行迭代,直到所述异常等级训练模型集收敛,得到日志异常检测模型;
获取待检测系统的待检测日志,并将所述待检测日志输入所述日志异常检测模型进行检测,输出所述待检测日志对应的异常等级,并将待检测日志对应的异常等级作为当前系统运行状态的分析结果。
本申请的第四方面提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行如下所述的系统异常检测方法的步骤:
获取待检测系统的标记日志、无标记日志,并对所述无标记日志进行扩充,得到扩充日志;
将所述标记日志、所述无标记日志、所述扩充日志分别输入三个相同的异常等级训练模型中进行训练,对应输出所述标记日志的各异常等级的第一概率分布、所述无标记日志的各异常等级的第二概率分布、所述扩充日志的各异常等级的第三概率分布,其中,所述三个相同的异常等级训练模型组成异常等级训练模型集;
计算第一概率分布与所述标记日志的预设异常等级标记对应的交叉熵损失,以及计算所述第二概率分布与所述第三概率分布之间的一致性损失;
根据所述一致性损失预测所述无标记日志与所述扩充日志的异常等级标记,以及根据所述交叉熵损失对所述异常等级训练模型集进行迭代,直到所述异常等级训练模型集收敛,得到日志异常检测模型;
获取待检测系统的待检测日志,并将所述待检测日志输入所述日志异常检测模型进行检测,输出所述待检测日志对应的异常等级,并将待检测日志对应的异常等级作为当前系统运行状态的分析结果。
本申请提供的技术方案中,通过获取待检测系统的标记日志、无标记日志及扩充日志分别输入异常等级训练模型集中三个相同的异常等级训练模型中进行训练,输出三者的各异常等级的概率分布;然后计算异常等级训练模型输出的交叉熵损失、一致性损失;再根据一致性损失 预测无标记日志与扩充日志的异常等级,以及根据交叉熵损失对所述异常等级训练模型集进行迭代,直到异常等级训练模型集收敛,得到日志异常检测模型;最后通过日志异常检测模型对系统运行中的异常日志进行检测。优化模型训练方式,防止模型过拟合,降低检测模型对系统中异常点的检测难度。
附图说明
图1为本申请中系统异常检测方法的第一个实施例示意图;
图2为本申请中系统异常检测方法的第二个实施例示意图;
图3为本申请中系统异常检测方法的第三个实施例示意图;
图4为本申请中系统异常检测方法的第四个实施例示意图;
图5为本申请中系统异常检测装置的一个实施例示意图;
图6为本申请中系统异常检测装置的另一个实施例示意图;
图7为本申请中系统异常检测设备的一个实施例示意图。
具体实施方式
本申请实施例提供了一种系统异常检测方法、装置、设备及存储介质,包括获取待检测系统的标记日志、无标记日志及扩充日志分别输入异常等级训练模型集中三个相同的异常等级训练模型中进行训练,输出三者的各异常等级的概率分布;然后计算异常等级训练模型输出的交叉熵损失、一致性损失;再根据一致性损失预测无标记日志与扩充日志的异常等级,以及根据交叉熵损失对所述异常等级训练模型集进行迭代,直到异常等级训练模型集收敛,得到日志异常检测模型;最后通过日志异常检测模型对系统运行中的异常日志进行检测。优化模型训练方式,防止模型过拟合,降低检测模型对系统中异常点的检测难度。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”或“具有”及其任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
为便于理解,下面对本申请实施例的具体流程进行描述,请参阅图1,本申请实施例中系统异常检测方法的第一个实施例包括:
101、获取待检测系统的标记日志、无标记日志,并对所述无标记日志进行扩充,得到扩充日志;
可以理解的是,本申请的执行主体可以为系统异常检测装置,还可以是终端或者服务器,具体此处不做限定。本申请实施例以服务器为执行主体为例进行说明。需要强调的是,为进一步保证上述标记日志、无标记日志与扩充日志的私密和安全性,上述标记日志、无标记日志与扩充日志还可以存储于一区块链的节点中。
本实施例中,可从系统存储器中获取在系统过往运行过程中产生的日志,该日志是用于记录系统状态和运行状态的文本信息,内容包括时间戳和指示发送内容的文本信息。
本实施例中的实际应用场景,我们使用少量的标记日志,以预测大量的未标记日志,得到其异常等级。获取系统过往生成的日志作为无标记日志,然后从无标记日志中筛选少量的日志进行异常等级标记以作为标记日志;而另一方面,在模型训练过程中,由于正常样本比异常样本的数量明显高出很多,而为保证训练样本的正负均衡,以有利于模型训练的稳定性,故需要对未标记日志进行扩充,增加负样本的数量,此处通过对标记日志中,描述系统运行状态的关键词,作相同语义词的替换,得到扩充日志。
优选的,可采用回译和TD-IDF(term frequency–inverse document frequency,词频- 逆文本频率指数)替换单词法进行异常未标记日志的扩充,首先通过TD-IDF评估无标记日志中每一个字段的对该无标记日志的重要程度,具体先将无标记日志中出现频率高的字段作为无标记日志的关键字段,再以DBPedia先验知识对不同无标记日志中关键字段根据语义进行分类,得到多个类别的关键字段,并以语义相同的同义词对关键字段进行替换,最后再其他非关键字段进行处理,包括替换、删除、插入、交换等方式,即可扩充异常未标记日志的数量,在存在信息内容异常的同时保证语义相同,得到扩充日志。
102、将所述标记日志、所述无标记日志、所述扩充日志分别输入三个相同的异常等级训练模型中进行训练,对应输出所述标记日志的各异常等级的第一概率分布、所述无标记日志的各异常等级的第二概率分布、所述扩充日志的各异常等级的第三概率分布,其中,所述三个相同的异常等级训练模型组成异常等级训练模型集;
本实施例中,异常等级训练模型集由三个相同的异常等级训练模型堆叠而成,而每一个异常等级训练模型的训练过程具体包括:
将所述标记日志、所述无标记日志、所述扩充日志中的各日志数据的长度统一调整为预设长度,并构建对应的数据向量;
根据所述数据向量的长度,确定所述数据向量的特征维度,并根据所述特征维度对所述数据向量进行语义特征提取,得到初始语义特征;
对所述初始语义特征进行突出特征的筛选及组合,得到最终语义特征,并根据所述最终语义特征计算所述标记日志、所述无标记日志与所述扩充日志的异常等级的概率分布并输出。
本实施例中,对于将标记日志输入异常等级训练模型中进行训练属于监督学习的训练方式,将无标记日志、扩充日志输入异常等级训练模型、异常等级训练模型中进行训练属于非监督学习的训练方式。值得一提的是,此处根据系统日志的不同字段的特征分布,对应输出其不同异常等级的概率,最后再选择异常等级概率最高的作为该系统日志的异常等级,而非直接输出异常等级。
优选的,此处使用Text-CNN(Text-Convolutional Neural Network,文本卷积神经网络)同时以监督学习的方式训练标记日志对应的异常等级训练模型、以非监督学习的方式训练未标记日志与扩充数据对应的异常等级训练模型。具体包括:
输入层,将输入Text-CNN模型中的标记日志、无标记日志或扩充日志中的文本词汇调节为相同长度L,得到每一条文本词汇的词向量;
卷积层,通过异常等级的类别数作为异常等级训练模型维度,使用多个不同尺寸的卷积核,对词向量进行描述等级异常的特征词汇的提取;
池化层,以Max-pool(最大值池化)对卷积层得到的不同特征词汇进行组合,以作为不同系统日志的分类特征;
全连接层,将分类特征输入到LR((Logistic Regression,逻辑回归)分类器中进行分类,比如设置的输出规则中异常等级包括:重大异常、普通异常、轻微异常、正常,则输出不同系统日志的前述每个异常等级概率。
最后再根据模型输出的各个异常等级概率,将概率最高的异常等级作为当前系统日志的异常等级的预测结果。比如输入Text-CNN的无标记日志A异常等级中,重大异常、普通异常、轻微异常、正常的概率分别为[0.5,0.2,0.2,0.1],则通过模型预测无标记日志A的异常等级为重大异常。
103、计算第一概率分布与所述标记日志的预设异常等级标记对应的交叉熵损失,以及计算所述第二概率分布与所述第三概率分布之间的一致性损失;
本实施例中,交叉熵损失表示标记日志的第一个概率分布的预测与其真实等级之间的差异值,一致性损失表示未标记日志与对应的扩充日志之间的差异值。最后可直接输入未标记的待检测日志,即可直接预测其异常等级。
具体的,先对未标记日志与扩充日志对应的异常等级训练模型进行堆叠,并进行一致性训练,通过一致性损失函数对该异常等级训练模型集的迭代升级进行控制,随着模型训练与迭代, 无标记日志与对应的扩充日志的特征越集中,相似度越高,表示两个模型的相似距离越小,则对应的一致性损失降低,当降到预设的阈值时,则将相应的异常等级标签传播到该扩充日志中,得到未标记日志的异常等级。其中,一致性损失的计算通过以下函数进行计算:
Figure PCTCN2020118218-appb-000001
另外,使用交叉熵损失衡量标记日志对应的异常等级训练模型输出的异常等级概率分布的预测结果与实际异常等级的差异;该交叉熵通过以下交叉熵函数计算:
Figure PCTCN2020118218-appb-000002
其中,p θ(y *|x)表示对标记日志预测正确的概率,在训练到t步时,当某个标记数据的概率分布计算出的p θ(y *|x)大于阈值η t时,则将该标记数据从损失函数中移除。这里
Figure PCTCN2020118218-appb-000003
正常情况下,
Figure PCTCN2020118218-appb-000004
104、根据所述一致性损失预测所述无标记日志与所述扩充日志的异常等级标记,以及根据所述交叉熵损失对所述异常等级训练模型集进行迭代,直到所述异常等级训练模型集收敛,得到日志异常检测模型;
本实施例中,结合一致性损失与交叉熵损失来评价日志异常检测模型,即
Figure PCTCN2020118218-appb-000005
其中,θ为预设的模型参数,
Figure PCTCN2020118218-appb-000006
为最终损失,λ用于平衡一致性损失和交叉熵损失,当最终损失小于预设阈值时,则日志异常检测模型停止迭代。
当一致性损失小于预设的一致性损失阈值时,即可确定无标记日志与扩充日志的异常等级标记预测可信;而当交叉熵损失小于预设的交叉熵损失阈值时,即可确定步骤102输出的各异常等级的概率分布可信,当两者相加得到的最终损失小于最终损失阈值时,即可表明整个异常等级训练模型集的输出结果可信。
105、获取待检测系统的待检测日志,并将所述待检测日志输入所述日志异常检测模型进行检测,输出所述待检测日志对应的异常等级,并将待检测日志对应的异常等级作为当前系统运行状态的分析结果。
本实施例中,待检测日志可从一个或多个系统中获取得到,不同的系统不同运行状态分优先级管理,监控过程中重点关注容易发生重大异常的运行情况,对于优先级高的异常日志,一旦出现重大异常情况,需及时采取应急措施,快速响应,定位到具体的故障原因,并加以排除。故待检测日志中带有优先级的标识信息,在输入日志异常检测模型输出的异常等级较高时,判断是否为标识优先级高的待检测日志,若是,则记录在分解结果中时,进行高亮显示,必要时发出警报。
本申请实施例中,通过获取标记日志与未标记日志,并对未标记日志进行扩充,得到扩充日志,再分别输入异常等级训练模型集中的三个对应的异常等级训练模型中进行训练,以用于预测三者的各异常等级概率分布;通过逐步减少标记信息的方式对该异常等级训练模型集进行迭代训练,最后生成的异常日志检测模型可用于预测系统运行过程中产生的待检测日志对应的异常等级,得到系统运行状态的分析结果,使模型抵抗过拟合强度提升,检察模型的检测难度下降。
请参阅图2,本申请实施例中系统异常检测方法的第二个实施例包括:
201、获取待检测系统的标记日志、无标记日志;
202、解析所述无标记日志,得到多个带有不同语义的日志字段;
本实施例主要介绍对无标记日志进行扩充得到扩充日志。无标记日志可以为系统异常状态下产生的系统日志,日志内容包含时间、会话标识、功能标识、精炼内容、其他信息,如系统版本号、线程号、日志级别,如DEBUG、INFO、WARM、ERROR等,此处将无标记日志的各个不同语义的字段进行解析,得到多种语义字段,很明显,该日志内容中的日志级别是我们欲获取的 目标关键字段。
203、根据预置语义结构先验知识和所述日志字段的出现频率,从所述日志字段中筛选与异常等级相关的关键字段;
本实施例中,预置语义结构先验知识对每个无标记日志中的相同语义的关键字段进行关联,其中,相同语义指表达的相同异常等级的内容含义;然后统计每种语义字段在同一个无标记日志中出现频率,与每种字段在所有无标记日志中出现的频率,再计算两者的乘积,根据计算结果与设置的阈值,即可筛选哪些字段为关键字段,即表示无标记日志的异常级别的字段,需注意的是,当同一类型字段在所有无标记日志中都出行过,即该字段亦没有代表性,则不予考虑。
优选的,可使用TF-IDF(Term Frequency-Inverse Document Frequency,词频-逆文本频率指数)技术来确定关键字段,若无标记日志为系统日志,当该系统日志中包含有100个字段,而字段a出现次数为15,则该字段的TF为:15/100=0.15;而本次训练采用的系统日志数量为10万分,该字段a在1千份系统日志中出现过,则该字段a的IDF为:lg(100000/1000)=3,则该字段a的TF-IDF为:0.15x3=0.45,若设置成为关键字段的TF-IDF阈值为0.4,则可确定字段a为关键字段。
204、获取所述关键字段对应的一个或多个同义字段,并以所述同义字段替换对应的关键字段;
205、根据随机字段处理策略,对所述同义字段与除所述关键字段外的其他日志字段进行拼接,得到多个对应的扩充日志,其中,所述随机字段处理策略包括对所述其他日志字段进行替换、删除、插入或交换;
本实施例中,在确认无标记日志中的关键字段后,通过回译的方式对无标记日志进行扩充。首先需保持无标记日志内容表达相同,通过其他多个同义字段来实现,然后需保证扩充的扩充日志与无标记日志整体内容存在差异,故需对关键字段外的其他语义字段进行处理,包括替换、删除、插入、交换等方式。然后将处理后的关键字段与其他字段进行拼接,即可得到多个含义相同,内容存在差异的扩充日志。
206、将所述标记日志、所述无标记日志、所述扩充日志分别输入三个相同的异常等级训练模型中进行训练,对应输出所述标记日志的各异常等级的第一概率分布、所述无标记日志的各异常等级的第二概率分布、所述扩充日志的各异常等级的第三概率分布,其中,所述三个相同的异常等级训练模型组成异常等级训练模型集;
207、计算第一概率分布与所述标记日志的预设异常等级标记对应的交叉熵损失,以及计算所述第二概率分布与所述第三概率分布之间的一致性损失;
208、根据所述一致性损失预测所述无标记日志与所述扩充日志的异常等级标记,以及根据所述交叉熵损失对所述异常等级训练模型集进行迭代,直到所述异常等级训练模型集收敛,得到日志异常检测模型;
209、获取待检测系统的待检测日志,并将所述待检测日志输入所述日志异常检测模型进行检测,输出所述待检测日志对应的异常等级,并将待检测日志对应的异常等级作为当前系统运行状态的分析结果。
本申请实施例中,介绍了系统异常情况下产生的未标记日志进行扩充,在保证数据差异性的情况下,增加异常的未标记日志的数量,在后面检测模型的训练过程中增加模型抵抗过拟合的能力,降低检测模型训练的难度。
请参阅图3,本申请实施例中系统异常检测方法的第三个实施例包括:
301、获取待检测系统的标记日志、无标记日志,并对所述无标记日志进行扩充,得到扩充日志;
302、将所述标记日志、所述无标记日志、所述扩充日志分别输入三个相同的异常等级训练模型中进行训练,对应输出所述标记日志的各异常等级的第一概率分布、所述无标记日志的各异常等级的第二概率分布、所述扩充日志的各异常等级的第三概率分布,其中,所述三个相同的异常等级训练模型组成异常等级训练模型集;
303、根据所述第一概率分布与所述标记日志的预设异常等级标记,计算各标记日志的异常等级的正确预测概率;
本实施例中,以异常等级训练模型对标记日志进行预测的异常等级概率分布与该标记日志的真实异常等级,来计算预测正确的概率,计算公式为:
Figure PCTCN2020118218-appb-000007
其中,p(yi)为标记日志中的第i个异常等级概率,q(yi)为标记日志中第i个异常等级概率,需注意的是,此处真实异常等级概率为1,而其他异常等级概率为0,故具体计算方式为:-log(q(yi))。
具体的,若异常等级的分类为:重大异常、普通异常、轻微异常、正常,并分别以1、2、3、4表示,Z为标记日志的异常等级概率分布,若Z~P中取值1、2、3、4的概率分别为[0.5,0.2,0.2,0.1],而该标记日志的真实异常等级为重大异常,则对应的正确预测概率为:-1xlog(0.5)=.0301。
304、根据预置模型训练参数与所述正确预测概率,计算所述第一概率分布的交叉熵损失,以用于衡量分类模型对所述标记日志的异常等级预测与所述标记日志的真实异常等级之间的差异;
本实施例中,对所有第一概率分布对应的正确预测概率累加并求均值,即可得到该第一概率分布的交叉熵损失。即可通过该交叉熵损失评价标记日志对应的异常等级训练模型的分类准确度,即分类结果与真实结果之间的量化差异指标。
305、计算所述第二概率分布与所述第三概率分布之间的一致性损失;
306、根据所述一致性损失预测所述无标记日志与所述扩充日志的异常等级,以及根据所述交叉熵损失对所述异常等级训练模型集进行迭代,直到所述异常等级训练模型集收敛,得到日志异常检测模型;
307、获取待检测系统的待检测日志,并将所述待检测日志输入所述日志异常检测模型进行检测,输出所述待检测日志对应的异常等级,并将待检测日志对应的异常等级作为当前系统运行状态的分析结果。
本申请实施例中,通过计算第一概率分布的交叉熵损失,以用于后续结合一致性损失计算最终损失,评价异常日志检测模型,作为衡量异常日志检测模型的指标之一。
请参阅图4,本申请实施例中系统异常检测方法的第四个实施例包括:
401、获取待检测系统的标记日志、无标记日志,并对所述无标记日志进行扩充,得到扩充日志;
402、将所述标记日志、所述无标记日志、所述扩充日志分别输入三个相同的异常等级训练模型中进行训练,对应输出所述标记日志的各异常等级的第一概率分布、所述无标记日志的各异常等级的第二概率分布、所述扩充日志的各异常等级的第三概率分布,其中,所述三个相同的异常等级训练模型组成异常等级训练模型集;
403、计算所述第一概率分布与所述标记日志的预设异常等级之间的交叉熵损失,以及计算所述第二概率分布与所述第三概率分布之间的一致性损失;
404、根据所述交叉熵损失确定各标记日志对应的正确预测概率,并判断是否存在大于预设概率阈值的正确预测概率;
本实施例中,在日志异常检测模型迭代训练过程中,需要逐步删除标记的标记日志,以训练信号退火的方式防止模型过拟合,增加其泛化能力。当正确预测概率大于设置的概率阈值时,则将该正确预测概率对应的标记日志删除。
具体的,在标记日志中带标记数据量正常的情况下,该概率阈值计算公式为:
Figure PCTCN2020118218-appb-000008
当标记日志中带标记数据量较少的情况下,模型容易发生过拟合,模型能 够在短时间内根据数据做出高概率的预测,故变换该概率阈值计算公式为:
Figure PCTCN2020118218-appb-000009
使得阈值的增长速度降低,以便删除更多的无效样本;当标记日志中带标记数据量较多的情况下,模型难发生过拟合,模型需要花费很长时间才能收敛,相同时间内模型输出的高概率预测样本较少,需要删除的样本也较少,故可通过变换该概率阈值计算公式为:
Figure PCTCN2020118218-appb-000010
使阈值增长速度加快,则删除的样本减少。
405、若是,则删除大于所述概率阈值的正确预测概率对应的第一概率分布,并继续对所述日志异常检测模型进行迭代,否则直接对所述日志异常检测模型进行迭代,并在所述日志异常检测模型迭代后更新所述模型训练参数;
本实施例中,通过训练信号退火的方式对日志异常检测模型进行迭代,逐步删除容易导致模型过拟合的标记日志,直到最终损失小于设置的阈值,即可确认该日志异常检测模型可用于检测的实践。
406、计算所述交叉熵损失与所述一致性损失的和,得到对应的最终损失值,并判断所述最终损失值是否小于预设的最终损失阈值;
407、若所述最终损失值小于所述最终损失阈值,则所述异常等级训练模型集收敛并停止迭代,得到日志异常检测模型;
本实施例中,结合交叉熵损失与一致性损失来评估日志异常检测模型的正确预测概率,以作为模型迭代的判别依据,此处只需将两者相加即可,即:
Figure PCTCN2020118218-appb-000011
408、获取待检测系统的待检测日志,并将所述待检测日志输入所述日志异常检测模型进行检测,输出所述待检测日志对应的异常等级,并将待检测日志对应的异常等级作为当前系统运行状态的分析结果。
本申请实施例中,在日志异常检测模型的迭代训练过程中,随着未标记数据的增加,逐步删除训练的第一概率分布,通过该训练信号退火法可有效抵抗过拟合风险。
上面对本申请实施例中系统异常检测方法进行了描述,下面对本申请实施例中系统异常检测装置进行描述,请参阅图5,本申请实施例中系统异常检测装置一个实施例包括:
获取模块501,用于获取待检测系统的标记日志、无标记日志,并对所述无标记日志进行扩充,得到扩充日志;
训练模块502,用于将所述标记日志、所述无标记日志、所述扩充日志分别输入三个相同的异常等级训练模型中进行训练,对应输出所述标记日志的各异常等级的第一概率分布、所述无标记日志的各异常等级的第二概率分布、所述扩充日志的各异常等级的第三概率分布,其中,所述三个相同的异常等级训练模型组成异常等级训练模型集;
计算模块503,用于计算第一概率分布与所述标记日志的预设异常等级标记对应的交叉熵损失,以及计算所述第二概率分布与所述第三概率分布之间的一致性损失;
生成模块504,用于根据所述一致性损失预测所述无标记日志与所述扩充日志的异常等级标记,以及根据所述交叉熵损失对所述异常等级训练模型集进行迭代,直到所述异常等级训练模型集收敛,得到日志异常检测模型;
检测模块505,用于获取待检测系统的待检测日志,并将所述待检测日志输入所述日志异常检测模型进行检测,输出所述待检测日志对应的异常等级,并将待检测日志对应的异常等级作为当前系统运行状态的分析结果。
本申请实施例中,通过获取待检测系统的标记日志、无标记日志及扩充日志分别输入异常等级训练模型集中三个相同的异常等级训练模型中进行训练,输出三者的各异常等级的概率分布;然后计算异常等级训练模型输出的交叉熵损失、一致性损失;再根据一致性损失预测无标记日志与扩充日志的异常等级,以及根据交叉熵损失对所述异常等级训练模型集进行迭代,直 到异常等级训练模型集收敛,得到日志异常检测模型;最后通过日志异常检测模型对系统运行中的异常日志进行检测。优化模型训练方式,防止模型过拟合,降低检测模型对系统中异常点的检测难度。
请参阅图6,本申请实施例中系统异常检测装置的另一个实施例包括:
获取模块501,用于获取待检测系统的标记日志、无标记日志,并对所述无标记日志进行扩充,得到扩充日志;
训练模块502,用于将所述标记日志、所述无标记日志、所述扩充日志分别输入三个相同的异常等级训练模型中进行训练,对应输出所述标记日志的各异常等级的第一概率分布、所述无标记日志的各异常等级的第二概率分布、所述扩充日志的各异常等级的第三概率分布,其中,所述三个相同的异常等级训练模型组成异常等级训练模型集;
计算模块503,用于计算第一概率分布与所述标记日志的预设异常等级标记对应的交叉熵损失,以及计算所述第二概率分布与所述第三概率分布之间的一致性损失;
生成模块504,用于根据所述一致性损失预测所述无标记日志与所述扩充日志的异常等级标记,以及根据所述交叉熵损失对所述异常等级训练模型集进行迭代,直到所述异常等级训练模型集收敛,得到日志异常检测模型;
检测模块505,用于获取待检测系统的待检测日志,并将所述待检测日志输入所述日志异常检测模型进行检测,输出所述待检测日志对应的异常等级,并将待检测日志对应的异常等级作为当前系统运行状态的分析结果。
具体的,所述获取模块501还用于:
解析所述无标记日志,得到多个带有不同语义的日志字段;
根据预置语义结构先验知识和所述日志字段的出现频率,从所述日志字段中筛选与异常等级相关的关键字段;
获取所述关键字段对应的一个或多个同义字段,并以所述同义字段替换对应的关键字段;
根据随机字段处理策略,对所述同义字段与除所述关键字段外的其他日志字段进行拼接,得到多个对应的扩充日志,其中,所述随机字段处理策略包括对所述其他日志字段进行替换、删除、插入或交换。
具体的,所述训练模块502还包括:
构建单元5021,用于将所述标记日志、所述无标记日志、所述扩充日志中的各日志数据的长度统一调整为预设长度,并构建对应的数据向量;
特征提取单元5022,用于根据所述数据向量的长度,确定所述数据向量的特征维度,并根据所述特征维度对所述数据向量进行语义特征提取,得到初始语义特征;
概率分布生成单元5023,用于对所述初始语义特征进行突出特征的筛选及组合,得到最终语义特征,并根据所述最终语义特征计算所述标记日志、所述无标记日志与所述扩充日志的异常等级的概率分布并输出。
具体的,所述计算模块503还包括:
第一计算单元5031,用于根据第一概率分布与所述标记日志的预设异常等级标记,计算各标记日志的异常等级的正确预测概率;
第二计算单元5032,用于根据预置模型训练参数与所述正确预测概率,计算所述第一概率分布的交叉熵损失,以用于衡量分类模型对所述标记日志的异常等级预测与所述标记日志的真实异常等级之间的差异。
具体的,所述生成模块504还包括:
迭代单元5041,用于根据所述交叉熵损失确定各标记日志对应的正确预测概率;判断是否存在大于预设概率阈值的正确预测概率;若是,则删除大于所述概率阈值的正确预测概率对应的第一概率分布,并继续对所述日志异常检测模型进行迭代,否则直接对所述日志异常检测模型进行迭代,并在所述日志异常检测模型迭代后更新所述模型训练参数;
模型生成单元5042,用于计算所述交叉熵损失与所述一致性损失的和,得到对应的最终损 失值,并判断所述最终损失值是否小于预设的最终损失阈值;若所述最终损失值小于所述最终损失阈值,则所述异常等级训练模型集收敛并停止迭代,得到日志异常检测模型。
具体的,所述正确预测概率的计算公式为:
Figure PCTCN2020118218-appb-000012
Figure PCTCN2020118218-appb-000013
其中,所述η t为概率阈值,所述a t为增长系数,所述K为异常等级类别个数,所述t为当前迭代次数,所述T为预设的总迭代次数;
当所述标记日志中的数据量小于预设正常数据量范围时,
Figure PCTCN2020118218-appb-000014
当所述标记日志中的数据量大于所述正常数据量范围时,所述
Figure PCTCN2020118218-appb-000015
具体的,所述检测模块505还包括:
获取单元5051,用于获取待检测系统的待检测日志,其中所述待检测日志中包含多条日志信息,所述日志信息包含系统运行管理优先级的标识信息;
检测单元5052,用于将所述待检测日志输入所述日志异常检测模型进行检测,并通过所述日志异常检测模型预测所述待检测日志的异常等级;
筛选单元5053,用于筛选异常等级高于预设异常等级阈值的待检测日志,并根据所述标识信息,从筛选的待检测日志中确定优先级大于预设优先级阈值的日志信息;
分析结果生成单元5054,用于将所述优先级大于预设优先级阈值的日志信息进行高亮显示,并将所述高亮显示的日志信息与除高亮显示的日志信息外的其他日志信息对应的异常等级作为当前系统运行状态的分析结果。
本申请实施例中,通过获取标记日志与未标记日志,再对系统异常情况下产生的未标记日志进行扩充,在保证数据差异性的情况下,增加异常的未标记日志的数量,在后面检测模型的训练过程中增加模型抵抗过拟合的能力;将标记日志、未标记日志、扩充日志分别输入异常等级训练模型集中的三个对应的异常等级训练模型中进行训练,以用于预测三者的各异常等级概率分布;在日志异常检测模型的迭代训练过程中,随着未标记数据的增加,逐步删除训练的第一概率分布,最后生成的异常日志检测模型可用于预测系统运行过程中产生的待检测日志对应的异常等级,得到系统运行状态的分析结果,使模型抵抗过拟合强度提升,检察模型的检测难度下降。
上面图5和图6从模块化功能实体的角度对本申请实施例中的系统异常检测装置进行详细描述,下面从硬件处理的角度对本申请实施例中设备进行详细描述。
图7是本申请实施例提供的一种系统异常检测设备的结构示意图,该系统异常检测设备700可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上处理器(central processing units,CPU)710(例如,一个或一个以上处理器)和存储器720,一个或一个以上存储应用程序733或数据732的存储介质730(例如一个或一个以上海量存储设备)。其中,存储器720和存储介质730可以是短暂存储或持久存储。存储在存储介质730的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对系统异常检测设备700中的一系列指令操作。更进一步地,处理器710可以设置为与存储介质730通信,在系统异常检测设备700上执行存储介质730中的一系列指令操作。
系统异常检测设备700还可以包括一个或一个以上电源740,一个或一个以上有线或无线网络接口750,一个或一个以上输入输出接口760,和/或,一个或一个以上操作系统731,例如Windows Serve,Mac OS X,Unix,Linux,FreeBSD等等。本领域技术人员可以理解,图7示出的系统异常检测设备结构并不构成对基于系统异常检测设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
本申请还提供一种计算机可读存储介质,该计算机可读存储介质可以为非易失性计算机可读存储介质,该计算机可读存储介质也可以为易失性计算机可读存储介质,所述计算机可读存储介质中存储有指令,当所述指令在计算机上运行时,使得计算机执行所述系统异常检测方法 的步骤。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。

Claims (20)

  1. 一种系统异常检测方法,其中,所述系统异常检测方法包括:
    获取待检测系统的标记日志、无标记日志,并对所述无标记日志进行扩充,得到扩充日志;
    将所述标记日志、所述无标记日志、所述扩充日志分别输入三个相同的异常等级训练模型中进行训练,对应输出所述标记日志的各异常等级的第一概率分布、所述无标记日志的各异常等级的第二概率分布、所述扩充日志的各异常等级的第三概率分布,其中,所述三个相同的异常等级训练模型组成异常等级训练模型集;
    计算所述第一概率分布与所述标记日志的预设异常等级之间的交叉熵损失,以及计算所述第二概率分布与所述第三概率分布之间的一致性损失;
    根据所述一致性损失预测所述无标记日志与所述扩充日志的异常等级,以及根据所述交叉熵损失对所述异常等级训练模型集进行迭代,直到所述异常等级训练模型集收敛,得到日志异常检测模型;
    获取当前系统的待检测日志,并将所述待检测日志输入所述日志异常检测模型进行检测,输出所述待检测日志对应的异常等级,并将待检测日志对应的异常等级作为当前系统运行状态的分析结果。
  2. 根据权利要求1所述的系统异常检测方法,其中,所述对所述无标记日志进行扩充,得到扩充日志包括:
    解析所述无标记日志,得到多个带有不同语义的日志字段;
    根据预置语义结构先验知识和所述日志字段的出现频率,从所述日志字段中筛选与异常等级相关的关键字段;
    获取所述关键字段对应的一个或多个同义字段,并以所述同义字段替换对应的关键字段;
    根据随机字段处理策略,对所述同义字段与除所述关键字段外的其他日志字段进行拼接,得到多个对应的扩充日志,其中,所述随机字段处理策略包括对所述其他日志字段进行替换、删除、插入或交换。
  3. 根据权利要求1所述的系统异常检测方法,其中,所述将所述标记日志、所述无标记日志、所述扩充日志分别输入三个相同的异常等级训练模型中进行训练,对应输出所述标记日志的各异常等级的第一概率分布、所述无标记日志的各异常等级的第二概率分布、所述扩充日志的各异常等级的第三概率分布包括:
    将所述标记日志、所述无标记日志、所述扩充日志中的各日志数据的长度统一调整为预设长度,并构建对应的数据向量;
    根据所述数据向量的长度,确定所述数据向量的特征维度,并根据所述特征维度对所述数据向量进行语义特征提取,得到初始语义特征;
    对所述初始语义特征进行突出特征的筛选及组合,得到最终语义特征,并根据所述最终语义特征计算所述标记日志、所述无标记日志与所述扩充日志的异常等级的概率分布并输出。
  4. 根据权利要求1-3任一项所述的系统异常检测方法,其中,所述计算所述第一概率分布与所述标记日志的预设异常等级之间的交叉熵损失包括:
    根据所述第一概率分布与所述标记日志的预设异常等级标记,计算各标记日志的异常等级的正确预测概率;
    根据预置模型训练参数与所述正确预测概率,计算所述第一概率分布的交叉熵损失,以用于衡量分类模型对所述标记日志的异常等级预测与所述标记日志的真实异常等级之间的差异。
  5. 根据权利要求4所述的系统异常检测方法,其中,所述根据所述交叉熵损失对所述异常等级训练模型集进行迭代,直到所述异常等级训练模型集收敛,得到日志异常检测模型包括:
    根据所述交叉熵损失确定各标记日志对应的正确预测概率,并判断是否存在大于预设概率阈值的正确预测概率;
    若是,则删除大于所述概率阈值的正确预测概率对应的第一概率分布,并继续对所述日志 异常检测模型进行迭代,否则直接对所述日志异常检测模型进行迭代,并在所述日志异常检测模型迭代后更新所述模型训练参数;
    计算所述交叉熵损失与所述一致性损失的和,得到对应的最终损失值,并判断所述最终损失值是否小于预设的最终损失阈值;
    若所述最终损失值小于所述最终损失阈值,则所述异常等级训练模型集收敛并停止迭代,得到日志异常检测模型。
  6. 根据权利要求5所述的系统异常检测方法,其中,所述正确预测概率的计算公式为:
    Figure PCTCN2020118218-appb-100001
    Figure PCTCN2020118218-appb-100002
    其中,所述η t为概率阈值,所述a t为增长系数,所述K为异常等级类别个数,所述t为当前迭代次数,所述T为预设的总迭代次数;
    当所述标记日志中的数据量小于预设正常数据量范围时,
    Figure PCTCN2020118218-appb-100003
    当所述标记日志中的数据量大于所述正常数据量范围时,所述
    Figure PCTCN2020118218-appb-100004
  7. 根据权利要求1所述的系统异常检测方法,其中,所述获取待检测系统的待检测日志,并将所述待检测日志输入所述日志异常检测模型进行检测,输出所述待检测日志对应的异常等级,并将待检测日志对应的异常等级作为当前系统运行状态的分析结果包括:
    获取待检测系统的待检测日志,其中所述待检测日志中包含多条日志信息,所述日志信息包含系统运行管理优先级的标识信息;
    将所述待检测日志输入所述日志异常检测模型进行检测,并通过所述日志异常检测模型预测所述待检测日志的异常等级;
    筛选异常等级高于预设异常等级阈值的待检测日志,并根据所述标识信息,从筛选的待检测日志中确定优先级大于预设优先级阈值的日志信息;
    将所述优先级大于预设优先级阈值的日志信息进行高亮显示,并将所述高亮显示的日志信息与除高亮显示的日志信息外的其他日志信息对应的异常等级作为当前系统运行状态的分析结果。
  8. 一种系统异常检测装置,其中,所述系统异常检测装置包括:
    获取模块,用于获取待检测系统的标记日志、无标记日志,并对所述无标记日志进行扩充,得到扩充日志;
    训练模块,用于将所述标记日志、所述无标记日志、所述扩充日志分别输入三个相同的异常等级训练模型中进行训练,对应输出所述标记日志的各异常等级的第一概率分布、所述无标记日志的各异常等级的第二概率分布、所述扩充日志的各异常等级的第三概率分布,其中,所述三个相同的异常等级训练模型组成异常等级训练模型集;
    计算模块,用于计算所述第一概率分布与所述标记日志的预设异常等级之间的交叉熵损失,以及计算所述第二概率分布与所述第三概率分布之间的一致性损失;
    生成模块,用于根据所述一致性损失预测所述无标记日志与所述扩充日志的异常等级,以及根据所述交叉熵损失对所述异常等级训练模型集进行迭代,直到所述异常等级训练模型集收敛,得到日志异常检测模型;
    检测模块,用于获取待检测系统的待检测日志,并将所述待检测日志输入所述日志异常检测模型进行检测,输出所述待检测日志对应的异常等级,并将待检测日志对应的异常等级作为当前系统运行状态的分析结果。
  9. 一种系统异常检测设备,其中,所述系统异常检测设备包括:存储器和至少一个处理器,所述存储器中存储有指令,所述存储器和所述至少一个处理器通过线路互连;
    所述至少一个处理器调用所述存储器中的所述指令,以使得所述系统异常检测设备执行如下所述的系统异常检测方法:
    获取待检测系统的标记日志、无标记日志,并对所述无标记日志进行扩充,得到扩充日志;
    将所述标记日志、所述无标记日志、所述扩充日志分别输入三个相同的异常等级训练模型中进行训练,对应输出所述标记日志的各异常等级的第一概率分布、所述无标记日志的各异常等级的第二概率分布、所述扩充日志的各异常等级的第三概率分布,其中,所述三个相同的异常等级训练模型组成异常等级训练模型集;
    计算所述第一概率分布与所述标记日志的预设异常等级之间的交叉熵损失,以及计算所述第二概率分布与所述第三概率分布之间的一致性损失;
    根据所述一致性损失预测所述无标记日志与所述扩充日志的异常等级,以及根据所述交叉熵损失对所述异常等级训练模型集进行迭代,直到所述异常等级训练模型集收敛,得到日志异常检测模型;
    获取当前系统的待检测日志,并将所述待检测日志输入所述日志异常检测模型进行检测,输出所述待检测日志对应的异常等级,并将待检测日志对应的异常等级作为当前系统运行状态的分析结果。
  10. 根据权利要求9所述的系统异常检测设备,其中,所述对所述无标记日志进行扩充,得到扩充日志,包括以下步骤:
    解析所述无标记日志,得到多个带有不同语义的日志字段;
    根据预置语义结构先验知识和所述日志字段的出现频率,从所述日志字段中筛选与异常等级相关的关键字段;
    获取所述关键字段对应的一个或多个同义字段,并以所述同义字段替换对应的关键字段;
    根据随机字段处理策略,对所述同义字段与除所述关键字段外的其他日志字段进行拼接,得到多个对应的扩充日志,其中,所述随机字段处理策略包括对所述其他日志字段进行替换、删除、插入或交换。
  11. 根据权利要求9所述的系统异常检测设备,其中,所述将所述标记日志、所述无标记日志、所述扩充日志分别输入三个相同的异常等级训练模型中进行训练,对应输出所述标记日志的各异常等级的第一概率分布、所述无标记日志的各异常等级的第二概率分布、所述扩充日志的各异常等级的第三概率分布,包括以下步骤:
    将所述标记日志、所述无标记日志、所述扩充日志中的各日志数据的长度统一调整为预设长度,并构建对应的数据向量;
    根据所述数据向量的长度,确定所述数据向量的特征维度,并根据所述特征维度对所述数据向量进行语义特征提取,得到初始语义特征;
    对所述初始语义特征进行突出特征的筛选及组合,得到最终语义特征,并根据所述最终语义特征计算所述标记日志、所述无标记日志与所述扩充日志的异常等级的概率分布并输出。
  12. 根据权利要求9-11任一项所述的系统异常检测设备,其中,所述计算所述第一概率分布与所述标记日志的预设异常等级之间的交叉熵损失,包括以下步骤:
    根据所述第一概率分布与所述标记日志的预设异常等级标记,计算各标记日志的异常等级的正确预测概率;
    根据预置模型训练参数与所述正确预测概率,计算所述第一概率分布的交叉熵损失,以用于衡量分类模型对所述标记日志的异常等级预测与所述标记日志的真实异常等级之间的差异。
  13. 根据权利要求12所述的系统异常检测设备,其中,所述根据所述交叉熵损失对所述异常等级训练模型集进行迭代,直到所述异常等级训练模型集收敛,得到日志异常检测模型,包括以下步骤:
    根据所述交叉熵损失确定各标记日志对应的正确预测概率,并判断是否存在大于预设概率阈值的正确预测概率;
    若是,则删除大于所述概率阈值的正确预测概率对应的第一概率分布,并继续对所述日志异常检测模型进行迭代,否则直接对所述日志异常检测模型进行迭代,并在所述日志异常检测模型迭代后更新所述模型训练参数;
    计算所述交叉熵损失与所述一致性损失的和,得到对应的最终损失值,并判断所述最终损 失值是否小于预设的最终损失阈值;
    若所述最终损失值小于所述最终损失阈值,则所述异常等级训练模型集收敛并停止迭代,得到日志异常检测模型。
  14. 根据权利要求13所述的系统异常检测设备,其中,所述正确预测概率的计算公式为:
    Figure PCTCN2020118218-appb-100005
    Figure PCTCN2020118218-appb-100006
    其中,所述η t为概率阈值,所述a t为增长系数,所述K为异常等级类别个数,所述t为当前迭代次数,所述T为预设的总迭代次数;
    当所述标记日志中的数据量小于预设正常数据量范围时,
    Figure PCTCN2020118218-appb-100007
    当所述标记日志中的数据量大于所述正常数据量范围时,所述
    Figure PCTCN2020118218-appb-100008
  15. 根据权利要求9所述的系统异常检测设备,其中,所述获取待检测系统的待检测日志,并将所述待检测日志输入所述日志异常检测模型进行检测,输出所述待检测日志对应的异常等级,并将待检测日志对应的异常等级作为当前系统运行状态的分析结果,包括以下步骤:
    获取待检测系统的待检测日志,其中所述待检测日志中包含多条日志信息,所述日志信息包含系统运行管理优先级的标识信息;
    将所述待检测日志输入所述日志异常检测模型进行检测,并通过所述日志异常检测模型预测所述待检测日志的异常等级;
    筛选异常等级高于预设异常等级阈值的待检测日志,并根据所述标识信息,从筛选的待检测日志中确定优先级大于预设优先级阈值的日志信息;
    将所述优先级大于预设优先级阈值的日志信息进行高亮显示,并将所述高亮显示的日志信息与除高亮显示的日志信息外的其他日志信息对应的异常等级作为当前系统运行状态的分析结果。
  16. 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,其中,所述计算机程序被处理器执行时实现如下所述的系统异常检测方法的步骤:
    获取待检测系统的标记日志、无标记日志,并对所述无标记日志进行扩充,得到扩充日志;
    将所述标记日志、所述无标记日志、所述扩充日志分别输入三个相同的异常等级训练模型中进行训练,对应输出所述标记日志的各异常等级的第一概率分布、所述无标记日志的各异常等级的第二概率分布、所述扩充日志的各异常等级的第三概率分布,其中,所述三个相同的异常等级训练模型组成异常等级训练模型集;
    计算所述第一概率分布与所述标记日志的预设异常等级之间的交叉熵损失,以及计算所述第二概率分布与所述第三概率分布之间的一致性损失;
    根据所述一致性损失预测所述无标记日志与所述扩充日志的异常等级,以及根据所述交叉熵损失对所述异常等级训练模型集进行迭代,直到所述异常等级训练模型集收敛,得到日志异常检测模型;
    获取当前系统的待检测日志,并将所述待检测日志输入所述日志异常检测模型进行检测,输出所述待检测日志对应的异常等级,并将待检测日志对应的异常等级作为当前系统运行状态的分析结果。
  17. 根据权利要求16所述的计算机可读存储介质,其中,所述系统异常检测方法的计算机程序被所述处理器执行所述对所述无标记日志进行扩充,得到扩充日志的步骤时,包括以下步骤:
    解析所述无标记日志,得到多个带有不同语义的日志字段;
    根据预置语义结构先验知识和所述日志字段的出现频率,从所述日志字段中筛选与异常等级相关的关键字段;
    获取所述关键字段对应的一个或多个同义字段,并以所述同义字段替换对应的关键字段;
    根据随机字段处理策略,对所述同义字段与除所述关键字段外的其他日志字段进行拼接, 得到多个对应的扩充日志,其中,所述随机字段处理策略包括对所述其他日志字段进行替换、删除、插入或交换。
  18. 根据权利要求16所述的计算机可读存储介质,其中,述系统异常检测方法的计算机程序被所述处理器执行所述将所述标记日志、所述无标记日志、所述扩充日志分别输入三个相同的异常等级训练模型中进行训练,对应输出所述标记日志的各异常等级的第一概率分布、所述无标记日志的各异常等级的第二概率分布、所述扩充日志的各异常等级的第三概率分布的步骤时,包括以下步骤:
    将所述标记日志、所述无标记日志、所述扩充日志中的各日志数据的长度统一调整为预设长度,并构建对应的数据向量;
    根据所述数据向量的长度,确定所述数据向量的特征维度,并根据所述特征维度对所述数据向量进行语义特征提取,得到初始语义特征;
    对所述初始语义特征进行突出特征的筛选及组合,得到最终语义特征,并根据所述最终语义特征计算所述标记日志、所述无标记日志与所述扩充日志的异常等级的概率分布并输出。
  19. 根据权利要求16-18任一项所述的计算机可读存储介质,其中,述系统异常检测方法的计算机程序被所述处理器执行所述计算所述第一概率分布与所述标记日志的预设异常等级之间的交叉熵损失的步骤时,包括以下步骤:
    根据所述第一概率分布与所述标记日志的预设异常等级标记,计算各标记日志的异常等级的正确预测概率;
    根据预置模型训练参数与所述正确预测概率,计算所述第一概率分布的交叉熵损失,以用于衡量分类模型对所述标记日志的异常等级预测与所述标记日志的真实异常等级之间的差异。
  20. 根据权利要求19所述的计算机可读存储介质,其中,述系统异常检测方法的计算机程序被所述处理器执行所述根据所述交叉熵损失对所述异常等级训练模型集进行迭代,直到所述异常等级训练模型集收敛,得到日志异常检测模型的步骤时,包括以下步骤:
    根据所述交叉熵损失确定各标记日志对应的正确预测概率,并判断是否存在大于预设概率阈值的正确预测概率;
    若是,则删除大于所述概率阈值的正确预测概率对应的第一概率分布,并继续对所述日志异常检测模型进行迭代,否则直接对所述日志异常检测模型进行迭代,并在所述日志异常检测模型迭代后更新所述模型训练参数;
    计算所述交叉熵损失与所述一致性损失的和,得到对应的最终损失值,并判断所述最终损失值是否小于预设的最终损失阈值;
    若所述最终损失值小于所述最终损失阈值,则所述异常等级训练模型集收敛并停止迭代,得到日志异常检测模型。
PCT/CN2020/118218 2020-06-30 2020-09-28 系统异常检测方法、装置、设备及存储介质 WO2021139235A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010611178.5A CN111782472B (zh) 2020-06-30 2020-06-30 系统异常检测方法、装置、设备及存储介质
CN202010611178.5 2020-06-30

Publications (1)

Publication Number Publication Date
WO2021139235A1 true WO2021139235A1 (zh) 2021-07-15

Family

ID=72760356

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/118218 WO2021139235A1 (zh) 2020-06-30 2020-09-28 系统异常检测方法、装置、设备及存储介质

Country Status (2)

Country Link
CN (1) CN111782472B (zh)
WO (1) WO2021139235A1 (zh)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114297054A (zh) * 2021-12-17 2022-04-08 北京交通大学 一种基于子空间混合抽样的软件缺陷数目预测方法
CN114338129A (zh) * 2021-12-24 2022-04-12 中汽创智科技有限公司 一种报文异常检测方法、装置、设备及介质
CN114819458A (zh) * 2021-12-31 2022-07-29 第四范式(北京)技术有限公司 仿真模型的构建方法和仿真模型的构建装置
CN115099676A (zh) * 2022-07-14 2022-09-23 华能罗源发电有限责任公司 火电储能系统的gis之中母线状态量检测方法
CN115146718A (zh) * 2022-06-27 2022-10-04 北京华能新锐控制技术有限公司 基于深度表示的风电机组异常检测方法
CN115174251A (zh) * 2022-07-19 2022-10-11 深信服科技股份有限公司 一种安全告警的误报识别方法、装置以及存储介质
CN115168154A (zh) * 2022-07-26 2022-10-11 北京优特捷信息技术有限公司 一种基于动态基线的异常日志检测方法、装置及设备
CN115499159A (zh) * 2022-08-09 2022-12-20 重庆长安汽车股份有限公司 Can信号异常检测方法、装置、车辆及存储介质
CN115883346A (zh) * 2023-02-23 2023-03-31 广州嘉为科技有限公司 一种基于fdep日志的异常检测方法、装置及存储介质
CN116070206A (zh) * 2023-03-28 2023-05-05 上海观安信息技术股份有限公司 一种异常行为检测方法、系统、电子设备及存储介质
CN116405326A (zh) * 2023-06-07 2023-07-07 厦门瞳景智能科技有限公司 基于区块链的信息安全管理方法及其系统
CN116863638A (zh) * 2023-06-01 2023-10-10 国药集团重庆医药设计院有限公司 一种基于主动预警的人员异常行为检测方法及安防系统
CN116911852A (zh) * 2023-07-21 2023-10-20 广州嘉磊元新信息科技有限公司 一种rpa的用户动态信息监控方法及系统
CN117149846A (zh) * 2023-08-16 2023-12-01 湖北中恒电测科技有限公司 一种基于数据融合的电力数据分析方法和系统
CN117271350A (zh) * 2023-09-28 2023-12-22 江苏天好富兴数据技术有限公司 一种基于日志分析的软件质量评估系统及方法
CN117290380A (zh) * 2023-11-14 2023-12-26 华青融天(北京)软件股份有限公司 异常维度数据生成方法、装置、设备和计算机可读介质
CN117538910A (zh) * 2023-12-20 2024-02-09 广东邦盛北斗科技股份公司 基于ai的北斗定位信号测试分析方法及系统

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112308455B (zh) * 2020-11-20 2024-04-09 深圳前海微众银行股份有限公司 根因定位方法、装置、设备及计算机存储介质
CN112446335A (zh) * 2020-12-02 2021-03-05 电子科技大学中山学院 一种基于深度学习的太赫兹违禁物品检测方法
CN112883193A (zh) * 2021-02-25 2021-06-01 中国平安人寿保险股份有限公司 一种文本分类模型的训练方法、装置、设备以及可读介质
CN113347033B (zh) * 2021-05-31 2022-05-27 中国工商银行股份有限公司 基于区块链的根因定位方法、系统及验证节点
CN113256434B (zh) * 2021-06-08 2021-11-23 平安科技(深圳)有限公司 车险理赔行为识别方法、装置、设备及存储介质
CN113297051B (zh) * 2021-07-26 2022-03-04 云智慧(北京)科技有限公司 一种日志分析处理方法及装置
CN113672870A (zh) * 2021-08-20 2021-11-19 中国南方电网有限责任公司超高压输电公司柳州局 故障事件概率估算方法、装置、计算机设备和存储介质
CN114238965A (zh) * 2021-11-17 2022-03-25 北京华清信安科技有限公司 用于恶意访问的检测分析方法及系统
CN114881112A (zh) * 2022-03-31 2022-08-09 北京优特捷信息技术有限公司 一种系统异常检测方法、装置、设备及介质
CN114706709B (zh) * 2022-06-01 2022-08-23 成都运荔枝科技有限公司 一种saas服务异常处理方法、装置及可读存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101360023A (zh) * 2008-09-09 2009-02-04 成都市华为赛门铁克科技有限公司 一种异常检测方法、装置及系统
CN102821002A (zh) * 2011-06-09 2012-12-12 中国移动通信集团河南有限公司信阳分公司 网络流量异常检测方法和系统
US20140040677A1 (en) * 2012-08-02 2014-02-06 Fujitsu Limited Storage device, control device and data protection method
CN106951353A (zh) * 2017-03-20 2017-07-14 北京搜狐新媒体信息技术有限公司 作业数据异常检测方法及装置
CN109343990A (zh) * 2018-09-25 2019-02-15 江苏润和软件股份有限公司 一种基于深度学习的云计算系统异常检测方法
CN110365648A (zh) * 2019-06-14 2019-10-22 东南大学 一种基于决策树的车载can总线异常检测方法

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7457985B2 (en) * 2005-09-09 2008-11-25 International Business Machines Corporation Method to detect errors in computer systems by using state tracking
CN103389701B (zh) * 2013-07-15 2015-08-19 浙江大学 基于分布式数据模型的厂级过程故障检测与诊断方法
CN107463455B (zh) * 2017-08-01 2020-10-30 联想(北京)有限公司 一种检测内存故障的方法及装置
CN108090615B (zh) * 2017-12-21 2021-10-08 东南大学溧阳研究院 基于交叉熵集成学习的电力系统故障后最低频率预测方法
CN109284606B (zh) * 2018-09-04 2019-08-27 中国人民解放军陆军工程大学 基于经验特征与卷积神经网络的数据流异常检测系统

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101360023A (zh) * 2008-09-09 2009-02-04 成都市华为赛门铁克科技有限公司 一种异常检测方法、装置及系统
CN102821002A (zh) * 2011-06-09 2012-12-12 中国移动通信集团河南有限公司信阳分公司 网络流量异常检测方法和系统
US20140040677A1 (en) * 2012-08-02 2014-02-06 Fujitsu Limited Storage device, control device and data protection method
CN106951353A (zh) * 2017-03-20 2017-07-14 北京搜狐新媒体信息技术有限公司 作业数据异常检测方法及装置
CN109343990A (zh) * 2018-09-25 2019-02-15 江苏润和软件股份有限公司 一种基于深度学习的云计算系统异常检测方法
CN110365648A (zh) * 2019-06-14 2019-10-22 东南大学 一种基于决策树的车载can总线异常检测方法

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114297054B (zh) * 2021-12-17 2023-06-30 北京交通大学 一种基于子空间混合抽样的软件缺陷数目预测方法
CN114297054A (zh) * 2021-12-17 2022-04-08 北京交通大学 一种基于子空间混合抽样的软件缺陷数目预测方法
CN114338129A (zh) * 2021-12-24 2022-04-12 中汽创智科技有限公司 一种报文异常检测方法、装置、设备及介质
CN114338129B (zh) * 2021-12-24 2023-10-31 中汽创智科技有限公司 一种报文异常检测方法、装置、设备及介质
CN114819458A (zh) * 2021-12-31 2022-07-29 第四范式(北京)技术有限公司 仿真模型的构建方法和仿真模型的构建装置
CN115146718A (zh) * 2022-06-27 2022-10-04 北京华能新锐控制技术有限公司 基于深度表示的风电机组异常检测方法
CN115099676A (zh) * 2022-07-14 2022-09-23 华能罗源发电有限责任公司 火电储能系统的gis之中母线状态量检测方法
CN115174251A (zh) * 2022-07-19 2022-10-11 深信服科技股份有限公司 一种安全告警的误报识别方法、装置以及存储介质
CN115174251B (zh) * 2022-07-19 2023-09-05 深信服科技股份有限公司 一种安全告警的误报识别方法、装置以及存储介质
CN115168154A (zh) * 2022-07-26 2022-10-11 北京优特捷信息技术有限公司 一种基于动态基线的异常日志检测方法、装置及设备
CN115168154B (zh) * 2022-07-26 2023-06-23 北京优特捷信息技术有限公司 一种基于动态基线的异常日志检测方法、装置及设备
CN115499159A (zh) * 2022-08-09 2022-12-20 重庆长安汽车股份有限公司 Can信号异常检测方法、装置、车辆及存储介质
CN115499159B (zh) * 2022-08-09 2024-05-07 重庆长安汽车股份有限公司 Can信号异常检测方法、装置、车辆及存储介质
CN115883346A (zh) * 2023-02-23 2023-03-31 广州嘉为科技有限公司 一种基于fdep日志的异常检测方法、装置及存储介质
CN116070206A (zh) * 2023-03-28 2023-05-05 上海观安信息技术股份有限公司 一种异常行为检测方法、系统、电子设备及存储介质
CN116070206B (zh) * 2023-03-28 2023-06-30 上海观安信息技术股份有限公司 一种异常行为检测方法、系统、电子设备及存储介质
CN116863638B (zh) * 2023-06-01 2024-02-23 国药集团重庆医药设计院有限公司 一种基于主动预警的人员异常行为检测方法及安防系统
CN116863638A (zh) * 2023-06-01 2023-10-10 国药集团重庆医药设计院有限公司 一种基于主动预警的人员异常行为检测方法及安防系统
CN116405326A (zh) * 2023-06-07 2023-07-07 厦门瞳景智能科技有限公司 基于区块链的信息安全管理方法及其系统
CN116405326B (zh) * 2023-06-07 2023-10-20 厦门瞳景智能科技有限公司 基于区块链的信息安全管理方法及其系统
CN116911852A (zh) * 2023-07-21 2023-10-20 广州嘉磊元新信息科技有限公司 一种rpa的用户动态信息监控方法及系统
CN116911852B (zh) * 2023-07-21 2024-01-26 广州嘉磊元新信息科技有限公司 一种rpa的用户动态信息监控方法及系统
CN117149846A (zh) * 2023-08-16 2023-12-01 湖北中恒电测科技有限公司 一种基于数据融合的电力数据分析方法和系统
CN117149846B (zh) * 2023-08-16 2024-05-24 上海永天科技股份有限公司 一种基于数据融合的电力数据分析方法和系统
CN117271350A (zh) * 2023-09-28 2023-12-22 江苏天好富兴数据技术有限公司 一种基于日志分析的软件质量评估系统及方法
CN117290380B (zh) * 2023-11-14 2024-02-06 华青融天(北京)软件股份有限公司 异常维度数据生成方法、装置、设备和计算机可读介质
CN117290380A (zh) * 2023-11-14 2023-12-26 华青融天(北京)软件股份有限公司 异常维度数据生成方法、装置、设备和计算机可读介质
CN117538910A (zh) * 2023-12-20 2024-02-09 广东邦盛北斗科技股份公司 基于ai的北斗定位信号测试分析方法及系统
CN117538910B (zh) * 2023-12-20 2024-04-30 广东邦盛北斗科技股份公司 基于ai的北斗定位信号测试分析方法及系统

Also Published As

Publication number Publication date
CN111782472A (zh) 2020-10-16
CN111782472B (zh) 2022-04-26

Similar Documents

Publication Publication Date Title
WO2021139235A1 (zh) 系统异常检测方法、装置、设备及存储介质
US20220147405A1 (en) Automatically scalable system for serverless hyperparameter tuning
Chen et al. Entity embedding-based anomaly detection for heterogeneous categorical events
CN111612041B (zh) 异常用户识别方法及装置、存储介质、电子设备
US20220103444A1 (en) Methods and systems for predicting time of server failure using server logs and time-series data
US20210097343A1 (en) Method and apparatus for managing artificial intelligence systems
WO2021213247A1 (zh) 一种异常检测方法及装置
WO2021139279A1 (zh) 基于分类模型的数据处理方法、装置、电子设备及介质
US11810000B2 (en) Systems and methods for expanding data classification using synthetic data generation in machine learning models
CN112953629B (zh) 一种光网络故障预测不确定性分析方法及系统
CN112579414B (zh) 日志异常检测方法及装置
CN114785666B (zh) 一种网络故障排查方法与系统
US11501547B1 (en) Leveraging text profiles to select and configure models for use with textual datasets
CN112685324A (zh) 一种生成测试方案的方法及系统
CN116451142A (zh) 一种基于机器学习算法的水质传感器故障检测方法
Xie et al. Logm: Log analysis for multiple components of hadoop platform
CN116384223A (zh) 基于退化状态智能辨识的核设备可靠性评估方法及系统
CN112732690B (zh) 一种用于慢病检测及风险评估的稳定系统及方法
CN112860652A (zh) 作业状态预测方法、装置和电子设备
Smitha Rao et al. Conceptual machine learning framework for initial data analysis
Liu et al. Healthcare center clustering for Cox's proportional hazards model by fusion penalty
AU2021412848B2 (en) Integrated feature engineering
Pusa AI-assisted anomaly detection from log data
He et al. Unsupervised Log Anomaly Detection Based on Pre-training
WO2023105262A1 (en) Data input processing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20912585

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20912585

Country of ref document: EP

Kind code of ref document: A1