CN112905380A - System anomaly detection method based on automatic monitoring log - Google Patents

System anomaly detection method based on automatic monitoring log Download PDF

Info

Publication number
CN112905380A
CN112905380A CN202110300903.1A CN202110300903A CN112905380A CN 112905380 A CN112905380 A CN 112905380A CN 202110300903 A CN202110300903 A CN 202110300903A CN 112905380 A CN112905380 A CN 112905380A
Authority
CN
China
Prior art keywords
log
behavior
training
abnormal
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110300903.1A
Other languages
Chinese (zh)
Inventor
王书敏
任洪敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Maritime University
Original Assignee
Shanghai Maritime University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Maritime University filed Critical Shanghai Maritime University
Priority to CN202110300903.1A priority Critical patent/CN112905380A/en
Publication of CN112905380A publication Critical patent/CN112905380A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/42Syntactic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Quality & Reliability (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention provides a system abnormity detection method based on an automatic monitoring log, which comprises the following steps: acquiring original log data of a software system, and extracting effective information contained in the original log data according to a log template to obtain an initial log set; normalizing the log information of the initial log set to obtain a normalized log set, analyzing the generated characteristics of the normalized log set, and extracting the characteristics to obtain a training log set; performing pattern training on the behavior sequence based on the training log set to generate a corresponding behavior pattern; and carrying out abnormal behavior detection on the real-time log stream, calculating an abnormal index, and judging the system state by comparing the abnormal index with an abnormal threshold value to obtain a log abnormal detection result. The invention overcomes the defects of lower discrimination accuracy and generalization capability in the existing anomaly detection method, improves the detection accuracy, solves the problems of processing a large amount of log data and non-uniform log structure to a certain extent, and can accurately predict the possible anomaly of the system in time.

Description

System anomaly detection method based on automatic monitoring log
Technical Field
The invention relates to the technical field of anomaly detection and fault early warning of software systems, in particular to a system anomaly detection method based on an automatic monitoring log.
Background
With the rapid development of computer technology and the internet, people have entered a big data era with extremely rich information and extremely massive data. Today's software systems are becoming increasingly large and complex, and the occurrence of exceptions and errors becomes difficult to avoid. Software exceptions are flooded at each stage of software development and are included in the final delivered software product. Today, analyzing the system log has become the most important means for determining whether the system is abnormal. The abundant information contained in the system log can help system developers and maintainers to better understand system behaviors and detect and locate system anomalies in the production process.
In the field of automatic anomaly detection of software systems, much research and accumulation have been carried out in recent years, but the requirements of real application environments are still difficult to meet. The concrete points are as follows: (1) the software systems have great difference in the aspects of behavior, input, output and the like, some automatic anomaly detection methods proposed at present are often effective only for a certain type of systems, and the universal anomaly detection method is difficult to achieve a good detection effect on most systems; (2) the modern software system is built on a new generation technology represented by cloud computing, has strong lateral expansion capability, is uncommon for a distributed system comprising thousands of computing nodes, and has very high concurrency, while the traditional automatic anomaly detection method mainly focuses on detecting the fault of a single node (service); (3) most of the current automatic anomaly detection methods are not intuitive, and few methods can provide meaningful information about anomalies, and cannot provide more help for system detection personnel to diagnose anomalies after the anomalies are reported.
The log-based anomaly detection method is based on the analysis of unstructured data in an original log file. The existing research is mainly divided into: statistical-based methods, classification-based methods, cluster analysis methods, information theory-based methods, and graph model-based methods. The log anomaly detection method based on statistics is based on designing a statistical model, namely, a model is firstly created for data, and the model is evaluated according to the condition of fitting the model to an object, however, if an incorrect model is selected, the object is likely to be wrongly judged as an anomaly point; the log abnormity detection method based on classification is mainly a supervised method, an optimal model is obtained through training by the aid of existing training samples, namely known data and corresponding output of the known data, all inputs are mapped into corresponding output by the aid of the model, and the output is simply judged, so that the classification purpose is achieved. The most important disadvantage of the anomaly detection method based on supervised learning is not in the technical level, but in that a large amount of labeled training data is needed, and the cost for acquiring the labeled data is high, which greatly limits the application range of the anomaly detection method based on supervised learning. The clustering-based log anomaly detection method is to cluster similar data instances into one class, the clustering is mainly a classic and unsupervised machine learning method, the clustering method is premised on the premise that normal log data instances belong to the class with large log data instance quantity and high density, and abnormal log data instances belong to the class with small log data instance quantity and low density. The basic assumption of the log anomaly detection method based on the information theory is that the abnormal log data can cause irregularity of the whole log data in the information quantity, and different information theory-based methods use different information theory measurement methods, such as kolmogorov complexity, entropy, relative entropy and the like, to analyze the information quantity of the data set. The most important thing of the log anomaly detection method based on the graph model is to construct a finite automatic state machine which can represent normal behaviors well, then match the log data with the finite automatic state machine, and if the finite automatic state machine cannot match with some log data, the log data are likely to be anomalous log data.
Disclosure of Invention
The invention aims to provide a system abnormity detection method based on an automatic monitoring log.
In order to achieve the purpose, the invention is realized by the following technical scheme:
a system anomaly detection method based on an automatic monitoring log comprises the following steps:
step S1: acquiring original log data of a software system, and extracting effective information contained in the original log data according to a log template to obtain an initial log set;
step S2: normalizing the log information of the initial log set to obtain a normalized log set, analyzing the generated characteristics of the normalized log set, extracting the characteristics to obtain a corresponding log characteristic set, and dividing the normalized log set into different types of behavior sequences according to the log characteristics, namely training log sets;
step S3: performing pattern training on the corresponding behavior sequence based on the training log set to generate a corresponding behavior pattern;
step S4: and carrying out abnormal behavior detection on the real-time log stream, calculating an abnormal index, and judging the system state by comparing the abnormal index with an abnormal threshold value to obtain a log abnormal detection result.
Further, in step S2, the manner of normalizing the log information of the initial log set includes at least one of:
rearranging the irregular log records;
parameterizing, and replacing the numerical data with placeholders;
adjusting a log structure to adjust records spanning multiple rows into one row;
removing redundant characters;
the log level is converted to a digital representation.
Further, in step 2, analyzing the generated features of the normative log set, performing feature extraction to obtain a corresponding log feature set, and dividing the normative log set into different types of behavior sequences according to the log features, including:
carrying out standardization and dimension reduction on log data, and selecting the most effective log features from the log data;
performing data conversion on the selected log features to form a log feature set;
selecting a similarity standard, and finding out a distance function which is most suitable for the characteristic type or constructing a new distance function;
and executing a clustering algorithm to divide the normative log set into different types of behavior sequences.
Further, the clustering algorithm is to cluster the log data by using a hierarchical method, and adopts a bottom-up aggregation mode.
Further, in step S3, performing pattern training on each corresponding behavior sequence based on the training log set, and generating a corresponding behavior pattern, including:
assigning a type number to each log type, taking a training log set as input, sequentially reading each log record in the training log set, mapping the normalized log to the corresponding log type, finally outputting the corresponding type number, wherein a final result sequence comprises a log time stamp and the corresponding type number, and converting the result sequence into a frequency sequence;
traversing the frequency sequence by a sliding window technology, and extracting all frequency subsequences as behavior subsequences;
defining a similarity measurement standard for the behavior subsequences, counting the number of identical and similar frequency subsequences, and taking the shape characteristics and the occurrence frequency of different types of behavior subsequences as the behavior mode of the type of behavior subsequences.
Further, in step S4, the abnormality index is calculated by the following formula:
Figure BDA0002986229930000031
wherein L represents a log sequence formed by the real-time log stream,
Figure BDA0002986229930000032
representing a sequence of behaviors diAnd
Figure BDA0002986229930000041
of each sequence of behaviors diCorresponding to a behavior pattern set Pi
Figure BDA0002986229930000042
Figure BDA0002986229930000043
Figure BDA0002986229930000044
Represents the x-th pattern of behavior, and diThe behavior pattern with the highest similarity is recorded as
Figure BDA0002986229930000045
Figure BDA0002986229930000046
Representing behavioral patterns
Figure BDA0002986229930000047
The outlier of (b), β, represents a balance factor.
Further, the update time of the behavior pattern is set, and when the update time is reached, the steps S1 to S3 are executed again to update the behavior pattern.
Compared with the prior art, the invention has the following beneficial effects:
the invention provides a log preprocessing method based on log standardization and hierarchical clustering, log data after preprocessing are classified into different types, and the log preprocessing method has a good structure and is convenient for extracting subsequent behavior patterns and judging abnormal behaviors; and judging the system according to the magnitude of the abnormal value based on the general log abnormality detection model of the behavior abnormality, and predicting the system abnormality which possibly occurs. The method not only improves the system abnormity detection accuracy, but also solves the log analysis problems of large system log data volume, complex structure, difference, various system fault types and the like to a certain extent, and is a universal log detection algorithm.
Drawings
In order to more clearly illustrate the technical solution of the present invention, the drawings used in the description will be briefly introduced, and it is obvious that the drawings in the following description are an embodiment of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts according to the drawings:
FIG. 1 is a flow chart of a system anomaly detection method based on an automated monitoring log according to the present invention;
FIG. 2 is a flowchart of the overall implementation of the present invention;
FIG. 3 is a flow chart of anomaly detection provided by the present invention;
fig. 4 is a schematic diagram of a general architecture of system anomaly detection provided by the present invention.
Detailed Description
The technical solution proposed by the present invention will be further described in detail with reference to the accompanying drawings and the detailed description. The advantages and features of the present invention will become more apparent from the following description. It is to be noted that the drawings are in a very simplified form and are all used in a non-precise scale for the purpose of facilitating and distinctly aiding in the description of the embodiments of the present invention. To make the objects, features and advantages of the present invention comprehensible, reference is made to the accompanying drawings. It should be understood that the structures, ratios, sizes, and the like shown in the drawings and described in the specification are only used for matching with the disclosure of the specification, so as to be understood and read by those skilled in the art, and are not used to limit the implementation conditions of the present invention, so that the present invention has no technical significance, and any structural modification, ratio relationship change or size adjustment should still fall within the scope of the present invention without affecting the efficacy and the achievable purpose of the present invention.
Based on an analysis of the problems of the prior art, what is needed is a real-time, efficient, versatile log monitoring system that can handle different log structures. Research and experiments are carried out aiming at the problems, and a behavior abnormity detection method based on system log monitoring is provided. The anomaly detection method not only improves the detection accuracy rate, solves the problems of processing a large amount of data and non-uniform log structure to a certain extent, is a general log detection algorithm, and can accurately predict the occurrence possibility of the system in time. The method not only overcomes the defects that the discrimination accuracy and generalization capability are low and the fault which does not appear in the training sample cannot be pre-warned in the anomaly detection method based on machine learning, but also overcomes the defects that the anomaly detection method based on knowledge needs to consume great time cost and labor cost.
As shown in fig. 1, an embodiment of the present invention provides a system anomaly detection method based on an automatic monitoring log, including:
step S1: acquiring original log data of a software system, and extracting effective information contained in the original log data according to a log template to obtain an initial log set;
step S2: normalizing the log information of the initial log set to obtain a normalized log set, analyzing the generated characteristics of the normalized log set, extracting the characteristics to obtain a corresponding log characteristic set, and dividing the normalized log set into different types of behavior sequences according to the log characteristics to obtain a training log set;
step S3: performing pattern training on the corresponding behavior sequence based on the training log set to generate a corresponding behavior pattern;
step S4: and acquiring real-time log data of the software system to detect abnormal behaviors, calculating an abnormal index, and comparing the abnormal index with an abnormal threshold value to judge the system state to obtain a log abnormal detection result.
The steps of the present invention are described in detail below with reference to fig. 2-4.
Step 1: collecting and analyzing log data: collecting the log data generated by each node together, carrying out source code analysis through an abstract syntax tree, converting unstructured data into structured data, extracting effective information contained in an initial log set, and using the effective information as input of subsequent log division and feature mining to obtain the initial log data set.
The method comprises the following specific steps:
step 1.1, aiming at two methods of character string splicing and method calling for obtaining the log, the invention respectively adopts different methods to generate the log template.
For the character string splicing mode, the log template is obtained by splitting the addition expression. For the method call, a template is generated by using a method of an Abstract Syntax Tree (AST) of the program. The input recognized by the log template is a program source code, an abstract syntax tree is firstly constructed, then traversal is carried out on the abstract syntax tree, a method with a return value type of String is found, a regular expression of the return value of the method is obtained, and therefore a corresponding log template is generated. Since the same-name methods may be contained in different classes, the method name must use the fully qualified name of the method. And finally, further integrating the log templates generated by the two methods.
And 1.2, extracting effective information contained in log data according to the log template to be used as input of subsequent log division and feature mining.
Step 2: preprocessing log data: normalizing the initial log set to obtain a normalized log set, analyzing the generated characteristics of the log set, performing characteristic extraction on the log event sequence set to obtain a corresponding characteristic set, and dividing the normalized log set into different types of behavior sequences according to the log characteristics to obtain a training log set.
The method comprises the following specific steps:
step 2.1, log normalization: in log normalization, log data is rearranged and log records are not normalized, parameterization operation is removed, numerical data is replaced by placeholders, a log structure is adjusted, records crossing multiple lines are adjusted into one line, and redundant characters are deleted. The normalized data is stored in a relational database, with each attribute of the tuple representing an item in the log record.
The log grade is converted into digital representation, so that the similarity measurement in log clustering is facilitated.
When an error occurs in the operation of the system and an exception is thrown, the log not only records the time and the position of the error, but also throws the exception function and the call stack thereof.
The same type of log may be misinterpreted as a different type during log analysis because of different parameter values in the log messages. To solve this problem, a de-parameterization approach is used, replacing the numerical parameters appearing in each log with key placeholders.
Step 2.2, the method for dividing the behavior sequence comprises the following steps:
selecting characteristics: carrying out feature standardization and dimension reduction on the log data, and selecting the most effective N log features from the log data;
feature extraction: performing data conversion on the selected N log features to form a log feature set, wherein the result can be expressed as a matrix, the rows represent samples, and the columns represent feature variables;
and (3) selecting the similarity: and selecting the similarity standard, and finding the distance function which is most suitable for the characteristic type or constructing a new distance function.
Grouping: and executing a clustering algorithm to divide the normative log set into different types of behavior sequences. The input of the algorithm is a sample matrix, and the output can be a tree diagram or a specific classification scheme, so that the classification conditions are reflected in different granularities. And (4) defining a classification threshold value by means of domain knowledge to obtain a final clustering result and evaluating the clustering effectiveness.
Specifically, a hierarchical method is used for clustering log data, and a bottom-up aggregation mode is adopted. Dividing the log data into n clusters, each cluster containing a data point; calculating the distance between clusters to obtain a similarity matrix of n multiplied by n; find the two clusters with the smallest distance in the matrix:
Figure BDA0002986229930000071
Figure BDA0002986229930000072
cm、clrepresenting two points in the matrix, ci、cjRepresenting two nearest points in the matrix, and merging ci,cjForming a new cluster; updating the similarity matrix, repeatingThe above steps are carried out until the termination condition is met or all data points are in one cluster.
And adopting a fully-connected complete link as a measurement standard to measure the distance between different clusters. Maximum distance dist is used for full connectivity clusteringmax(ci,cj) As a distance criterion:
Figure BDA0002986229930000073
p, p' represent the two-point distance, the maximum distance being the maximum of the distance between any two points in the two clusters. The clustering process stops when the maximum distance between any two clusters exceeds a distance threshold.
And (4) similarity calculation, namely decomposing the problem requiring solution into sub-problems by adopting an algorithm based on dynamic programming, and solving the original problem after each sub-stage of solving the sub-problems is completed.
The hierarchical clustering of the aggregation mode is finally ended at a predefined ending condition or all data points are clustered to the same cluster, the ending condition needs to be determined, logs of the same type are clustered as much as possible, different types are distinguished, and original information of the data is reserved.
And step 3: pattern training: and analyzing the log stream of the training log set, and generating behavior patterns for different behavior sequences. When the last training time exceeds the length of the updating period or an updating command is received, the whole mode training process is executed again, and the specific steps are as follows:
step 3.1: conversion of log stream to frequency sequence: and converting the training log set into a frequency sequence according to different types of the training log set.
Assigning a type number to each log type, taking a training log set as input, sequentially reading each log record in the training log set, mapping the normalized log to the corresponding log type, and finally outputting the type number corresponding to the type. And a dictionary structure is constructed, so that the character string searching speed is increased, and the searching efficiency is further improved. The final result sequence contains a log time stamp and a corresponding type number, and is prepared for extracting the frequency sequence and converting the result sequence into the frequency sequence.
Step 3.2: generating a behavior pattern: the last step obtains a frequency sequence composed of different log types, namely a behavior sequence. Let the frequency sequence set be TiComposition, i ∈ [1, N ]]And N is the number of sequence types. Extracting behavior pattern from each frequency sequence, and performing sliding window technique on the frequency sequences
Figure BDA0002986229930000081
And (m is the sequence length) traversing, extracting all frequency subsequences and generating a behavior mode.
First, a sliding window with length k is defined, and a frequency sequence T with length m is extractediAll of the frequency subsequences in (a). To refer to all frequency subsequences
Figure BDA0002986229930000082
For behavioral subsequences:
Figure BDA0002986229930000083
each frequency sequence T of length miContains m-k +1 behavioral subsequences, and adjacent subsequences have parts with the length of k-1 overlapped with each other. Thus, N behavior sequence sets S are obtainedi,i∈[1,N]Each behavior sequence set is represented as:
Figure BDA0002986229930000084
then defining a similarity measurement standard for the behavior subsequences, counting the number of the same and similar frequency subsequences, and taking the shape characteristics and the occurrence frequency of the different types of behavior subsequences as the behavior mode of the type of behavior sequences. The classification operation on similar frequency sub-sequences may use a hierarchical clustering method in log preprocessing. Periodic sum waves of sub-sequences of actions due to feature extraction operations done previouslyThe shapes are all quite regular, so that a simpler mode can be adopted, such as defining a similarity threshold value to carry out simple clustering, or taking a subsequence vector as a key value to count the occurrence times of the subsequence vector. Finally, an outlier is defined for each behavioral pattern in the set of behavioral patterns. Using the frequency of occurrence of each behavioral subsequence as a behavioral pattern parameter, a behavioral pattern that occurs less frequently must have a higher abnormal value. Therefore, the reciprocal of the appearance frequency of the behavior subsequence is taken as the behavior pattern
Figure BDA0002986229930000091
The outliers of (d) are noted as:
Figure BDA0002986229930000092
and 4, step 4: abnormality detection: and carrying out abnormal behavior detection on the real-time log stream, calculating an abnormal index, and judging the system state by comparing the abnormal index with an abnormal threshold value to obtain a log abnormal detection result. The whole analysis flow is shown as the attached figure 3, and the specific steps are as follows:
step 4.1: and (4) segmenting the log stream according to a predefined time interval and a time window, and converting the log into different types of frequency subsequences according to the clustering result.
According to a predefined unit time interval and a sliding time window with the length of k, a log sequence L of the latest k unit times is intercepted in real time, and the log sequence is divided into N log subsequences according to different log types, wherein L is { L ═ L1,l2,...,lNThen using a conversion algorithm, the current log sequence is converted into a behavior sequence set D containing N elements, denoted D ═ D1,d2,...,dN}。
Step 4.2: and taking the behavior sequence of the log stream and the log behavior pattern as parameters, and calculating the log abnormal value according to an abnormal detection calculation formula. And if the behavior mode reaches the updating time, executing the updating operation of the behavior mode, and calculating after the updating is finished.
The current sequence set D comprises N behavior sequences, each behavior sequence DiCorresponding to a behavior patternSet Pi,PiBy
Figure BDA0002986229930000093
Equal x behavior patterns, where diThe behavior pattern with the highest similarity is recorded as
Figure BDA0002986229930000094
Log subsequence liIs given by diAnd
Figure BDA0002986229930000095
and (4) jointly determining. Definition of liIs equal to diAnd
Figure BDA0002986229930000096
degree of dissimilarity and behavior pattern of
Figure BDA0002986229930000097
The sum of the outliers itself, is noted as:
Figure BDA0002986229930000098
while the abnormality index of the log sequence L is defined as the sum of all sub-sequence abnormality indexes, i.e.
Figure BDA0002986229930000099
The final anomaly index calculation formula is expressed as:
Figure BDA00029862299300000910
when d isiThe behavioral model closest thereto
Figure BDA00029862299300000911
At a greater distance from each other, i.e.
Figure BDA00029862299300000912
When the value is larger, indicating the log subsequence liIs not similar to any behavior pattern, so the possibility of abnormality is high. liThe abnormality of (2) is also related to
Figure BDA00029862299300000913
Abnormalities in themselves are related because the nature of the closest behavioral pattern is largely representative of the nature of the sequence. Beta is a balance factor that can be adjusted during the experiment to obtain the best results. And comparing the abnormal value with the abnormal threshold value, and if the abnormal value is larger than the abnormal threshold value, sending an abnormal warning.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
While the present invention has been described in detail with reference to the preferred embodiments, it should be understood that the above description should not be taken as limiting the invention. Various modifications and alterations to this invention will become apparent to those skilled in the art upon reading the foregoing description. Accordingly, the scope of the invention should be determined from the following claims.

Claims (7)

1. A system abnormity detection method based on an automatic monitoring log is characterized by comprising the following steps:
step S1: acquiring original log data of a software system, and extracting effective information contained in the original log data according to a log template to obtain an initial log set;
step S2: normalizing the log information of the initial log set to obtain a normalized log set, analyzing the generated characteristics of the normalized log set, extracting the characteristics to obtain a corresponding log characteristic set, and dividing the normalized log set into different types of behavior sequences according to the log characteristics, namely training log sets;
step S3: performing pattern training on the corresponding behavior sequence based on the training log set to generate a corresponding behavior pattern;
step S4: and carrying out abnormal behavior detection on the real-time log stream, calculating an abnormal index, and judging the system state by comparing the abnormal index with an abnormal threshold value to obtain a log abnormal detection result.
2. The method for detecting system abnormality based on an automatic monitoring log according to claim 1, wherein in step S2, the manner of normalizing the log information of the initial log set includes at least one of:
rearranging the irregular log records;
parameterizing, and replacing the numerical data with placeholders;
adjusting a log structure to adjust records spanning multiple rows into one row;
removing redundant characters;
the log level is converted to a digital representation.
3. The method according to claim 1, wherein in step 2, analyzing the generated features of the normative log set, performing feature extraction to obtain corresponding log feature sets, and dividing the normative log set into different types of behavior sequences according to log features, comprises:
carrying out standardization and dimension reduction on log data, and selecting the most effective log features from the log data;
performing data conversion on the selected log features to form a log feature set;
selecting a similarity standard, and finding out a distance function which is most suitable for the characteristic type or constructing a new distance function;
and executing a clustering algorithm to divide the normative log set into different types of behavior sequences.
4. The method according to claim 3, wherein the clustering algorithm is a bottom-up clustering method for clustering log data by using a hierarchical method.
5. The method for detecting system anomalies based on automated monitoring logs as claimed in claim 1, wherein in step S3, performing pattern training for each corresponding behavior sequence based on the training log set, and generating corresponding behavior patterns, includes:
assigning a type number to each log type, taking a training log set as input, sequentially reading each log record in the training log set, mapping the normalized log to the corresponding log type, finally outputting the corresponding type number, wherein a final result sequence comprises a log time stamp and the corresponding type number, and converting the result sequence into a frequency sequence;
traversing the frequency sequence by a sliding window technology, and extracting all frequency subsequences as behavior subsequences;
defining a similarity measurement standard for the behavior subsequences, counting the number of identical and similar frequency subsequences, and taking the shape characteristics and the occurrence frequency of different types of behavior subsequences as the behavior mode of the type of behavior subsequences.
6. The method for detecting system abnormality based on the automated monitoring log according to claim 1, wherein in step S4, the abnormality index is calculated by the following formula:
Figure FDA0002986229920000021
wherein L represents a log sequence formed by the real-time log stream,
Figure FDA0002986229920000022
representing a sequence of behaviors diAnd
Figure FDA0002986229920000023
of each sequence of behaviors diCorresponding to a behavior pattern set Pi
Figure FDA0002986229920000024
Figure FDA0002986229920000025
Figure FDA0002986229920000026
Represents the x-th pattern of behavior, and diThe behavior pattern with the highest similarity is recorded as
Figure FDA0002986229920000027
Figure FDA0002986229920000028
Representing behavioral patterns
Figure FDA0002986229920000029
The outlier of (b), β, represents a balance factor.
7. The method as claimed in claim 1, wherein the updating time of the behavior pattern is set, and when the updating time is reached, the steps S1-S3 are executed again to update the behavior pattern.
CN202110300903.1A 2021-03-22 2021-03-22 System anomaly detection method based on automatic monitoring log Pending CN112905380A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110300903.1A CN112905380A (en) 2021-03-22 2021-03-22 System anomaly detection method based on automatic monitoring log

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110300903.1A CN112905380A (en) 2021-03-22 2021-03-22 System anomaly detection method based on automatic monitoring log

Publications (1)

Publication Number Publication Date
CN112905380A true CN112905380A (en) 2021-06-04

Family

ID=76106322

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110300903.1A Pending CN112905380A (en) 2021-03-22 2021-03-22 System anomaly detection method based on automatic monitoring log

Country Status (1)

Country Link
CN (1) CN112905380A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113656271A (en) * 2021-08-10 2021-11-16 上海浦东发展银行股份有限公司 Method, device and equipment for processing user abnormal behaviors and storage medium
CN114881112A (en) * 2022-03-31 2022-08-09 北京优特捷信息技术有限公司 System anomaly detection method, device, equipment and medium
CN116545650A (en) * 2023-04-03 2023-08-04 中国华能集团有限公司北京招标分公司 Network dynamic defense method
CN117349478A (en) * 2023-10-08 2024-01-05 国网江苏省电力有限公司经济技术研究院 Resource data reconstruction integration system based on digital transformation enterprise
CN117349478B (en) * 2023-10-08 2024-05-24 国网江苏省电力有限公司经济技术研究院 Resource data reconstruction integration system based on digital transformation enterprise

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113656271A (en) * 2021-08-10 2021-11-16 上海浦东发展银行股份有限公司 Method, device and equipment for processing user abnormal behaviors and storage medium
CN114881112A (en) * 2022-03-31 2022-08-09 北京优特捷信息技术有限公司 System anomaly detection method, device, equipment and medium
CN116545650A (en) * 2023-04-03 2023-08-04 中国华能集团有限公司北京招标分公司 Network dynamic defense method
CN116545650B (en) * 2023-04-03 2024-01-30 中国华能集团有限公司北京招标分公司 Network dynamic defense method
CN117349478A (en) * 2023-10-08 2024-01-05 国网江苏省电力有限公司经济技术研究院 Resource data reconstruction integration system based on digital transformation enterprise
CN117349478B (en) * 2023-10-08 2024-05-24 国网江苏省电力有限公司经济技术研究院 Resource data reconstruction integration system based on digital transformation enterprise

Similar Documents

Publication Publication Date Title
CN112905380A (en) System anomaly detection method based on automatic monitoring log
CN109062763B (en) Method for dynamically mining software process activities in real time from SVN log event stream
CN108959395B (en) Multi-source heterogeneous big data oriented hierarchical reduction combined cleaning method
CN113887616A (en) Real-time abnormity detection system and method for EPG (electronic program guide) connection number
CN113779272A (en) Data processing method, device and equipment based on knowledge graph and storage medium
Pit-Claudel et al. Outlier detection in heterogeneous datasets using automatic tuple expansion
WO2020125929A1 (en) Apparatus and method for detecting an anomaly among successive events and computer program product therefor
CN113064873B (en) Log anomaly detection method with high recall rate
US20240142922A1 (en) Analysis method, analysis program and information processing device
CN113779590B (en) Source code vulnerability detection method based on multidimensional characterization
Turgeman et al. Context-aware incremental clustering of alerts in monitoring systems
WO2024027487A1 (en) Health degree evaluation method and apparatus based on intelligent operations and maintenance scene
CN115794803B (en) Engineering audit problem monitoring method and system based on big data AI technology
Giurgiu et al. Explainable failure predictions with rnn classifiers based on time series data
Zhu et al. A Performance Fault Diagnosis Method for SaaS Software Based on GBDT Algorithm.
CN111866128B (en) Internet of things data flow detection method based on double LSTM iterative learning
CN114530163A (en) Method and system for recognizing life cycle of equipment by adopting voice based on density clustering
CN113177040A (en) Full-process big data cleaning and analyzing method for aluminum/copper plate strip production
Supardi et al. An evolutionary stream clustering technique for outlier detection
CN116861204B (en) Intelligent manufacturing equipment data management system based on digital twinning
WO2023208136A1 (en) Kpi anomaly detection method and apparatus, device and medium
CN112579667B (en) Data-driven engine multidisciplinary knowledge machine learning method and device
CN117435246B (en) Code clone detection method based on Markov chain model
CN117150439B (en) Automobile manufacturing parameter detection method and system based on multi-source heterogeneous data fusion
CN109993217B (en) Automatic feature construction method and device for structured data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination