CN117389821A - Log abnormality detection method, device and storage medium - Google Patents
Log abnormality detection method, device and storage medium Download PDFInfo
- Publication number
- CN117389821A CN117389821A CN202210751478.2A CN202210751478A CN117389821A CN 117389821 A CN117389821 A CN 117389821A CN 202210751478 A CN202210751478 A CN 202210751478A CN 117389821 A CN117389821 A CN 117389821A
- Authority
- CN
- China
- Prior art keywords
- log
- detected
- log information
- information
- detection
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 171
- 230000005856 abnormality Effects 0.000 title claims abstract description 19
- 230000002159 abnormal effect Effects 0.000 claims abstract description 70
- 238000000034 method Methods 0.000 claims abstract description 59
- 238000012549 training Methods 0.000 claims description 34
- 238000001914 filtration Methods 0.000 claims description 25
- 238000012360 testing method Methods 0.000 claims description 20
- 230000015654 memory Effects 0.000 claims description 12
- 230000000877 morphologic effect Effects 0.000 claims description 7
- 230000011218 segmentation Effects 0.000 claims description 6
- 238000004891 communication Methods 0.000 claims description 4
- 230000008569 process Effects 0.000 description 10
- 238000012545 processing Methods 0.000 description 6
- 238000000605 extraction Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 239000013598 vector Substances 0.000 description 4
- 238000003672 processing method Methods 0.000 description 3
- 230000009466 transformation Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000007499 fusion processing Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000007639 printing Methods 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 230000007723 transport mechanism Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3065—Monitoring arrangements determined by the means or processing involved in reporting the monitored data
- G06F11/3072—Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/3668—Software testing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
The invention discloses a log abnormality detection method, equipment and a storage medium. The method comprises the following steps: acquiring log information to be detected, and extracting a log template to be detected according to the log information to be detected; matching the log template to be detected with a preset log template in a log template library to obtain a matching result of whether the log information to be detected belongs to known normal log information or known abnormal log information; when the matching result is that the log information to be detected does not belong to the known normal log information and does not belong to the known abnormal log information, respectively inputting the log template to be detected into a plurality of detection models corresponding to a plurality of features to obtain a plurality of predicted values; and carrying out weighted combination according to the plurality of predicted values, and obtaining a detection result of whether the log information to be detected is abnormal or not according to the result of the weighted combination. The scheme provided by the embodiment of the invention can cover various application scenes, improves the detection accuracy and is widely applied to the field of computers.
Description
Technical Field
The present invention relates to the field of computers, and in particular, to a method, an apparatus, and a storage medium for detecting log anomalies.
Background
Anomaly detection for a computer software system includes anomaly detection based on system metrics or based on system logs. The abnormal detection based on the system log can more clearly embody the state experienced by the system, but the abnormal detection based on the system log is limited in covered application scene by collecting cases and establishing keyword rules, and the detection accuracy is required to be improved.
Disclosure of Invention
The embodiment of the invention provides a method, equipment and a storage medium for detecting log abnormality, which can cover various application scenes and improve the detection accuracy.
In order to achieve the above object, an embodiment of the present invention provides a method for detecting log anomalies, including the following steps: acquiring log information to be detected, and extracting a log template to be detected according to the log information to be detected; matching the log template to be detected with a preset log template in a log template library to obtain a matching result of whether the log information to be detected belongs to known normal log information or known abnormal log information; when the matching result is that the log information to be detected does not belong to the known normal log information and does not belong to the known abnormal log information, respectively inputting the log template to be detected into a plurality of detection models corresponding to a plurality of features to obtain a plurality of predicted values; and carrying out weighted combination according to the plurality of predicted values, and obtaining a detection result of the log information to be detected belonging to the normal log information or the abnormal log information according to the weighted combination result.
Optionally, the performing weighted combination according to the plurality of predicted values, and obtaining a detection result of the log information to be detected belonging to the normal log information or the abnormal log information according to a result of the weighted combination specifically includes:
extracting detection characteristic values of the log information to be detected;
determining the weight corresponding to the detection characteristic value and the weight corresponding to each detection model;
weighting the weight corresponding to the detection characteristic value and the weight corresponding to the plurality of predicted values to obtain a weighted value;
and obtaining a detection result that the log information to be detected belongs to normal log information or abnormal log information according to the weighted value.
Optionally, the detection method further comprises:
and when the matching result is that the log information to be detected belongs to known normal log information or known abnormal log information, taking the matching result as a detection result.
Optionally, the extracting the log template to be detected according to the log information to be detected specifically includes:
extracting time and log content of the log information to be detected to obtain extracted log information;
combining information belonging to the same type of log in the extracted log information to obtain combined log information;
and filtering variable information of the combined log information to obtain the log template to be detected.
Optionally, the extracting the log template to be detected according to the log information to be detected specifically includes:
combining information belonging to the same log in the log information to be detected to obtain combined log information;
extracting time and log content of the combined log information to obtain extracted log information;
and filtering variable information of the extracted log information to obtain the log template to be detected.
Optionally, the filtering variable information of the extracted log information to obtain the log template to be detected specifically includes:
performing word segmentation and morphological restoration on the extracted log information to obtain lexical restored log information;
performing readability filtering on the log information subjected to morphological restoration to obtain log information subjected to readability filtering;
and carrying out similarity comparison on the log information subjected to the readability filtering to obtain the log template to be detected.
Optionally, the plurality of detection models include a sequence detection model, the log sample template includes a sequence log sample template, and the sequence detection model is obtained through training by the following steps:
acquiring a plurality of pieces of log sample information, dividing each piece of log sample information according to a preset time window, and extracting a corresponding sequence log sample template from each piece of divided log sample information; the log sample information comprises normal log sample information and abnormal log sample information;
selecting a preset number of normal log sample information to be added into a first training set, and adding the abnormal log sample information and the normal log sample information except the first training set into a first test set;
and performing model training and testing by adopting the first training set and the first testing set to obtain the trained sequence detection model.
Optionally, the plurality of detection models include a semantic detection model, and the semantic detection model is obtained through training by the following steps:
taking a log template preset in a log template library as sample data;
adding the sample data to a second training set and a second test set, respectively;
and performing model training and testing by adopting the second training set and the second testing set to obtain the trained semantic detection model.
Optionally, the detection method further comprises:
and updating the detection result and the corresponding log template to be detected to the log template library.
In order to achieve the above objective, the embodiment of the present invention further provides a log abnormality detection device, where the device includes a first module, a second module, a third module, and a fourth module; the first module is used for acquiring log information to be detected and extracting a log template to be detected according to the log information to be detected; the second module is used for matching the log template to be detected with a log template preset in a log template library to obtain a matching result of whether the log information to be detected belongs to known normal log information or known abnormal log information; the third module is used for inputting the log template to be detected into a plurality of detection models corresponding to a plurality of features to obtain a plurality of predicted values when the matching result is that the log information to be detected does not belong to the known normal log information and does not belong to the known abnormal log information; and the fourth module is used for carrying out weighted combination according to a plurality of predicted values and obtaining a detection result of the log information to be detected belonging to the normal log information or the abnormal log information according to a weighted combination result.
To achieve the above object, an embodiment of the present invention further provides a log abnormality detection device, the device including a memory, a processor, a program stored on the memory and executable on the processor, and a data bus for implementing connection communication between the processor and the memory, the program implementing the steps of the foregoing method when executed by the processor.
To achieve the above object, the present invention provides a storage medium for computer-readable storage, the storage medium storing one or more programs executable by one or more processors to implement the steps of the foregoing method.
According to the method, the device and the storage medium for detecting the log abnormality, the log template to be detected is extracted through the log information to be detected, the log template to be detected is matched with the log template preset in the log template library to determine whether the log information to be detected belongs to known normal log information or abnormal log information, when whether the log information to be detected is abnormal or not cannot be determined through log template matching, the log template to be detected is respectively input into a plurality of different types of detection modules to be predicted to obtain a plurality of predicted values, and weighted combination is carried out according to the plurality of predicted values to predict whether the log information to be detected is abnormal or not; therefore, the embodiment of the invention firstly carries out first-level detection on the log information to be detected through log template matching, and further carries out second-level detection according to a plurality of detection model weight combinations when the log information cannot be determined to be abnormal through matching, thereby realizing multi-level detection on the log information to be detected so as to cover various application scenes, and simultaneously realizing self-adaptive adjustment of different application scenes through adjusting the weighted weights, and improving the accuracy rate of log abnormality detection.
Drawings
FIG. 1 is a flow chart of steps of a method for detecting log anomalies provided by one embodiment of the present invention;
FIG. 2 is a flowchart illustrating steps for extracting log templates to be detected according to log information to be detected according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating steps for filtering variable information of extracted log information to obtain a log template to be detected according to an embodiment of the present invention;
FIG. 4 is a flowchart illustrating a step of extracting a log template to be detected according to log information to be detected according to another embodiment of the present invention;
FIG. 5 is a flow chart of steps of a training method for a sequence detection model according to one embodiment of the present invention;
FIG. 6 is a data flow diagram for LSTM as a sequence detection model provided by one embodiment of the present invention;
FIG. 7 is a flow chart of steps of a training method for a semantic detection model according to one embodiment of the present invention;
FIG. 8 is a flowchart illustrating steps for determining an anomaly of log information to be detected based on a plurality of predicted values according to one embodiment of the present invention;
FIG. 9 is a flowchart illustrating a method for detecting log anomalies according to another embodiment of the present invention;
FIG. 10 is a schematic diagram of a log anomaly detection device according to an embodiment of the present invention;
fig. 11 is a schematic diagram of a log abnormality detection apparatus according to another embodiment of the present invention.
Detailed Description
In order to make the present application solution better understood by those skilled in the art, the following description will be made in detail and with reference to the accompanying drawings in the embodiments of the present application, it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, shall fall within the scope of the present application.
The terms "first," "second," "third," and "fourth" and the like in the description and in the claims and drawings are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.
In the following description, suffixes such as "module", "part" or "unit" for representing elements are used only for facilitating the description of the present invention, and have no particular meaning in themselves. Thus, "module," "component," or "unit" may be used in combination.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
As shown in fig. 1, a log abnormality detection method includes the following steps S100 to S400.
S100, acquiring log information to be detected, and extracting a log template to be detected according to the log information to be detected.
The system log information is generally generated by a print statement in the program, and includes variable information and constant information. The variable information is replaced by the current variable value during printing, such as session name, ID, process, thread name, ID, IP address or interface link, the constant part generally refers to sentences or words with complete semantics, and the constant part and the variable part are combined together to form a piece of log information.
Specifically, since the log information contains a large amount of content, some information has no substantial effect on the judgment of the abnormal situation, and the information having a substantial effect on the abnormal detection is extracted and used in the judgment process, the data amount can be further reduced and the accuracy of the abnormal detection can be improved.
Referring to fig. 2, in a specific embodiment, the step S100 of extracting the log template to be detected according to the log information to be detected specifically includes the following steps S110A to S130A.
S110A, combining information belonging to the same log in the log information to be detected to obtain combined log information.
In addition, the same log is distributed in the plurality of lines of data in the log information, and the same log distributed in the plurality of lines of data is combined at this time, so that the data volume of the processed data can be reduced, and the calculation efficiency can be improved.
S120A, extracting time and log content of the combined log information to obtain extracted log information.
It should be noted that, the specific method of extracting the time, the specific log content of the extraction, and the specific method are determined according to the practical application, and the embodiment is not limited specifically. For example, the time in the log information is extracted by a time extraction function or plug-in, or the log content in the log file is extracted by a regular expression, etc.
S130A, variable information filtering is carried out on the extracted log information, and a log template to be detected is obtained.
It should be noted that, the specific method of filtering the variable information is determined according to practical application, and the embodiment is not limited specifically.
Referring to fig. 3, in a specific embodiment, variable information filtering is performed on the extracted log information in step S130A to obtain a log template to be detected, which specifically includes steps S131 to S133.
S131, performing word segmentation and morphological restoration on the extracted log information to obtain the log information subjected to morphological restoration.
It should be noted that, the specific method of word segmentation may be determined according to practical application, and the embodiment is not limited specifically, for example, NLTK word segmentation is adopted.
It should be noted that the words in the log information may include various transformation forms, such as a past form of a verb, or a complex form of a name, etc.
Specifically, firstly, the extracted log information is segmented, then, the word shapes with different transformation forms are restored, and the log information with different transformation forms generated in various application scenes can be unified, so that the data volume is properly reduced on the premise of not affecting the accuracy of log anomaly detection.
And S132, performing readability filtering on the log information subjected to morphological restoration to obtain the log information subjected to readability filtering.
The purpose of the readability filtering is to filter out tokens in the log information that do not have readability, such as process name/ID, session name/ID, etc. The specific method of readability filtering is determined according to practical application, and the embodiment is not limited specifically, for example, readability filtering is performed by using a dictionary of wordnet.
And S133, performing similarity comparison on the log information subjected to the readability filtering to obtain a log template to be detected.
The log information to be detected is combined and extracted, and partial identical or similar contents are possibly generated through word segmentation, word shape restoration and readability filtration, and the identical or similar contents are subjected to fusion processing through similarity comparison, so that the log template to be detected can cover the whole content of the log content, and meanwhile, the content redundancy and the calculated data volume are reduced.
Referring to fig. 4, in another specific embodiment, the step S100 of extracting the log template to be detected according to the log information to be detected specifically includes the following steps S110B to S130B.
S110B, extracting time and log content of the log information to be detected to obtain extracted log information.
It should be noted that, the specific method of extracting the time, the specific log content of the extraction, and the specific method are determined according to the practical application, and the embodiment is not limited specifically. For example, the time in the log information is extracted by a time extraction function or plug-in, or the log content in the log file is extracted by a regular expression, etc.
S120B, combining the information belonging to the same type of log in the extracted log information to obtain combined log information.
It should be noted that, the method for merging log information of the same class is determined according to practical application, and the embodiment is not particularly limited. For example, the information belonging to the same type of log in the extracted log information is combined by a clustering algorithm in machine learning.
S130B, variable information filtering is carried out on the combined log information, and the log template to be detected is obtained.
Specifically, the variable information filtering of the combined log information may employ the method of steps S131 to S133.
It should be noted that, extracting time and log content from the log information to be detected to obtain the extracted log information, if the comprehensiveness or integrity of the extracted log information does not meet the preset requirement, extracting time and log content from the combined log information again to improve the comprehensiveness and accuracy of the extracted log information.
And S300, matching the log template to be detected with a preset log template in a log template library to obtain a matching result of whether the log information to be detected belongs to known normal log information or known abnormal log information.
It should be noted that, the log template library includes a plurality of known normal log information or log templates corresponding to known abnormal log information, and the log template library may be empty before log abnormality detection, and updated in real time according to the detection result of the log abnormality detection. When the log template library temporarily does not have a preset log template, the log template to be detected is directly input into a plurality of subsequent detection models to predict abnormal log conditions.
S400, when the matching result is that the log information to be detected belongs to known normal log information or known abnormal log information, taking the matching result as a detection result.
When the log information to be detected corresponding to the log template to be detected is abnormal or not can be determined by matching the log template to be detected with a preset log template in a log template library, the detection process is finished, and the matching result is used as a detection result. If abnormal conditions of the log information can be determined through log template matching, the detection process is concise and efficient.
S500, when the matching result is that the log information to be detected does not belong to the known normal log information and does not belong to the known abnormal log information, the log template to be detected is respectively input into a plurality of detection models corresponding to a plurality of features to obtain a plurality of predicted values.
When the log information to be detected corresponding to the log template to be detected cannot be determined to be abnormal by matching the log template to be detected with the preset log template in the log template library, for example, the log template which is the same as or similar to the log template to be detected cannot be matched in the preset log template, or the log template to be detected cannot be identified, further prediction judgment is required to be performed on the abnormal condition of the log template to be detected through a prediction model.
The detection model includes prediction models corresponding to 2 or more different features. The prediction models corresponding to different features can cover different application scenes, and the application scenes are moderately adjusted by adjusting the weights of the different prediction models. The detection models corresponding to the different features need to be trained before application, and the variable parameters in the detection models are determined through a training process.
Referring to fig. 5, in a specific embodiment, the several detection models in step S500 include a sequence detection model, and the log sample template includes a sequence log sample template, and the sequence detection model is obtained through training in the following steps S011 to S013.
S011, acquiring a plurality of pieces of log sample information, dividing each piece of log sample information according to a preset time window, and extracting a corresponding sequence log sample template from each piece of divided log sample information; the log sample information includes normal log sample information and abnormal log sample information.
It should be noted that, the sequence detection model is determined according to practical applications, and the embodiment is not particularly limited. For example, sequence detection modes include, but are not limited to, LSTM (Long Short-Term Memory network) or bi-directional encoder BERT (Bidirectional Encoder Representation from Transformers).
Specifically, a preset fixed time window w is defined first, log sample information appearing in the time window w is extracted according to the extraction method of the log templates, so as to obtain corresponding sequence log sample templates, and the sequence log templates are marked according to different template numbers while the sequence of the log sample information is kept unchanged, for example: w1= { m1, m2, m3, m4, m5, m6}.
S012, selecting a preset number of normal log sample information to be added into the first training set, and adding the abnormal log sample information and the normal log sample information except for the first training set into the first test set.
Specifically, the sequence detection model is trained by using normal log sample information, and the sequence detection model is tested by using normal log sample information and abnormal log sample information.
And S013, performing model training and testing by adopting the first training set and the first testing set to obtain a trained sequence detection model.
Specifically, when the prediction accuracy or training times of the sequence detection model reach the preset requirement, the training is finished, and the output sequence detection model is the trained sequence detection model.
For example, referring to FIG. 6, the sequence detection model employs LSTM whose number of units is determined based on the length of the sequence log templates, each of which is an input to a single LSTM unit. When the sequence log template is: w1= { M1, M2, M3, M4, M5, M6}, set the number of LSTM cells to 5, the cell layer of M1 passes both hidden layer and lower layer probabilities into the LSTM cells of M2, and so on. When the probability of the output result of the unit layer of M5 is M6 is close to 1, the probability of the input log information being normal log information is larger; when the probability of the output result of the unit layer of M5 being M6 is close to 0, the probability that the log information is the abnormal log information is indicated to be larger.
Referring to fig. 7, in a specific embodiment, the several detection models in step S500 include a semantic detection model, and the semantic detection model is obtained through training in the following steps S021 to S023.
S021, taking a log template preset in a log template library as sample data.
It should be noted that the semantic detection model is determined according to practical applications, and the embodiment is not limited in particular, and for example, the semantic detection model includes, but is not limited to, BERT.
The sample data includes sample data of normal log information and sample data of abnormal log information.
S022, adding the sample data to the second training set and the second test set, respectively.
The training set and the test set both contain sample data of normal log information and sample data of abnormal log information.
And S023, performing model training and testing by adopting a second training set and a second testing set to obtain a trained semantic detection model.
Specifically, when the prediction precision or training times of the semantic detection model reach the preset requirement, training is finished, and the output model is the trained semantic detection model.
For example, the semantic detection model employs a BERT that includes, in order, a preprocessing layer, an encoder layer, a dropout layer, and a classifer layer. The preprocessing layer is used for converting each text into three vectors containing 128 elements, namely input word ids, input mask and input type ids; input word IDs is word ID, the insufficient is filled with 0; the fill part of input_mask corresponding to input_word_ids is 0, and the rest is 1; input_types_ids are used to distinguish different sentences, the elements of the vector being all 0's in the classification problem. The encoder layer is a vectorization layer, and its complete output includes a mapped_output (a vector of a preset number of elements for each text), a sequence_output (a vector of a preset number of elements for each word in each text), and an encoder_output (an output of an internal unit). The purpose of the dropout layer is to reduce the overfitting, the probability of dropout being set to 0.1. The classifer layer is a full connection layer, and outputs the probability of classifying the labels of each text.
S700, carrying out weighted combination according to a plurality of predicted values, and obtaining a detection result of the log information to be detected belonging to the normal log information or the abnormal log information according to a weighted combination result.
Specifically, the weighted term includes, but is not limited to, a plurality of predicted values, and the weighted term may also include other detection features corresponding to the log information to be detected, where the sum of weights corresponding to all weighted terms is 1. When the weight corresponding to other detection features corresponding to the log information to be detected is set to 0, the abnormal detection of the log information to be detected only considers the predicted value of the detection module; when the weight corresponding to other detection characteristics corresponding to the log information to be detected is not set to 0, the abnormal detection of the log information to be detected is indicated, and the predicted value of the detection module and the other detection characteristics corresponding to the log information to be detected are considered at the same time.
It should be noted that, in the process of performing weighted combination on the log information to be detected, the weights corresponding to the weighted terms are determined according to the actual application, and the embodiment is not particularly limited.
Referring to fig. 8, in a specific embodiment, in step S700, weighted combination is performed according to a plurality of predicted values, and a detection result of the log information to be detected belonging to the normal log information or the abnormal log information is obtained according to a result of the weighted combination, which specifically includes steps S710 to S740.
S710, extracting detection characteristic values of the log information to be detected.
It should be noted that, the detection feature of the log information to be detected is determined according to the practical application, and the embodiment is not particularly limited. For example, detection features of log information to be detected include, but are not limited to: the occurrence frequency or total occurrence times of the log template to be detected in a certain time, variable values in the original data and the like. And determining a detection characteristic value according to the actual detection characteristic condition of the log information to be detected. For example, the more the log templates to be detected appear in a certain time or the total number of occurrences, the higher the detection feature value.
S720, determining the weight corresponding to the detection characteristic value and the weight corresponding to each detection model.
It should be noted that, the weight corresponding to the detection feature value is determined according to the practical application, and the embodiment is not limited specifically. The larger the influence of the detection characteristic corresponding to the detection characteristic value on the abnormal detection of the log, the larger the weight corresponding to the detection characteristic value. The weight corresponding to each detection model is also determined according to practical application, and the embodiment is not particularly limited.
And S730, weighting according to the weight corresponding to the detection characteristic value and the weight corresponding to the plurality of predicted values to obtain a weighted value.
Specifically, each prediction score is calculated according to each prediction value and the corresponding weight, the feature score is calculated according to the detected feature value and the corresponding weight, and then each prediction score and the feature score are added to obtain the weighted value.
S740, obtaining a detection result that the log information to be detected belongs to normal log information or abnormal log information according to the weighted value.
Specifically, the normal log information or the abnormal log information respectively correspond to weighted values of different numerical ranges, for example, when the numerical range of the weighted values is between 0.5 and 1, the log information to be detected belongs to the normal log information; when the numerical range of the weighted value is between 0 and 0.5, the log information to be detected belongs to abnormal log information; if the weighted value obtained by weighting according to the detection characteristic value and the weight values corresponding to the plurality of predicted values is 0.9, the log information to be detected belongs to normal log information.
In a specific embodiment, the method for detecting log anomalies further includes:
s800, updating the detection result and the corresponding log template to be detected to a log template library.
Specifically, when the log information to be detected is matched according to the log templates in the log template library, whether the log information is abnormal or not cannot be determined, and after the abnormal condition of the log information to be detected is determined according to the detection model and the weighted combination, the detection result and the corresponding log template to be detected are updated to the log template library, so that the detection time of the subsequent identical or similar log information can be reduced, and the detection efficiency is improved.
Referring to fig. 9, in a specific embodiment, the detection model includes a sequence detection model and a semantic detection model, log information to be detected is extracted through a log template to obtain a log template to be detected, and the log template to be detected is matched with a known normal template and a known abnormal template in a log template library; if the log template to be detected belongs to a known normal template or a known abnormal template in the log template library, ending the detection process; if abnormal conditions of the log information cannot be determined through log template matching, the log templates to be detected are respectively input into a sequence detection model and a semantic detection model to be predicted to obtain a predicted value, and in addition, detection features in the log information to be detected are extracted and detection feature values are determined; carrying out weighted combination on the predicted value of the detection model and the detection characteristic in the log information to be detected to estimate a weighted value, and determining the abnormal condition of the log information to be detected according to the weighted value; and finally, feeding back the log template to be detected corresponding to the log information to be detected and a detection result to a log template library through autonomous learning. The sequence detection model and the semantic detection model can train and test the log templates in the log template library as sample data.
According to the method, the device and the storage medium for detecting the log abnormality, the log template to be detected is extracted from the log information to be detected, the log template to be detected is matched with the log template preset in the log template library to determine whether the log information to be detected belongs to known normal log information or abnormal log information, when the log information to be detected cannot be determined to be abnormal through log template matching, the log template to be detected is respectively input into a plurality of different types of detection modules to be predicted to obtain a plurality of predicted values, and weighted combination is carried out according to the plurality of predicted values to predict whether the log information to be detected is abnormal; therefore, the embodiment of the invention firstly carries out first-level detection on the log information to be detected through log template matching, and further carries out second-level detection according to a plurality of detection model weight combinations when the log information cannot be determined to be abnormal through matching, thereby realizing multi-level detection on the log information to be detected so as to cover various application scenes, and simultaneously realizing self-adaptive adjustment of different application scenes through adjusting the weighted weights, and improving the accuracy rate of log abnormality detection.
As shown in fig. 10, an embodiment of the present invention provides a log abnormality detection apparatus, including:
the first module is used for acquiring log information to be detected and extracting a log template to be detected according to the log information to be detected;
the second module is used for matching the log template to be detected with a log template preset in a log template library to obtain a matching result of whether the log information to be detected belongs to known normal log information or known abnormal log information;
the third module is used for inputting the log template to be detected into a plurality of detection models corresponding to a plurality of features to obtain a plurality of predicted values when the matching result is that the log information to be detected does not belong to known normal log information and does not belong to known abnormal log information;
and the fourth module is used for carrying out weighted combination according to a plurality of predicted values and obtaining a detection result of the log information to be detected belonging to normal log information or abnormal log information according to a weighted combination result.
The specific functional implementation manners of the first module, the second module, the third module, and the fourth module may refer to step S100 to step S400 in the corresponding embodiment of fig. 1, which are not described herein.
Referring to fig. 11, fig. 11 is a schematic diagram of another log abnormality detection device provided in an embodiment of the present invention, and one embodiment of the present invention further provides a data processing device 100, where the data processing device 100 includes: memory 101, processor 102, and a computer program stored on memory 101 and executable on processor 102.
The processor 102 and the memory 101 may be connected by a bus or other means.
The non-transitory software programs and instructions required to implement the data processing method of the above-described embodiments are stored in the memory 101, and when executed by the processor 102, perform the data processing method of the data processing apparatus in the above-described embodiments, for example, perform the method steps S100 to S400 in fig. 1, the method steps S110A to S130A in fig. 2, the method steps S131 to S133 in fig. 3, the method steps S110B to S130B in fig. 4, the method steps S011 to S013 in fig. 5, the method steps S021 to S023 in fig. 7, and the method steps S710 to S740 in fig. 8, which are described above.
The above described apparatus embodiments are merely illustrative, wherein the units illustrated as separate components may or may not be physically separate, i.e. may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
Furthermore, an embodiment of the present invention provides a computer-readable storage medium storing computer-executable instructions that are executed by a processor 102 or a controller, for example, by one of the processors 102 in the embodiment of the data processing apparatus 100, which may cause the processor 102 to execute the data processing method applied to the data processing apparatus 100 in the embodiment described above, for example, execute the method steps S100 to S400 in fig. 1, the method steps S110A to S130A in fig. 2, the method steps S131 to S133 in fig. 3, the method steps S110B to S130B in fig. 4, the method steps S011 to S013 in fig. 5, the method steps S021 to S023 in fig. 7, and the method steps S710 to S740 in fig. 8 described above. Those of ordinary skill in the art will appreciate that all or some of the steps, systems, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as known to those skilled in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Furthermore, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.
While the preferred embodiment of the present invention has been described in detail, the present invention is not limited to the above embodiment, and various equivalent modifications and substitutions can be made by those skilled in the art without departing from the spirit of the present invention, and these equivalent modifications and substitutions are intended to be included in the scope of the present invention as defined in the appended claims.
Claims (12)
1. A method for detecting log anomalies, the method comprising the steps of:
acquiring log information to be detected, and extracting a log template to be detected according to the log information to be detected;
matching the log template to be detected with a preset log template in a log template library to obtain a matching result of whether the log information to be detected belongs to known normal log information or known abnormal log information;
when the matching result is that the log information to be detected does not belong to the known normal log information and does not belong to the known abnormal log information, the log template to be detected is respectively input into a plurality of detection models corresponding to a plurality of features to obtain a plurality of predicted values;
and carrying out weighted combination according to a plurality of predicted values, and obtaining a detection result of the log information to be detected belonging to normal log information or abnormal log information according to a weighted combination result.
2. The detection method according to claim 1, wherein the performing weighted combination according to the plurality of predicted values, and obtaining the detection result of the log information to be detected belonging to the normal log information or the abnormal log information according to the result of the weighted combination specifically includes:
extracting detection characteristic values of the log information to be detected;
determining the weight corresponding to the detection characteristic value and the weight corresponding to each detection model;
weighting the weight corresponding to the detection characteristic value and the weight corresponding to the plurality of predicted values to obtain a weighted value;
and obtaining a detection result that the log information to be detected belongs to normal log information or abnormal log information according to the weighted value.
3. The method of detection according to claim 1, wherein the method of detection further comprises:
and when the matching result is that the log information to be detected belongs to known normal log information or known abnormal log information, taking the matching result as a detection result.
4. The detection method according to claim 1, wherein the extracting the log template to be detected according to the log information to be detected specifically includes:
extracting time and log content of the log information to be detected to obtain extracted log information;
combining information belonging to the same type of log in the extracted log information to obtain combined log information;
and filtering variable information of the combined log information to obtain the log template to be detected.
5. The detection method according to claim 1, wherein the extracting the log template to be detected according to the log information to be detected specifically includes:
combining information belonging to the same log in the log information to be detected to obtain combined log information;
extracting time and log content of the combined log information to obtain extracted log information;
and filtering variable information of the extracted log information to obtain the log template to be detected.
6. The detection method according to claim 5, wherein the variable information filtering is performed on the extracted log information to obtain the log template to be detected, and the method specifically includes:
performing word segmentation and morphological restoration on the extracted log information to obtain lexical restored log information;
performing readability filtering on the log information subjected to morphological restoration to obtain log information subjected to readability filtering;
and carrying out similarity comparison on the log information subjected to the readability filtering to obtain the log template to be detected.
7. The method according to claim 1, wherein the plurality of detection models comprises a sequence detection model, the log sample template comprises a sequence log sample template, and the sequence detection model is obtained by training the following steps:
acquiring a plurality of pieces of log sample information, dividing each piece of log sample information according to a preset time window, and extracting a corresponding sequence log sample template from each piece of divided log sample information; the log sample information comprises normal log sample information and abnormal log sample information;
selecting a preset number of normal log sample information to be added into a first training set, and adding the abnormal log sample information and the normal log sample information except the first training set into a first test set;
and performing model training and testing by adopting the first training set and the first testing set to obtain the trained sequence detection model.
8. The detection method according to claim 1, wherein the plurality of detection models comprises a semantic detection model, the semantic detection model being trained by:
taking a log template preset in a log template library as sample data;
adding the sample data to a second training set and a second test set, respectively;
and performing model training and testing by adopting the second training set and the second testing set to obtain the trained semantic detection model.
9. The method of detection according to claim 1, wherein the method of detection further comprises:
and updating the detection result and the corresponding log template to be detected to the log template library.
10. A log abnormality detection apparatus, characterized by comprising:
the first module is used for acquiring log information to be detected and extracting a log template to be detected according to the log information to be detected;
the second module is used for matching the log template to be detected with a log template preset in a log template library to obtain a matching result of whether the log information to be detected belongs to known normal log information or known abnormal log information;
the third module is used for inputting the log template to be detected into a plurality of detection models corresponding to a plurality of features to obtain a plurality of predicted values when the matching result is that the log information to be detected does not belong to known normal log information and does not belong to known abnormal log information;
and the fourth module is used for carrying out weighted combination according to a plurality of predicted values and obtaining a detection result of the log information to be detected belonging to normal log information or abnormal log information according to a weighted combination result.
11. A log abnormality detection apparatus, characterized in that the apparatus includes a memory, a processor, a program stored on the memory and executable on the processor, and a data bus for enabling connection communication between the processor and the memory, the program when executed by the processor implementing the steps of the log abnormality detection method according to any one of claims 1 to 9.
12. A storage medium for computer-readable storage, wherein the storage medium stores one or more programs executable by one or more processors to implement the steps of the method for detecting a log anomaly of any one of claims 1 to 9.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210751478.2A CN117389821A (en) | 2022-06-29 | 2022-06-29 | Log abnormality detection method, device and storage medium |
PCT/CN2023/097504 WO2024001656A1 (en) | 2022-06-29 | 2023-05-31 | Method and device for detecting abnormal log, and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210751478.2A CN117389821A (en) | 2022-06-29 | 2022-06-29 | Log abnormality detection method, device and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117389821A true CN117389821A (en) | 2024-01-12 |
Family
ID=89382809
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210751478.2A Pending CN117389821A (en) | 2022-06-29 | 2022-06-29 | Log abnormality detection method, device and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN117389821A (en) |
WO (1) | WO2024001656A1 (en) |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11457029B2 (en) * | 2013-12-14 | 2022-09-27 | Micro Focus Llc | Log analysis based on user activity volume |
CN105337985A (en) * | 2015-11-19 | 2016-02-17 | 北京师范大学 | Attack detection method and system |
CN110659175A (en) * | 2018-06-30 | 2020-01-07 | 中兴通讯股份有限公司 | Log trunk extraction method, log trunk classification method, log trunk extraction equipment and log trunk storage medium |
CN111881011A (en) * | 2020-07-31 | 2020-11-03 | 网易(杭州)网络有限公司 | Log management method, platform, server and storage medium |
CN112269730A (en) * | 2020-11-05 | 2021-01-26 | 北京小米松果电子有限公司 | Abnormal log detection method, abnormal log detection device, and storage medium |
-
2022
- 2022-06-29 CN CN202210751478.2A patent/CN117389821A/en active Pending
-
2023
- 2023-05-31 WO PCT/CN2023/097504 patent/WO2024001656A1/en unknown
Also Published As
Publication number | Publication date |
---|---|
WO2024001656A1 (en) | 2024-01-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112084337B (en) | Training method of text classification model, text classification method and equipment | |
CN107491432B (en) | Low-quality article identification method and device based on artificial intelligence, equipment and medium | |
CN110472675B (en) | Image classification method, image classification device, storage medium and electronic equipment | |
CN108376151A (en) | Question classification method, device, computer equipment and storage medium | |
CN109635110A (en) | Data processing method, device, equipment and computer readable storage medium | |
CN111177367B (en) | Case classification method, classification model training method and related products | |
CN107341143A (en) | A kind of sentence continuity determination methods and device and electronic equipment | |
CN115544240B (en) | Text sensitive information identification method and device, electronic equipment and storage medium | |
CN112507376B (en) | Sensitive data detection method and device based on machine learning | |
CN110968689A (en) | Training method of criminal name and law bar prediction model and criminal name and law bar prediction method | |
CN112419268A (en) | Method, device, equipment and medium for detecting image defects of power transmission line | |
CN110674642B (en) | Semantic relation extraction method for noisy sparse text | |
CN113486174B (en) | Model training, reading understanding method and device, electronic equipment and storage medium | |
US11966455B2 (en) | Text partitioning method, text classifying method, apparatus, device and storage medium | |
CN116522912B (en) | Training method, device, medium and equipment for package design language model | |
CN115357718B (en) | Method, system, device and storage medium for discovering repeated materials of theme integration service | |
CN114969334A (en) | Abnormal log detection method and device, electronic equipment and readable storage medium | |
CN117389821A (en) | Log abnormality detection method, device and storage medium | |
CN115185918A (en) | Method and device for automatically classifying system logs | |
CN116029280A (en) | Method, device, computing equipment and storage medium for extracting key information of document | |
CN114610882A (en) | Abnormal equipment code detection method and system based on electric power short text classification | |
CN113761875A (en) | Event extraction method and device, electronic equipment and storage medium | |
CN115879446B (en) | Text processing method, deep learning model training method, device and equipment | |
CN113239205B (en) | Data labeling method, device, electronic equipment and computer readable storage medium | |
CN114676797B (en) | Model precision calculation method and device and computer readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication |