CN117389821A - Log abnormality detection method, device and storage medium - Google Patents

Log abnormality detection method, device and storage medium Download PDF

Info

Publication number
CN117389821A
CN117389821A CN202210751478.2A CN202210751478A CN117389821A CN 117389821 A CN117389821 A CN 117389821A CN 202210751478 A CN202210751478 A CN 202210751478A CN 117389821 A CN117389821 A CN 117389821A
Authority
CN
China
Prior art keywords
log
detected
log information
information
detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210751478.2A
Other languages
Chinese (zh)
Inventor
韩静
张百胜
龚子璨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Priority to CN202210751478.2A priority Critical patent/CN117389821A/en
Priority to PCT/CN2023/097504 priority patent/WO2024001656A1/en
Publication of CN117389821A publication Critical patent/CN117389821A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • G06F11/3072Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a log abnormality detection method, equipment and a storage medium. The method comprises the following steps: acquiring log information to be detected, and extracting a log template to be detected according to the log information to be detected; matching the log template to be detected with a preset log template in a log template library to obtain a matching result of whether the log information to be detected belongs to known normal log information or known abnormal log information; when the matching result is that the log information to be detected does not belong to the known normal log information and does not belong to the known abnormal log information, respectively inputting the log template to be detected into a plurality of detection models corresponding to a plurality of features to obtain a plurality of predicted values; and carrying out weighted combination according to the plurality of predicted values, and obtaining a detection result of whether the log information to be detected is abnormal or not according to the result of the weighted combination. The scheme provided by the embodiment of the invention can cover various application scenes, improves the detection accuracy and is widely applied to the field of computers.

Description

Log abnormality detection method, device and storage medium
Technical Field
The present invention relates to the field of computers, and in particular, to a method, an apparatus, and a storage medium for detecting log anomalies.
Background
Anomaly detection for a computer software system includes anomaly detection based on system metrics or based on system logs. The abnormal detection based on the system log can more clearly embody the state experienced by the system, but the abnormal detection based on the system log is limited in covered application scene by collecting cases and establishing keyword rules, and the detection accuracy is required to be improved.
Disclosure of Invention
The embodiment of the invention provides a method, equipment and a storage medium for detecting log abnormality, which can cover various application scenes and improve the detection accuracy.
In order to achieve the above object, an embodiment of the present invention provides a method for detecting log anomalies, including the following steps: acquiring log information to be detected, and extracting a log template to be detected according to the log information to be detected; matching the log template to be detected with a preset log template in a log template library to obtain a matching result of whether the log information to be detected belongs to known normal log information or known abnormal log information; when the matching result is that the log information to be detected does not belong to the known normal log information and does not belong to the known abnormal log information, respectively inputting the log template to be detected into a plurality of detection models corresponding to a plurality of features to obtain a plurality of predicted values; and carrying out weighted combination according to the plurality of predicted values, and obtaining a detection result of the log information to be detected belonging to the normal log information or the abnormal log information according to the weighted combination result.
Optionally, the performing weighted combination according to the plurality of predicted values, and obtaining a detection result of the log information to be detected belonging to the normal log information or the abnormal log information according to a result of the weighted combination specifically includes:
extracting detection characteristic values of the log information to be detected;
determining the weight corresponding to the detection characteristic value and the weight corresponding to each detection model;
weighting the weight corresponding to the detection characteristic value and the weight corresponding to the plurality of predicted values to obtain a weighted value;
and obtaining a detection result that the log information to be detected belongs to normal log information or abnormal log information according to the weighted value.
Optionally, the detection method further comprises:
and when the matching result is that the log information to be detected belongs to known normal log information or known abnormal log information, taking the matching result as a detection result.
Optionally, the extracting the log template to be detected according to the log information to be detected specifically includes:
extracting time and log content of the log information to be detected to obtain extracted log information;
combining information belonging to the same type of log in the extracted log information to obtain combined log information;
and filtering variable information of the combined log information to obtain the log template to be detected.
Optionally, the extracting the log template to be detected according to the log information to be detected specifically includes:
combining information belonging to the same log in the log information to be detected to obtain combined log information;
extracting time and log content of the combined log information to obtain extracted log information;
and filtering variable information of the extracted log information to obtain the log template to be detected.
Optionally, the filtering variable information of the extracted log information to obtain the log template to be detected specifically includes:
performing word segmentation and morphological restoration on the extracted log information to obtain lexical restored log information;
performing readability filtering on the log information subjected to morphological restoration to obtain log information subjected to readability filtering;
and carrying out similarity comparison on the log information subjected to the readability filtering to obtain the log template to be detected.
Optionally, the plurality of detection models include a sequence detection model, the log sample template includes a sequence log sample template, and the sequence detection model is obtained through training by the following steps:
acquiring a plurality of pieces of log sample information, dividing each piece of log sample information according to a preset time window, and extracting a corresponding sequence log sample template from each piece of divided log sample information; the log sample information comprises normal log sample information and abnormal log sample information;
selecting a preset number of normal log sample information to be added into a first training set, and adding the abnormal log sample information and the normal log sample information except the first training set into a first test set;
and performing model training and testing by adopting the first training set and the first testing set to obtain the trained sequence detection model.
Optionally, the plurality of detection models include a semantic detection model, and the semantic detection model is obtained through training by the following steps:
taking a log template preset in a log template library as sample data;
adding the sample data to a second training set and a second test set, respectively;
and performing model training and testing by adopting the second training set and the second testing set to obtain the trained semantic detection model.
Optionally, the detection method further comprises:
and updating the detection result and the corresponding log template to be detected to the log template library.
In order to achieve the above objective, the embodiment of the present invention further provides a log abnormality detection device, where the device includes a first module, a second module, a third module, and a fourth module; the first module is used for acquiring log information to be detected and extracting a log template to be detected according to the log information to be detected; the second module is used for matching the log template to be detected with a log template preset in a log template library to obtain a matching result of whether the log information to be detected belongs to known normal log information or known abnormal log information; the third module is used for inputting the log template to be detected into a plurality of detection models corresponding to a plurality of features to obtain a plurality of predicted values when the matching result is that the log information to be detected does not belong to the known normal log information and does not belong to the known abnormal log information; and the fourth module is used for carrying out weighted combination according to a plurality of predicted values and obtaining a detection result of the log information to be detected belonging to the normal log information or the abnormal log information according to a weighted combination result.
To achieve the above object, an embodiment of the present invention further provides a log abnormality detection device, the device including a memory, a processor, a program stored on the memory and executable on the processor, and a data bus for implementing connection communication between the processor and the memory, the program implementing the steps of the foregoing method when executed by the processor.
To achieve the above object, the present invention provides a storage medium for computer-readable storage, the storage medium storing one or more programs executable by one or more processors to implement the steps of the foregoing method.
According to the method, the device and the storage medium for detecting the log abnormality, the log template to be detected is extracted through the log information to be detected, the log template to be detected is matched with the log template preset in the log template library to determine whether the log information to be detected belongs to known normal log information or abnormal log information, when whether the log information to be detected is abnormal or not cannot be determined through log template matching, the log template to be detected is respectively input into a plurality of different types of detection modules to be predicted to obtain a plurality of predicted values, and weighted combination is carried out according to the plurality of predicted values to predict whether the log information to be detected is abnormal or not; therefore, the embodiment of the invention firstly carries out first-level detection on the log information to be detected through log template matching, and further carries out second-level detection according to a plurality of detection model weight combinations when the log information cannot be determined to be abnormal through matching, thereby realizing multi-level detection on the log information to be detected so as to cover various application scenes, and simultaneously realizing self-adaptive adjustment of different application scenes through adjusting the weighted weights, and improving the accuracy rate of log abnormality detection.
Drawings
FIG. 1 is a flow chart of steps of a method for detecting log anomalies provided by one embodiment of the present invention;
FIG. 2 is a flowchart illustrating steps for extracting log templates to be detected according to log information to be detected according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating steps for filtering variable information of extracted log information to obtain a log template to be detected according to an embodiment of the present invention;
FIG. 4 is a flowchart illustrating a step of extracting a log template to be detected according to log information to be detected according to another embodiment of the present invention;
FIG. 5 is a flow chart of steps of a training method for a sequence detection model according to one embodiment of the present invention;
FIG. 6 is a data flow diagram for LSTM as a sequence detection model provided by one embodiment of the present invention;
FIG. 7 is a flow chart of steps of a training method for a semantic detection model according to one embodiment of the present invention;
FIG. 8 is a flowchart illustrating steps for determining an anomaly of log information to be detected based on a plurality of predicted values according to one embodiment of the present invention;
FIG. 9 is a flowchart illustrating a method for detecting log anomalies according to another embodiment of the present invention;
FIG. 10 is a schematic diagram of a log anomaly detection device according to an embodiment of the present invention;
fig. 11 is a schematic diagram of a log abnormality detection apparatus according to another embodiment of the present invention.
Detailed Description
In order to make the present application solution better understood by those skilled in the art, the following description will be made in detail and with reference to the accompanying drawings in the embodiments of the present application, it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, shall fall within the scope of the present application.
The terms "first," "second," "third," and "fourth" and the like in the description and in the claims and drawings are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.
In the following description, suffixes such as "module", "part" or "unit" for representing elements are used only for facilitating the description of the present invention, and have no particular meaning in themselves. Thus, "module," "component," or "unit" may be used in combination.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
As shown in fig. 1, a log abnormality detection method includes the following steps S100 to S400.
S100, acquiring log information to be detected, and extracting a log template to be detected according to the log information to be detected.
The system log information is generally generated by a print statement in the program, and includes variable information and constant information. The variable information is replaced by the current variable value during printing, such as session name, ID, process, thread name, ID, IP address or interface link, the constant part generally refers to sentences or words with complete semantics, and the constant part and the variable part are combined together to form a piece of log information.
Specifically, since the log information contains a large amount of content, some information has no substantial effect on the judgment of the abnormal situation, and the information having a substantial effect on the abnormal detection is extracted and used in the judgment process, the data amount can be further reduced and the accuracy of the abnormal detection can be improved.
Referring to fig. 2, in a specific embodiment, the step S100 of extracting the log template to be detected according to the log information to be detected specifically includes the following steps S110A to S130A.
S110A, combining information belonging to the same log in the log information to be detected to obtain combined log information.
In addition, the same log is distributed in the plurality of lines of data in the log information, and the same log distributed in the plurality of lines of data is combined at this time, so that the data volume of the processed data can be reduced, and the calculation efficiency can be improved.
S120A, extracting time and log content of the combined log information to obtain extracted log information.
It should be noted that, the specific method of extracting the time, the specific log content of the extraction, and the specific method are determined according to the practical application, and the embodiment is not limited specifically. For example, the time in the log information is extracted by a time extraction function or plug-in, or the log content in the log file is extracted by a regular expression, etc.
S130A, variable information filtering is carried out on the extracted log information, and a log template to be detected is obtained.
It should be noted that, the specific method of filtering the variable information is determined according to practical application, and the embodiment is not limited specifically.
Referring to fig. 3, in a specific embodiment, variable information filtering is performed on the extracted log information in step S130A to obtain a log template to be detected, which specifically includes steps S131 to S133.
S131, performing word segmentation and morphological restoration on the extracted log information to obtain the log information subjected to morphological restoration.
It should be noted that, the specific method of word segmentation may be determined according to practical application, and the embodiment is not limited specifically, for example, NLTK word segmentation is adopted.
It should be noted that the words in the log information may include various transformation forms, such as a past form of a verb, or a complex form of a name, etc.
Specifically, firstly, the extracted log information is segmented, then, the word shapes with different transformation forms are restored, and the log information with different transformation forms generated in various application scenes can be unified, so that the data volume is properly reduced on the premise of not affecting the accuracy of log anomaly detection.
And S132, performing readability filtering on the log information subjected to morphological restoration to obtain the log information subjected to readability filtering.
The purpose of the readability filtering is to filter out tokens in the log information that do not have readability, such as process name/ID, session name/ID, etc. The specific method of readability filtering is determined according to practical application, and the embodiment is not limited specifically, for example, readability filtering is performed by using a dictionary of wordnet.
And S133, performing similarity comparison on the log information subjected to the readability filtering to obtain a log template to be detected.
The log information to be detected is combined and extracted, and partial identical or similar contents are possibly generated through word segmentation, word shape restoration and readability filtration, and the identical or similar contents are subjected to fusion processing through similarity comparison, so that the log template to be detected can cover the whole content of the log content, and meanwhile, the content redundancy and the calculated data volume are reduced.
Referring to fig. 4, in another specific embodiment, the step S100 of extracting the log template to be detected according to the log information to be detected specifically includes the following steps S110B to S130B.
S110B, extracting time and log content of the log information to be detected to obtain extracted log information.
It should be noted that, the specific method of extracting the time, the specific log content of the extraction, and the specific method are determined according to the practical application, and the embodiment is not limited specifically. For example, the time in the log information is extracted by a time extraction function or plug-in, or the log content in the log file is extracted by a regular expression, etc.
S120B, combining the information belonging to the same type of log in the extracted log information to obtain combined log information.
It should be noted that, the method for merging log information of the same class is determined according to practical application, and the embodiment is not particularly limited. For example, the information belonging to the same type of log in the extracted log information is combined by a clustering algorithm in machine learning.
S130B, variable information filtering is carried out on the combined log information, and the log template to be detected is obtained.
Specifically, the variable information filtering of the combined log information may employ the method of steps S131 to S133.
It should be noted that, extracting time and log content from the log information to be detected to obtain the extracted log information, if the comprehensiveness or integrity of the extracted log information does not meet the preset requirement, extracting time and log content from the combined log information again to improve the comprehensiveness and accuracy of the extracted log information.
And S300, matching the log template to be detected with a preset log template in a log template library to obtain a matching result of whether the log information to be detected belongs to known normal log information or known abnormal log information.
It should be noted that, the log template library includes a plurality of known normal log information or log templates corresponding to known abnormal log information, and the log template library may be empty before log abnormality detection, and updated in real time according to the detection result of the log abnormality detection. When the log template library temporarily does not have a preset log template, the log template to be detected is directly input into a plurality of subsequent detection models to predict abnormal log conditions.
S400, when the matching result is that the log information to be detected belongs to known normal log information or known abnormal log information, taking the matching result as a detection result.
When the log information to be detected corresponding to the log template to be detected is abnormal or not can be determined by matching the log template to be detected with a preset log template in a log template library, the detection process is finished, and the matching result is used as a detection result. If abnormal conditions of the log information can be determined through log template matching, the detection process is concise and efficient.
S500, when the matching result is that the log information to be detected does not belong to the known normal log information and does not belong to the known abnormal log information, the log template to be detected is respectively input into a plurality of detection models corresponding to a plurality of features to obtain a plurality of predicted values.
When the log information to be detected corresponding to the log template to be detected cannot be determined to be abnormal by matching the log template to be detected with the preset log template in the log template library, for example, the log template which is the same as or similar to the log template to be detected cannot be matched in the preset log template, or the log template to be detected cannot be identified, further prediction judgment is required to be performed on the abnormal condition of the log template to be detected through a prediction model.
The detection model includes prediction models corresponding to 2 or more different features. The prediction models corresponding to different features can cover different application scenes, and the application scenes are moderately adjusted by adjusting the weights of the different prediction models. The detection models corresponding to the different features need to be trained before application, and the variable parameters in the detection models are determined through a training process.
Referring to fig. 5, in a specific embodiment, the several detection models in step S500 include a sequence detection model, and the log sample template includes a sequence log sample template, and the sequence detection model is obtained through training in the following steps S011 to S013.
S011, acquiring a plurality of pieces of log sample information, dividing each piece of log sample information according to a preset time window, and extracting a corresponding sequence log sample template from each piece of divided log sample information; the log sample information includes normal log sample information and abnormal log sample information.
It should be noted that, the sequence detection model is determined according to practical applications, and the embodiment is not particularly limited. For example, sequence detection modes include, but are not limited to, LSTM (Long Short-Term Memory network) or bi-directional encoder BERT (Bidirectional Encoder Representation from Transformers).
Specifically, a preset fixed time window w is defined first, log sample information appearing in the time window w is extracted according to the extraction method of the log templates, so as to obtain corresponding sequence log sample templates, and the sequence log templates are marked according to different template numbers while the sequence of the log sample information is kept unchanged, for example: w1= { m1, m2, m3, m4, m5, m6}.
S012, selecting a preset number of normal log sample information to be added into the first training set, and adding the abnormal log sample information and the normal log sample information except for the first training set into the first test set.
Specifically, the sequence detection model is trained by using normal log sample information, and the sequence detection model is tested by using normal log sample information and abnormal log sample information.
And S013, performing model training and testing by adopting the first training set and the first testing set to obtain a trained sequence detection model.
Specifically, when the prediction accuracy or training times of the sequence detection model reach the preset requirement, the training is finished, and the output sequence detection model is the trained sequence detection model.
For example, referring to FIG. 6, the sequence detection model employs LSTM whose number of units is determined based on the length of the sequence log templates, each of which is an input to a single LSTM unit. When the sequence log template is: w1= { M1, M2, M3, M4, M5, M6}, set the number of LSTM cells to 5, the cell layer of M1 passes both hidden layer and lower layer probabilities into the LSTM cells of M2, and so on. When the probability of the output result of the unit layer of M5 is M6 is close to 1, the probability of the input log information being normal log information is larger; when the probability of the output result of the unit layer of M5 being M6 is close to 0, the probability that the log information is the abnormal log information is indicated to be larger.
Referring to fig. 7, in a specific embodiment, the several detection models in step S500 include a semantic detection model, and the semantic detection model is obtained through training in the following steps S021 to S023.
S021, taking a log template preset in a log template library as sample data.
It should be noted that the semantic detection model is determined according to practical applications, and the embodiment is not limited in particular, and for example, the semantic detection model includes, but is not limited to, BERT.
The sample data includes sample data of normal log information and sample data of abnormal log information.
S022, adding the sample data to the second training set and the second test set, respectively.
The training set and the test set both contain sample data of normal log information and sample data of abnormal log information.
And S023, performing model training and testing by adopting a second training set and a second testing set to obtain a trained semantic detection model.
Specifically, when the prediction precision or training times of the semantic detection model reach the preset requirement, training is finished, and the output model is the trained semantic detection model.
For example, the semantic detection model employs a BERT that includes, in order, a preprocessing layer, an encoder layer, a dropout layer, and a classifer layer. The preprocessing layer is used for converting each text into three vectors containing 128 elements, namely input word ids, input mask and input type ids; input word IDs is word ID, the insufficient is filled with 0; the fill part of input_mask corresponding to input_word_ids is 0, and the rest is 1; input_types_ids are used to distinguish different sentences, the elements of the vector being all 0's in the classification problem. The encoder layer is a vectorization layer, and its complete output includes a mapped_output (a vector of a preset number of elements for each text), a sequence_output (a vector of a preset number of elements for each word in each text), and an encoder_output (an output of an internal unit). The purpose of the dropout layer is to reduce the overfitting, the probability of dropout being set to 0.1. The classifer layer is a full connection layer, and outputs the probability of classifying the labels of each text.
S700, carrying out weighted combination according to a plurality of predicted values, and obtaining a detection result of the log information to be detected belonging to the normal log information or the abnormal log information according to a weighted combination result.
Specifically, the weighted term includes, but is not limited to, a plurality of predicted values, and the weighted term may also include other detection features corresponding to the log information to be detected, where the sum of weights corresponding to all weighted terms is 1. When the weight corresponding to other detection features corresponding to the log information to be detected is set to 0, the abnormal detection of the log information to be detected only considers the predicted value of the detection module; when the weight corresponding to other detection characteristics corresponding to the log information to be detected is not set to 0, the abnormal detection of the log information to be detected is indicated, and the predicted value of the detection module and the other detection characteristics corresponding to the log information to be detected are considered at the same time.
It should be noted that, in the process of performing weighted combination on the log information to be detected, the weights corresponding to the weighted terms are determined according to the actual application, and the embodiment is not particularly limited.
Referring to fig. 8, in a specific embodiment, in step S700, weighted combination is performed according to a plurality of predicted values, and a detection result of the log information to be detected belonging to the normal log information or the abnormal log information is obtained according to a result of the weighted combination, which specifically includes steps S710 to S740.
S710, extracting detection characteristic values of the log information to be detected.
It should be noted that, the detection feature of the log information to be detected is determined according to the practical application, and the embodiment is not particularly limited. For example, detection features of log information to be detected include, but are not limited to: the occurrence frequency or total occurrence times of the log template to be detected in a certain time, variable values in the original data and the like. And determining a detection characteristic value according to the actual detection characteristic condition of the log information to be detected. For example, the more the log templates to be detected appear in a certain time or the total number of occurrences, the higher the detection feature value.
S720, determining the weight corresponding to the detection characteristic value and the weight corresponding to each detection model.
It should be noted that, the weight corresponding to the detection feature value is determined according to the practical application, and the embodiment is not limited specifically. The larger the influence of the detection characteristic corresponding to the detection characteristic value on the abnormal detection of the log, the larger the weight corresponding to the detection characteristic value. The weight corresponding to each detection model is also determined according to practical application, and the embodiment is not particularly limited.
And S730, weighting according to the weight corresponding to the detection characteristic value and the weight corresponding to the plurality of predicted values to obtain a weighted value.
Specifically, each prediction score is calculated according to each prediction value and the corresponding weight, the feature score is calculated according to the detected feature value and the corresponding weight, and then each prediction score and the feature score are added to obtain the weighted value.
S740, obtaining a detection result that the log information to be detected belongs to normal log information or abnormal log information according to the weighted value.
Specifically, the normal log information or the abnormal log information respectively correspond to weighted values of different numerical ranges, for example, when the numerical range of the weighted values is between 0.5 and 1, the log information to be detected belongs to the normal log information; when the numerical range of the weighted value is between 0 and 0.5, the log information to be detected belongs to abnormal log information; if the weighted value obtained by weighting according to the detection characteristic value and the weight values corresponding to the plurality of predicted values is 0.9, the log information to be detected belongs to normal log information.
In a specific embodiment, the method for detecting log anomalies further includes:
s800, updating the detection result and the corresponding log template to be detected to a log template library.
Specifically, when the log information to be detected is matched according to the log templates in the log template library, whether the log information is abnormal or not cannot be determined, and after the abnormal condition of the log information to be detected is determined according to the detection model and the weighted combination, the detection result and the corresponding log template to be detected are updated to the log template library, so that the detection time of the subsequent identical or similar log information can be reduced, and the detection efficiency is improved.
Referring to fig. 9, in a specific embodiment, the detection model includes a sequence detection model and a semantic detection model, log information to be detected is extracted through a log template to obtain a log template to be detected, and the log template to be detected is matched with a known normal template and a known abnormal template in a log template library; if the log template to be detected belongs to a known normal template or a known abnormal template in the log template library, ending the detection process; if abnormal conditions of the log information cannot be determined through log template matching, the log templates to be detected are respectively input into a sequence detection model and a semantic detection model to be predicted to obtain a predicted value, and in addition, detection features in the log information to be detected are extracted and detection feature values are determined; carrying out weighted combination on the predicted value of the detection model and the detection characteristic in the log information to be detected to estimate a weighted value, and determining the abnormal condition of the log information to be detected according to the weighted value; and finally, feeding back the log template to be detected corresponding to the log information to be detected and a detection result to a log template library through autonomous learning. The sequence detection model and the semantic detection model can train and test the log templates in the log template library as sample data.
According to the method, the device and the storage medium for detecting the log abnormality, the log template to be detected is extracted from the log information to be detected, the log template to be detected is matched with the log template preset in the log template library to determine whether the log information to be detected belongs to known normal log information or abnormal log information, when the log information to be detected cannot be determined to be abnormal through log template matching, the log template to be detected is respectively input into a plurality of different types of detection modules to be predicted to obtain a plurality of predicted values, and weighted combination is carried out according to the plurality of predicted values to predict whether the log information to be detected is abnormal; therefore, the embodiment of the invention firstly carries out first-level detection on the log information to be detected through log template matching, and further carries out second-level detection according to a plurality of detection model weight combinations when the log information cannot be determined to be abnormal through matching, thereby realizing multi-level detection on the log information to be detected so as to cover various application scenes, and simultaneously realizing self-adaptive adjustment of different application scenes through adjusting the weighted weights, and improving the accuracy rate of log abnormality detection.
As shown in fig. 10, an embodiment of the present invention provides a log abnormality detection apparatus, including:
the first module is used for acquiring log information to be detected and extracting a log template to be detected according to the log information to be detected;
the second module is used for matching the log template to be detected with a log template preset in a log template library to obtain a matching result of whether the log information to be detected belongs to known normal log information or known abnormal log information;
the third module is used for inputting the log template to be detected into a plurality of detection models corresponding to a plurality of features to obtain a plurality of predicted values when the matching result is that the log information to be detected does not belong to known normal log information and does not belong to known abnormal log information;
and the fourth module is used for carrying out weighted combination according to a plurality of predicted values and obtaining a detection result of the log information to be detected belonging to normal log information or abnormal log information according to a weighted combination result.
The specific functional implementation manners of the first module, the second module, the third module, and the fourth module may refer to step S100 to step S400 in the corresponding embodiment of fig. 1, which are not described herein.
Referring to fig. 11, fig. 11 is a schematic diagram of another log abnormality detection device provided in an embodiment of the present invention, and one embodiment of the present invention further provides a data processing device 100, where the data processing device 100 includes: memory 101, processor 102, and a computer program stored on memory 101 and executable on processor 102.
The processor 102 and the memory 101 may be connected by a bus or other means.
The non-transitory software programs and instructions required to implement the data processing method of the above-described embodiments are stored in the memory 101, and when executed by the processor 102, perform the data processing method of the data processing apparatus in the above-described embodiments, for example, perform the method steps S100 to S400 in fig. 1, the method steps S110A to S130A in fig. 2, the method steps S131 to S133 in fig. 3, the method steps S110B to S130B in fig. 4, the method steps S011 to S013 in fig. 5, the method steps S021 to S023 in fig. 7, and the method steps S710 to S740 in fig. 8, which are described above.
The above described apparatus embodiments are merely illustrative, wherein the units illustrated as separate components may or may not be physically separate, i.e. may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
Furthermore, an embodiment of the present invention provides a computer-readable storage medium storing computer-executable instructions that are executed by a processor 102 or a controller, for example, by one of the processors 102 in the embodiment of the data processing apparatus 100, which may cause the processor 102 to execute the data processing method applied to the data processing apparatus 100 in the embodiment described above, for example, execute the method steps S100 to S400 in fig. 1, the method steps S110A to S130A in fig. 2, the method steps S131 to S133 in fig. 3, the method steps S110B to S130B in fig. 4, the method steps S011 to S013 in fig. 5, the method steps S021 to S023 in fig. 7, and the method steps S710 to S740 in fig. 8 described above. Those of ordinary skill in the art will appreciate that all or some of the steps, systems, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as known to those skilled in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Furthermore, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.
While the preferred embodiment of the present invention has been described in detail, the present invention is not limited to the above embodiment, and various equivalent modifications and substitutions can be made by those skilled in the art without departing from the spirit of the present invention, and these equivalent modifications and substitutions are intended to be included in the scope of the present invention as defined in the appended claims.

Claims (12)

1. A method for detecting log anomalies, the method comprising the steps of:
acquiring log information to be detected, and extracting a log template to be detected according to the log information to be detected;
matching the log template to be detected with a preset log template in a log template library to obtain a matching result of whether the log information to be detected belongs to known normal log information or known abnormal log information;
when the matching result is that the log information to be detected does not belong to the known normal log information and does not belong to the known abnormal log information, the log template to be detected is respectively input into a plurality of detection models corresponding to a plurality of features to obtain a plurality of predicted values;
and carrying out weighted combination according to a plurality of predicted values, and obtaining a detection result of the log information to be detected belonging to normal log information or abnormal log information according to a weighted combination result.
2. The detection method according to claim 1, wherein the performing weighted combination according to the plurality of predicted values, and obtaining the detection result of the log information to be detected belonging to the normal log information or the abnormal log information according to the result of the weighted combination specifically includes:
extracting detection characteristic values of the log information to be detected;
determining the weight corresponding to the detection characteristic value and the weight corresponding to each detection model;
weighting the weight corresponding to the detection characteristic value and the weight corresponding to the plurality of predicted values to obtain a weighted value;
and obtaining a detection result that the log information to be detected belongs to normal log information or abnormal log information according to the weighted value.
3. The method of detection according to claim 1, wherein the method of detection further comprises:
and when the matching result is that the log information to be detected belongs to known normal log information or known abnormal log information, taking the matching result as a detection result.
4. The detection method according to claim 1, wherein the extracting the log template to be detected according to the log information to be detected specifically includes:
extracting time and log content of the log information to be detected to obtain extracted log information;
combining information belonging to the same type of log in the extracted log information to obtain combined log information;
and filtering variable information of the combined log information to obtain the log template to be detected.
5. The detection method according to claim 1, wherein the extracting the log template to be detected according to the log information to be detected specifically includes:
combining information belonging to the same log in the log information to be detected to obtain combined log information;
extracting time and log content of the combined log information to obtain extracted log information;
and filtering variable information of the extracted log information to obtain the log template to be detected.
6. The detection method according to claim 5, wherein the variable information filtering is performed on the extracted log information to obtain the log template to be detected, and the method specifically includes:
performing word segmentation and morphological restoration on the extracted log information to obtain lexical restored log information;
performing readability filtering on the log information subjected to morphological restoration to obtain log information subjected to readability filtering;
and carrying out similarity comparison on the log information subjected to the readability filtering to obtain the log template to be detected.
7. The method according to claim 1, wherein the plurality of detection models comprises a sequence detection model, the log sample template comprises a sequence log sample template, and the sequence detection model is obtained by training the following steps:
acquiring a plurality of pieces of log sample information, dividing each piece of log sample information according to a preset time window, and extracting a corresponding sequence log sample template from each piece of divided log sample information; the log sample information comprises normal log sample information and abnormal log sample information;
selecting a preset number of normal log sample information to be added into a first training set, and adding the abnormal log sample information and the normal log sample information except the first training set into a first test set;
and performing model training and testing by adopting the first training set and the first testing set to obtain the trained sequence detection model.
8. The detection method according to claim 1, wherein the plurality of detection models comprises a semantic detection model, the semantic detection model being trained by:
taking a log template preset in a log template library as sample data;
adding the sample data to a second training set and a second test set, respectively;
and performing model training and testing by adopting the second training set and the second testing set to obtain the trained semantic detection model.
9. The method of detection according to claim 1, wherein the method of detection further comprises:
and updating the detection result and the corresponding log template to be detected to the log template library.
10. A log abnormality detection apparatus, characterized by comprising:
the first module is used for acquiring log information to be detected and extracting a log template to be detected according to the log information to be detected;
the second module is used for matching the log template to be detected with a log template preset in a log template library to obtain a matching result of whether the log information to be detected belongs to known normal log information or known abnormal log information;
the third module is used for inputting the log template to be detected into a plurality of detection models corresponding to a plurality of features to obtain a plurality of predicted values when the matching result is that the log information to be detected does not belong to known normal log information and does not belong to known abnormal log information;
and the fourth module is used for carrying out weighted combination according to a plurality of predicted values and obtaining a detection result of the log information to be detected belonging to normal log information or abnormal log information according to a weighted combination result.
11. A log abnormality detection apparatus, characterized in that the apparatus includes a memory, a processor, a program stored on the memory and executable on the processor, and a data bus for enabling connection communication between the processor and the memory, the program when executed by the processor implementing the steps of the log abnormality detection method according to any one of claims 1 to 9.
12. A storage medium for computer-readable storage, wherein the storage medium stores one or more programs executable by one or more processors to implement the steps of the method for detecting a log anomaly of any one of claims 1 to 9.
CN202210751478.2A 2022-06-29 2022-06-29 Log abnormality detection method, device and storage medium Pending CN117389821A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210751478.2A CN117389821A (en) 2022-06-29 2022-06-29 Log abnormality detection method, device and storage medium
PCT/CN2023/097504 WO2024001656A1 (en) 2022-06-29 2023-05-31 Method and device for detecting abnormal log, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210751478.2A CN117389821A (en) 2022-06-29 2022-06-29 Log abnormality detection method, device and storage medium

Publications (1)

Publication Number Publication Date
CN117389821A true CN117389821A (en) 2024-01-12

Family

ID=89382809

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210751478.2A Pending CN117389821A (en) 2022-06-29 2022-06-29 Log abnormality detection method, device and storage medium

Country Status (2)

Country Link
CN (1) CN117389821A (en)
WO (1) WO2024001656A1 (en)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11457029B2 (en) * 2013-12-14 2022-09-27 Micro Focus Llc Log analysis based on user activity volume
CN105337985A (en) * 2015-11-19 2016-02-17 北京师范大学 Attack detection method and system
CN110659175A (en) * 2018-06-30 2020-01-07 中兴通讯股份有限公司 Log trunk extraction method, log trunk classification method, log trunk extraction equipment and log trunk storage medium
CN111881011A (en) * 2020-07-31 2020-11-03 网易(杭州)网络有限公司 Log management method, platform, server and storage medium
CN112269730A (en) * 2020-11-05 2021-01-26 北京小米松果电子有限公司 Abnormal log detection method, abnormal log detection device, and storage medium

Also Published As

Publication number Publication date
WO2024001656A1 (en) 2024-01-04

Similar Documents

Publication Publication Date Title
CN112084337B (en) Training method of text classification model, text classification method and equipment
CN107491432B (en) Low-quality article identification method and device based on artificial intelligence, equipment and medium
CN110472675B (en) Image classification method, image classification device, storage medium and electronic equipment
CN108376151A (en) Question classification method, device, computer equipment and storage medium
CN109635110A (en) Data processing method, device, equipment and computer readable storage medium
CN111177367B (en) Case classification method, classification model training method and related products
CN107341143A (en) A kind of sentence continuity determination methods and device and electronic equipment
CN115544240B (en) Text sensitive information identification method and device, electronic equipment and storage medium
CN112507376B (en) Sensitive data detection method and device based on machine learning
CN110968689A (en) Training method of criminal name and law bar prediction model and criminal name and law bar prediction method
CN112419268A (en) Method, device, equipment and medium for detecting image defects of power transmission line
CN110674642B (en) Semantic relation extraction method for noisy sparse text
CN113486174B (en) Model training, reading understanding method and device, electronic equipment and storage medium
US11966455B2 (en) Text partitioning method, text classifying method, apparatus, device and storage medium
CN116522912B (en) Training method, device, medium and equipment for package design language model
CN115357718B (en) Method, system, device and storage medium for discovering repeated materials of theme integration service
CN114969334A (en) Abnormal log detection method and device, electronic equipment and readable storage medium
CN117389821A (en) Log abnormality detection method, device and storage medium
CN115185918A (en) Method and device for automatically classifying system logs
CN116029280A (en) Method, device, computing equipment and storage medium for extracting key information of document
CN114610882A (en) Abnormal equipment code detection method and system based on electric power short text classification
CN113761875A (en) Event extraction method and device, electronic equipment and storage medium
CN115879446B (en) Text processing method, deep learning model training method, device and equipment
CN113239205B (en) Data labeling method, device, electronic equipment and computer readable storage medium
CN114676797B (en) Model precision calculation method and device and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication