CN114064434A - Early warning method and device for log abnormity, electronic equipment and storage medium - Google Patents

Early warning method and device for log abnormity, electronic equipment and storage medium Download PDF

Info

Publication number
CN114064434A
CN114064434A CN202111362830.5A CN202111362830A CN114064434A CN 114064434 A CN114064434 A CN 114064434A CN 202111362830 A CN202111362830 A CN 202111362830A CN 114064434 A CN114064434 A CN 114064434A
Authority
CN
China
Prior art keywords
template
log
identifier
classification
log data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111362830.5A
Other languages
Chinese (zh)
Inventor
简拥军
金勇�
吴泽君
雷发林
常冬冬
高阳
周明宏
王艳华
李国莹
苑志云
梁晓冬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CCB Finetech Co Ltd
Original Assignee
CCB Finetech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CCB Finetech Co Ltd filed Critical CCB Finetech Co Ltd
Priority to CN202111362830.5A priority Critical patent/CN114064434A/en
Publication of CN114064434A publication Critical patent/CN114064434A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3452Performance evaluation by statistical analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention provides a log abnormity early warning method, a log abnormity early warning device, electronic equipment and a storage medium, wherein the method comprises the following steps: classifying and serializing each piece of log data in sequence to obtain a template sequence corresponding to each classification identifier; counting the number of template identifications included in the template sequence corresponding to each classification identification, and predicting based on the template sequence corresponding to the classification identification to obtain a predicted template identification of the next log data corresponding to the classification identification; according to the prediction template identification and the actual template identification of the next log data corresponding to the classification identification, counting the prediction failure amount corresponding to the next log data corresponding to the classification identification of which the number of the template identifications is equal to the preset number in the preset time period; and if the prediction failure amount is larger than a set threshold value, performing log abnormity early warning. The device is used for executing the method. The log abnormity early warning method and device, the electronic equipment and the storage medium provided by the embodiment of the invention improve the reliability of log abnormity early warning.

Description

Early warning method and device for log abnormity, electronic equipment and storage medium
Technical Field
The invention relates to the technical field of computers, in particular to a log abnormity early warning method and device, electronic equipment and a storage medium.
Background
At present, almost all backend services generate log data in real time during running, and events and system states of the backend services during running are recorded. The log data plays an important role in anomaly detection and early warning of the operation and maintenance monitoring system.
In the prior art, a large online service is usually developed and maintained by hundreds of development and operation and maintenance personnel respectively, and the development personnel and the operation and maintenance personnel tend to analyze the log abnormality from a local view, so that the inaccuracy of log analysis is easily caused. Furthermore, as the number of logs proliferates, the way to manually analyze log data anomalies becomes impractical. The traditional method for automatically detecting log data abnormity is based on keyword matching of regular expressions, a keyword library needs to be generated according to experience, and due to the fact that keywords cannot be exhausted exhaustively, keywords are easy to miss, noise is easy to generate, and the reliability of abnormity detection is low.
Disclosure of Invention
For solving the problems in the prior art, embodiments of the present invention provide a method and an apparatus for early warning of log anomaly, an electronic device, and a storage medium, which can at least partially solve the problems in the prior art.
On one hand, the invention provides a log abnormity early warning method, which comprises the following steps:
classifying and serializing each piece of log data in sequence to obtain a template sequence corresponding to each classification identifier; each piece of log data corresponds to one classification identifier;
counting the number of template identifications included in the template sequence corresponding to each classification identification, predicting the template sequences corresponding to the classification identifications with the number equal to the preset number based on the log prediction model, and obtaining the prediction template identifications of the next log data corresponding to the classification identifications with the number equal to the preset number; the log prediction model is obtained by pre-training based on historical log data;
according to the predicted template identifications and the actual template identifications of the next log data corresponding to the classification identifications with the number equal to the preset number, counting the predicted failure amount corresponding to the next log data corresponding to the classification identifications with the number equal to the preset number in the preset time period;
and if the prediction failure quantity is judged and known to be larger than a set threshold value, performing log abnormity early warning.
In another aspect, the present invention provides an early warning apparatus for log anomaly, including:
the obtaining module is used for sequentially classifying and serializing each piece of log data to obtain a template sequence corresponding to each classification identifier; each piece of log data corresponds to one classification identifier;
the prediction module is used for counting the number of the template identifications included in the template sequence corresponding to each classification identification, predicting the template sequence corresponding to the classification identification of which the number is equal to the preset number based on the log prediction model, and obtaining the prediction template identification of the next log data corresponding to the classification identification of which the number is equal to the preset number; the log prediction model is obtained by pre-training based on historical log data;
the statistical module is used for counting the prediction failure amount corresponding to the next log data corresponding to the classification identifications with the number equal to the preset number in the preset time period according to the prediction template identifications and the actual template identifications of the next log data corresponding to the classification identifications with the number equal to the preset number;
and the early warning module is used for carrying out log abnormity early warning after judging that the prediction failure amount is larger than a set threshold value.
In another aspect, the present invention provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the steps of the log anomaly warning method according to any one of the above embodiments.
In still another aspect, the present invention provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the log anomaly warning method according to any one of the above embodiments.
In a further aspect, the present invention provides a computer program product, which includes a computer program, and when the computer program is executed by a processor, the computer program implements the steps of the log anomaly warning method according to any one of the above embodiments.
The log anomaly early warning method, the log anomaly early warning device, the electronic equipment and the storage medium provided by the embodiment of the invention can classify and serialize each log data in sequence to obtain the template sequence corresponding to each classification identifier, count the number of template identifiers included in the template sequence corresponding to each classification identifier, predict the template sequence corresponding to the classification identifier with the number equal to the preset number based on the log prediction model to obtain the predicted template identifier of the next log data corresponding to the classification identifier with the number equal to the preset number, count the prediction failure amount corresponding to the next log data corresponding to the classification identifier with the number equal to the preset number in a preset time period according to the predicted template identifier and the actual template identifier of the next log data corresponding to the classification identifier with the number equal to the preset number, if the prediction failure quantity is judged to be larger than the set threshold value, log abnormity early warning is carried out, and the reliability of log abnormity early warning is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts. In the drawings:
fig. 1 is a schematic flowchart of a log anomaly warning method according to a first embodiment of the present invention.
Fig. 2 is a schematic flowchart of a log anomaly warning method according to a second embodiment of the present invention.
Fig. 3 is a flowchart illustrating a log anomaly warning method according to a third embodiment of the present invention.
Fig. 4 is a schematic flowchart of a log anomaly warning method according to a fourth embodiment of the present invention.
Fig. 5 is a flowchart illustrating an early warning method for log anomalies according to a fifth embodiment of the present invention.
Fig. 6 is a flowchart illustrating an early warning method for log anomalies according to a sixth embodiment of the present invention.
Fig. 7 is a flowchart illustrating an early warning method for log anomalies according to a seventh embodiment of the present invention.
Fig. 8 is a flowchart illustrating an early warning method for log anomalies according to an eighth embodiment of the present invention.
Fig. 9 is a flowchart illustrating an early warning method for log anomalies according to a ninth embodiment of the present invention.
Fig. 10 is a schematic structural diagram of an early warning apparatus for log abnormality according to a tenth embodiment of the present invention.
Fig. 11 is a schematic structural diagram of an early warning device for log abnormality according to an eleventh embodiment of the present invention.
Fig. 12 is a schematic structural diagram of an early warning device for log abnormality according to a twelfth embodiment of the present invention.
Fig. 13 is a schematic structural diagram of an early warning device for log abnormality according to a thirteenth embodiment of the present invention.
Fig. 14 is a schematic structural diagram of an early warning device for log abnormality according to a fourteenth embodiment of the present invention.
Fig. 15 is a schematic structural diagram of an early warning device for log abnormality according to a fifteenth embodiment of the present invention.
Fig. 16 is a schematic structural diagram of an early warning device for log abnormality according to a sixteenth embodiment of the present invention.
Fig. 17 is a schematic structural diagram of an early warning device for log abnormality according to a seventeenth embodiment of the present invention.
Fig. 18 is a schematic structural diagram of an early warning apparatus for log abnormality according to an eighteenth embodiment of the present invention.
Fig. 19 is a schematic physical structure diagram of an electronic device according to a nineteenth embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention are further described in detail below with reference to the accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention. It should be noted that the embodiments and features of the embodiments in the present application may be arbitrarily combined with each other without conflict.
In order to facilitate understanding of the technical solutions provided in the present application, the following first describes relevant contents of the technical solutions in the present application. The log data is a text data, and a normal log and an abnormal log have obvious difference. The current deep learning technology has many excellent performances in text data mining, and especially models such as BERT (Bidirectional Encoder retrieval from transforms), Gated Round Unit (GRU), Bidirectional Long Short Term Memory (Bi-directional Long Short Term Memory, BiLSTM), Attention mechanism (Attention) and the like are widely applied to the field of text analysis. In the background of large-scale online service and massive log data processing, in order to find the abnormality of log data more quickly and improve the automation of log data processing, the invention trains and obtains a log prediction model by using historical log data and a deep learning technology, and then predicts the online log data based on the log prediction model so as to early warn the abnormal log data.
The following describes a specific implementation process of the log anomaly early warning method provided by the embodiment of the present invention, taking a server as an execution subject as an example. It can be understood that the execution subject of the log anomaly early warning method provided by the embodiment of the invention is not limited to the server.
Fig. 1 is a schematic flow diagram of a method for early warning of log anomalies according to a first embodiment of the present invention, and as shown in fig. 1, the method for early warning of log anomalies according to the embodiment of the present invention includes:
s101, sequentially classifying and serializing each piece of log data to obtain a template sequence corresponding to each classification identifier; each piece of log data corresponds to one classification identifier;
specifically, the server may obtain real-time log data, and then sequentially classify and serialize each piece of log data, so as to obtain a template sequence corresponding to each classification identifier, where the template sequence includes a plurality of template identifiers arranged in sequence, and the arrangement sequence of each template identifier is arranged according to the sequence of timestamps of each piece of log data corresponding to each template identifier, and may be regarded as time series data. The classification identifier is preset, and may be a process number or a thread number. Each log data will include the classification identification.
S102, counting the number of template identifications included in the template sequence corresponding to each classification identification, predicting the template sequences corresponding to the classification identifications with the number equal to the preset number based on the log prediction model, and obtaining the prediction template identifications of the next log data corresponding to the classification identifications with the number equal to the preset number; the log prediction model is obtained by pre-training based on historical log data;
specifically, since the real-time log data is constantly changing, the number of the template identifiers included in the template sequence corresponding to the obtained classification identifier also changes, the server counts the number of the template identifiers included in the template sequence corresponding to each classification identifier, when the number of the template identifiers is equal to the preset number, the template sequences corresponding to the classification identifiers are predicted based on the log prediction model, the predicted template identifiers of the next log data corresponding to the classification identifiers whose number is equal to the preset number can be obtained, and the predicted template identifiers can be multiple. Wherein the log prediction model is obtained by pre-training based on historical log data. The preset number is set according to actual needs, and the embodiment of the invention is not limited.
It should be noted that the next log data corresponding to the classification identifiers whose number of the template identifiers is equal to the preset number is the next log data of the preset number of pieces of log data corresponding to the template sequences corresponding to the classification identifiers whose number of the template identifiers is equal to the preset number, for example, if the preset number of pieces of log data is n pieces of log data, the next log data of the preset number of pieces of log data is the n +1 th piece of log data. The n +1 log data have the same classification identification, and the n +1 log data have a time precedence relationship.
It can be understood that after the number of the template identifiers included in the template sequence corresponding to the classification identifier is equal to the preset number, the template sequence corresponding to the classification identifier is obtained again.
S103, according to the predicted template identifications and the actual template identifications of the next log data corresponding to the classification identifications with the number equal to the preset number, counting the prediction failure amount corresponding to the next log data corresponding to the classification identifications with the number equal to the preset number in the preset time period;
specifically, the server may obtain next log data corresponding to the classification identifiers, where the number of the template identifiers is equal to the preset number, perform template matching on the next log data, obtain a template identifier corresponding to the next log data, and use the template identifier as an actual template identifier of the next log data corresponding to the classification identifiers, where the number of the template identifiers is equal to the preset number. And the server can determine whether the template identification of the next log data fails to be predicted according to the predicted template identification and the actual template identification of the next log data. In a preset time period, the server may obtain a plurality of next log data, determine whether the template identifier of each next log data fails to predict, and count the number of template prediction failures of each next log data in the preset time period as the number of template identifiers in the preset time period equal to the number of prediction failures corresponding to the next log data corresponding to the preset number of classification identifiers. The preset time period is set according to actual needs, and the embodiment of the invention is not limited.
For example, after obtaining the predicted template identifier and the actual template identifier of the next log data, the server compares the actual template identifier with each template identifier included in the predicted template identifiers, and if a template identifier identical to the actual template identifier exists in the template identifiers included in the predicted template identifiers, it may be determined that the template identifier of the next log data is predicted successfully. If the template identifier included in the predicted template identifier does not have the same template identifier as the actual template identifier, it may be determined that the template identifier of the next log data is unsuccessfully predicted.
For example, in the preset time period, the template identifier prediction of the next log data corresponding to the obtained category identifier a fails 3 times, the template identifier prediction of the next log data corresponding to the obtained category identifier b fails 2 times, and the template identifier prediction of the next log data corresponding to the obtained category identifier c fails 1 time, so that the server counts that the number of template identifiers in the preset time period is equal to the number of prediction failures of the next log data corresponding to the preset number of category identifiers, and the prediction failure amount is 3+2+ 1-6.
And S104, if the prediction failure amount is judged to be larger than the set threshold value, performing log abnormity early warning.
Specifically, after obtaining the prediction failure amount corresponding to the next log data corresponding to the classification identifiers of which the number of the template identifiers is equal to the preset number within the preset time period, the server compares the prediction failure amount with a set threshold, and if the prediction failure amount is greater than the set threshold, log anomaly early warning can be performed in a system message, a mail or the like. The set threshold is set according to actual needs, and the embodiment of the present invention is not limited.
The log anomaly early warning method provided by the embodiment of the invention can classify and serialize each log data in sequence to obtain the template sequence corresponding to each classification identifier, count the number of template identifiers included in the template sequence corresponding to each classification identifier, predict the template sequence corresponding to the classification identifier with the number equal to the preset number based on the log prediction model to obtain the predicted template identifier of the next log data corresponding to the classification identifier with the number equal to the preset number, count the prediction failure amount corresponding to the next log data corresponding to the classification identifier with the number equal to the preset number in the preset time period according to the predicted template identifier and the actual template identifier of the next log data corresponding to the classification identifier with the number equal to the preset number, if the prediction failure amount is judged to be larger than the set threshold value, and log abnormity early warning is carried out, and the reliability of log abnormity early warning is improved.
Fig. 2 is a schematic flow chart of an early warning method for log anomalies according to a second embodiment of the present invention, and as shown in fig. 2, on the basis of the foregoing embodiments, further, the sequentially classifying and serializing each piece of log data to obtain a template sequence corresponding to each classification identifier includes:
s201, performing template matching on each piece of log data to obtain a template identifier corresponding to each piece of log data;
specifically, the server may obtain log data, perform template matching on each piece of log data, obtain a log template matched with each piece of log data, and obtain a template identifier of the matched log template as a template identifier corresponding to each piece of log data.
The log template library can be pre-established and comprises a plurality of log templates, and each log template has a unique corresponding template identifier. Each log data comprises a fixed character string and a variable, different log templates can be obtained by clustering historical log data, extracting the fixed character string and replacing the variable with a wildcard character, the log templates correspond to log identifiers one to one, each log template comprises template contents, each template content comprises a regular expression formed by the fixed character string and the wildcard character, each regular expression is used for log data matching, and if the log data is matched with the regular expression included in the log template, the log data is matched with the log template, the template identifier corresponding to the log template is used as the template identifier corresponding to the log data. The log template can be extracted through a Logmine tool, and the log template library is established.
When the log data are subjected to template matching, each log template in the log template library is traversed, whether a regular expression matched with the log data exists in each log template is judged, and if the regular expression matched with the log data exists, the log data are matched with the log template, and then the template identifier corresponding to the log template to which the regular expression belongs is used as the template identifier corresponding to the log data.
In addition, if the log template matched with the log data is not found in the log template library, the log template extraction is carried out on the log data of which the matched log template is not found, and a new log template is established.
S202, classifying each piece of log data according to the classification identifier to obtain a classification identifier corresponding to each piece of log data;
specifically, for each piece of log data, the server queries the log data according to the classification identifier, and if the log data includes the classification identifier, the classification identifier is used as the classification identifier corresponding to the log data. Wherein, there is no precedence relationship between step S201 and step S202. The classification identifier may be a process number or a thread number, and is set according to actual needs, which is not limited in the embodiments of the present invention.
S203, obtaining a template sequence corresponding to each classification identifier according to the classification identifier corresponding to each log data, the template identifier corresponding to each log data and the timestamp corresponding to each log data.
Specifically, the server sorts the template identifiers corresponding to the log data according to the sequence of the timestamps corresponding to the log data, and obtains a template sequence corresponding to each classification identifier. For real-time log data, after obtaining a classification identifier and a template identifier corresponding to the latest log data, whether a template sequence corresponding to the classification identifier is already established may be queried according to the classification identifier, and if the template sequence corresponding to the classification identifier is already established, the template identifier corresponding to the latest log data may be directly used as the last template identifier of the template sequence corresponding to the classification identifier. And if the template sequence corresponding to the classification identification is not established, establishing the template sequence corresponding to the classification identification, and taking the template identification corresponding to the latest log data as the first template identification of the template sequence corresponding to the classification identification.
Fig. 3 is a schematic flow chart of an early warning method for log anomalies according to a third embodiment of the present invention, and as shown in fig. 3, on the basis of the foregoing embodiments, further, the predicting, based on the log prediction model, the template sequence corresponding to the classification identifiers whose number of template identifiers is equal to the preset number, and obtaining the predicted template identifier of the next log data corresponding to the classification identifier whose number of template identifiers is equal to the preset number includes:
s301, performing feature processing on template sequences corresponding to the classification identifiers with the number of the template identifiers equal to the preset number to obtain prediction feature data corresponding to the classification identifiers;
specifically, after judging that the number of the template identifiers included in the template sequence corresponding to the classification identifier is equal to the preset number, the server performs feature processing on the template sequence corresponding to the classification identifier to obtain predicted feature data corresponding to the classification identifier. The specific process of the feature processing is described below, and is not described herein again.
S302, according to the prediction characteristic data corresponding to the classification identification and the log prediction model, obtaining a prediction template identification of the next log data corresponding to the classification identification.
Specifically, after obtaining the predicted feature data corresponding to the classification identifier, the server inputs the predicted feature data corresponding to the classification identifier into a log prediction model, and the predicted template identifier of the next log data corresponding to the classification identifier can be output through processing of the log prediction model, where the predicted template identifier may include multiple template identifiers and is set according to actual needs, which is not limited in the embodiment of the present invention.
Fig. 4 is a schematic flow chart of an early warning method for log anomalies according to a fourth embodiment of the present invention, and as shown in fig. 4, on the basis of the foregoing embodiments, further, the performing feature processing on the template sequence corresponding to the classification identifiers whose number of template identifiers is equal to a preset number, and obtaining the predicted feature data corresponding to the classification identifiers includes:
s401, obtaining a log template vector corresponding to each template identifier according to each template identifier and a template vector library included in the template sequence; the template vector library is pre-established and comprises log template vectors which correspond to the template identifications one by one;
specifically, the server queries a corresponding log template vector from a template vector library according to each template identifier included in the template sequence, and may obtain the log template vector corresponding to each template identifier in the template sequence. The template vector library is pre-established and comprises a plurality of log template vectors, and the log template vectors correspond to the template identifications one by one.
S402, counting the number of the same template identifications included in the template sequence, wherein the number is used as the frequency corresponding to each template identification;
specifically, the server counts the number of the same template identifiers in each template identifier included in the template sequence, and as the frequency count corresponding to the template identifier, the frequency count corresponding to each different template identifier included in the template sequence may be obtained.
For example, the template sequence X includes 10 template identifiers arranged in sequence, where there are 4 template identifiers a, 2 template identifiers b, 3 template identifiers c, and 1 template identifier d, then the server may count that the frequency of the template identifier a is 4, the frequency of the template identifier b is 2, the frequency of the template identifier c is 3, and the frequency of the template identifier d is 1.
And S403, taking the log template vector corresponding to each template identifier in the template sequence and the corresponding frequency as the prediction characteristic data corresponding to the classification identifier.
Specifically, the server uses the log template vector corresponding to each template identifier in the template sequence and the frequency count corresponding to each template identifier in the template sequence as the prediction feature data corresponding to the classification identifier corresponding to the template sequence.
Fig. 5 is a schematic flow chart of an early warning method for log anomalies according to a fifth embodiment of the present invention, and as shown in fig. 5, on the basis of the foregoing embodiments, further, the step of establishing a template vector library includes:
s501, obtaining a word vector corresponding to each word in the log template according to the log template and a word vector library; the log template corresponds to the template identification one by one; the word vector library is obtained in advance;
specifically, the server queries a word vector corresponding to each word from a word vector library according to each word in the template content of the log template. Wherein the word vector library is obtained in advance; and the log template corresponds to the template identification one by one.
For the word vector library, word vector training can be performed on historical log data by using an LRWE model, and a word sense library and an antisense word library are also required to be used as input of the LRWE model during training. The word stock and the antisense word stock are established by extracting the word stock and the antisense word from historical log data, and the word stock and the antisense word stock can be manually extracted from the historical log data according to operation and maintenance experience.
S502, calculating the average value of word vectors corresponding to all words included in the log template to serve as corresponding log template vectors;
specifically, after obtaining the word vector corresponding to each word in the log template, the server performs vector summation on each word vector, and then calculates an average value of the vectors to obtain the log template vector corresponding to the log template.
S503, correspondingly storing the log template vector corresponding to the log template and the corresponding template identifier.
Specifically, the server stores the template vector corresponding to the log template and the template identifier corresponding to the log template correspondingly. And the log template vectors and the corresponding template identifications form the template vector library.
Fig. 6 is a flowchart of an early warning method for log anomalies according to a sixth embodiment of the present invention, and as shown in fig. 6, on the basis of the foregoing embodiments, further, the counting, according to the predicted template identifier and the actual template identifier of the next log data corresponding to the classification identifier whose number of template identifiers is equal to the preset number, the prediction failure amount corresponding to the next log data corresponding to the classification identifier whose number of template identifiers is equal to the preset number in the preset time period includes:
s601, performing template matching on next log data corresponding to the classification identifier to obtain an actual template identifier of the next log data corresponding to the classification identifier;
specifically, the server may obtain next log data corresponding to classification identifiers, where the number of the template identifiers is equal to a preset number of the classification identifiers, perform template matching on the next log data corresponding to the classification identifiers, obtain a log template matched with the next log data corresponding to the classification identifiers, and use the template identifier of the log template as an actual template identifier of the next log data corresponding to the classification identifiers.
S602, if the prediction template identifier of the next log data corresponding to the classification identifier is judged to be not included in the actual template identifier of the next log data corresponding to the classification identifier, recording that the prediction of the next log data corresponding to the classification identifier fails.
Specifically, the server compares the actual template identifier of the next log data corresponding to the classification identifier with each template identifier in the prediction template identifiers of the next log data corresponding to the classification identifier, and if the template identifier identical to the actual template identifier of the next log data corresponding to the classification identifier does not exist in the prediction template identifiers of the next log data corresponding to the classification identifier, records that the prediction of the next log data corresponding to the classification identifier fails. If the template identifier which is the same as the actual template identifier of the next log data corresponding to the classification identifier exists in the prediction template identifier of the next log data corresponding to the classification identifier, the success of prediction of the next log data corresponding to the classification identifier can be recorded.
Fig. 7 is a schematic flow chart of an early warning method for log anomalies according to a seventh embodiment of the present invention, and as shown in fig. 7, on the basis of the foregoing embodiments, further, the step of obtaining a log prediction model based on historical log data training includes:
s701, classifying and serializing the historical log data to obtain a plurality of fixed template sequences; wherein each fixed template sequence corresponds to a classification identifier;
specifically, the server may classify and serialize the historical log data to obtain a plurality of fixed template sequences, where each fixed template sequence includes the same number of template identifiers, and each fixed template sequence corresponds to one classification identifier. The number of template identifiers included in each fixed template sequence is set according to actual needs, and the embodiment of the invention is not limited.
S702, performing sliding sampling on each fixed template sequence to obtain training data, and obtaining a template identifier corresponding to the next log data corresponding to each sample sequence as a label corresponding to each sample sequence; the training data comprises a plurality of sample sequences, the length of each sample sequence is fixed, and each sample sequence corresponds to a classification mark;
specifically, the server performs sliding sampling on each fixed template sequence to obtain a plurality of sample sequences, each sample sequence includes the preset number of template identifiers, and each sample sequence corresponds to one classification identifier. The sample sequences obtained by sampling each fixed template sequence constitute training data. The server may obtain next log data corresponding to each sample sequence from the historical log data, perform template matching on the next log data corresponding to each sample sequence, obtain a log template matched with the next log data corresponding to each sample sequence, obtain a template identifier of the matched log template as a template identifier corresponding to the next log data corresponding to each sample sequence, and use the template identifier as a label corresponding to each sample sequence.
The next log data corresponding to the sample sequence is the next log data corresponding to the preset number of template identifications included in the sample sequence, the log data corresponding to the preset number of template identifications and the corresponding next log data have the same classification identification and have a time precedence relationship, and the time of occurrence of the next log data corresponding to the preset number of template identifications is latest relative to the log data corresponding to the preset number of template identifications.
S703, performing feature processing on each sample sequence in the training data to obtain training feature data; training sample data included in the training characteristic data corresponds to the sample sequence one by one;
specifically, after obtaining the training data, the server may perform feature processing on each sample sequence in the training data to obtain training sample data corresponding to each sample sequence, where the training sample data corresponding to all the sample sequences form the training feature data. Each sample sequence has a corresponding label, and the label corresponding to the sample sequence corresponding to the training sample data, i.e. the label corresponding to the training sample data. The labels corresponding to the training sample data can be automatically obtained, manual marking is not needed, and the training efficiency of the model is improved.
S704, training according to the original combination model, training sample data included in the training characteristic data and corresponding labels to obtain a log prediction model.
Specifically, the training feature data may be divided into a training set and a verification set, and each piece of training sample data and a corresponding label included in the training set are input into an original combination model, so as to perform model training, thereby obtaining a to-be-verified log prediction model. And verifying the log prediction model to be verified through a verification set, wherein the log prediction model to be verified which passes the verification is used as the log prediction model. The original combination model may adopt a combination of a GRU and an Attention model, or a combination of Long Short-Term Memory (LSTM) and Attention, and is selected according to actual needs, which is not limited in the embodiments of the present invention.
Fig. 8 is a schematic flow chart of an early warning method for log anomalies according to an eighth embodiment of the present invention, and as shown in fig. 8, on the basis of the foregoing embodiments, further classifying and serializing the historical log data to obtain a plurality of fixed template sequences includes:
s801, acquiring each piece of historical log data, and performing template matching on each piece of historical log data to obtain a template identifier corresponding to each piece of historical log data;
specifically, the server may acquire log data from the historical log data one by one, perform template matching on each piece of historical log data to obtain a log template corresponding to each piece of historical log data, and use a template identifier of the matched log template as a template identifier corresponding to each piece of historical log data. The specific implementation process of this step is similar to step S201, and is not described herein again.
S802, classifying the historical log data according to the classification identifier to obtain the classification identifier corresponding to each piece of historical log data;
specifically, for each piece of historical log data, the server queries the historical log data according to the classification identifier, and if the historical log data comprises the classification identifier, the classification identifier is used as the classification identifier corresponding to the historical log data. Wherein, there is no precedence relationship between step S801 and step S802. The classification identifier may be a process number or a thread number, and is set according to actual needs, which is not limited in the embodiments of the present invention.
And S803, obtaining a plurality of fixed template sequences according to the classification identification corresponding to each historical log data, the template identification corresponding to each historical log data and the time stamp corresponding to each historical log data.
Specifically, the server sorts the template identifiers corresponding to the historical log data according to the sequence of the timestamps corresponding to the historical log data, wherein the historical log data have the same classification identifier. For the sorting results corresponding to different classification identifiers, a set number of template identifiers are sequentially obtained from the first template identifier in each sorting result to form a fixed template sequence, and a plurality of fixed template sequences can be obtained. The set number is set according to actual needs, and the embodiment of the invention is not limited.
Fig. 9 is a schematic flow chart of an early warning method for log anomalies according to a ninth embodiment of the present invention, and as shown in fig. 9, on the basis of the foregoing embodiments, further performing feature processing on each sample sequence in the training data to obtain training feature data includes:
s901, obtaining a log template vector corresponding to each template identifier according to each template identifier and a template vector library included in the sample sequence; the template vector library is pre-established and comprises log template vectors which correspond to the template identifications one by one;
specifically, the server queries a corresponding log template vector from a template vector library according to each template identifier included in the sample sequence, and may obtain the log template vector corresponding to each template identifier in the sample sequence. The template vector library is pre-established and comprises a plurality of log template vectors, and the log template vectors correspond to the template identifications one by one.
S902, counting the number of the same template identifications included in the sample sequence, and taking the number as the frequency corresponding to the same template identifications;
specifically, the server counts the number of the same template identifiers in each template identifier included in the sample sequence, and as the frequency count corresponding to the template identifier, the frequency count corresponding to each different template identifier included in the template sequence may be obtained.
And S903, taking the log template vector corresponding to each template identifier in the sample sequence and the frequency count corresponding to each template identifier as training sample data corresponding to the sample sequence.
Specifically, the server uses the log template vector corresponding to each template identifier in the sample sequence and the frequency count corresponding to each template identifier in the sample sequence as training sample data corresponding to the sample sequence.
On the basis of the above embodiments, further, the classification identifier is a process number or a thread number.
In particular, the log data may include a process number and/or a thread number, and the process number or the thread number may be used as the classification identifier.
The early warning method for log abnormity provided by the embodiment of the invention uses log data of a production environment for testing. Firstly, a log of a normal operation time period of a service system is used for training, and then the log of the normal time period and the log of an abnormal time period are respectively used for prediction analysis. Through tests, the log prediction failure amount in the normal time period is extremely small and stable, and the log prediction failure amount in the abnormal time period can be increased sharply. Tests show that the early warning method for log abnormity provided by the embodiment of the invention has a very good effect on detecting log abnormity.
Fig. 10 is a schematic structural diagram of an early warning apparatus for log anomalies according to a tenth embodiment of the present invention, and as shown in fig. 10, the early warning apparatus for log anomalies according to the embodiment of the present invention includes an obtaining module 1010, a predicting module 1020, a counting module 1030, and an early warning module 1040, where:
the obtaining module 1010 is configured to classify and serialize each piece of log data in sequence to obtain a template sequence corresponding to each classification identifier; each piece of log data corresponds to one classification identifier; the prediction module 1020 is configured to count the number of template identifiers included in the template sequence corresponding to each classification identifier, and predict, based on the log prediction model, the template sequence corresponding to the classification identifier whose number of template identifiers is equal to the preset number, to obtain a predicted template identifier of the next log data corresponding to the classification identifier whose number of template identifiers is equal to the preset number; the log prediction model is obtained by pre-training based on historical log data; the counting module 1030 is configured to count a prediction failure amount corresponding to the next log data corresponding to the classification identifiers, where the number of the template identifiers is equal to the preset number, in a preset time period according to the prediction template identifiers and the actual template identifiers of the next log data corresponding to the classification identifiers, where the number of the template identifiers is equal to the preset number; the early warning module 1040 is configured to perform log anomaly early warning after determining that the prediction failure amount is greater than a set threshold.
Specifically, the obtaining module 1010 may obtain real-time log data, and then sequentially classify and serialize each piece of log data, so as to obtain a template sequence corresponding to each classification identifier, where the template sequence includes a plurality of template identifiers arranged in sequence, and the arrangement sequence of each template identifier is arranged according to the sequence of the time stamps of each piece of log data corresponding to each template identifier, and may be regarded as time series data. The classification identifier is preset, and may be a process number or a thread number. Each log data will include the classification identification.
Because the real-time log data is constantly changed, the number of the template identifications included in the template sequence corresponding to the obtained classification identification also changes, the prediction module 1020 counts the number of the template identifications included in the template sequence corresponding to each classification identification, when the number of the template identifications is equal to the preset number, the prediction module predicts the template sequences corresponding to the classification identifications, the number of which is equal to the preset number, based on the log prediction model, so that the prediction template identifications, the number of which is equal to the preset number, of the next log data corresponding to the classification identifications can be obtained, and the number of the prediction template identifications can be multiple. Wherein the log prediction model is obtained by pre-training based on historical log data. The preset number is set according to actual needs, and the embodiment of the invention is not limited.
The statistical module 1030 may obtain next log data corresponding to the classification identifiers whose number of template identifiers is equal to the preset number, perform template matching on the next log data, obtain template identifiers corresponding to the next log data, and use the template identifiers as actual template identifiers of the next log data corresponding to the classification identifiers whose number of template identifiers is equal to the preset number. The statistic module 1030 may determine whether the template identifier of the next log data is predicted unsuccessfully according to the predicted template identifier and the actual template identifier of the next log data. In a preset time period, the counting module 1030 may obtain a plurality of pieces of the next log data, determine whether the template identifier of each piece of the next log data fails to be predicted, and count the number of template prediction failures of each piece of the next log data in the preset time period as the number of template identifiers in the preset time period equal to the number of prediction failures corresponding to the next piece of log data corresponding to the preset number of classification identifiers. The preset time period is set according to actual needs, and the embodiment of the invention is not limited.
After the prediction failure amount corresponding to the next log data corresponding to the classification identifiers of which the number of the template identifiers is equal to the preset number is obtained within the preset time period, the early warning module 1040 compares the prediction failure amount with a set threshold, and if the prediction failure amount is greater than the set threshold, log anomaly early warning can be performed in a system message, a mail or the like. The set threshold is set according to actual needs, and the embodiment of the present invention is not limited.
The early warning device for log anomaly provided by the embodiment of the invention can classify and serialize each log data in sequence to obtain the template sequence corresponding to each classification identifier, count the number of template identifiers included in the template sequence corresponding to each classification identifier, predict the template sequence corresponding to the classification identifier with the number equal to the preset number based on the log prediction model to obtain the predicted template identifier of the next log data corresponding to the classification identifier with the number equal to the preset number, count the prediction failure amount corresponding to the next log data corresponding to the classification identifier with the number equal to the preset number in the preset time period according to the predicted template identifier and the actual template identifier of the next log data corresponding to the classification identifier with the number equal to the preset number, if the prediction failure amount is judged to be larger than the set threshold value, and log abnormity early warning is carried out, and the reliability of log abnormity early warning is improved.
Fig. 11 is a schematic structural diagram of an early warning apparatus for log anomalies according to an eleventh embodiment of the present invention, and as shown in fig. 11, on the basis of the foregoing embodiments, further, the obtaining module 1010 includes a first obtaining unit 1011, a first classifying unit 1012, and a second obtaining unit 1013, where:
the first obtaining unit 1011 is configured to perform template matching on each piece of log data to obtain a template identifier corresponding to each piece of log data; the first classification unit 1012 is configured to classify each piece of log data according to the classification identifier, and obtain a classification identifier corresponding to each piece of log data; the second obtaining unit 1013 is configured to obtain a template sequence corresponding to each classification identifier according to the classification identifier corresponding to each log data, the template identifier corresponding to each log data, and the timestamp corresponding to each log data.
Fig. 12 is a schematic structural diagram of an early warning apparatus for log anomalies according to a twelfth embodiment of the present invention, as shown in fig. 12, on the basis of the foregoing embodiments, further, the prediction module 1020 includes a third obtaining unit 1021 and a prediction unit 1022, where:
the third obtaining unit 1021 is configured to perform feature processing on template sequences corresponding to classification identifiers, where the number of the template identifiers is equal to a preset number, to obtain predicted feature data corresponding to the classification identifiers; the predicting unit 1022 is configured to obtain a prediction template identifier of the next piece of log data corresponding to the classification identifier according to the prediction feature data corresponding to the classification identifier and the log prediction model.
Fig. 13 is a schematic structural diagram of an early warning apparatus for log abnormality according to a thirteenth embodiment of the present invention, and as shown in fig. 13, on the basis of the foregoing embodiments, further, the third obtaining unit 1021 includes a first obtaining subunit 10211, a statistics subunit 10212, and a serving subunit 10213, where:
the first obtaining subunit 10211 is configured to obtain, according to each template identifier and the template vector library included in the template sequence, a log template vector corresponding to each template identifier; the template vector library is pre-established and comprises log template vectors which correspond to the template identifications one by one; the statistics subunit 10212 is configured to count the number of the same template identifiers included in the template sequence, as the frequency count corresponding to the same template; the sub-unit 10213 is configured to use the log template vector corresponding to each template identifier in the template sequence and the corresponding frequency count as the prediction feature data corresponding to the classification identifier.
Fig. 14 is a schematic structural diagram of an early warning apparatus for log anomalies according to a fourteenth embodiment of the present invention, as shown in fig. 14, on the basis of the foregoing embodiments, further, the early warning apparatus for log anomalies according to the present invention further includes a word vector obtaining module 1050, a calculating module 1060, and a storing module 1070, where:
the word vector obtaining module 1050 is configured to obtain a word vector corresponding to each word in the log template according to the log template and the word vector library; the log template corresponds to the template identification one by one; the calculating module 1060 is configured to calculate an average value of word vectors corresponding to each word included in the log template, as a corresponding log template vector; the storage module 1070 is configured to correspondingly store the log template vector corresponding to the log template and the corresponding template identifier.
Fig. 15 is a schematic structural diagram of an early warning apparatus for log anomalies according to a fifteenth embodiment of the present invention, as shown in fig. 15, on the basis of the foregoing embodiments, further, the statistical module 1030 includes a fourth obtaining unit 1031 and a recording unit 1032, where:
the fourth obtaining unit 1031 is configured to perform template matching on the next log data corresponding to the classification identifier, and obtain an actual template identifier of the next log data corresponding to the classification identifier; the recording unit 1032 is configured to record that prediction of the next log data corresponding to the classification identifier fails after it is determined that the prediction template identifier of the next log data corresponding to the classification identifier does not include the actual template identifier of the next log data corresponding to the classification identifier.
Fig. 16 is a schematic structural diagram of an early warning apparatus for log anomalies according to a sixteenth embodiment of the present invention, as shown in fig. 16, on the basis of the foregoing embodiments, further, the early warning apparatus for log anomalies according to the embodiment of the present invention further includes a classification and serialization module 1080, a sampling module 1090, a feature processing module 1100, and a training module 1110, where:
a classification and serialization module 1080, configured to classify and serialize the historical log data to obtain a plurality of fixed template sequences; wherein each fixed template sequence corresponds to a classification identifier; the sampling module 1090 is configured to perform sliding sampling on each fixed template sequence to obtain training data, and obtain a template identifier corresponding to the next log data corresponding to each sample sequence as a tag corresponding to each sample sequence; the training data comprises a plurality of sample sequences, the length of each sample sequence is fixed, and each sample sequence corresponds to a classification mark; the feature processing module 1100 is configured to perform feature processing on each sample sequence in the training data to obtain training feature data; training sample data included in the training characteristic data corresponds to the sample sequence one by one; the training module 1110 is configured to train to obtain a log prediction model according to the original combination model, training sample data included in the training feature data, and a corresponding label.
Fig. 17 is a schematic structural diagram of an early warning apparatus for log abnormality according to a seventeenth embodiment of the present invention, and as shown in fig. 17, on the basis of the foregoing embodiments, further, the classifying and serializing module 1080 includes a fifth obtaining unit 1081, a second classifying unit 1082, and a sixth obtaining unit 1083, where:
the fifth obtaining unit 1081 is configured to obtain each piece of historical log data, perform template matching on each piece of historical log data, and obtain a template identifier corresponding to each piece of historical log data; the second classification unit 1082 is configured to classify each piece of historical log data according to the classification identifier, so as to obtain a classification identifier corresponding to each piece of historical log data; the sixth obtaining unit 1083 is configured to obtain a plurality of fixed template sequences according to the classification identifier corresponding to each historical log data, the template identifier corresponding to each historical log data, and the timestamp corresponding to each historical log data.
Fig. 18 is a schematic structural diagram of an early warning apparatus for log anomalies according to an eighteenth embodiment of the present invention, and as shown in fig. 18, on the basis of the foregoing embodiments, further, the feature processing module 1100 includes a seventh obtaining unit 1101, a statistical unit 1102, and a unit 1103, where:
the seventh obtaining unit 1101 is configured to obtain, according to each template identifier included in the sample sequence, a log template vector corresponding to each template identifier; the log template vector corresponding to each template identifier is generated in advance; the counting unit 1102 is configured to count the number of the same template identifiers included in the sample sequence, where the number is used as the frequency count corresponding to the same template identifier; the unit 1103 is configured to use the log template vector corresponding to each template identifier in the sample sequence and the frequency count corresponding to each template identifier as training sample data corresponding to the sample sequence.
The embodiment of the apparatus provided in the embodiment of the present invention may be specifically configured to execute the processing flows of the above method embodiments, and the functions of the apparatus are not described herein again, and refer to the detailed description of the above method embodiments.
Fig. 19 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 19, the electronic device may include: a processor (processor)1901, a communication Interface (Communications Interface)1902, a memory (memory)1903 and a communication bus 1904, wherein the processor 1901, the communication Interface 1902 and the memory 1903 communicate with each other via the communication bus 1904. The processor 1901 may call logical instructions in the memory 1903 to perform the following method: classifying and serializing each piece of log data in sequence to obtain a template sequence corresponding to each classification identifier; each piece of log data corresponds to one classification identifier; counting the number of template identifications included in the template sequence corresponding to each classification identification, predicting the template sequences corresponding to the classification identifications with the number equal to the preset number based on the log prediction model, and obtaining the prediction template identifications of the next log data corresponding to the classification identifications with the number equal to the preset number; the log prediction model is obtained by pre-training based on historical log data; according to the predicted template identifications and the actual template identifications of the next log data corresponding to the classification identifications with the number equal to the preset number, counting the predicted failure amount corresponding to the next log data corresponding to the classification identifications with the number equal to the preset number in the preset time period; and if the prediction failure quantity is judged and known to be larger than a set threshold value, performing log abnormity early warning.
In addition, the logic instructions in the memory 1903 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The present embodiments disclose a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, which, when executed by a processor, enables a computer to perform the methods provided by the above-described method embodiments, for example, comprising: classifying and serializing each piece of log data in sequence to obtain a template sequence corresponding to each classification identifier; each piece of log data corresponds to one classification identifier; counting the number of template identifications included in the template sequence corresponding to each classification identification, predicting the template sequences corresponding to the classification identifications with the number equal to the preset number based on the log prediction model, and obtaining the prediction template identifications of the next log data corresponding to the classification identifications with the number equal to the preset number; the log prediction model is obtained by pre-training based on historical log data; according to the predicted template identifications and the actual template identifications of the next log data corresponding to the classification identifications with the number equal to the preset number, counting the predicted failure amount corresponding to the next log data corresponding to the classification identifications with the number equal to the preset number in the preset time period; and if the prediction failure quantity is judged and known to be larger than a set threshold value, performing log abnormity early warning.
The present embodiment provides a computer-readable storage medium, which stores a computer program, where the computer program causes the computer to execute the method provided by the above method embodiments, for example, the method includes: classifying and serializing each piece of log data in sequence to obtain a template sequence corresponding to each classification identifier; each piece of log data corresponds to one classification identifier; counting the number of template identifications included in the template sequence corresponding to each classification identification, predicting the template sequences corresponding to the classification identifications with the number equal to the preset number based on the log prediction model, and obtaining the prediction template identifications of the next log data corresponding to the classification identifications with the number equal to the preset number; the log prediction model is obtained by pre-training based on historical log data; according to the predicted template identifications and the actual template identifications of the next log data corresponding to the classification identifications with the number equal to the preset number, counting the predicted failure amount corresponding to the next log data corresponding to the classification identifications with the number equal to the preset number in the preset time period; and if the prediction failure quantity is judged and known to be larger than a set threshold value, performing log abnormity early warning.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In the description herein, reference to the description of the terms "one embodiment," "a particular embodiment," "some embodiments," "for example," "an example," "a particular example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (22)

1. A log abnormity early warning method is characterized by comprising the following steps:
classifying and serializing each piece of log data in sequence to obtain a template sequence corresponding to each classification identifier; each piece of log data corresponds to one classification identifier;
counting the number of template identifications included in the template sequence corresponding to each classification identification, predicting the template sequences corresponding to the classification identifications with the number equal to the preset number based on the log prediction model, and obtaining the prediction template identifications of the next log data corresponding to the classification identifications with the number equal to the preset number; the log prediction model is obtained by pre-training based on historical log data;
according to the predicted template identifications and the actual template identifications of the next log data corresponding to the classification identifications with the number equal to the preset number, counting the predicted failure amount corresponding to the next log data corresponding to the classification identifications with the number equal to the preset number in the preset time period;
and if the prediction failure quantity is judged and known to be larger than a set threshold value, performing log abnormity early warning.
2. The method of claim 1, wherein the sequentially classifying and serializing each piece of log data to obtain a template sequence corresponding to each classification identifier comprises:
performing template matching on each log data to obtain a template identifier corresponding to each log data;
classifying each piece of log data according to the classification identifier to obtain a classification identifier corresponding to each piece of log data;
and obtaining a template sequence corresponding to each classification identifier according to the classification identifier corresponding to each log data, the template identifier corresponding to each log data and the timestamp corresponding to each log data.
3. The method of claim 1, wherein the predicting the template sequence corresponding to the classification identifiers with the number of the template identifiers equal to the preset number based on the log prediction model, and obtaining the predicted template identifier of the next log data corresponding to the classification identifiers with the number of the template identifiers equal to the preset number comprises:
performing feature processing on template sequences corresponding to classification identifications of which the number of the template identifications is equal to a preset number to obtain prediction feature data corresponding to the classification identifications;
and obtaining the prediction template identifier of the next log data corresponding to the classification identifier according to the prediction characteristic data corresponding to the classification identifier and the log prediction model.
4. The method according to claim 3, wherein the performing the feature processing on the template sequence corresponding to the classification identifiers with the number of the template identifiers equal to the preset number, and obtaining the predicted feature data corresponding to the classification identifiers comprises:
obtaining a log template vector corresponding to each template identifier according to each template identifier and a template vector library included in the template sequence; the template vector library is pre-established and comprises log template vectors which correspond to the template identifications one by one;
counting the number of the same template identifications included in the template sequence as the frequency numbers corresponding to the same template identifications;
and taking the log template vector corresponding to each template identifier in the template sequence and the corresponding frequency as the prediction characteristic data corresponding to the classification identifier.
5. The method of claim 4, wherein the step of building a library of template vectors comprises:
obtaining a word vector corresponding to each word in the log template according to the log template and a word vector library; the log template corresponds to the template identification one by one;
calculating the average value of word vectors corresponding to all words included in the log template to serve as corresponding log template vectors;
and correspondingly storing the log template vector corresponding to the log template and the corresponding template identifier.
6. The method of claim 1, wherein the counting the prediction failure amount corresponding to the next log data corresponding to the classification identifiers with the number of the template identifiers equal to the preset number within the preset time period according to the prediction template identifiers and the actual template identifiers of the next log data corresponding to the classification identifiers with the number of the template identifiers equal to the preset number comprises:
performing template matching on the next log data corresponding to the classification identifier to obtain an actual template identifier of the next log data corresponding to the classification identifier;
and if the prediction template identifier of the next log data corresponding to the classification identifier does not comprise the actual template identifier of the next log data corresponding to the classification identifier, recording that the prediction of the next log data corresponding to the classification identifier fails.
7. The method of claim 1, wherein the step of training a derived log prediction model based on historical log data comprises:
classifying and serializing the historical log data to obtain a plurality of fixed template sequences; wherein each fixed template sequence corresponds to a classification identifier;
performing sliding sampling on each fixed template sequence to obtain training data, and obtaining a template identifier corresponding to the next log data corresponding to each sample sequence as a label corresponding to each sample sequence; the training data comprises a plurality of sample sequences, the length of each sample sequence is fixed, and each sample sequence corresponds to a classification mark;
performing feature processing on each sample sequence in the training data to obtain training feature data; training sample data included in the training characteristic data corresponds to the sample sequence one by one;
and training to obtain a log prediction model according to the original combination model, training sample data included in the training characteristic data and corresponding labels.
8. The method of claim 7, wherein the classifying and serializing the historical log data to obtain a plurality of fixed template sequences comprises:
acquiring each piece of historical log data one by one, and performing template matching on each piece of historical log data to obtain a template identifier corresponding to each piece of historical log data;
classifying the historical log data according to the classification identification to obtain the classification identification corresponding to each piece of historical log data;
and obtaining a plurality of fixed template sequences according to the classification identification corresponding to each historical log data, the template identification corresponding to each historical log data and the timestamp corresponding to each historical log data.
9. The method of claim 7, wherein the performing feature processing on each sample sequence in the training data to obtain training feature data comprises:
obtaining a log template vector corresponding to each template identifier according to each template identifier included in the sample sequence; the log template vector corresponding to each template identifier is generated in advance;
counting the number of the same template identifications included in the sample sequence as the frequency corresponding to the same template identifications;
and taking the log template vector corresponding to each template identifier in the sample sequence and the frequency count corresponding to each template identifier as training sample data corresponding to the sample sequence.
10. The method of any of claims 1 to 9, wherein the class identifier is a process number or a thread number.
11. An early warning device for log abnormality is characterized by comprising:
the obtaining module is used for sequentially classifying and serializing each piece of log data to obtain a template sequence corresponding to each classification identifier; each piece of log data corresponds to one classification identifier;
the prediction module is used for counting the number of the template identifications included in the template sequence corresponding to each classification identification, predicting the template sequence corresponding to the classification identification of which the number is equal to the preset number based on the log prediction model, and obtaining the prediction template identification of the next log data corresponding to the classification identification of which the number is equal to the preset number; the log prediction model is obtained by pre-training based on historical log data;
the statistical module is used for counting the prediction failure amount corresponding to the next log data corresponding to the classification identifications with the number equal to the preset number in the preset time period according to the prediction template identifications and the actual template identifications of the next log data corresponding to the classification identifications with the number equal to the preset number;
and the early warning module is used for carrying out log abnormity early warning after judging that the prediction failure amount is larger than a set threshold value.
12. The apparatus of claim 11, wherein the obtaining module comprises:
the first obtaining unit is used for carrying out template matching on each piece of log data to obtain a template identifier corresponding to each piece of log data;
the first classification unit is used for classifying each piece of log data according to the classification identifier to obtain a classification identifier corresponding to each piece of log data;
and the second obtaining unit is used for obtaining the template sequence corresponding to each classification identifier according to the classification identifier corresponding to each log data, the template identifier corresponding to each log data and the timestamp corresponding to each log data.
13. The apparatus of claim 11, wherein the prediction module comprises:
a third obtaining unit, configured to perform feature processing on a template sequence corresponding to a classification identifier whose number of template identifiers is equal to a preset number, so as to obtain predicted feature data corresponding to the classification identifier;
and the prediction unit is used for obtaining the prediction template identifier of the next log data corresponding to the classification identifier according to the prediction characteristic data corresponding to the classification identifier and the log prediction model.
14. The apparatus of claim 13, wherein the third obtaining unit comprises:
the first obtaining subunit is configured to obtain, according to each template identifier and a template vector library included in the template sequence, a log template vector corresponding to each template identifier; the template vector library is pre-established and comprises log template vectors which correspond to the template identifications one by one;
the counting subunit is configured to count the number of the same template identifiers included in the template sequence, where the number is used as the frequency number corresponding to the same template identifier;
and the sub-unit is used for taking the log template vector corresponding to each template identifier in the template sequence and the corresponding frequency as the prediction characteristic data corresponding to the classification identifier.
15. The apparatus of claim 14, further comprising:
the word vector obtaining module is used for obtaining a word vector corresponding to each word in the log template according to the log template and the word vector library; the log template corresponds to the template identification one by one;
the calculation module is used for calculating the average value of word vectors corresponding to all words included in the log template as corresponding log template vectors;
and the storage module is used for correspondingly storing the log template vector corresponding to the log template and the corresponding template identifier.
16. The apparatus of claim 11, wherein the statistics module comprises:
a fourth obtaining unit, configured to perform template matching on the next log data corresponding to the classification identifier, and obtain an actual template identifier of the next log data corresponding to the classification identifier;
and the recording unit is used for recording the prediction failure of the next log data corresponding to the classification identifier after judging that the prediction template identifier of the next log data corresponding to the classification identifier does not comprise the actual template identifier of the next log data corresponding to the classification identifier.
17. The apparatus of claim 11, further comprising:
the classification and serialization module is used for classifying and serializing the historical log data to obtain a plurality of fixed template sequences; wherein each fixed template sequence corresponds to a classification identifier;
the sampling module is used for performing sliding sampling on each fixed template sequence to obtain training data, and obtaining a template identifier corresponding to the next log data corresponding to each sample sequence as a label corresponding to each sample sequence; the training data comprises a plurality of sample sequences, the length of each sample sequence is fixed, and each sample sequence corresponds to a classification mark;
the characteristic processing module is used for carrying out characteristic processing on each sample sequence in the training data to obtain training characteristic data; training sample data included in the training characteristic data corresponds to the sample sequence one by one;
and the training module is used for training to obtain a log prediction model according to the original combination model, the training sample data included in the training characteristic data and the corresponding label.
18. The apparatus of claim 17, wherein the classification and serialization module comprises:
a fifth obtaining unit, configured to obtain each piece of historical log data, perform template matching on each piece of historical log data, and obtain a template identifier corresponding to each piece of historical log data;
the second classification unit is used for classifying each piece of historical log data according to the classification identifier to obtain a classification identifier corresponding to each piece of historical log data;
and the sixth obtaining unit is used for obtaining a plurality of fixed template sequences according to the classification identifier corresponding to each piece of historical log data, the template identifier corresponding to each piece of historical log data and the timestamp corresponding to each piece of historical log data.
19. The apparatus of claim 17, wherein the feature processing module comprises:
a seventh obtaining unit, configured to obtain, according to each template identifier included in the sample sequence, a log template vector corresponding to each template identifier; the log template vector corresponding to each template identifier is generated in advance;
the counting unit is used for counting the number of the same template identifications included in the sample sequence as the frequency numbers corresponding to the same template identifications;
and the unit is used for taking the log template vector corresponding to each template identifier in the sample sequence and the frequency count corresponding to each template identifier as the training sample data corresponding to the sample sequence.
20. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1 to 10 are implemented by the processor when executing the computer program.
21. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 10.
22. A computer program product comprising a computer program, characterized in that the computer program realizes the steps of the method of any one of claims 1 to 10 when executed by a processor.
CN202111362830.5A 2021-11-17 2021-11-17 Early warning method and device for log abnormity, electronic equipment and storage medium Pending CN114064434A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111362830.5A CN114064434A (en) 2021-11-17 2021-11-17 Early warning method and device for log abnormity, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111362830.5A CN114064434A (en) 2021-11-17 2021-11-17 Early warning method and device for log abnormity, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114064434A true CN114064434A (en) 2022-02-18

Family

ID=80273412

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111362830.5A Pending CN114064434A (en) 2021-11-17 2021-11-17 Early warning method and device for log abnormity, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114064434A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115062191A (en) * 2022-08-16 2022-09-16 国网智能电网研究院有限公司 Abnormal behavior detection method and device based on data interaction of abnormal graph

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115062191A (en) * 2022-08-16 2022-09-16 国网智能电网研究院有限公司 Abnormal behavior detection method and device based on data interaction of abnormal graph

Similar Documents

Publication Publication Date Title
CN106951925B (en) Data processing method, device, server and system
Lo et al. SMArTIC: Towards building an accurate, robust and scalable specification miner
CN107391353B (en) Method for detecting abnormal behavior of complex software system based on log
CN111475370A (en) Operation and maintenance monitoring method, device and equipment based on data center and storage medium
US8886660B2 (en) Method and apparatus for tracking a change in a collection of web documents
CN111539493B (en) Alarm prediction method and device, electronic equipment and storage medium
CN112422351A (en) Network alarm prediction model establishing method and device based on deep learning
CN113590451A (en) Root cause positioning method, operation and maintenance server and storage medium
CN111274084A (en) Fault diagnosis method, device, equipment and computer readable storage medium
CN115277180A (en) Block chain log anomaly detection and tracing system
CN114064434A (en) Early warning method and device for log abnormity, electronic equipment and storage medium
CN115237724A (en) Data monitoring method, device, equipment and storage medium based on artificial intelligence
CN113282920B (en) Log abnormality detection method, device, computer equipment and storage medium
CN108255700A (en) Test result generation method and device
CN114138968A (en) Network hotspot mining method, device, equipment and storage medium
CN111950623B (en) Data stability monitoring method, device, computer equipment and medium
CN104636404B (en) Large-scale data generation method and device for test
CN116756659A (en) Intelligent operation and maintenance management method, device, equipment and storage medium
CN116795978A (en) Complaint information processing method and device, electronic equipment and medium
CN113535458B (en) Abnormal false alarm processing method and device, storage medium and terminal
CN116126807A (en) Log analysis method and related device
CN113268614A (en) Label system updating method and device, electronic equipment and readable storage medium
WO2023050967A1 (en) System abnormality detection processing method and apparatus
CN112767022B (en) Mobile application function evolution trend prediction method and device and computer equipment
CN117874236A (en) Error log processing method and device, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination